id,category,category_misc,category_other,readme,task,task_other 2251,Computer Vision,Computer Vision,Computer Vision,"TensorFlow YOLO object detection on Android Source project android yolo is the first implementation of YOLO for TensorFlow on an Android device. It is compatible with Android Studio and usable out of the box. It can detect the 20 classes of objects in the Pascal VOC dataset: aeroplane, bicycle, bird, boat, bottle, bus, car, cat, chair, cow, dining table, dog, horse, motorbike, person, potted plant, sheep, sofa, train and tv/monitor. The network only outputs one predicted bounding box at a time for now. The code can and will be extended in the future to output several predictions. To use this demo first clone the repository. Download the TensorFlow YOLO model and put it in android yolo/app/src/main/assets. Then open the project on Android Studio. Once the project is open you can run the project on your Android device using the Run 'app' command and selecting your device. NEW : The standalone APK has been released and you can find it here . Just open your browser on your Android device and download the APK file. When the file has been downloaded it should begin installing on your device after you grant the required permissions. GPUs are not currently supported by TensorFlow on Android. If you have a decent Android device you will have around two frames per second of processed images. Here is a video showing a small demo of the app. Nataniel Ruiz School of Interactive Computing Georgia Institute of Technology Credits: App launch icon made by Freepik from Flaticon is licensed by Creative Commons BY 3.0 . Disclaimer: The app is hardcoded for 20 classes and for the tiny yolo network final output layer. You can check the following code if you want to change this: The code describes the interpretation of the output. The code for the network inference pass is written in C++ and the output is passed to Java. The output of the network is in the form of a String which is converted to a StringTokenizer and is then converted into an array of Floats in line 87 of TensorflowClassifier.java You can work from there and read the papers to transform the new yolo model output into something that makes sense. (I did it only for one bounding box and also obtained the confidence of this bounding box). This part of the code is commented by me so you can understand what I did. Also read the paper here:",Object Detection,Object Detection 2258,Computer Vision,Computer Vision,Computer Vision,"Receptive Field Block Net for Accurate and Fast Object Detection By Songtao Liu, Di Huang, Yunhong Wang Introduction Inspired by the structure of Receptive Fields (RFs) in human visual systems, we propose a novel RF Block (RFB) module, which takes the relationship between the size and eccentricity of RFs into account, to enhance the discriminability and robustness of features. We further assemble the RFB module to the top of SSD with a lightweight CNN model, constructing the RFB Net detector. You can use the code to train/evaluate the RFB Net for object detection. For more details, please refer to our ECCV paper . VOC2007 Test System mAP FPS (Titan X Maxwell) : : : : : Faster R CNN (VGG16) 73.2 7 YOLOv2 (Darknet 19) 78.6 40 R FCN (ResNet 101) 80.5 9 SSD300 (VGG16) 77.2 46 SSD512 (VGG16) 79.8 19 RFBNet300 (VGG16) 80.7 83 RFBNet512 (VGG16) 82.2 38 COCO System test dev mAP Time (Titan X Maxwell) : : : : : Faster R CNN++ (ResNet 101) 34.9 3.36s YOLOv2 (Darknet 19) 21.6 25ms SSD300 (VGG16) 25.1 22ms SSD512 (VGG16) 28.8 53ms RetinaNet500 (ResNet 101 FPN) 34.4 90ms RFBNet300 (VGG16) 30.3 15ms RFBNet512 (VGG16) 33.8 30ms RFBNet512 E (VGG16) 34.4 33ms MobileNet System COCO minival mAP \ parameters : : : : : SSD MobileNet 19.3 6.8M RFB MobileNet 20.7 7.4M Citing RFB Net Please cite our paper in your publications if it helps your research: @InProceedings{Liu_2018_ECCV, author {Liu, Songtao and Huang, Di and Wang, andYunhong}, title {Receptive Field Block Net for Accurate and Fast Object Detection}, booktitle {The European Conference on Computer Vision (ECCV)}, month {September}, year {2018} } Contents 1. Installation ( installation) 2. Datasets ( datasets) 3. Training ( training) 4. Evaluation ( evaluation) 5. Models ( models) Installation Install PyTorch 0.4.0 by selecting your environment on the website and running the appropriate command. Clone this repository. This repository is mainly based on ssd.pytorch and Chainer ssd , a huge thank to them. Note: We currently only support PyTorch 0.4.0 and Python 3+. Compile the nms and coco tools: Shell ./make.sh Note : Check you GPU architecture support in utils/build.py, line 131. Default is: 'nvcc': ' arch sm_52', Then download the dataset by following the instructions ( download voc2007 trainval test) below and install opencv. Shell conda install opencv Note: For training, we currently support VOC and COCO . Datasets To make things easy, we provide simple VOC and COCO dataset loader that inherits torch.utils.data.Dataset making it fully compatible with the torchvision.datasets API . VOC Dataset Download VOC2007 trainval & test Shell specify a directory for dataset to be downloaded into, else default is /data/ sh data/scripts/VOC2007.sh Download VOC2012 trainval Shell specify a directory for dataset to be downloaded into, else default is /data/ sh data/scripts/VOC2012.sh COCO Dataset Install the MS COCO dataset at /path/to/coco from official website , default is /data/COCO. Following the instructions to prepare minival2014 and valminusminival2014 annotations. All label files (.json) should be under the COCO/annotations/ folder. It should have this basic structure Shell $COCO/ $COCO/cache/ $COCO/annotations/ $COCO/images/ $COCO/images/test2015/ $COCO/images/train2014/ $COCO/images/val2014/ UPDATE : The current COCO dataset has released new train2017 and val2017 sets which are just new splits of the same image sets. Training First download the fc reduced VGG 16 PyTorch base network weights at: or from our BaiduYun Driver MobileNet pre trained basenet is ported from MobileNet Caffe , which achieves slightly better accuracy rates than the original one reported in the paper , weight file is available at: or BaiduYun Driver . By default, we assume you have downloaded the file in the RFBNet/weights dir: Shell mkdir weights cd weights wget To train RFBNet using the train script simply specify the parameters listed in train_RFB.py as a flag or manually change them. Shell python train_RFB.py d VOC v RFB_vgg s 300 Note: d: choose datasets, VOC or COCO. v: choose backbone version, RFB_VGG, RFB_E_VGG or RFB_mobile. s: image size, 300 or 512. You can pick up training from a checkpoint by specifying the path as one of the training parameters (again, see train_RFB.py for options) If you want to reproduce the results in the paper, the VOC model should be trained about 240 epoches while the COCO version need 130 epoches. Evaluation To evaluate a trained network: Shell python test_RFB.py d VOC v RFB_vgg s 300 trained_model /path/to/model/weights By default, it will directly output the mAP results on VOC2007 test or COCO minival2014 . For VOC2012 test and COCO test dev results, you can manually change the datasets in the test_RFB.py file, then save the detection results and submitted to the server. Models 07+12 RFB_Net300 , BaiduYun Driver COCO RFB_Net300 COCO RFB_Net512_E , BaiduYun Driver COCO RFB_Mobile Net300 , BaiduYun Driver",Object Detection,Object Detection 2262,Computer Vision,Computer Vision,Computer Vision,"Important notice: If you used the master branch before Sep. 26 2017 and its corresponding pretrained model, PLEASE PAY ATTENTION : The old master branch in now under old_master, you can still run the code and download the pretrained model, but the pretrained model for that old master is not compatible to the current master! The main differences between new and old master branch are in this two commits: 9d4c24e , c899ce7 The change is related to this issue ; master now matches all the details in tf faster rcnn so that we can now convert pretrained tf model to pytorch model. pytorch faster rcnn A pytorch implementation of faster RCNN detection framework based on Xinlei Chen's tf faster rcnn . Xinlei Chen's repository is based on the python Caffe implementation of faster RCNN available here . Note : Several minor modifications are made when reimplementing the framework, which give potential improvements. For details about the modifications and ablative analysis, please refer to the technical report An Implementation of Faster RCNN with Study for Region Sampling . If you are seeking to reproduce the results in the original paper, please use the official code or maybe the semi official code . For details about the faster RCNN architecture please refer to the paper Faster R CNN: Towards Real Time Object Detection with Region Proposal Networks . Detection Performance The current code supports VGG16 , Resnet V1 and Mobilenet V1 models. We mainly tested it on plain VGG16 and Resnet101 architecture. As the baseline, we report numbers using a single model on a single convolution layer, so no multi scale, no multi stage bounding box regression, no skip connection, no extra input is used. The only data augmentation technique is left right flipping during training following the original Faster RCNN. All models are released. With VGG16 ( conv5_3 ): Train on VOC 2007 trainval and test on VOC 2007 test, 71.22 (from scratch) 70.75 (converted) ( 70.8 for tf faster rcnn). Train on VOC 2007+2012 trainval and test on VOC 2007 test ( R FCN schedule), 75.33 (from scratch) 75.27 (converted) ( 75.7 for tf faster rcnn). Train on COCO 2014 trainval35k and test on minival (900k/1190k) 29.2 (from scratch) 30.1 (converted) ( 30.2 for tf faster rcnn). With Resnet101 (last conv4 ): Train on VOC 2007 trainval and test on VOC 2007 test, 75.29 (from scratch) 75.76 (converted) ( 75.7 for tf faster rcnn). Train on VOC 2007+2012 trainval and test on VOC 2007 test (R FCN schedule), 79.26 (from scratch) 79.78 (converted) ( 79.8 for tf faster rcnn). Train on COCO 2014 trainval35k and test on minival (800k/1190k), 35.1 (from scratch) 35.4 (converted) ( 35.4 for tf faster rcnn). More Results: Train Mobilenet (1.0, 224) on COCO 2014 trainval35k and test on minival (900k/1190k), 21.4 (from scratch), 21.9 (converted) ( 21.8 for tf faster rcnn). Train Resnet50 on COCO 2014 trainval35k and test on minival (900k/1190k), 32.4 (converted) ( 32.4 for tf faster rcnn). Train Resnet152 on COCO 2014 trainval35k and test on minival (900k/1190k), 36.7 (converted) ( 36.1 for tf faster rcnn). Approximate baseline setup from FPN (this repository does not contain training code for FPN yet): Train Resnet50 on COCO 2014 trainval35k and test on minival (900k/1190k), 34.2 . Train Resnet101 on COCO 2014 trainval35k and test on minival (900k/1190k), 37.4 . Train Resnet152 on COCO 2014 trainval35k and test on minival (900k/1190k), 38.2 . Note : Due to the randomness in GPU training especially for VOC, the best numbers are reported (with 2 3 attempts) here. According to Xinlei's experience, for COCO you can almost always get a very close number (within 0.2%) despite the randomness. The numbers are obtained with the default testing scheme which selects region proposals using non maximal suppression (TEST.MODE nms), the alternative testing scheme (TEST.MODE top) will likely result in slightly better performance (see report , for COCO it boosts 0.X AP). Since we keep the small proposals (\ Another server here . Google drive here . (Optional) Instead of downloading my pretrained or converted model, you can also convert from tf faster rcnn model. You can download the tensorflow pretrained model from tf faster rcnn . Then run: Shell python tools/convert_from_tensorflow.py tensorflow_model resnet_model.ckpt python tools/convert_from_tensorflow_vgg.py tensorflow_model vgg_model.ckpt This script will create a .pth file with the same name in the same folder as the tensorflow model. 2. Create a folder and a soft link to use the pre trained model Shell NET res101 TRAIN_IMDB voc_2007_trainval+voc_2012_trainval mkdir p output/${NET}/${TRAIN_IMDB} cd output/${NET}/${TRAIN_IMDB} ln s ../../../data/voc_2007_trainval+voc_2012_trainval ./default cd ../../.. 3. Demo for testing on custom images Shell at repository root GPU_ID 0 CUDA_VISIBLE_DEVICES ${GPU_ID} ./tools/demo.py Note : Resnet101 testing probably requires several gigabytes of memory, so if you encounter memory capacity issues, please install it with CPU support only. Refer to Issue 25 . 4. Test with pre trained Resnet101 models Shell GPU_ID 0 ./experiments/scripts/test_faster_rcnn.sh $GPU_ID pascal_voc_0712 res101 Note : If you cannot get the reported numbers (79.8 on my side), then probably the NMS function is compiled improperly, refer to Issue 5 . Train your own model 1. Download pre trained models and weights. The current code support VGG16 and Resnet V1 models. Pre trained models are provided by pytorch vgg and pytorch resnet (the ones with caffe in the name), you can download the pre trained models and set them in the data/imagenet_weights folder. For example for VGG16 model, you can set up like: Shell mkdir p data/imagenet_weights cd data/imagenet_weights python open python in terminal and run the following Python code Python import torch from torch.utils.model_zoo import load_url from torchvision import models sd load_url( sd 'classifier.0.weight' sd 'classifier.1.weight' sd 'classifier.0.bias' sd 'classifier.1.bias' del sd 'classifier.1.weight' del sd 'classifier.1.bias' sd 'classifier.3.weight' sd 'classifier.4.weight' sd 'classifier.3.bias' sd 'classifier.4.bias' del sd 'classifier.4.weight' del sd 'classifier.4.bias' torch.save(sd, vgg16.pth ) Shell cd ../.. For Resnet101, you can set up like: Shell mkdir p data/imagenet_weights cd data/imagenet_weights download from my gdrive (link in pytorch resnet) mv resnet101 caffe.pth res101.pth cd ../.. For Mobilenet V1, you can set up like: Shell mkdir p data/imagenet_weights cd data/imagenet_weights download from my gdrive mv mobilenet_v1_1.0_224.pth.pth mobile.pth cd ../.. 2. Train (and test, evaluation) Shell ./experiments/scripts/train_faster_rcnn.sh GPU_ID DATASET NET GPU_ID is the GPU you want to test on NET in {vgg16, res50, res101, res152} is the network arch to use DATASET {pascal_voc, pascal_voc_0712, coco} is defined in train_faster_rcnn.sh Examples: ./experiments/scripts/train_faster_rcnn.sh 0 pascal_voc vgg16 ./experiments/scripts/train_faster_rcnn.sh 1 coco res101 Note : Please double check you have deleted soft link to the pre trained models before training. If you find NaNs during training, please refer to Issue 86 . Also if you want to have multi gpu support, check out Issue 121 . 3. Visualization with Tensorboard Shell tensorboard logdir tensorboard/vgg16/voc_2007_trainval/ port 7001 & tensorboard logdir tensorboard/vgg16/coco_2014_train+coco_2014_valminusminival/ port 7002 & 4. Test and evaluate Shell ./experiments/scripts/test_faster_rcnn.sh GPU_ID DATASET NET GPU_ID is the GPU you want to test on NET in {vgg16, res50, res101, res152} is the network arch to use DATASET {pascal_voc, pascal_voc_0712, coco} is defined in test_faster_rcnn.sh Examples: ./experiments/scripts/test_faster_rcnn.sh 0 pascal_voc vgg16 ./experiments/scripts/test_faster_rcnn.sh 1 coco res101 5. You can use tools/reval.sh for re evaluation By default, trained networks are saved under: output/ NET / DATASET /default/ Test outputs are saved under: output/ NET / DATASET /default/ SNAPSHOT / Tensorboard information for train and validation is saved under: tensorboard/ NET / DATASET /default/ tensorboard/ NET / DATASET /default_val/ The default number of training iterations is kept the same to the original faster RCNN for VOC 2007, however Xinlei finds it is beneficial to train longer (see report for COCO), probably due to the fact that the image batch size is one. For VOC 07+12 we switch to a 80k/110k schedule following R FCN . Also note that due to the nondeterministic nature of the current implementation, the performance can vary a bit, but in general it should be within 1% of the reported numbers for VOC, and 0.2% of the reported numbers for COCO. Suggestions/Contributions are welcome. Citation If you find this implementation or the analysis conducted in our report helpful, please consider citing: @article{chen17implementation, Author {Xinlei Chen and Abhinav Gupta}, Title {An Implementation of Faster RCNN with Study for Region Sampling}, Journal {arXiv preprint arXiv:1702.02138}, Year {2017} } For convenience, here is the faster RCNN citation: @inproceedings{renNIPS15fasterrcnn, Author {Shaoqing Ren and Kaiming He and Ross Girshick and Jian Sun}, Title {Faster {R CNN}: Towards Real Time Object Detection with Region Proposal Networks}, Booktitle {Advances in Neural Information Processing Systems ({NIPS})}, Year {2015} } Detailed numbers from COCO server (not supported) All the models are trained on COCO 2014 trainval35k . VGG16 COCO 2015 test dev (900k/1190k): Average Precision (AP) @ IoU 0.50:0.95 area all maxDets 100 0.297 Average Precision (AP) @ IoU 0.50 area all maxDets 100 0.504 Average Precision (AP) @ IoU 0.75 area all maxDets 100 0.312 Average Precision (AP) @ IoU 0.50:0.95 area small maxDets 100 0.128 Average Precision (AP) @ IoU 0.50:0.95 area medium maxDets 100 0.325 Average Precision (AP) @ IoU 0.50:0.95 area large maxDets 100 0.421 Average Recall (AR) @ IoU 0.50:0.95 area all maxDets 1 0.272 Average Recall (AR) @ IoU 0.50:0.95 area all maxDets 10 0.399 Average Recall (AR) @ IoU 0.50:0.95 area all maxDets 100 0.409 Average Recall (AR) @ IoU 0.50:0.95 area small maxDets 100 0.187 Average Recall (AR) @ IoU 0.50:0.95 area medium maxDets 100 0.451 Average Recall (AR) @ IoU 0.50:0.95 area large maxDets 100 0.591 VGG16 COCO 2015 test std (900k/1190k): Average Precision (AP) @ IoU 0.50:0.95 area all maxDets 100 0.295 Average Precision (AP) @ IoU 0.50 area all maxDets 100 0.501 Average Precision (AP) @ IoU 0.75 area all maxDets 100 0.312 Average Precision (AP) @ IoU 0.50:0.95 area small maxDets 100 0.119 Average Precision (AP) @ IoU 0.50:0.95 area medium maxDets 100 0.327 Average Precision (AP) @ IoU 0.50:0.95 area large maxDets 100 0.418 Average Recall (AR) @ IoU 0.50:0.95 area all maxDets 1 0.273 Average Recall (AR) @ IoU 0.50:0.95 area all maxDets 10 0.400 Average Recall (AR) @ IoU 0.50:0.95 area all maxDets 100 0.409 Average Recall (AR) @ IoU 0.50:0.95 area small maxDets 100 0.179 Average Recall (AR) @ IoU 0.50:0.95 area medium maxDets 100 0.455 Average Recall (AR) @ IoU 0.50:0.95 area large maxDets 100 0.586",Object Detection,Object Detection 2271,Computer Vision,Computer Vision,Computer Vision,"Build Status A PyTorch implementation of a YOLO v1 Object Detector Implementation of YOLO v1 object detector in PyTorch. Full tutorial can be found here in korean. Tested under Python 3.6, PyTorch 0.4.1 on Ubuntu 16.04, Windows10. Requirements See requirements (./requirements.txt) for details. NOTICE: different versions of PyTorch package have different memory usages. How to use Training on PASCAL VOC (20 classes) main.py mode train data_path where/your/dataset/is class_path ./names/VOC.names num_class 20 use_augmentation True use_visdom True Test on PASCAL VOC (20 classes) main.py mode test data_path where/your/dataset/is class_path ./names/VOC.names num_class 20 checkpoint_path your_checkpoint.pth.tar pre built weights file python python3 utilities/download_checkpoint.py pre build weights donwload Supported Datasets Only Pascal VOC datasets are supported for now. Configuration Options argument type description default : : : : mode str train or test train dataset str only support voc now voc data_path str data path class_path str filenames text file path input_height int input height 448 input_width int input width 448 batch_size int batch size 16 num_epochs int of epochs 16000 learning_rate float initial learning rate 1e 3 dropout float dropout probability 0.5 num_gpus int of GPUs for training 1 checkpoint_path str checkpoint path ./ use_augmentation bool image Augmentation True use_visdom bool visdom False use_wandb bool wandb False use_summary bool descripte Model summary True use_gtcheck bool gt check flag False use_githash bool use githash False num_class int number of classes 5 Train Log ! train_log Results ! image ! image ! image ! image Authorship This project is equally contributed by Chanhee Jeong , Donghyeon Hwang , and Jaewon Lee . Copyright See LICENSE (./LICENSE) for details. REFERENCES 1 Redmon, Joseph, et al. You only look once: Unified, real time object detection. Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.",Object Detection,Object Detection 2272,Computer Vision,Computer Vision,Computer Vision,"Yolo v3 and Yolo v2 for Windows and Linux (neural network for object detection) CircleCI 1. How to use ( how to use) 2. How to compile on Linux ( how to compile on linux) 3. How to compile on Windows ( how to compile on windows) 4. How to train (Pascal VOC Data) ( how to train pascal voc data) 5. How to train (to detect your custom objects) ( how to train to detect your custom objects) 6. When should I stop training ( when should i stop training) 7. How to calculate mAP on PascalVOC 2007 ( how to calculate map on pascalvoc 2007) 8. How to improve object detection ( how to improve object detection) 9. How to mark bounded boxes of objects and create annotation files ( how to mark bounded boxes of objects and create annotation files) 10. Using Yolo9000 ( using yolo9000) 11. How to use Yolo as DLL ( how to use yolo as dll) ! Darknet Logo ! map_fps mAP (AP50) Yolo v3 source chart for the RetinaNet on MS COCO got from Table 1 (e): Yolo v2 on Pascal VOC 2007: Yolo v2 on Pascal VOC 2012 (comp4): You Only Look Once: Unified, Real Time Object Detection (versions 2 & 3) A Yolo cross platform Windows and Linux version (for object detection). Contributtors: This repository is forked from Linux version: More details: This repository supports: both Windows and Linux both OpenCV 2.x.x and OpenCV 7.5 also create SO library on Linux and DLL library on Windows Requires: Linux GCC> 4.9 or Windows MS Visual Studio 2015 (v140) : (or offline ISO image ) CUDA 9.1 : OpenCV 3.4.0 : or OpenCV 2.4.13 : OpenCV allows to show image or video detection in the window and store result to file that specified in command line out_filename res.avi GPU with CC > 3.0 : Pre trained models for different cfg files can be downloaded from (smaller > faster & lower quality): yolov3.cfg (236 MB COCO Yolo v3 ) require 4 GB GPU RAM: yolov2.cfg (194 MB COCO Yolo v2) require 4 GB GPU RAM: yolo voc.cfg (194 MB VOC Yolo v2) require 4 GB GPU RAM: yolov2 tiny.cfg (43 MB COCO Yolo v2) require 1 GB GPU RAM: yolov2 tiny voc.cfg (60 MB VOC Yolo v2) require 1 GB GPU RAM: yolo9000.cfg (186 MB Yolo9000 model) require 4 GB GPU RAM: Put it near compiled: darknet.exe You can get cfg files by path: darknet/cfg/ Examples of results: Everything Is AWESOME Others: How to use: Example of usage in cmd files from build\darknet\x64\ : darknet_yolo_v3.cmd initialization with 236 MB Yolo v3 COCO model yolov3.weights & yolov3.cfg and show detection on the image: dog.jpg darknet_voc.cmd initialization with 194 MB VOC model yolo voc.weights & yolo voc.cfg and waiting for entering the name of the image file darknet_demo_voc.cmd initialization with 194 MB VOC model yolo voc.weights & yolo voc.cfg and play your video file which you must rename to: test.mp4 darknet_demo_store.cmd initialization with 194 MB VOC model yolo voc.weights & yolo voc.cfg and play your video file which you must rename to: test.mp4, and store result to: res.avi darknet_net_cam_voc.cmd initialization with 194 MB VOC model, play video from network video camera mjpeg stream (also from you phone) darknet_web_cam_voc.cmd initialization with 194 MB VOC model, play video from Web Camera number 0 darknet_coco_9000.cmd initialization with 186 MB Yolo9000 COCO model, and show detection on the image: dog.jpg darknet_coco_9000_demo.cmd initialization with 186 MB Yolo9000 COCO model, and show detection on the video (if it is present): street4k.mp4, and store result to: res.avi How to use on the command line: On Linux use ./darknet instead of darknet.exe , like this: ./darknet detector test ./cfg/coco.data ./cfg/yolov3.cfg ./yolov3.weights 194 MB COCO model image: darknet.exe detector test data/coco.data yolo.cfg yolo.weights i 0 thresh 0.2 Alternative method 194 MB COCO model image: darknet.exe detect yolo.cfg yolo.weights i 0 thresh 0.2 194 MB VOC model image: darknet.exe detector test data/voc.data yolo voc.cfg yolo voc.weights i 0 194 MB COCO model video: darknet.exe detector demo data/coco.data yolo.cfg yolo.weights test.mp4 i 0 194 MB VOC model video: darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights test.mp4 i 0 194 MB COCO model save result to the file res.avi : darknet.exe detector demo data/coco.data yolo.cfg yolo.weights test.mp4 i 0 out_filename res.avi 194 MB VOC model save result to the file res.avi : darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights test.mp4 i 0 out_filename res.avi Alternative method 194 MB VOC model video: darknet.exe yolo demo yolo voc.cfg yolo voc.weights test.mp4 i 0 60 MB VOC model for video: darknet.exe detector demo data/voc.data tiny yolo voc.cfg tiny yolo voc.weights test.mp4 i 0 194 MB COCO model for net videocam Smart WebCam: darknet.exe detector demo data/coco.data yolo.cfg yolo.weights i 0 194 MB VOC model for net videocam Smart WebCam: darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights i 0 194 MB VOC model WebCamera 0: darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights c 0 186 MB Yolo9000 image: darknet.exe detector test cfg/combine9k.data yolo9000.cfg yolo9000.weights 186 MB Yolo9000 video: darknet.exe detector demo cfg/combine9k.data yolo9000.cfg yolo9000.weights test.mp4 Remeber to put data/9k.tree and data/coco9k.map under the same folder of your app if you use the cpp api to build an app To process a list of images data/train.txt and save results of detection to result.txt use: darknet.exe detector test data/voc.data yolo voc.cfg yolo voc.weights dont_show result.txt You can comment this line so that each image does not require pressing the button ESC: For using network video camera mjpeg stream with any Android smartphone: 1. Download for Android phone mjpeg stream soft: IP Webcam / Smart WebCam Smart WebCam preferably: IP Webcam: 2. Connect your Android phone to computer by WiFi (through a WiFi router) or USB 3. Start Smart WebCam on your phone 4. Replace the address below, on shown in the phone application (Smart WebCam) and launch: 194 MB COCO model: darknet.exe detector demo data/coco.data yolo.cfg yolo.weights i 0 194 MB VOC model: darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights i 0 How to compile on Linux: Just do make in the darknet directory. Before make, you can set such options in the Makefile : link GPU 1 to build with CUDA to accelerate by using GPU (CUDA should be in /usr/local/cuda ) CUDNN 1 to build with cuDNN v5 v7 to accelerate training by using GPU (cuDNN should be in /usr/local/cudnn ) OPENCV 1 to build with OpenCV 3.x/2.4.x allows to detect on video files and video streams from network cameras or web cams DEBUG 1 to bould debug version of Yolo OPENMP 1 to build with OpenMP support to accelerate Yolo by using multi core CPU LIBSO 1 to build a library darknet.so and binary runable file uselib that uses this library. Or you can try to run so LD_LIBRARY_PATH ./:$LD_LIBRARY_PATH ./uselib test.mp4 How to use this SO library from your own code you can look at C++ example: How to compile on Windows: 1. If you have MSVS 2015, CUDA 9.1, cuDNN 7.0 and OpenCV 3.x (with paths: C:\opencv_3.0\opencv\build\include & C:\opencv_3.0\opencv\build\x64\vc14\lib ), then start MSVS, open build\darknet\darknet.sln , set x64 and Release , and do the: Build > Build darknet. NOTE: If installing OpenCV, use OpenCV 3.4.0 or earlier. This is a bug in OpenCV 3.4.1 in the C API (see 500 ). 1.1. Find files opencv_world320.dll and opencv_ffmpeg320_64.dll (or opencv_world340.dll and opencv_ffmpeg340_64.dll ) in C:\opencv_3.0\opencv\build\x64\vc14\bin and put it near with darknet.exe 1.2 Check that there are bin and include folders in the C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1 if aren't, then copy them to this folder from the path where is CUDA installed 1.3. To install CUDNN (speedup neural network), do the following: download and install cuDNN 7.0 for CUDA 9.1 : add Windows system variable cudnn with path to CUDNN: 1.4. If you want to build without CUDNN then: open \darknet.sln > (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions, and remove this: CUDNN; 2. If you have other version of CUDA (not 9.1) then open build\darknet\darknet.vcxproj by using Notepad, find 2 places with CUDA 9.1 and change it to your CUDA version, then do step 1 3. If you don't have GPU , but have MSVS 2015 and OpenCV 3.0 (with paths: C:\opencv_3.0\opencv\build\include & C:\opencv_3.0\opencv\build\x64\vc14\lib ), then start MSVS, open build\darknet\darknet_no_gpu.sln , set x64 and Release , and do the: Build > Build darknet_no_gpu 4. If you have OpenCV 2.4.13 instead of 3.0 then you should change pathes after \darknet.sln is opened 4.1 (right click on project) > properties > C/C++ > General > Additional Include Directories: C:\opencv_2.4.13\opencv\build\include 4.2 (right click on project) > properties > Linker > General > Additional Library Directories: C:\opencv_2.4.13\opencv\build\x64\vc14\lib How to compile (custom): Also, you can to create your own darknet.sln & darknet.vcxproj , this example for CUDA 9.1 and OpenCV 3.0 Then add to your created project: (right click on project) > properties > C/C++ > General > Additional Include Directories, put here: C:\opencv_3.0\opencv\build\include;..\..\3rdparty\include;%(AdditionalIncludeDirectories);$(CudaToolkitIncludeDir);$(cudnn)\include (right click on project) > Build dependecies > Build Customizations > set check on CUDA 9.1 or what version you have for example as here: add to project all .c & .cu files and file from \src (right click on project) > properties > Linker > General > Additional Library Directories, put here: C:\opencv_3.0\opencv\build\x64\vc14\lib;$(CUDA_PATH)lib\$(PlatformName);$(cudnn)\lib\x64;%(AdditionalLibraryDirectories) (right click on project) > properties > Linker > Input > Additional dependecies, put here: ..\..\3rdparty\lib\x64\pthreadVC2.lib;cublas.lib;curand.lib;cudart.lib;cudnn.lib;%(AdditionalDependencies) (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions OPENCV;_TIMESPEC_DEFINED;_CRT_SECURE_NO_WARNINGS;_CRT_RAND_S;WIN32;NDEBUG;_CONSOLE;_LIB;%(PreprocessorDefinitions) compile to .exe (X64 & Release) and put .dll s near with .exe: pthreadVC2.dll, pthreadGC2.dll from \3rdparty\dll\x64 cusolver64_91.dll, curand64_91.dll, cudart64_91.dll, cublas64_91.dll 91 for CUDA 9.1 or your version, from C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1\bin For OpenCV 3.2: opencv_world320.dll and opencv_ffmpeg320_64.dll from C:\opencv_3.0\opencv\build\x64\vc14\bin For OpenCV 2.4.13: opencv_core2413.dll , opencv_highgui2413.dll and opencv_ffmpeg2413_64.dll from C:\opencv_2.4.13\opencv\build\x64\vc14\bin How to train (Pascal VOC Data): 1. Download pre trained weights for the convolutional layers (154 MB): and put to the directory build\darknet\x64 2. Download The Pascal VOC Data and unpack it to directory build\darknet\x64\data\voc will be created dir build\darknet\x64\data\voc\VOCdevkit\ : 2.1 Download file voc_label.py to dir build\darknet\x64\data\voc : 3. Download and install Python for Windows: 4. Run command: python build\darknet\x64\data\voc\voc_label.py (to generate files: 2007_test.txt, 2007_train.txt, 2007_val.txt, 2012_train.txt, 2012_val.txt) 5. Run command: type 2007_train.txt 2007_val.txt 2012_ .txt > train.txt 6. Set batch 64 and subdivisions 8 in the file yolov3 voc.cfg : link 7. Start training by using train_voc.cmd or by using the command line: darknet.exe detector train data/voc.data cfg/yolov3 voc.cfg darknet53.conv.74 ( Note: To disable Loss Window use flag dont_show . If you are using CPU, try darknet_no_gpu.exe instead of darknet.exe .) If required change pathes in the file build\darknet\x64\data\voc.data More information about training by the link: Note: If during training you see nan values in some lines then training goes well, but if nan are in all lines then training goes wrong. How to train with multi GPU: 1. Train it first on 1 GPU for like 1000 iterations: darknet.exe detector train data/voc.data cfg/yolov3 voc.cfg darknet53.conv.74 2. Then stop and by using partially trained model /backup/yolov3 voc_1000.weights run training with multigpu (up to 4 GPUs): darknet.exe detector train data/voc.data cfg/yolov3 voc.cfg /backup/yolov3 voc_1000.weights gpus 0,1,2,3 How to train (to detect your custom objects): Training Yolo v3 1. Create file yolo obj.cfg with the same content as in yolov3.cfg (or copy yolov3.cfg to yolo obj.cfg) and: change line batch to batch 64 change line subdivisions to subdivisions 8 change line classes 80 to your number of objects in each of 3 yolo layers: change filters 255 to filters (classes + 5)x3 in the 3 convolutional before each yolo layer So if classes 1 then should be filters 18 . If classes 2 then write filters 21 . (Do not write in the cfg file: filters (classes + 5)x3) (Generally filters depends on the classes , coords and number of mask s, i.e. filters (classes + coords + 1) , where mask is indices of anchors. If mask is absence, then filters (classes + coords + 1) num ) So for example, for 2 objects, your file yolo obj.cfg should differ from yolov3.cfg in such lines in each of 3 yolo layers: convolutional filters 21 region classes 2 2. Create file obj.names in the directory build\darknet\x64\data\ , with objects names each in new line 3. Create file obj.data in the directory build\darknet\x64\data\ , containing (where classes number of objects ): classes 2 train data/train.txt valid data/test.txt names data/obj.names backup backup/ 4. Put image files (.jpg) of your objects in the directory build\darknet\x64\data\obj\ 5. You should label each object on images from your dataset. Use this visual GUI software for marking bounded boxes of objects and generating annotation files for Yolo v2 & v3: It will create .txt file for each .jpg image file in the same directory and with the same name, but with .txt extension, and put to file: object number and object coordinates on this image, for each object in new line: Where: integer number of object from 0 to (classes 1) float values relative to width and height of image, it can be equal from (0.0 to 1.0 for example: / or / atention: are center of rectangle (are not top left corner) For example for img1.jpg you will be created img1.txt containing: 1 0.716797 0.395833 0.216406 0.147222 0 0.687109 0.379167 0.255469 0.158333 1 0.420312 0.395833 0.140625 0.166667 6. Create file train.txt in directory build\darknet\x64\data\ , with filenames of your images, each filename in new line, with path relative to darknet.exe , for example containing: data/obj/img1.jpg data/obj/img2.jpg data/obj/img3.jpg 7. Download pre trained weights for the convolutional layers (154 MB): and put to the directory build\darknet\x64 8. Start training by using the command line: darknet.exe detector train data/obj.data yolo obj.cfg darknet53.conv.74 (file yolo obj_xxx.weights will be saved to the build\darknet\x64\backup\ for each 100 iterations) (To disable Loss Window use darknet.exe detector train data/obj.data yolo obj.cfg darknet53.conv.74 dont_show , if you train on computer without monitor like a cloud Amazaon EC2) 9. After training is complete get result yolo obj_final.weights from path build\darknet\x64\backup\ After each 1000 iterations you can stop and later start training from this point. For example, after 2000 iterations you can stop training, and later just copy yolo obj_2000.weights from build\darknet\x64\backup\ to build\darknet\x64\ and start training using: darknet.exe detector train data/obj.data yolo obj.cfg yolo obj_2000.weights Also you can get result earlier than all 45000 iterations. Note: If during training you see nan values in some lines then training goes well, but if nan are in all lines then training goes wrong. How to train tiny yolo (to detect your custom objects): Do all the same steps as for the full yolo model as described above. With the exception of: Download default weights file for yolov2 tiny voc: Get pre trained weights yolov2 tiny voc.conv.13 using command: darknet.exe partial cfg/yolov2 tiny voc.cfg yolov2 tiny voc.weights yolov2 tiny voc.conv.13 13 Make your custom model yolov2 tiny obj.cfg based on cfg/yolov2 tiny voc.cfg instead of yolov3.cfg Start training: darknet.exe detector train data/obj.data yolov2 tiny obj.cfg yolov2 tiny voc.conv.13 For training Yolo based on other models ( DenseNet201 Yolo or ResNet50 Yolo ), you can download and get pre trained weights as showed in this file: If you made you custom model that isn't based on other models, then you can train it without pre trained weights, then will be used random initial weights. When should I stop training: Usually sufficient 2000 iterations for each class(object). But for a more precise definition when you should stop training, use the following manual: 1. During training, you will see varying indicators of error, and you should stop when no longer decreases 0.XXXXXXX avg : > Region Avg IOU: 0.798363, Class: 0.893232, Obj: 0.700808, No Obj: 0.004567, Avg Recall: 1.000000, count: 8 > Region Avg IOU: 0.800677, Class: 0.892181, Obj: 0.701590, No Obj: 0.004574, Avg Recall: 1.000000, count: 8 > > 9002 : 0.211667, 0.060730 avg , 0.001000 rate, 3.868000 seconds, 576128 images > Loaded: 0.000000 seconds 9002 iteration number (number of batch) 0.060730 avg average loss (error) the lower, the better When you see that average loss 0.xxxxxx avg no longer decreases at many iterations then you should stop training. 2. Once training is stopped, you should take some of last .weights files from darknet\build\darknet\x64\backup and choose the best of them: For example, you stopped training after 9000 iterations, but the best result can give one of previous weights (7000, 8000, 9000). It can happen due to overfitting. Overfitting is case when you can detect objects on images from training dataset, but can't detect ojbects on any others images. You should get weights from Early Stopping Point : ! Overfitting To get weights from Early Stopping Point: 2.1. At first, in your file obj.data you must specify the path to the validation dataset valid valid.txt (format of valid.txt as in train.txt ), and if you haven't validation images, just copy data\train.txt to data\valid.txt . 2.2 If training is stopped after 9000 iterations, to validate some of previous weights use this commands: (If you use another GitHub repository, then use darknet.exe detector recall ... instead of darknet.exe detector map ...) darknet.exe detector map data/obj.data yolo obj.cfg backup\yolo obj_7000.weights darknet.exe detector map data/obj.data yolo obj.cfg backup\yolo obj_8000.weights darknet.exe detector map data/obj.data yolo obj.cfg backup\yolo obj_9000.weights And comapre last output lines for each weights (7000, 8000, 9000): Choose weights file with the highest IoU (intersect of union) and mAP (mean average precision) For example, bigger IOU gives weights yolo obj_8000.weights then use this weights for detection . Example of custom object detection: darknet.exe detector test data/obj.data yolo obj.cfg yolo obj_8000.weights IoU (intersect of union) average instersect of union of objects and detections for a certain threshold 0.24 mAP (mean average precision) mean value of average precisions for each class, where average precision is average value of 11 points on PR curve for each possible threshold (each probability of detection) for the same class (Precision Recall in terms of PascalVOC, where Precision TP/(TP+FP) and Recall TP/(TP+FN) ), page 11: mAP is default metric of precision in the PascalVOC competition, this is the same as AP50 metric in the MS COCO competition. In terms of Wiki, indicators Precision and Recall have a slightly different meaning than in the PascalVOC competition, but IoU always has the same meaning . ! precision_recall_iou How to calculate mAP on PascalVOC 2007: 1. To calculate mAP (mean average precision) on PascalVOC 2007 test: Download PascalVOC dataset, install Python 3.x and get file 2007_test.txt as described here: Then download file to the dir build\darknet\x64\data\ then run voc_label_difficult.py to get the file difficult_2007_test.txt Remove symbol from this line to un comment it: Then there are 2 ways to get mAP: 1. Using Darknet + Python: run the file build/darknet/x64/calc_mAP_voc_py.cmd you will get mAP for yolo voc.cfg model, mAP 75.9% 2. Using this fork of Darknet: run the file build/darknet/x64/calc_mAP.cmd you will get mAP for yolo voc.cfg model, mAP 75.8% (The article specifies the value of mAP 76.8% for YOLOv2 416×416, page 4 table 3: We get values lower perhaps due to the fact that the model was trained on a slightly different source code than the code on which the detection is was done) if you want to get mAP for tiny yolo voc.cfg model, then un comment line for tiny yolo voc.cfg and comment line for yolo voc.cfg in the .cmd file if you have Python 2.x instead of Python 3.x, and if you use Darknet+Python way to get mAP, then in your cmd file use reval_voc.py and voc_eval.py instead of reval_voc_py3.py and voc_eval_py3.py from this directory: Custom object detection: Example of custom object detection: darknet.exe detector test data/obj.data yolo obj.cfg yolo obj_8000.weights ! Yolo_v2_training ! Yolo_v2_training How to improve object detection: 1. Before training: set flag random 1 in your .cfg file it will increase precision by training Yolo for different resolutions: link increase network resolution in your .cfg file ( height 608 , width 608 or any value multiple of 32) it will increase precision recalculate anchors for your dataset for width and height from cfg file: darknet.exe detector calc_anchors data/obj.data num_of_clusters 9 width 416 heigh 416 then set the same 9 anchors in each of 3 yolo layers in your cfg file desirable that your training dataset include images with objects at diffrent: scales, rotations, lightings, from different sides, on different backgrounds desirable that your training dataset include images with non labeled objects that you do not want to detect negative samples without bounded box for training with a large number of objects in each image, add the parameter max 200 or higher value in the last layer region in your cfg file to speedup training (with decreasing detection accuracy) do Fine Tuning instead of Transfer Learning, set param stopbackward 1 in one of the penultimate convolutional layers before the 1 st yolo layer, for example here: 2. After training for detection: Increase network resolution by set in your .cfg file ( height 608 and width 608 ) or ( height 832 and width 832 ) or (any value multiple of 32) this increases the precision and makes it possible to detect small objects: link you do not need to train the network again, just use .weights file already trained for 416x416 resolution if error Out of memory occurs then in .cfg file you should increase subdivisions 16 , 32 or 64: link How to mark bounded boxes of objects and create annotation files: Here you can find repository with GUI software for marking bounded boxes of objects and generating annotation files for Yolo v2 & v3: With example of: train.txt , obj.names , obj.data , yolo obj.cfg , air 1 6 .txt , bird 1 4 .txt for 2 classes of objects (air, bird) and train_obj.cmd with example how to train this image set with Yolo v2 & v3 Using Yolo9000 Simultaneous detection and classification of 9000 objects: yolo9000.weights (186 MB Yolo9000 Model) requires 4 GB GPU RAM: yolo9000.cfg cfg file of the Yolo9000, also there are paths to the 9k.tree and coco9k.map 9k.tree WordTree of 9418 categories , if parent_id 1 then this label hasn't parent: coco9k.map map 80 categories from MSCOCO to WordTree 9k.tree : combine9k.data data file, there are paths to: 9k.labels , 9k.names , inet9k.map , (change path to your combine9k.train.list ): 9k.labels 9418 labels of objects: 9k.names 9418 names of objects: inet9k.map map 200 categories from ImageNet to WordTree 9k.tree : How to use Yolo as DLL 1. To compile Yolo as C++ DLL file yolo_cpp_dll.dll open in MSVS2015 file build\darknet\yolo_cpp_dll.sln , set x64 and Release , and do the: Build > Build yolo_cpp_dll You should have installed CUDA 9.1 To use cuDNN do: (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions, and add at the beginning of line: CUDNN; 2. To use Yolo as DLL file in your C++ console application open in MSVS2015 file build\darknet\yolo_console_dll.sln , set x64 and Release , and do the: Build > Build yolo_console_dll you can run your console application from Windows Explorer build\darknet\x64\yolo_console_dll.exe or you can run from MSVS2015 (before this you should copy 2 files yolo voc.cfg and yolo voc.weights to the directory build\darknet\ ) after launching your console application and entering the image file name you will see info for each object: to use simple OpenCV GUI you should uncomment line // define OPENCV in yolo_console_dll.cpp file: link you can see source code of simple example for detection on the video file: link yolo_cpp_dll.dll API: link class Detector { public: Detector(std::string cfg_filename, std::string weight_filename, int gpu_id 0); Detector(); std::vector detect(std::string image_filename, float thresh 0.2, bool use_mean false); std::vector detect(image_t img, float thresh 0.2, bool use_mean false); static image_t load_image(std::string image_filename); static void free_image(image_t m); ifdef OPENCV std::vector detect(cv::Mat mat, float thresh 0.2, bool use_mean false); endif }; mydarknet",Object Detection,Object Detection 2274,Computer Vision,Computer Vision,Computer Vision," Yolo v3 and Yolo v2 for Windows and Linux (neural network for object detection) Tensor Cores can be used on Linux and Windows Contributors: More details: CircleCI TravisCI AppveyorCI Requirements (and how to install dependecies) ( requirements) Pre trained models ( pre trained models) Explanations in issues Yolo v3 in other frameworks (TensorFlow, OpenVINO, OpenCV dnn, ...) ( yolo v3 in other frameworks) 0. Improvements in this repository ( improvements in this repository) 1. How to use ( how to use on the command line) 2. How to compile on Linux ( how to compile on linux) 3. How to compile on Windows Using vcpkg ( how to compile on windows using vcpkg) Legacy way ( how to compile on windows legacy way) 4. How to train (Pascal VOC Data) ( how to train pascal voc data) 5. How to train with multi GPU: ( how to train with multi gpu) 6. How to train (to detect your custom objects) ( how to train to detect your custom objects) 7. How to train tiny yolo (to detect your custom objects) ( how to train tiny yolo to detect your custom objects) 8. When should I stop training ( when should i stop training) 9. How to calculate mAP on PascalVOC 2007 ( how to calculate map on pascalvoc 2007) 10. How to improve object detection ( how to improve object detection) 11. How to mark bounded boxes of objects and create annotation files ( how to mark bounded boxes of objects and create annotation files) 12. How to use Yolo as DLL and SO libraries ( how to use yolo as dll and so libraries) ! Darknet Logo ! map_time mAP@0.5 (AP50) YOLOv3 spp better than YOLOv3 mAP 60.6%, FPS 20: Yolo v3 source chart for the RetinaNet on MS COCO got from Table 1 (e): Yolo v2 on Pascal VOC 2007: Yolo v2 on Pascal VOC 2012 (comp4): Requirements Windows or Linux CMake > 3.8 for modern CUDA support: CUDA 10.0 : (on Linux do Post installation Actions ) OpenCV 7.0 for CUDA 10.0 (on Linux copy cudnn.h , libcudnn.so ... as desribed here , on Windows copy cudnn.h , cudnn64_7.dll , cudnn64_7.lib as desribed here ) GPU with CC > 3.0 : on Linux GCC or Clang , on Windows MSVS 2017 (v15) Compiling on Windows by using Cmake GUI as on this IMAGE : Configure > Optional platform for generator (Set: x64) > Finish > Generate > Open Project > x64 & Release > Build > Build solution Compiling on Linux by using command make (or alternative way by using command: cmake . && make ) Pre trained models There are weights file for different cfg files (smaller size > faster speed & lower accuracy: yolov3 openimages.cfg (247 MB COCO Yolo v3 ) requires 4 GB GPU RAM: yolov3 spp.cfg (240 MB COCO Yolo v3 ) requires 4 GB GPU RAM: yolov3.cfg (236 MB COCO Yolo v3 ) requires 4 GB GPU RAM: yolov3 tiny.cfg (34 MB COCO Yolo v3 tiny ) requires 1 GB GPU RAM: yolov2.cfg (194 MB COCO Yolo v2) requires 4 GB GPU RAM: yolo voc.cfg (194 MB VOC Yolo v2) requires 4 GB GPU RAM: yolov2 tiny.cfg (43 MB COCO Yolo v2) requires 1 GB GPU RAM: yolov2 tiny voc.cfg (60 MB VOC Yolo v2) requires 1 GB GPU RAM: yolo9000.cfg (186 MB Yolo9000 model) requires 4 GB GPU RAM: Put it near compiled: darknet.exe You can get cfg files by path: darknet/cfg/ Yolo v3 in other frameworks Convert yolov3.weights / cfg model to TensorFlow : by using mystic123 or jinyu121 projects, and TensorFlow lite To use Yolo v3 model in Intel OpenVINO (Myriad X / USB Neural Compute Stick / Arria FPGA): read this manual OpenCV dnn is very fast DNN implementation on CPU (x86/ARM Android), use yolov3.weights / cfg with: C++ example , Python example Examples of results Yolo v3 Others: Improvements in this repository added support for Windows improved binary neural network performance 2x 4x times for Detection on CPU and GPU if you trained your own weights by using this XNOR net model (bit 1 inference) : improved neural network performance 7% by fusing 2 layers into 1: Convolutional + Batch norm improved neural network performance Detection 3x times , Training 2 x times on GPU Volta (Tesla V100, Titan V, ...) using Tensor Cores if CUDNN_HALF defined in the Makefile or darknet.sln improved performance 1.2x times on FullHD, 2x times on 4K, for detection on the video (file/stream) using darknet detector demo ... improved performance 3.5 X times of data augmentation for training (using OpenCV SSE/AVX functions instead of hand written functions) removes bottleneck for training on multi GPU or GPU Volta improved performance of detection and training on Intel CPU with AVX (Yolo v3 85% , Yolo v2 10%) fixed usage of reorg layer optimized memory allocation during network resizing when random 1 optimized initialization GPU for detection we use batch 1 initially instead of re init with batch 1 added correct calculation of mAP, F1, IoU, Precision Recall using command darknet detector map ... added drawing of chart of average Loss and accuracy mAP ( map flag) during training run ./darknet detector demo ... json_port 8070 mjpeg_port 8090 as JSON and MJPEG server to get results online over the network by using your soft or Web browser added calculation of anchors for training added example of Detection and Tracking objects: fixed code for use Web cam on OpenCV 3.x run time tips and warnings if you use incorrect cfg file or dataset many other fixes of code... And added manual How to train Yolo v3/v2 (to detect your custom objects) ( how to train to detect your custom objects) Also, you might be interested in using a simplified repository where is implemented INT8 quantization (+30% speedup and 1% mAP reduced): How to use on the command line On Linux use ./darknet instead of darknet.exe , like this: ./darknet detector test ./cfg/coco.data ./cfg/yolov3.cfg ./yolov3.weights On Linux find executable file ./darknet in the root directory, while on Windows find it in the directory \build\darknet\x64 Yolo v3 COCO image : darknet.exe detector test cfg/coco.data cfg/yolov3.cfg yolov3.weights thresh 0.25 Output coordinates of objects: darknet.exe detector test cfg/coco.data yolov3.cfg yolov3.weights ext_output dog.jpg Yolo v3 COCO video : darknet.exe detector demo cfg/coco.data cfg/yolov3.cfg yolov3.weights ext_output test.mp4 Yolo v3 COCO WebCam 0 : darknet.exe detector demo cfg/coco.data cfg/yolov3.cfg yolov3.weights c 0 Yolo v3 COCO for net videocam Smart WebCam: darknet.exe detector demo cfg/coco.data cfg/yolov3.cfg yolov3.weights Yolo v3 save result videofile res.avi : darknet.exe detector demo cfg/coco.data cfg/yolov3.cfg yolov3.weights test.mp4 out_filename res.avi Yolo v3 Tiny COCO video: darknet.exe detector demo cfg/coco.data cfg/yolov3 tiny.cfg yolov3 tiny.weights test.mp4 JSON and MJPEG server that allows multiple connections from your soft or Web browser ip address:8070 and 8090: ./darknet detector demo ./cfg/coco.data ./cfg/yolov3.cfg ./yolov3.weights test50.mp4 json_port 8070 mjpeg_port 8090 ext_output Yolo v3 Tiny on GPU 1 : darknet.exe detector demo cfg/coco.data cfg/yolov3 tiny.cfg yolov3 tiny.weights i 1 test.mp4 Alternative method Yolo v3 COCO image: darknet.exe detect cfg/yolov3.cfg yolov3.weights i 0 thresh 0.25 Train on Amazon EC2 , to see mAP & Loss chart using URL like: in the Chrome/Firefox ( Darknet should be compiled with OpenCV ): ./darknet detector train cfg/coco.data yolov3.cfg darknet53.conv.74 dont_show mjpeg_port 8090 map 186 MB Yolo9000 image: darknet.exe detector test cfg/combine9k.data cfg/yolo9000.cfg yolo9000.weights Remeber to put data/9k.tree and data/coco9k.map under the same folder of your app if you use the cpp api to build an app To process a list of images data/train.txt and save results of detection to result.json file use: darknet.exe detector test cfg/coco.data cfg/yolov3.cfg yolov3.weights ext_output dont_show out result.json result.txt Pseudo lableing to process a list of images data/new_train.txt and save results of detection in Yolo training format for each image as label .txt (in this way you can increase the amount of training data) use: darknet.exe detector test cfg/coco.data cfg/yolov3.cfg yolov3.weights thresh 0.25 dont_show save_labels Optional platform for generator (Set: x64) > Finish > Generate > Open Project > x64 & Release > Build > Build solution Otherwise: 1. Install or update Visual Studio to at least version 2017, making sure to have it fully patched (run again the installer if not sure to automatically update to latest version). If you need to install from scratch, download VS from here: Visual Studio 2017 Community 2. Install CUDA and cuDNN 3. Install git and cmake . Make sure they are on the Path at least for the current account 4. Install vcpkg and try to install a test library to make sure everything is working, for example vcpkg install opengl 5. Define an environment variables, VCPKG_ROOT , pointing to the install path of vcpkg 6. Define another environment variable, with name VCPKG_DEFAULT_TRIPLET and value x64 windows 7. Open a Powershell (as a standard user) and type (the last command requires a confirmation and is used to clean up unnecessary files) PowerShell PS \> cd $env:VCPKG_ROOT PS Code\vcpkg> .\vcpkg install pthreads opencv ffmpeg replace with opencv cuda,ffmpeg in case you want to use cuda accelerated openCV 8. necessary only with CUDA Customize the build.ps1 script enabling the appropriate my_cuda_compute_model line. If not manually defined, CMake toolchain will automatically use the very low 3.0 CUDA compute model 9. Build with the Powershell script build.ps1 . If you want to use Visual Studio, you will find a custom solution created for you by CMake after the build containing all the appropriate config flags for your system. How to compile on Windows (legacy way) 1. If you have MSVS 2015, CUDA 10.0, cuDNN 7.4 and OpenCV 3.x (with paths: C:\opencv_3.0\opencv\build\include & C:\opencv_3.0\opencv\build\x64\vc14\lib ), then start MSVS, open build\darknet\darknet.sln , set x64 and Release and do the: Build > Build darknet. Also add Windows system variable CUDNN with path to CUDNN: NOTE: If installing OpenCV, use OpenCV 3.4.0 or earlier. This is a bug in OpenCV 3.4.1 in the C API (see 500 ). 1.1. Find files opencv_world320.dll and opencv_ffmpeg320_64.dll (or opencv_world340.dll and opencv_ffmpeg340_64.dll ) in C:\opencv_3.0\opencv\build\x64\vc14\bin and put it near with darknet.exe 1.2 Check that there are bin and include folders in the C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0 if aren't, then copy them to this folder from the path where is CUDA installed 1.3. To install CUDNN (speedup neural network), do the following: download and install cuDNN v7.4.1 for CUDA 10.0 : add Windows system variable CUDNN with path to CUDNN: copy file cudnn64_7.dll to the folder \build\darknet\x64 near with darknet.exe 1.4. If you want to build without CUDNN then: open \darknet.sln > (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions, and remove this: CUDNN; 2. If you have other version of CUDA (not 10.0) then open build\darknet\darknet.vcxproj by using Notepad, find 2 places with CUDA 10.0 and change it to your CUDA version. Then open \darknet.sln > (right click on project) > properties > CUDA C/C++ > Device and remove there ;compute_75,sm_75 . Then do step 1 3. If you don't have GPU , but have MSVS 2015 and OpenCV 3.0 (with paths: C:\opencv_3.0\opencv\build\include & C:\opencv_3.0\opencv\build\x64\vc14\lib ), then start MSVS, open build\darknet\darknet_no_gpu.sln , set x64 and Release , and do the: Build > Build darknet_no_gpu 4. If you have OpenCV 2.4.13 instead of 3.0 then you should change paths after \darknet.sln is opened 4.1 (right click on project) > properties > C/C++ > General > Additional Include Directories: C:\opencv_2.4.13\opencv\build\include 4.2 (right click on project) > properties > Linker > General > Additional Library Directories: C:\opencv_2.4.13\opencv\build\x64\vc14\lib 5. If you have GPU with Tensor Cores (nVidia Titan V / Tesla V100 / DGX 2 and later) speedup Detection 3x, Training 2x: \darknet.sln > (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions, and add here: CUDNN_HALF; Note: CUDA must be installed only after that MSVS2015 had been installed. How to compile (custom): Also, you can to create your own darknet.sln & darknet.vcxproj , this example for CUDA 9.1 and OpenCV 3.0 Then add to your created project: (right click on project) > properties > C/C++ > General > Additional Include Directories, put here: C:\opencv_3.0\opencv\build\include;..\..\3rdparty\include;%(AdditionalIncludeDirectories);$(CudaToolkitIncludeDir);$(CUDNN)\include (right click on project) > Build dependecies > Build Customizations > set check on CUDA 9.1 or what version you have for example as here: add to project: all .c files all .cu files file from \src directory file darknet.h from \include directory (right click on project) > properties > Linker > General > Additional Library Directories, put here: C:\opencv_3.0\opencv\build\x64\vc14\lib;$(CUDA_PATH)\lib\$(PlatformName);$(CUDNN)\lib\x64;%(AdditionalLibraryDirectories) (right click on project) > properties > Linker > Input > Additional dependecies, put here: ..\..\3rdparty\lib\x64\pthreadVC2.lib;cublas.lib;curand.lib;cudart.lib;cudnn.lib;%(AdditionalDependencies) (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions OPENCV;_TIMESPEC_DEFINED;_CRT_SECURE_NO_WARNINGS;_CRT_RAND_S;WIN32;NDEBUG;_CONSOLE;_LIB;%(PreprocessorDefinitions) compile to .exe (X64 & Release) and put .dll s near with .exe: pthreadVC2.dll, pthreadGC2.dll from \3rdparty\dll\x64 cusolver64_91.dll, curand64_91.dll, cudart64_91.dll, cublas64_91.dll 91 for CUDA 9.1 or your version, from C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1\bin For OpenCV 3.2: opencv_world320.dll and opencv_ffmpeg320_64.dll from C:\opencv_3.0\opencv\build\x64\vc14\bin For OpenCV 2.4.13: opencv_core2413.dll , opencv_highgui2413.dll and opencv_ffmpeg2413_64.dll from C:\opencv_2.4.13\opencv\build\x64\vc14\bin How to train (Pascal VOC Data): 1. Download pre trained weights for the convolutional layers (154 MB): and put to the directory build\darknet\x64 2. Download The Pascal VOC Data and unpack it to directory build\darknet\x64\data\voc will be created dir build\darknet\x64\data\voc\VOCdevkit\ : 2.1 Download file voc_label.py to dir build\darknet\x64\data\voc : 3. Download and install Python for Windows: 4. Run command: python build\darknet\x64\data\voc\voc_label.py (to generate files: 2007_test.txt, 2007_train.txt, 2007_val.txt, 2012_train.txt, 2012_val.txt) 5. Run command: type 2007_train.txt 2007_val.txt 2012_ .txt > train.txt 6. Set batch 64 and subdivisions 8 in the file yolov3 voc.cfg : link 7. Start training by using train_voc.cmd or by using the command line: darknet.exe detector train cfg/voc.data cfg/yolov3 voc.cfg darknet53.conv.74 ( Note: To disable Loss Window use flag dont_show . If you are using CPU, try darknet_no_gpu.exe instead of darknet.exe .) If required change paths in the file build\darknet\cfg\voc.data More information about training by the link: Note: If during training you see nan values for avg (loss) field then training goes wrong, but if nan is in some other lines then training goes well. How to train with multi GPU: 1. Train it first on 1 GPU for like 1000 iterations: darknet.exe detector train cfg/voc.data cfg/yolov3 voc.cfg darknet53.conv.74 2. Then stop and by using partially trained model /backup/yolov3 voc_1000.weights run training with multigpu (up to 4 GPUs): darknet.exe detector train cfg/voc.data cfg/yolov3 voc.cfg /backup/yolov3 voc_1000.weights gpus 0,1,2,3 Only for small datasets sometimes better to decrease learning rate, for 4 GPUs set learning_rate 0.00025 (i.e. learning_rate 0.001 / GPUs). In this case also increase 4x times burn_in and max_batches in your cfg file. I.e. use burn_in 4000 instead of 1000 . Same goes for steps if policy steps is set. How to train (to detect your custom objects): (to train old Yolo v2 yolov2 voc.cfg , yolov2 tiny voc.cfg , yolo voc.cfg , yolo voc.2.0.cfg , ... click by the link ) Training Yolo v3: 1. Create file yolo obj.cfg with the same content as in yolov3.cfg (or copy yolov3.cfg to yolo obj.cfg) and: change line batch to batch 64 change line subdivisions to subdivisions 8 change line max_batches to ( classes 2000 ), f.e. max_batches 6000 if you train for 3 classes change line steps to 80% and 90% of max_batches, f.e. steps 4800,5400 change line classes 80 to your number of objects in each of 3 yolo layers: change filters 255 to filters (classes + 5)x3 in the 3 convolutional before each yolo layer So if classes 1 then should be filters 18 . If classes 2 then write filters 21 . (Do not write in the cfg file: filters (classes + 5)x3) (Generally filters depends on the classes , coords and number of mask s, i.e. filters (classes + coords + 1) , where mask is indices of anchors. If mask is absence, then filters (classes + coords + 1) num ) So for example, for 2 objects, your file yolo obj.cfg should differ from yolov3.cfg in such lines in each of 3 yolo layers: convolutional filters 21 region classes 2 2. Create file obj.names in the directory build\darknet\x64\data\ , with objects names each in new line 3. Create file obj.data in the directory build\darknet\x64\data\ , containing (where classes number of objects ): classes 2 train data/train.txt valid data/test.txt names data/obj.names backup backup/ 4. Put image files (.jpg) of your objects in the directory build\darknet\x64\data\obj\ 5. You should label each object on images from your dataset. Use this visual GUI software for marking bounded boxes of objects and generating annotation files for Yolo v2 & v3: It will create .txt file for each .jpg image file in the same directory and with the same name, but with .txt extension, and put to file: object number and object coordinates on this image, for each object in new line: Where: integer object number from 0 to (classes 1) float values relative to width and height of image, it can be equal from (0.0 to 1.0 for example: / or / atention: are center of rectangle (are not top left corner) For example for img1.jpg you will be created img1.txt containing: 1 0.716797 0.395833 0.216406 0.147222 0 0.687109 0.379167 0.255469 0.158333 1 0.420312 0.395833 0.140625 0.166667 6. Create file train.txt in directory build\darknet\x64\data\ , with filenames of your images, each filename in new line, with path relative to darknet.exe , for example containing: data/obj/img1.jpg data/obj/img2.jpg data/obj/img3.jpg 7. Download pre trained weights for the convolutional layers (154 MB): and put to the directory build\darknet\x64 8. Start training by using the command line: darknet.exe detector train data/obj.data yolo obj.cfg darknet53.conv.74 To train on Linux use command: ./darknet detector train data/obj.data yolo obj.cfg darknet53.conv.74 (just use ./darknet instead of darknet.exe ) (file yolo obj_last.weights will be saved to the build\darknet\x64\backup\ for each 100 iterations) (file yolo obj_xxxx.weights will be saved to the build\darknet\x64\backup\ for each 1000 iterations) (to disable Loss Window use darknet.exe detector train data/obj.data yolo obj.cfg darknet53.conv.74 dont_show , if you train on computer without monitor like a cloud Amazon EC2) (to see the mAP & Loss chart during training on remote server without GUI, use command darknet.exe detector train data/obj.data yolo obj.cfg darknet53.conv.74 dont_show mjpeg_port 8090 map then open URL in Chrome/Firefox browser) 8.1. For training with mAP (mean average precisions) calculation for each 4 Epochs (set valid valid.txt or train.txt in obj.data file) and run: darknet.exe detector train data/obj.data yolo obj.cfg darknet53.conv.74 map 9. After training is complete get result yolo obj_final.weights from path build\darknet\x64\backup\ After each 100 iterations you can stop and later start training from this point. For example, after 2000 iterations you can stop training, and later just start training using: darknet.exe detector train data/obj.data yolo obj.cfg backup\yolo obj_2000.weights (in the original repository the weights file is saved only once every 10 000 iterations if(iterations > 1000) ) Also you can get result earlier than all 45000 iterations. Note: If during training you see nan values for avg (loss) field then training goes wrong, but if nan is in some other lines then training goes well. Note: If you changed width or height in your cfg file, then new width and height must be divisible by 32. Note: After training use such command for detection: darknet.exe detector test data/obj.data yolo obj.cfg yolo obj_8000.weights Note: if error Out of memory occurs then in .cfg file you should increase subdivisions 16 , 32 or 64: link How to train tiny yolo (to detect your custom objects): Do all the same steps as for the full yolo model as described above. With the exception of: Download default weights file for yolov3 tiny: Get pre trained weights yolov3 tiny.conv.15 using command: darknet.exe partial cfg/yolov3 tiny.cfg yolov3 tiny.weights yolov3 tiny.conv.15 15 Make your custom model yolov3 tiny obj.cfg based on cfg/yolov3 tiny_obj.cfg instead of yolov3.cfg Start training: darknet.exe detector train data/obj.data yolov3 tiny obj.cfg yolov3 tiny.conv.15 For training Yolo based on other models ( DenseNet201 Yolo or ResNet50 Yolo ), you can download and get pre trained weights as showed in this file: If you made you custom model that isn't based on other models, then you can train it without pre trained weights, then will be used random initial weights. When should I stop training: Usually sufficient 2000 iterations for each class(object), but not less than 4000 iterations in total. But for a more precise definition when you should stop training, use the following manual: 1. During training, you will see varying indicators of error, and you should stop when no longer decreases 0.XXXXXXX avg : > Region Avg IOU: 0.798363, Class: 0.893232, Obj: 0.700808, No Obj: 0.004567, Avg Recall: 1.000000, count: 8 > Region Avg IOU: 0.800677, Class: 0.892181, Obj: 0.701590, No Obj: 0.004574, Avg Recall: 1.000000, count: 8 > > 9002 : 0.211667, 0.060730 avg , 0.001000 rate, 3.868000 seconds, 576128 images > Loaded: 0.000000 seconds 9002 iteration number (number of batch) 0.060730 avg average loss (error) the lower, the better When you see that average loss 0.xxxxxx avg no longer decreases at many iterations then you should stop training. The final avgerage loss can be from 0.05 (for a small model and easy dataset) to 3.0 (for a big model and a difficult dataset). 2. Once training is stopped, you should take some of last .weights files from darknet\build\darknet\x64\backup and choose the best of them: For example, you stopped training after 9000 iterations, but the best result can give one of previous weights (7000, 8000, 9000). It can happen due to overfitting. Overfitting is case when you can detect objects on images from training dataset, but can't detect objects on any others images. You should get weights from Early Stopping Point : ! Overfitting To get weights from Early Stopping Point: 2.1. At first, in your file obj.data you must specify the path to the validation dataset valid valid.txt (format of valid.txt as in train.txt ), and if you haven't validation images, just copy data\train.txt to data\valid.txt . 2.2 If training is stopped after 9000 iterations, to validate some of previous weights use this commands: (If you use another GitHub repository, then use darknet.exe detector recall ... instead of darknet.exe detector map ...) darknet.exe detector map data/obj.data yolo obj.cfg backup\yolo obj_7000.weights darknet.exe detector map data/obj.data yolo obj.cfg backup\yolo obj_8000.weights darknet.exe detector map data/obj.data yolo obj.cfg backup\yolo obj_9000.weights And comapre last output lines for each weights (7000, 8000, 9000): Choose weights file with the highest mAP (mean average precision) or IoU (intersect over union) For example, bigger mAP gives weights yolo obj_8000.weights then use this weights for detection . Or just train with map flag: darknet.exe detector train data/obj.data yolo obj.cfg darknet53.conv.74 map So you will see mAP chart (red line) in the Loss chart Window. mAP will be calculated for each 4 Epochs using valid valid.txt file that is specified in obj.data file ( 1 Epoch images_in_train_txt / batch iterations) (to change the max x axis value change max_batches parameter to 2000 classes , f.e. max_batches 6000 for 3 classes) ! loss_chart_map_chart Example of custom object detection: darknet.exe detector test data/obj.data yolo obj.cfg yolo obj_8000.weights IoU (intersect over union) average instersect over union of objects and detections for a certain threshold 0.24 mAP (mean average precision) mean value of average precisions for each class, where average precision is average value of 11 points on PR curve for each possible threshold (each probability of detection) for the same class (Precision Recall in terms of PascalVOC, where Precision TP/(TP+FP) and Recall TP/(TP+FN) ), page 11: mAP is default metric of precision in the PascalVOC competition, this is the same as AP50 metric in the MS COCO competition. In terms of Wiki, indicators Precision and Recall have a slightly different meaning than in the PascalVOC competition, but IoU always has the same meaning . ! precision_recall_iou How to calculate mAP on PascalVOC 2007: 1. To calculate mAP (mean average precision) on PascalVOC 2007 test: Download PascalVOC dataset, install Python 3.x and get file 2007_test.txt as described here: Then download file to the dir build\darknet\x64\data\ then run voc_label_difficult.py to get the file difficult_2007_test.txt Remove symbol from this line to un comment it: Then there are 2 ways to get mAP: 1. Using Darknet + Python: run the file build/darknet/x64/calc_mAP_voc_py.cmd you will get mAP for yolo voc.cfg model, mAP 75.9% 2. Using this fork of Darknet: run the file build/darknet/x64/calc_mAP.cmd you will get mAP for yolo voc.cfg model, mAP 75.8% (The article specifies the value of mAP 76.8% for YOLOv2 416×416, page 4 table 3: We get values lower perhaps due to the fact that the model was trained on a slightly different source code than the code on which the detection is was done) if you want to get mAP for tiny yolo voc.cfg model, then un comment line for tiny yolo voc.cfg and comment line for yolo voc.cfg in the .cmd file if you have Python 2.x instead of Python 3.x, and if you use Darknet+Python way to get mAP, then in your cmd file use reval_voc.py and voc_eval.py instead of reval_voc_py3.py and voc_eval_py3.py from this directory: Custom object detection: Example of custom object detection: darknet.exe detector test data/obj.data yolo obj.cfg yolo obj_8000.weights ! Yolo_v2_training ! Yolo_v2_training How to improve object detection: 1. Before training: set flag random 1 in your .cfg file it will increase precision by training Yolo for different resolutions: link increase network resolution in your .cfg file ( height 608 , width 608 or any value multiple of 32) it will increase precision check that each object is mandatory labeled in your dataset no one object in your data set should not be without label. In the most training issues there are wrong labels in your dataset (got labels by using some conversion script, marked with a third party tool, ...). Always check your dataset by using: for each object which you want to detect there must be at least 1 similar object in the Training dataset with about the same: shape, side of object, relative size, angle of rotation, tilt, illumination. So desirable that your training dataset include images with objects at diffrent: scales, rotations, lightings, from different sides, on different backgrounds you should preferably have 2000 different images for each class or more, and you should train 2000 classes iterations or more desirable that your training dataset include images with non labeled objects that you do not want to detect negative samples without bounded box (empty .txt files) use as many images of negative samples as there are images with objects for training with a large number of objects in each image, add the parameter max 200 or higher value in the last yolo layer or region layer in your cfg file (the global maximum number of objects that can be detected by YoloV3 is 0,0615234375 (width height) where are width and height are parameters from net section in cfg file) for training for small objects (smaller than 16x16 after the image is resized to 416x416) set layers 1, 11 instead of and set stride 4 instead of for training for both small and large objects use modified models: Full model: 5 yolo layers: Tiny model: 3 yolo layers: Spatial full model: 3 yolo layers: If you train the model to distinguish Left and Right objects as separate classes (left/right hand, left/right turn on road signs, ...) then for disabling flip data augmentation add flip 0 here: General rule your training dataset should include such a set of relative sizes of objects that you want to detect: train_network_width train_obj_width / train_image_width detection_network_width detection_obj_width / detection_image_width train_network_height train_obj_height / train_image_height detection_network_height detection_obj_height / detection_image_height I.e. for each object from Test dataset there must be at least 1 object in the Training dataset with the same class_id and about the same relative size: object width in percent from Training dataset object width in percent from Test dataset That is, if only objects that occupied 80 90% of the image were present in the training set, then the trained network will not be able to detect objects that occupy 1 10% of the image. to speedup training (with decreasing detection accuracy) do Fine Tuning instead of Transfer Learning, set param stopbackward 1 here: then do this command: ./darknet partial cfg/yolov3.cfg yolov3.weights yolov3.conv.81 81 will be created file yolov3.conv.81 , then train by using weights file yolov3.conv.81 instead of darknet53.conv.74 each: model of object, side, illimination, scale, each 30 grad of the turn and inclination angles these are different objects from an internal perspective of the neural network. So the more different objects you want to detect, the more complex network model should be used. recalculate anchors for your dataset for width and height from cfg file: darknet.exe detector calc_anchors data/obj.data num_of_clusters 9 width 416 height 416 then set the same 9 anchors in each of 3 yolo layers in your cfg file. But you should change indexes of anchors masks for each yolo layer, so that 1st yolo layer has anchors larger than 60x60, 2nd larger than 30x30, 3rd remaining. Also you should change the filters (classes + 5) before each yolo layer. If many of the calculated anchors do not fit under the appropriate layers then just try using all the default anchors. 2. After training for detection: Increase network resolution by set in your .cfg file ( height 608 and width 608 ) or ( height 832 and width 832 ) or (any value multiple of 32) this increases the precision and makes it possible to detect small objects: link it is not necessary to train the network again, just use .weights file already trained for 416x416 resolution but to get even greater accuracy you should train with higher resolution 608x608 or 832x832, note: if error Out of memory occurs then in .cfg file you should increase subdivisions 16 , 32 or 64: link How to mark bounded boxes of objects and create annotation files: Here you can find repository with GUI software for marking bounded boxes of objects and generating annotation files for Yolo v2 & v3: With example of: train.txt , obj.names , obj.data , yolo obj.cfg , air 1 6 .txt , bird 1 4 .txt for 2 classes of objects (air, bird) and train_obj.cmd with example how to train this image set with Yolo v2 & v3 Using Yolo9000 Simultaneous detection and classification of 9000 objects: darknet.exe detector test cfg/combine9k.data cfg/yolo9000.cfg yolo9000.weights data/dog.jpg yolo9000.weights (186 MB Yolo9000 Model) requires 4 GB GPU RAM: yolo9000.cfg cfg file of the Yolo9000, also there are paths to the 9k.tree and coco9k.map 9k.tree WordTree of 9418 categories , if parent_id 1 then this label hasn't parent: coco9k.map map 80 categories from MSCOCO to WordTree 9k.tree : combine9k.data data file, there are paths to: 9k.labels , 9k.names , inet9k.map , (change path to your combine9k.train.list ): 9k.labels 9418 labels of objects: 9k.names 9418 names of objects: inet9k.map map 200 categories from ImageNet to WordTree 9k.tree : How to use Yolo as DLL and SO libraries on Linux set LIBSO 1 in the Makefile and do make on Windows compile build\darknet\yolo_cpp_dll.sln or build\darknet\yolo_cpp_dll_no_gpu.sln solution There are 2 APIs: C API: Python examples using the C API:: C++ API: C++ example that uses C++ API: 1. To compile Yolo as C++ DLL file yolo_cpp_dll.dll open in MSVS2015 file build\darknet\yolo_cpp_dll.sln , set x64 and Release , and do the: Build > Build yolo_cpp_dll You should have installed CUDA 10.0 To use cuDNN do: (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions, and add at the beginning of line: CUDNN; 2. To use Yolo as DLL file in your C++ console application open in MSVS2015 file build\darknet\yolo_console_dll.sln , set x64 and Release , and do the: Build > Build yolo_console_dll you can run your console application from Windows Explorer build\darknet\x64\yolo_console_dll.exe use this command : yolo_console_dll.exe data/coco.names yolov3.cfg yolov3.weights test.mp4 after launching your console application and entering the image file name you will see info for each object: to use simple OpenCV GUI you should uncomment line // define OPENCV in yolo_console_dll.cpp file: link you can see source code of simple example for detection on the video file: link yolo_cpp_dll.dll API: link struct bbox_t { unsigned int x, y, w, h; // (x,y) top left corner, (w, h) width & height of bounded box float prob; // confidence probability that the object was found correctly unsigned int obj_id; // class of object from range 0, classes 1 unsigned int track_id; // tracking id for video (0 untracked, 1 inf tracked object) unsigned int frames_counter;// counter of frames on which the object was detected }; class Detector { public: Detector(std::string cfg_filename, std::string weight_filename, int gpu_id 0); Detector(); std::vector detect(std::string image_filename, float thresh 0.2, bool use_mean false); std::vector detect(image_t img, float thresh 0.2, bool use_mean false); static image_t load_image(std::string image_filename); static void free_image(image_t m); ifdef OPENCV std::vector detect(cv::Mat mat, float thresh 0.2, bool use_mean false); std::shared_ptr mat_to_image_resize(cv::Mat mat) const; endif };",Object Detection,Object Detection 2278,Computer Vision,Computer Vision,Computer Vision,"Faster R CNN and Mask R CNN in PyTorch 1.0 This project aims at providing the necessary building blocks for easily creating detection and segmentation models using PyTorch 1.0. ! alt text (demo/demo_e2e_mask_rcnn_X_101_32x8d_FPN_1x.png from Highlights PyTorch 1.0: RPN, Faster R CNN and Mask R CNN implementations that matches or exceeds Detectron accuracies Very fast : up to 2x faster than Detectron and 30% faster than mmdetection during training. See MODEL_ZOO.md (MODEL_ZOO.md) for more details. Memory efficient: uses roughly 500MB less GPU memory than mmdetection during training Multi GPU training and inference Batched inference: can perform inference using multiple images per batch per GPU CPU support for inference: runs on CPU in inference time. See our webcam demo (demo) for an example Provides pre trained models for almost all reference Mask R CNN and Faster R CNN configurations with 1x schedule. Webcam and Jupyter notebook demo We provide a simple webcam demo that illustrates how you can use maskrcnn_benchmark for inference: bash cd demo by default, it runs on the GPU for best results, use min image size 800 python webcam.py min image size 800 can also run it on the CPU python webcam.py min image size 300 MODEL.DEVICE cpu or change the model that you want to use python webcam.py config file ../configs/caffe2/e2e_mask_rcnn_R_101_FPN_1x_caffe2.yaml min image size 300 MODEL.DEVICE cpu in order to see the probability heatmaps, pass show mask heatmaps python webcam.py min image size 300 show mask heatmaps MODEL.DEVICE cpu for the keypoint demo python webcam.py config file ../configs/caffe2/e2e_keypoint_rcnn_R_50_FPN_1x_caffe2.yaml min image size 300 MODEL.DEVICE cpu A notebook with the demo can be found in demo/Mask_R CNN_demo.ipynb (demo/Mask_R CNN_demo.ipynb). Installation Check INSTALL.md (INSTALL.md) for installation instructions. Model Zoo and Baselines Pre trained models, baselines and comparison with Detectron and mmdetection can be found in MODEL_ZOO.md (MODEL_ZOO.md) Inference in a few lines We provide a helper class to simplify writing inference pipelines using pre trained models. Here is how we would do it. Run this from the demo folder: python from maskrcnn_benchmark.config import cfg from predictor import COCODemo config_file ../configs/caffe2/e2e_mask_rcnn_R_50_FPN_1x_caffe2.yaml update the config options with the config file cfg.merge_from_file(config_file) manual override some options cfg.merge_from_list( MODEL.DEVICE , cpu ) coco_demo COCODemo( cfg, min_image_size 800, confidence_threshold 0.7, ) load image and then run prediction image ... predictions coco_demo.run_on_opencv_image(image) Perform training on COCO dataset For the following examples to work, you need to first install maskrcnn_benchmark . You will also need to download the COCO dataset. We recommend to symlink the path to the coco dataset to datasets/ as follows We use minival and valminusminival sets from Detectron bash symlink the coco dataset cd /github/maskrcnn benchmark mkdir p datasets/coco ln s /path_to_coco_dataset/annotations datasets/coco/annotations ln s /path_to_coco_dataset/train2014 datasets/coco/train2014 ln s /path_to_coco_dataset/test2014 datasets/coco/test2014 ln s /path_to_coco_dataset/val2014 datasets/coco/val2014 or use COCO 2017 version ln s /path_to_coco_dataset/annotations datasets/coco/annotations ln s /path_to_coco_dataset/train2017 datasets/coco/train2017 ln s /path_to_coco_dataset/test2017 datasets/coco/test2017 ln s /path_to_coco_dataset/val2017 datasets/coco/val2017 for pascal voc dataset: ln s /path_to_VOCdevkit_dir datasets/voc P.S. COCO_2017_train COCO_2014_train + valminusminival , COCO_2017_val minival You can also configure your own paths to the datasets. For that, all you need to do is to modify maskrcnn_benchmark/config/paths_catalog.py to point to the location where your dataset is stored. You can also create a new paths_catalog.py file which implements the same two classes, and pass it as a config argument PATHS_CATALOG during training. Single GPU training Most of the configuration files that we provide assume that we are running on 8 GPUs. In order to be able to run it on fewer GPUs, there are a few possibilities: 1. Run the following without modifications bash python /path_to_maskrcnn_benchmark/tools/train_net.py config file /path/to/config/file.yaml This should work out of the box and is very similar to what we should do for multi GPU training. But the drawback is that it will use much more GPU memory. The reason is that we set in the configuration files a global batch size that is divided over the number of GPUs. So if we only have a single GPU, this means that the batch size for that GPU will be 8x larger, which might lead to out of memory errors. If you have a lot of memory available, this is the easiest solution. 2. Modify the cfg parameters If you experience out of memory errors, you can reduce the global batch size. But this means that you'll also need to change the learning rate, the number of iterations and the learning rate schedule. Here is an example for Mask R CNN R 50 FPN with the 1x schedule: bash python tools/train_net.py config file configs/e2e_mask_rcnn_R_50_FPN_1x.yaml SOLVER.IMS_PER_BATCH 2 SOLVER.BASE_LR 0.0025 SOLVER.MAX_ITER 720000 SOLVER.STEPS (480000, 640000) TEST.IMS_PER_BATCH 1 This follows the scheduling rules from Detectron. Note that we have multiplied the number of iterations by 8x (as well as the learning rate schedules), and we have divided the learning rate by 8x. We also changed the batch size during testing, but that is generally not necessary because testing requires much less memory than training. Multi GPU training We use internally torch.distributed.launch in order to launch multi gpu training. This utility function from PyTorch spawns as many Python processes as the number of GPUs we want to use, and each Python process will only use a single GPU. bash export NGPUS 8 python m torch.distributed.launch nproc_per_node $NGPUS /path_to_maskrcnn_benchmark/tools/train_net.py config file path/to/config/file.yaml Abstractions For more information on some of the main abstractions in our implementation, see ABSTRACTIONS.md (ABSTRACTIONS.md). Adding your own dataset This implementation adds support for COCO style datasets. But adding support for training on a new dataset can be done as follows: python from maskrcnn_benchmark.structures.bounding_box import BoxList class MyDataset(object): def __init__(self, ...): as you would do normally def __getitem__(self, idx): load the image as a PIL Image image ... load the bounding boxes as a list of list of boxes in this case, for illustrative purposes, we use x1, y1, x2, y2 order. boxes 0, 0, 10, 10 , 10, 20, 50, 50 and labels labels torch.tensor( 10, 20 ) create a BoxList from the boxes boxlist BoxList(boxes, image.size, mode xyxy ) add the labels to the boxlist boxlist.add_field( labels , labels) if self.transforms: image, boxlist self.transforms(image, boxlist) return the image, the boxlist and the idx in your dataset return image, boxlist, idx def get_img_info(self, idx): get img_height and img_width. This is used if we want to split the batches according to the aspect ratio of the image, as it can be more efficient than loading the image from disk return { height : img_height, width : img_width} That's it. You can also add extra fields to the boxlist, such as segmentation masks (using structures.segmentation_mask.SegmentationMask ), or even your own instance type. For a full example of how the COCODataset is implemented, check maskrcnn_benchmark/data/datasets/coco.py (maskrcnn_benchmark/data/datasets/coco.py). Note: While the aforementioned example should work for training, we leverage the cocoApi for computing the accuracies during testing. Thus, test datasets should currently follow the cocoApi for now. Finetuning from Detectron weights on custom datasets Create a script tools/trim_detectron_model.py like here . You can decide which keys to be removed and which keys to be kept by modifying the script. Then you can simply point the converted model path in the config file by changing MODEL.WEIGHT . For further information, please refer to 15 . Troubleshooting If you have issues running or compiling this code, we have compiled a list of common issues in TROUBLESHOOTING.md (TROUBLESHOOTING.md). If your issue is not present there, please feel free to open a new issue. Citations Please consider citing this project in your publications if it helps your research. The following is a BibTeX reference. The BibTeX entry requires the url LaTeX package. @misc{massa2018mrcnn, author {Massa, Francisco and Girshick, Ross}, title {{maskrnn benchmark: Fast, modular reference implementation of Instance Segmentation and Object Detection algorithms in PyTorch}}, year {2018}, howpublished {\url{ note {Accessed: Insert date here } } Projects using maskrcnn benchmark RetinaMask: Learning to predict masks improves state of the art single shot detection for free . Cheng Yang Fu, Mykhailo Shvets, and Alexander C. Berg. Tech report, arXiv,1901.03353. License maskrcnn benchmark is released under the MIT license. See LICENSE (LICENSE) for additional details.",Object Detection,Object Detection 2279,Computer Vision,Computer Vision,Computer Vision,"Yolo v3 and Yolo v2 for Windows and Linux (neural network for object detection) CircleCI 1. How to use ( how to use) 2. How to compile on Linux ( how to compile on linux) 3. How to compile on Windows ( how to compile on windows) 4. How to train (Pascal VOC Data) ( how to train pascal voc data) 5. How to train (to detect your custom objects) ( how to train to detect your custom objects) 6. When should I stop training ( when should i stop training) 7. How to calculate mAP on PascalVOC 2007 ( how to calculate map on pascalvoc 2007) 8. How to improve object detection ( how to improve object detection) 9. How to mark bounded boxes of objects and create annotation files ( how to mark bounded boxes of objects and create annotation files) 10. Using Yolo9000 ( using yolo9000) 11. How to use Yolo as DLL ( how to use yolo as dll) ! Darknet Logo ! map_fps mAP (AP50) Yolo v3 source chart for the RetinaNet on MS COCO got from Table 1 (e): Yolo v2 on Pascal VOC 2007: Yolo v2 on Pascal VOC 2012 (comp4): You Only Look Once: Unified, Real Time Object Detection (versions 2 & 3) A Yolo cross platform Windows and Linux version (for object detection). Contributtors: This repository is forked from Linux version: More details: This repository supports: both Windows and Linux both OpenCV 2.x.x and OpenCV 7.5 also create SO library on Linux and DLL library on Windows Requires: Linux GCC> 4.9 or Windows MS Visual Studio 2015 (v140) : (or offline ISO image ) CUDA 9.1 : OpenCV 3.4.0 : or OpenCV 2.4.13 : OpenCV allows to show image or video detection in the window and store result to file that specified in command line out_filename res.avi GPU with CC > 3.0 : Pre trained models for different cfg files can be downloaded from (smaller > faster & lower quality): yolov3.cfg (236 MB COCO Yolo v3 ) requires 4 GB GPU RAM: yolov3 tiny.cfg (34 MB COCO Yolo v3 tiny ) requires 1 GB GPU RAM: yolov2.cfg (194 MB COCO Yolo v2) requires 4 GB GPU RAM: yolo voc.cfg (194 MB VOC Yolo v2) requires 4 GB GPU RAM: yolov2 tiny.cfg (43 MB COCO Yolo v2) requires 1 GB GPU RAM: yolov2 tiny voc.cfg (60 MB VOC Yolo v2) requires 1 GB GPU RAM: yolo9000.cfg (186 MB Yolo9000 model) requires 4 GB GPU RAM: Put it near compiled: darknet.exe You can get cfg files by path: darknet/cfg/ Examples of results: Everything Is AWESOME Others: How to use: Example of usage in cmd files from build\darknet\x64\ : darknet_yolo_v3.cmd initialization with 236 MB Yolo v3 COCO model yolov3.weights & yolov3.cfg and show detection on the image: dog.jpg darknet_voc.cmd initialization with 194 MB VOC model yolo voc.weights & yolo voc.cfg and waiting for entering the name of the image file darknet_demo_voc.cmd initialization with 194 MB VOC model yolo voc.weights & yolo voc.cfg and play your video file which you must rename to: test.mp4 darknet_demo_store.cmd initialization with 194 MB VOC model yolo voc.weights & yolo voc.cfg and play your video file which you must rename to: test.mp4, and store result to: res.avi darknet_net_cam_voc.cmd initialization with 194 MB VOC model, play video from network video camera mjpeg stream (also from you phone) darknet_web_cam_voc.cmd initialization with 194 MB VOC model, play video from Web Camera number 0 darknet_coco_9000.cmd initialization with 186 MB Yolo9000 COCO model, and show detection on the image: dog.jpg darknet_coco_9000_demo.cmd initialization with 186 MB Yolo9000 COCO model, and show detection on the video (if it is present): street4k.mp4, and store result to: res.avi How to use on the command line: On Linux use ./darknet instead of darknet.exe , like this: ./darknet detector test ./cfg/coco.data ./cfg/yolov3.cfg ./yolov3.weights Yolo v3 COCO image: darknet.exe detector test data/coco.data cfg/yolov3.cfg yolov3.weights i 0 thresh 0.25 Alternative method Yolo v3 COCO image: darknet.exe detect cfg/yolov3.cfg yolov3.weights i 0 thresh 0.25 Output coordinates of objects: darknet.exe detector test data/coco.data yolov3.cfg yolov3.weights thresh 0.25 dog.jpg ext_output 194 MB VOC model image: darknet.exe detector test data/voc.data yolo voc.cfg yolo voc.weights i 0 194 MB VOC model video: darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights test.mp4 i 0 194 MB VOC model save result to the file res.avi : darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights test.mp4 i 0 out_filename res.avi Alternative method 194 MB VOC model video: darknet.exe yolo demo yolo voc.cfg yolo voc.weights test.mp4 i 0 43 MB VOC model for video: darknet.exe detector demo data/coco.data cfg/yolov2 tiny.cfg yolov2 tiny.weights test.mp4 i 0 Yolo v3 236 MB COCO for net videocam Smart WebCam: darknet.exe detector demo data/coco.data cfg/yolov3.cfg yolov3.weights i 0 194 MB VOC model for net videocam Smart WebCam: darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights i 0 194 MB VOC model WebCamera 0: darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights c 0 186 MB Yolo9000 image: darknet.exe detector test cfg/combine9k.data yolo9000.cfg yolo9000.weights Remeber to put data/9k.tree and data/coco9k.map under the same folder of your app if you use the cpp api to build an app To process a list of images data/train.txt and save results of detection to result.txt use: darknet.exe detector test data/voc.data yolo voc.cfg yolo voc.weights dont_show ext_output result.txt For using network video camera mjpeg stream with any Android smartphone: 1. Download for Android phone mjpeg stream soft: IP Webcam / Smart WebCam Smart WebCam preferably: IP Webcam: 2. Connect your Android phone to computer by WiFi (through a WiFi router) or USB 3. Start Smart WebCam on your phone 4. Replace the address below, on shown in the phone application (Smart WebCam) and launch: 194 MB COCO model: darknet.exe detector demo data/coco.data yolo.cfg yolo.weights i 0 194 MB VOC model: darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights i 0 How to compile on Linux: Just do make in the darknet directory. Before make, you can set such options in the Makefile : link GPU 1 to build with CUDA to accelerate by using GPU (CUDA should be in /usr/local/cuda ) CUDNN 1 to build with cuDNN v5 v7 to accelerate training by using GPU (cuDNN should be in /usr/local/cudnn ) CUDNN_HALF 1 to build for Tensor Cores (on Titan V / Tesla V100 / DGX 2 and later) speedup Detection 3x, Training 2x OPENCV 1 to build with OpenCV 3.x/2.4.x allows to detect on video files and video streams from network cameras or web cams DEBUG 1 to bould debug version of Yolo OPENMP 1 to build with OpenMP support to accelerate Yolo by using multi core CPU LIBSO 1 to build a library darknet.so and binary runable file uselib that uses this library. Or you can try to run so LD_LIBRARY_PATH ./:$LD_LIBRARY_PATH ./uselib test.mp4 How to use this SO library from your own code you can look at C++ example: or use in such a way: LD_LIBRARY_PATH ./:$LD_LIBRARY_PATH ./uselib data/coco.names cfg/yolov3.cfg yolov3.weights test.mp4 How to compile on Windows: 1. If you have MSVS 2015, CUDA 9.1, cuDNN 7.0 and OpenCV 3.x (with paths: C:\opencv_3.0\opencv\build\include & C:\opencv_3.0\opencv\build\x64\vc14\lib ), then start MSVS, open build\darknet\darknet.sln , set x64 and Release and do the: Build > Build darknet. NOTE: If installing OpenCV, use OpenCV 3.4.0 or earlier. This is a bug in OpenCV 3.4.1 in the C API (see 500 ). 1.1. Find files opencv_world320.dll and opencv_ffmpeg320_64.dll (or opencv_world340.dll and opencv_ffmpeg340_64.dll ) in C:\opencv_3.0\opencv\build\x64\vc14\bin and put it near with darknet.exe 1.2 Check that there are bin and include folders in the C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1 if aren't, then copy them to this folder from the path where is CUDA installed 1.3. To install CUDNN (speedup neural network), do the following: download and install cuDNN 7.0 for CUDA 9.1 : add Windows system variable cudnn with path to CUDNN: 1.4. If you want to build without CUDNN then: open \darknet.sln > (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions, and remove this: CUDNN; 2. If you have other version of CUDA (not 9.1) then open build\darknet\darknet.vcxproj by using Notepad, find 2 places with CUDA 9.1 and change it to your CUDA version, then do step 1 3. If you don't have GPU , but have MSVS 2015 and OpenCV 3.0 (with paths: C:\opencv_3.0\opencv\build\include & C:\opencv_3.0\opencv\build\x64\vc14\lib ), then start MSVS, open build\darknet\darknet_no_gpu.sln , set x64 and Release , and do the: Build > Build darknet_no_gpu 4. If you have OpenCV 2.4.13 instead of 3.0 then you should change pathes after \darknet.sln is opened 4.1 (right click on project) > properties > C/C++ > General > Additional Include Directories: C:\opencv_2.4.13\opencv\build\include 4.2 (right click on project) > properties > Linker > General > Additional Library Directories: C:\opencv_2.4.13\opencv\build\x64\vc14\lib 5. If you have GPU with Tensor Cores (nVidia Titan V / Tesla V100 / DGX 2 and later) speedup Detection 3x, Training 2x: \darknet.sln > (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions, and add here: CUDNN_HALF; Note: CUDA must be installed only after that MSVS2015 had been installed. How to compile (custom): Also, you can to create your own darknet.sln & darknet.vcxproj , this example for CUDA 9.1 and OpenCV 3.0 Then add to your created project: (right click on project) > properties > C/C++ > General > Additional Include Directories, put here: C:\opencv_3.0\opencv\build\include;..\..\3rdparty\include;%(AdditionalIncludeDirectories);$(CudaToolkitIncludeDir);$(cudnn)\include (right click on project) > Build dependecies > Build Customizations > set check on CUDA 9.1 or what version you have for example as here: add to project all .c & .cu files and file from \src (right click on project) > properties > Linker > General > Additional Library Directories, put here: C:\opencv_3.0\opencv\build\x64\vc14\lib;$(CUDA_PATH)lib\$(PlatformName);$(cudnn)\lib\x64;%(AdditionalLibraryDirectories) (right click on project) > properties > Linker > Input > Additional dependecies, put here: ..\..\3rdparty\lib\x64\pthreadVC2.lib;cublas.lib;curand.lib;cudart.lib;cudnn.lib;%(AdditionalDependencies) (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions OPENCV;_TIMESPEC_DEFINED;_CRT_SECURE_NO_WARNINGS;_CRT_RAND_S;WIN32;NDEBUG;_CONSOLE;_LIB;%(PreprocessorDefinitions) compile to .exe (X64 & Release) and put .dll s near with .exe: pthreadVC2.dll, pthreadGC2.dll from \3rdparty\dll\x64 cusolver64_91.dll, curand64_91.dll, cudart64_91.dll, cublas64_91.dll 91 for CUDA 9.1 or your version, from C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1\bin For OpenCV 3.2: opencv_world320.dll and opencv_ffmpeg320_64.dll from C:\opencv_3.0\opencv\build\x64\vc14\bin For OpenCV 2.4.13: opencv_core2413.dll , opencv_highgui2413.dll and opencv_ffmpeg2413_64.dll from C:\opencv_2.4.13\opencv\build\x64\vc14\bin How to train (Pascal VOC Data): 1. Download pre trained weights for the convolutional layers (154 MB): and put to the directory build\darknet\x64 2. Download The Pascal VOC Data and unpack it to directory build\darknet\x64\data\voc will be created dir build\darknet\x64\data\voc\VOCdevkit\ : 2.1 Download file voc_label.py to dir build\darknet\x64\data\voc : 3. Download and install Python for Windows: 4. Run command: python build\darknet\x64\data\voc\voc_label.py (to generate files: 2007_test.txt, 2007_train.txt, 2007_val.txt, 2012_train.txt, 2012_val.txt) 5. Run command: type 2007_train.txt 2007_val.txt 2012_ .txt > train.txt 6. Set batch 64 and subdivisions 8 in the file yolov3 voc.cfg : link 7. Start training by using train_voc.cmd or by using the command line: darknet.exe detector train data/voc.data cfg/yolov3 voc.cfg darknet53.conv.74 ( Note: To disable Loss Window use flag dont_show . If you are using CPU, try darknet_no_gpu.exe instead of darknet.exe .) If required change pathes in the file build\darknet\x64\data\voc.data More information about training by the link: Note: If during training you see nan values for avg (loss) field then training goes wrong, but if nan is in some other lines then training goes well. How to train with multi GPU: 1. Train it first on 1 GPU for like 1000 iterations: darknet.exe detector train data/voc.data cfg/yolov3 voc.cfg darknet53.conv.74 2. Then stop and by using partially trained model /backup/yolov3 voc_1000.weights run training with multigpu (up to 4 GPUs): darknet.exe detector train data/voc.data cfg/yolov3 voc.cfg /backup/yolov3 voc_1000.weights gpus 0,1,2,3 How to train (to detect your custom objects): (to train old Yolo v2 yolov2 voc.cfg , yolov2 tiny voc.cfg , yolo voc.cfg , yolo voc.2.0.cfg , ... click by the link ) Training Yolo v3: 1. Create file yolo obj.cfg with the same content as in yolov3.cfg (or copy yolov3.cfg to yolo obj.cfg) and: change line batch to batch 64 change line subdivisions to subdivisions 8 change line classes 80 to your number of objects in each of 3 yolo layers: change filters 255 to filters (classes + 5)x3 in the 3 convolutional before each yolo layer So if classes 1 then should be filters 18 . If classes 2 then write filters 21 . (Do not write in the cfg file: filters (classes + 5)x3) (Generally filters depends on the classes , coords and number of mask s, i.e. filters (classes + coords + 1) , where mask is indices of anchors. If mask is absence, then filters (classes + coords + 1) num ) So for example, for 2 objects, your file yolo obj.cfg should differ from yolov3.cfg in such lines in each of 3 yolo layers: convolutional filters 21 region classes 2 2. Create file obj.names in the directory build\darknet\x64\data\ , with objects names each in new line 3. Create file obj.data in the directory build\darknet\x64\data\ , containing (where classes number of objects ): classes 2 train data/train.txt valid data/test.txt names data/obj.names backup backup/ 4. Put image files (.jpg) of your objects in the directory build\darknet\x64\data\obj\ 5. You should label each object on images from your dataset. Use this visual GUI software for marking bounded boxes of objects and generating annotation files for Yolo v2 & v3: It will create .txt file for each .jpg image file in the same directory and with the same name, but with .txt extension, and put to file: object number and object coordinates on this image, for each object in new line: Where: integer object number from 0 to (classes 1) float values relative to width and height of image, it can be equal from (0.0 to 1.0 for example: / or / atention: are center of rectangle (are not top left corner) For example for img1.jpg you will be created img1.txt containing: 1 0.716797 0.395833 0.216406 0.147222 0 0.687109 0.379167 0.255469 0.158333 1 0.420312 0.395833 0.140625 0.166667 6. Create file train.txt in directory build\darknet\x64\data\ , with filenames of your images, each filename in new line, with path relative to darknet.exe , for example containing: data/obj/img1.jpg data/obj/img2.jpg data/obj/img3.jpg 7. Download pre trained weights for the convolutional layers (154 MB): and put to the directory build\darknet\x64 8. Start training by using the command line: darknet.exe detector train data/obj.data yolo obj.cfg darknet53.conv.74 (file yolo obj_xxx.weights will be saved to the build\darknet\x64\backup\ for each 100 iterations) (To disable Loss Window use darknet.exe detector train data/obj.data yolo obj.cfg darknet53.conv.74 dont_show , if you train on computer without monitor like a cloud Amazaon EC2) 9. After training is complete get result yolo obj_final.weights from path build\darknet\x64\backup\ After each 100 iterations you can stop and later start training from this point. For example, after 2000 iterations you can stop training, and later just copy yolo obj_2000.weights from build\darknet\x64\backup\ to build\darknet\x64\ and start training using: darknet.exe detector train data/obj.data yolo obj.cfg yolo obj_2000.weights (in the original repository the weights file is saved only once every 10 000 iterations if(iterations > 1000) ) Also you can get result earlier than all 45000 iterations. Note: If during training you see nan values for avg (loss) field then training goes wrong, but if nan is in some other lines then training goes well. Note: If you changed width or height in your cfg file, then new width and height must be divisible by 32. Note: After training use such command for detection: darknet.exe detector test data/obj.data yolo obj.cfg yolo obj_8000.weights Note: if error Out of memory occurs then in .cfg file you should increase subdivisions 16 , 32 or 64: link How to train tiny yolo (to detect your custom objects): Do all the same steps as for the full yolo model as described above. With the exception of: Download default weights file for yolov3 tiny: Get pre trained weights yolov3 tiny.conv.15 using command: darknet.exe partial cfg/yolov3 tiny.cfg yolov3 tiny.weights yolov3 tiny.conv.15 15 Make your custom model yolov3 tiny obj.cfg based on cfg/yolov3 tiny_obj.cfg instead of yolov3.cfg Start training: darknet.exe detector train data/obj.data yolov3 tiny obj.cfg yolov3 tiny.conv.15 For training Yolo based on other models ( DenseNet201 Yolo or ResNet50 Yolo ), you can download and get pre trained weights as showed in this file: If you made you custom model that isn't based on other models, then you can train it without pre trained weights, then will be used random initial weights. When should I stop training: Usually sufficient 2000 iterations for each class(object). But for a more precise definition when you should stop training, use the following manual: 1. During training, you will see varying indicators of error, and you should stop when no longer decreases 0.XXXXXXX avg : > Region Avg IOU: 0.798363, Class: 0.893232, Obj: 0.700808, No Obj: 0.004567, Avg Recall: 1.000000, count: 8 > Region Avg IOU: 0.800677, Class: 0.892181, Obj: 0.701590, No Obj: 0.004574, Avg Recall: 1.000000, count: 8 > > 9002 : 0.211667, 0.060730 avg , 0.001000 rate, 3.868000 seconds, 576128 images > Loaded: 0.000000 seconds 9002 iteration number (number of batch) 0.060730 avg average loss (error) the lower, the better When you see that average loss 0.xxxxxx avg no longer decreases at many iterations then you should stop training. 2. Once training is stopped, you should take some of last .weights files from darknet\build\darknet\x64\backup and choose the best of them: For example, you stopped training after 9000 iterations, but the best result can give one of previous weights (7000, 8000, 9000). It can happen due to overfitting. Overfitting is case when you can detect objects on images from training dataset, but can't detect objects on any others images. You should get weights from Early Stopping Point : ! Overfitting To get weights from Early Stopping Point: 2.1. At first, in your file obj.data you must specify the path to the validation dataset valid valid.txt (format of valid.txt as in train.txt ), and if you haven't validation images, just copy data\train.txt to data\valid.txt . 2.2 If training is stopped after 9000 iterations, to validate some of previous weights use this commands: (If you use another GitHub repository, then use darknet.exe detector recall ... instead of darknet.exe detector map ...) darknet.exe detector map data/obj.data yolo obj.cfg backup\yolo obj_7000.weights darknet.exe detector map data/obj.data yolo obj.cfg backup\yolo obj_8000.weights darknet.exe detector map data/obj.data yolo obj.cfg backup\yolo obj_9000.weights And comapre last output lines for each weights (7000, 8000, 9000): Choose weights file with the highest IoU (intersect of union) and mAP (mean average precision) For example, bigger IOU gives weights yolo obj_8000.weights then use this weights for detection . Example of custom object detection: darknet.exe detector test data/obj.data yolo obj.cfg yolo obj_8000.weights IoU (intersect of union) average instersect of union of objects and detections for a certain threshold 0.24 mAP (mean average precision) mean value of average precisions for each class, where average precision is average value of 11 points on PR curve for each possible threshold (each probability of detection) for the same class (Precision Recall in terms of PascalVOC, where Precision TP/(TP+FP) and Recall TP/(TP+FN) ), page 11: mAP is default metric of precision in the PascalVOC competition, this is the same as AP50 metric in the MS COCO competition. In terms of Wiki, indicators Precision and Recall have a slightly different meaning than in the PascalVOC competition, but IoU always has the same meaning . ! precision_recall_iou How to calculate mAP on PascalVOC 2007: 1. To calculate mAP (mean average precision) on PascalVOC 2007 test: Download PascalVOC dataset, install Python 3.x and get file 2007_test.txt as described here: Then download file to the dir build\darknet\x64\data\ then run voc_label_difficult.py to get the file difficult_2007_test.txt Remove symbol from this line to un comment it: Then there are 2 ways to get mAP: 1. Using Darknet + Python: run the file build/darknet/x64/calc_mAP_voc_py.cmd you will get mAP for yolo voc.cfg model, mAP 75.9% 2. Using this fork of Darknet: run the file build/darknet/x64/calc_mAP.cmd you will get mAP for yolo voc.cfg model, mAP 75.8% (The article specifies the value of mAP 76.8% for YOLOv2 416×416, page 4 table 3: We get values lower perhaps due to the fact that the model was trained on a slightly different source code than the code on which the detection is was done) if you want to get mAP for tiny yolo voc.cfg model, then un comment line for tiny yolo voc.cfg and comment line for yolo voc.cfg in the .cmd file if you have Python 2.x instead of Python 3.x, and if you use Darknet+Python way to get mAP, then in your cmd file use reval_voc.py and voc_eval.py instead of reval_voc_py3.py and voc_eval_py3.py from this directory: Custom object detection: Example of custom object detection: darknet.exe detector test data/obj.data yolo obj.cfg yolo obj_8000.weights ! Yolo_v2_training ! Yolo_v2_training How to improve object detection: 1. Before training: set flag random 1 in your .cfg file it will increase precision by training Yolo for different resolutions: link increase network resolution in your .cfg file ( height 608 , width 608 or any value multiple of 32) it will increase precision recalculate anchors for your dataset for width and height from cfg file: darknet.exe detector calc_anchors data/obj.data num_of_clusters 9 width 416 height 416 then set the same 9 anchors in each of 3 yolo layers in your cfg file check that each object are mandatory labeled in your dataset no one object in your data set should not be without label. In the most training issues there are wrong labels in your dataset (got labels by using some conversion script, marked with a third party tool, ...). Always check your dataset by using: desirable that your training dataset include images with objects at diffrent: scales, rotations, lightings, from different sides, on different backgrounds you should preferably have 2000 different images for each class or more, and you should train 2000 classes iterations or more desirable that your training dataset include images with non labeled objects that you do not want to detect negative samples without bounded box (empty .txt files) use as many images of negative samples as there are images with objects for training with a large number of objects in each image, add the parameter max 200 or higher value in the last layer region in your cfg file for training for small objects set layers 1, 11 instead of and set stride 4 instead of If you train the model to distinguish Left and Right objects as separate classes (left/right hand, left/right turn on road signs, ...) then for disabling flip data augmentation add flip 0 here: General rule your training dataset should include such a set of relative sizes of objects that you want to detect: train_network_width train_obj_width / train_image_width detection_network_width detection_obj_width / detection_image_width train_network_height train_obj_height / train_image_height detection_network_height detection_obj_height / detection_image_height to speedup training (with decreasing detection accuracy) do Fine Tuning instead of Transfer Learning, set param stopbackward 1 here: 2. After training for detection: Increase network resolution by set in your .cfg file ( height 608 and width 608 ) or ( height 832 and width 832 ) or (any value multiple of 32) this increases the precision and makes it possible to detect small objects: link you do not need to train the network again, just use .weights file already trained for 416x416 resolution if error Out of memory occurs then in .cfg file you should increase subdivisions 16 , 32 or 64: link How to mark bounded boxes of objects and create annotation files: Here you can find repository with GUI software for marking bounded boxes of objects and generating annotation files for Yolo v2 & v3: With example of: train.txt , obj.names , obj.data , yolo obj.cfg , air 1 6 .txt , bird 1 4 .txt for 2 classes of objects (air, bird) and train_obj.cmd with example how to train this image set with Yolo v2 & v3 Using Yolo9000 Simultaneous detection and classification of 9000 objects: yolo9000.weights (186 MB Yolo9000 Model) requires 4 GB GPU RAM: yolo9000.cfg cfg file of the Yolo9000, also there are paths to the 9k.tree and coco9k.map 9k.tree WordTree of 9418 categories , if parent_id 1 then this label hasn't parent: coco9k.map map 80 categories from MSCOCO to WordTree 9k.tree : combine9k.data data file, there are paths to: 9k.labels , 9k.names , inet9k.map , (change path to your combine9k.train.list ): 9k.labels 9418 labels of objects: 9k.names 9418 names of objects: inet9k.map map 200 categories from ImageNet to WordTree 9k.tree : How to use Yolo as DLL 1. To compile Yolo as C++ DLL file yolo_cpp_dll.dll open in MSVS2015 file build\darknet\yolo_cpp_dll.sln , set x64 and Release , and do the: Build > Build yolo_cpp_dll You should have installed CUDA 9.1 To use cuDNN do: (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions, and add at the beginning of line: CUDNN; 2. To use Yolo as DLL file in your C++ console application open in MSVS2015 file build\darknet\yolo_console_dll.sln , set x64 and Release , and do the: Build > Build yolo_console_dll you can run your console application from Windows Explorer build\darknet\x64\yolo_console_dll.exe or you can run from MSVS2015 (before this you should copy 2 files yolo voc.cfg and yolo voc.weights to the directory build\darknet\ ) after launching your console application and entering the image file name you will see info for each object: to use simple OpenCV GUI you should uncomment line // define OPENCV in yolo_console_dll.cpp file: link you can see source code of simple example for detection on the video file: link yolo_cpp_dll.dll API: link class Detector { public: Detector(std::string cfg_filename, std::string weight_filename, int gpu_id 0); Detector(); std::vector detect(std::string image_filename, float thresh 0.2, bool use_mean false); std::vector detect(image_t img, float thresh 0.2, bool use_mean false); static image_t load_image(std::string image_filename); static void free_image(image_t m); ifdef OPENCV std::vector detect(cv::Mat mat, float thresh 0.2, bool use_mean false); endif };",Object Detection,Object Detection 2290,Computer Vision,Computer Vision,Computer Vision,"Python version: 3.6 Clone the repository and install the dependancies from requirements.txt file. Make sure to place the data in data directory under train and test folder or change the paths in the main.py file. Link to VGG 16 pretrained weights file: I have used vgg 16 as base feature extractor, I will implement ResNet50 and 100 in future. This reposiort contains a simple implementation of Faster RCNN presented by Ross Girshick I have tested this on 3 Classes extracted from Open Images v 4. data set. In order to download and extract data from open images, run the extract_data.py script. By default, it will download data for 3 classes 'Person', 'Mobile phone', 'Car' . You can edit the list to get other classes as well. This will also convert and save the data which is required by the model. Inspired by this article: https://towardsdatascience.com/faster r cnn object detection implemented by keras for custom data from googles open images 125f62b9141a",Object Detection,Object Detection 2291,Computer Vision,Computer Vision,Computer Vision,"Cifar 10 Attached notebooks are implementation of ResNet proposed by Microsoft Research in and self proposed model. Summary of Contents: 1) cifar_10.ipynb : A 34 layer Deep Convolutional Neural Network based on Resnet Architecture. 2) cifar_110_layers.ipynb : 110 layer implementation of Deep Convolutional Neural Network based on Resnet Architecture. 3) cifar_10_87.ipynb : self proposed model Dataset used : Link of dataset : The deep residual network proposed in the paper was tested on various datasets like MS COCO,Cifar 10 etc. Keeping in view the time and computational requirements I have used Cifar 10 dataset composed of 60,000 images (50,000 training & 10,000 testing) divided into 10 classes. The models are trained on GoogleColab and using the same hyperparameters and optimizer as given in the paper. Data augmentation methods like horizontal flipping has been used as discussed in the paper. Results Using my proposed architecture inspired from the stacking concept of vgg , highest validation accuracy achieved is 87.68% Using Cifar 34 layer architecture, highest validation accuracy achieved is 85.86%. Using Cifar 110 layer architecture, highest validation accuracy achieved is 70.02%.",Object Detection,Object Detection 2298,Computer Vision,Computer Vision,Computer Vision,いろんなモデルの実装 セグメンテーション FCN 論文 参考にした実装、記事 U Net 論文 参考にした実装、記事 ResNet(resnet.py) 論文 参考にした実装、記事,Object Detection,Object Detection 2299,Computer Vision,Computer Vision,Computer Vision,"Car detection using YOLOv2 You Only Look Once (YOLO) is the state of the art, real time object detection system. I have used the pre trained model available at The original paper for YOLOv2 by Joseph Redmon and Ali Farhadi can be found in arXiv Find more about YOLOv2 in their project webpage . This implementation has taken significant inspiration from Allan Zelener's GitHub repository, YAD2K: Yet Another Darknet 2 Keras .",Object Detection,Object Detection 2301,Computer Vision,Computer Vision,Computer Vision,"Faster R CNN and Mask R CNN in PyTorch 1.0 This project aims at providing the necessary building blocks for easily creating detection and segmentation models using PyTorch 1.0. ! alt text (demo/demo_e2e_mask_rcnn_X_101_32x8d_FPN_1x.png from Highlights PyTorch 1.0: RPN, Faster R CNN and Mask R CNN implementations that matches or exceeds Detectron accuracies Very fast : up to 2x faster than Detectron and 30% faster than mmdetection during training. See MODEL_ZOO.md (MODEL_ZOO.md) for more details. Memory efficient: uses roughly 500MB less GPU memory than mmdetection during training Multi GPU training and inference Batched inference: can perform inference using multiple images per batch per GPU CPU support for inference: runs on CPU in inference time. See our webcam demo (demo) for an example Provides pre trained models for almost all reference Mask R CNN and Faster R CNN configurations with 1x schedule. Webcam and Jupyter notebook demo We provide a simple webcam demo that illustrates how you can use maskrcnn_benchmark for inference: bash cd demo by default, it runs on the GPU for best results, use min image size 800 python webcam.py min image size 800 can also run it on the CPU python webcam.py min image size 300 MODEL.DEVICE cpu or change the model that you want to use python webcam.py config file ../configs/caffe2/e2e_mask_rcnn_R_101_FPN_1x_caffe2.yaml min image size 300 MODEL.DEVICE cpu in order to see the probability heatmaps, pass show mask heatmaps python webcam.py min image size 300 show mask heatmaps MODEL.DEVICE cpu for the keypoint demo python webcam.py config file ../configs/caffe2/e2e_keypoint_rcnn_R_50_FPN_1x_caffe2.yaml min image size 300 MODEL.DEVICE cpu A notebook with the demo can be found in demo/Mask_R CNN_demo.ipynb (demo/Mask_R CNN_demo.ipynb). Installation Check INSTALL.md (INSTALL.md) for installation instructions. Model Zoo and Baselines Pre trained models, baselines and comparison with Detectron and mmdetection can be found in MODEL_ZOO.md (MODEL_ZOO.md) Inference in a few lines We provide a helper class to simplify writing inference pipelines using pre trained models. Here is how we would do it. Run this from the demo folder: python from maskrcnn_benchmark.config import cfg from predictor import COCODemo config_file ../configs/caffe2/e2e_mask_rcnn_R_50_FPN_1x_caffe2.yaml update the config options with the config file cfg.merge_from_file(config_file) manual override some options cfg.merge_from_list( MODEL.DEVICE , cpu ) coco_demo COCODemo( cfg, min_image_size 800, confidence_threshold 0.7, ) load image and then run prediction image ... predictions coco_demo.run_on_opencv_image(image) Perform training on COCO dataset For the following examples to work, you need to first install maskrcnn_benchmark . You will also need to download the COCO dataset. We recommend to symlink the path to the coco dataset to datasets/ as follows We use minival and valminusminival sets from Detectron bash symlink the coco dataset cd /github/maskrcnn benchmark mkdir p datasets/coco ln s /path_to_coco_dataset/annotations datasets/coco/annotations ln s /path_to_coco_dataset/train2014 datasets/coco/train2014 ln s /path_to_coco_dataset/test2014 datasets/coco/test2014 ln s /path_to_coco_dataset/val2014 datasets/coco/val2014 or use COCO 2017 version ln s /path_to_coco_dataset/annotations datasets/coco/annotations ln s /path_to_coco_dataset/train2017 datasets/coco/train2017 ln s /path_to_coco_dataset/test2017 datasets/coco/test2017 ln s /path_to_coco_dataset/val2017 datasets/coco/val2017 for pascal voc dataset: ln s /path_to_VOCdevkit_dir datasets/voc P.S. COCO_2017_train COCO_2014_train + valminusminival , COCO_2017_val minival You can also configure your own paths to the datasets. For that, all you need to do is to modify maskrcnn_benchmark/config/paths_catalog.py to point to the location where your dataset is stored. You can also create a new paths_catalog.py file which implements the same two classes, and pass it as a config argument PATHS_CATALOG during training. Single GPU training Most of the configuration files that we provide assume that we are running on 8 GPUs. In order to be able to run it on fewer GPUs, there are a few possibilities: 1. Run the following without modifications bash python /path_to_maskrcnn_benchmark/tools/train_net.py config file /path/to/config/file.yaml This should work out of the box and is very similar to what we should do for multi GPU training. But the drawback is that it will use much more GPU memory. The reason is that we set in the configuration files a global batch size that is divided over the number of GPUs. So if we only have a single GPU, this means that the batch size for that GPU will be 8x larger, which might lead to out of memory errors. If you have a lot of memory available, this is the easiest solution. 2. Modify the cfg parameters If you experience out of memory errors, you can reduce the global batch size. But this means that you'll also need to change the learning rate, the number of iterations and the learning rate schedule. Here is an example for Mask R CNN R 50 FPN with the 1x schedule: bash python tools/train_net.py config file configs/e2e_mask_rcnn_R_50_FPN_1x.yaml SOLVER.IMS_PER_BATCH 2 SOLVER.BASE_LR 0.0025 SOLVER.MAX_ITER 720000 SOLVER.STEPS (480000, 640000) TEST.IMS_PER_BATCH 1 This follows the scheduling rules from Detectron. Note that we have multiplied the number of iterations by 8x (as well as the learning rate schedules), and we have divided the learning rate by 8x. We also changed the batch size during testing, but that is generally not necessary because testing requires much less memory than training. Multi GPU training We use internally torch.distributed.launch in order to launch multi gpu training. This utility function from PyTorch spawns as many Python processes as the number of GPUs we want to use, and each Python process will only use a single GPU. bash export NGPUS 8 python m torch.distributed.launch nproc_per_node $NGPUS /path_to_maskrcnn_benchmark/tools/train_net.py config file path/to/config/file.yaml Abstractions For more information on some of the main abstractions in our implementation, see ABSTRACTIONS.md (ABSTRACTIONS.md). Adding your own dataset This implementation adds support for COCO style datasets. But adding support for training on a new dataset can be done as follows: python from maskrcnn_benchmark.structures.bounding_box import BoxList class MyDataset(object): def __init__(self, ...): as you would do normally def __getitem__(self, idx): load the image as a PIL Image image ... load the bounding boxes as a list of list of boxes in this case, for illustrative purposes, we use x1, y1, x2, y2 order. boxes 0, 0, 10, 10 , 10, 20, 50, 50 and labels labels torch.tensor( 10, 20 ) create a BoxList from the boxes boxlist BoxList(boxes, image.size, mode xyxy ) add the labels to the boxlist boxlist.add_field( labels , labels) if self.transforms: image, boxlist self.transforms(image, boxlist) return the image, the boxlist and the idx in your dataset return image, boxlist, idx def get_img_info(self, idx): get img_height and img_width. This is used if we want to split the batches according to the aspect ratio of the image, as it can be more efficient than loading the image from disk return { height : img_height, width : img_width} That's it. You can also add extra fields to the boxlist, such as segmentation masks (using structures.segmentation_mask.SegmentationMask ), or even your own instance type. For a full example of how the COCODataset is implemented, check maskrcnn_benchmark/data/datasets/coco.py (maskrcnn_benchmark/data/datasets/coco.py). Once you have created your dataset, it needs to be added in a couple of places: maskrcnn_benchmark/data/datasets/__init__.py (maskrcnn_benchmark/data/datasets/__init__.py): add it to __all__ maskrcnn_benchmark/config/paths_catalog.py (maskrcnn_benchmark/config/paths_catalog.py): DatasetCatalog.DATASETS and corresponding if clause in DatasetCatalog.get() Testing While the aforementioned example should work for training, we leverage the cocoApi for computing the accuracies during testing. Thus, test datasets should currently follow the cocoApi for now. To enable your dataset for testing, add a corresponding if statement in maskrcnn_benchmark/data/datasets/evaluation/__init__.py (maskrcnn_benchmark/data/datasets/evaluation/__init__.py): python if isinstance(dataset, datasets.MyDataset): return coco_evaluation( args) Finetuning from Detectron weights on custom datasets Create a script tools/trim_detectron_model.py like here . You can decide which keys to be removed and which keys to be kept by modifying the script. Then you can simply point the converted model path in the config file by changing MODEL.WEIGHT . For further information, please refer to 15 . Troubleshooting If you have issues running or compiling this code, we have compiled a list of common issues in TROUBLESHOOTING.md (TROUBLESHOOTING.md). If your issue is not present there, please feel free to open a new issue. Citations Please consider citing this project in your publications if it helps your research. The following is a BibTeX reference. The BibTeX entry requires the url LaTeX package. @misc{massa2018mrcnn, author {Massa, Francisco and Girshick, Ross}, title {{maskrcnn benchmark: Fast, modular reference implementation of Instance Segmentation and Object Detection algorithms in PyTorch}}, year {2018}, howpublished {\url{ note {Accessed: Insert date here } } Projects using maskrcnn benchmark RetinaMask: Learning to predict masks improves state of the art single shot detection for free . Cheng Yang Fu, Mykhailo Shvets, and Alexander C. Berg. Tech report, arXiv,1901.03353. FCOS: Fully Convolutional One Stage Object Detection . Zhi Tian, Chunhua Shen, Hao Chen and Tong He. Tech report, arXiv,1904.01355. code License maskrcnn benchmark is released under the MIT license. See LICENSE (LICENSE) for additional details.",Object Detection,Object Detection 2304,Computer Vision,Computer Vision,Computer Vision,"Tackling Background Differently presented at ICVSS 2015 This work proposes classifying background using a learned threshold. It branches off from the research presented in Prototypical priors, Jetley et.al . The sub field of zero shot recognition relies heavily upon the definition of an attribute based continuous embedding space that maps through to the category labels to allow test time classification of images to unseen categories with known attributes. In particular for the case above, the attributes are the prototypical templates of objects such as traffic lights, logos, etc. For recognition in real world, soa CNN based detection models trivially introduce a background class which subsumes all real world visuals that do not belong to any of the pre determined object categories of interest, see Fast RCNN , Faster R CNN . However, the background class is ill defined and has a changing definition based on the foreground object categories considered. Establishing an mapping for the background class in the attribute space is ambiguous and unclear. Thus, existing deep convolutional recognition pipelines need to be modified to allow bypassing an attribute mapping for the background. We propose one such modification in the current work. The proposed architecture affords incorporation of attribute based embedding space over the non background category labels in classification models; while bypassing the background label by the use of a learned threshold that is supposed to preemptively filter out non object samples. More details here: License",Object Detection,Object Detection 2305,Computer Vision,Computer Vision,Computer Vision,"yolo_v2_tensorflow YOLO algorithm Object Detection Implementation of yolo v2 algorithm in tensorflow. Many of the ideas in this notebook are described in the two YOLO papers: Redmon et al., 2016 and Redmon and Farhadi, 2016 . test case: Input Output",Object Detection,Object Detection 2311,Computer Vision,Computer Vision,Computer Vision,"MaskRCNN + Faster RCNN (Personal Notes) Mask RCNN implementation in PyTorch Faster RCNN works in two phases: 1. Region proposals 2. Classifying regions Region Proposals A region is an area of the original picture which might contain an object. Also known as Region Of Interest ( RoI ) These are the most important aspects of an RCNN. They are also a source of bottlenecks. RCNN used a large number of region proposals by running it through a variety of category independent region proposal algorithms. These regions are then passed through a CNN. Fast RCNN is an improvment over RCNN . Instead of running the region proposal algorithms over the underlying image, the algorithms are run over a feature map. This feature map is obtained by passing the image through the Convlutional layers of any CNN. Fast RCNN is computationally less expensive when compared to RCNN. Dataset Coco Paper RCNN Fast RCNN Mask RCNN",Object Detection,Object Detection 2327,Computer Vision,Computer Vision,Computer Vision,"Automated Video Filtering YOLOv2 Reduces specific manual video inspection task by discarding the majority of meaningless videos. Title: Automated Video Filter for traffic analysis Dependencies: Python 3, Darkflow's YOLOv2, OpenCV, NumPy, tqdm, Pandas, glob. N.B.: For adequate speed GPU must be set up for use, otherwise set gpu in options to 0.0 (in Process.py) to use your CPU. Problem: A friend mentioned a neighbour of his was receiving significant traffic to their home run business causing disruptions on their usually quiet road, along with extra noise pollution and safety issues. The council was unwilling to hear a case without evidence, which it was unwilling to collect. A fixed motion detecting camera used for security captured areas of the public road but resulted in a large number of videos that were not meaningful people walking their dogs and putting bins out, trees blowing in the wind, etc. This seemed like the perfect use for machine learning image processing algorithms to filter out the majority of meaningless videos and allow for a drastically reduced manual video filtering task. Solution: Using Darknet's implementation of a pre trained YOLOv2 model for automated object detection. Possible business traffic (meaningful) and other (not meaningful) videos can then be separated by analysis of bounding box position and dimensions. Method (techniques explained in Module Descriptions and noted code): 3500 videos/day are produced, presented as two second chunks due to a characteristic of the motion detection camera software. Hence, consecutive videos concerning the same object must be merged for easier watchability. This reduces number of videos needed to be inspected by 90%, although overall length of video to be inspected is the same. Merged videos are searched by the YOLOv2 algorithm to find those containing vehicles and frame by frame results are recorded in individual CSVs for each video. CSVs for each video are searched in Python to find those containing vehicle bounding boxes in requisite positions indicating possible business traffic. A list of videos to be manually inspected is presented. Module Descriptions: RealTimeVidDetect.py: Detects objects in real time for an input video as a demonstration of the YOLOv2 algorithm. MergeVideos.py: Merges consecutive videos for easier inspection (both automated and manual) and easier recording of possible meaningful videos. VidInputDetect.py: Creates a new CSV for each video detailing positions of bounding boxes for each vehicle (car/truck) object detected in each frame. ListOfYVids.py: Searches each CSV for vehicles in requisite locations indicating possible business traffic and creates a list of possible videos. Process.py: Combines previous 3 modules to fully process a batch of input videos and produce a details of videos to be manually inspeced. To run the processing yourself: Download sample videos from Open the folder and place the 3 folders into the same folder as your cloned repository (like this: Run Process.py and select Y for each of the 3 options Sample videos contain (once merged) one false positive, two true negatives, and two true positives for illustration of each. The cost of false negatives was deemed to be much higher than false positives so pixel values for vehicle detection are adjusted such as to effectively remove the possibility of false negatives. Instructions for real time object recognition for individual videos are included in RealTimeVidDetect. Results: Number of videos to be manually inspected decreased by 98.5%. This is composed of a 90% decrease through merging consecutive videos and subsequent 85% decrease through retaining only videos containing vehicles in positions indicating possible business traffic. Hence, 50 videos must be manually inspected per day, of which around half contain business traffic. FYI: The council sought and gained assurances from the neighbour on levels of traffic and he ceased pursuing the case. Why YOLOv2? YOLO is an effective and quick, state of the art, real time object detection system capable most applicable given the volume of video to be processed. Future Improvements: Mask R CNN yields more detailed results allowing for semantic segmentation of different object instances, leading to theoretically perfectly accurate automated video inspection for this application (as vehicle orientation indicated by masks produced can be used to highlight those pulling into the neighbours) but requires significantly more processing time and was deemed inapplicable given the volume of video to be processed per day. Camera position could be moved from ground floor to first which would improve accuracy of business traffic position recognition (although pixel values indicating business traffic would have to be manually modified). YOLOv2's accuracy is limited severly by the dark in winter months although project extensions making use of CNN's for improving low light images/videos may greatly decrease this. Vehicles in frames are detected on a frame by frame basis and so, instances are not linked to themselves over changing frames. A robust method of creating instances of each vehicle detected may be to link those with similar bounding box characteristics (size, position, vehicle type) in consecutive frames and train a subsequent learning model based on features of these vehicle instances.",Object Detection,Object Detection 2328,Computer Vision,Computer Vision,Computer Vision,"Dogs and cats image classification CNN Binary Image Classification using CNN w/ residual layers (Dogs & Cats) (Tensorflow GPU, TFLearn, OpenCV) Modules converted from Sentdex' tutorial . Now includes: Improved accuracy through CNN architecture and hyperparameter modification featuring inclusion of residual blocks (Ref: Data Augmentation Separated to callable functions for easier hyperparameter optimisation, debugging, and more readable code Added custom image input function Added commands while running to eliminate repeated image processing/model training Model attains 90% accuracy on validation data and a log loss of 0.32 on Kaggle's test data. Results analysed with tensorboard. Data available at: Modules: Main.py: calls preprocessing.py and CNN_model.py to load and preprocess image data, set model parameters, augment the image collection, and train, test, and produce classification metrics for the model on the validation set. preprocessing.py: Functions for creation of training and test data in .npy files. CNN_model.py: Functions for CNN model creation. Model architecture and more complex hyperparameters can be modified here. modelresults_inspection.py: loads model and test data and outputs images and predicted labels for user inspection. custominput.py: predicts categories and displays images for images located in CustomImageInput directory (will need to be modified to wherever user creates this folder when used). kaggle_submission: loads model to create kaggle submission file (.csv).",Object Detection,Object Detection 2329,Computer Vision,Computer Vision,Computer Vision,"FlowersRecognition_keras Multi class Image Classification using CNN (Flower Types) (Keras, Tensorflow GPU, OpenCV) Dataset available at: Includes: Keras CNN implemented using keras.models.Sequential Keras CNN incorporating a residual block/shortcut connection Hyperparameter tuning using grid search Learning rate annealing using keras.optimizers.adadelta Dropout in fully connected layer to reduce overfit Normalisation of image data Modules: prepocessing: Function for creation of training and test data in .npy files. main_keras: Calls preprocessing.py to load and preprocess image data, sets model parameters, loads and normalises data, and trains, tests, and produces classification metrics for the convolutional neural network model on the validation set using a sequential model. hyperparamsearch: Calls preprocessing.py to load and preprocess image data, sets model parameters, loads and normalises data, and trains, tests, and produces classification metrics for the convolutional neural network model on the validation set using a sequential model. Performs a grid search of hyperparameter options and records accuracy and epoch number of max accuracy in a separate csv. res_hyperparamsearch: Calls preprocessing.py to load and preprocess image data, sets model parameters, loads and normalises data, and trains, tests, and produces classification metrics for the convolutional neural network model with residual block/shortcut connection on the validation set using a sequential model. Performs a grid search of hyperparameter options and records accuracy and epoch number of max accuracy in a separate csv. Info on residual layers: ResNet 50: Deep Residual Learning for Image Recognition Python 3.6.7, Tensorflow 1.12.0, Keras 2.2.4",Object Detection,Object Detection 2338,Computer Vision,Computer Vision,Computer Vision,"OverFeat OverFeat is a Convolutional Network based image classifier and feature extractor. OverFeat was trained on the ImageNet dataset and participated in the ImageNet 2013 competition. This package allows researchers to use OverFeat to recognize images and extract features. A library with C++ source code is provided for running the OverFeat convolutional network, together with wrappers in various scripting languages (Python, Lua, Matlab coming soon). OverFeat was trained with the Torch7 package ( ). The OverFeat package provides tools to run the network in a standalone fashion. The training code is not distributed at this time. CREDITS, LICENSE, CITATION OverFeat is Copyright NYU 2013. Authors of the present package are Michael Mathieu, Pierre Sermanet, and Yann LeCun. The OverFeat system is by Pierre Sermanet, David Eigen, Xiang Zhang, Michael Mathieu, Rob Fergus, and Yann LeCun. Please refer to the LICENSE file in the same directory as the present file for licensing information. If you use OverFeat in your research, please cite the following paper: OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , Pierre Sermanet, David Eigen, Xiang Zhang, Michael Mathieu, Rob Fergus, Yann LeCun INSTALLATION: Download the archive from Extract the files: tar xvf overfeat vXX.tgz cd overfeat Overfeat uses external weight files. Since these files are large and do not change often, they are not included in the archive. We provide a script to automatically download the weights : ./download_weights.py The weight files should be in the folder data/default in the overfeat directory. Overfeat can run without BLAS, however it would be very slow. We strongly advice you to install openblas on linux (on MacOS, Accelerate should be available without any installation). On Ubuntu/Debian you should compile it (it might take a while, but it is worth it) : sudo apt get install build essential gcc g++ gfortran git libgfortran3 cd /tmp git clone cd OpenBLAS make NO_AFFINITY 1 USE_OPENMP 1 sudo make install For some reason, on 32 bits Ubuntu, libgfortran doesn't create the correct symlink. If you have issues linking with libgfortran, locate where libgfortran is installed (for instance /usr/lib/i386 linux gnu) and create the correct symlink : cd sudo ln sf libgfortran.so.3 libgfortran.so The precompiled binaries use BLAS. If you don't want to (or can't, for some reason) use BLAS, you must recompile overfeat. RUNNING THE PRE COMPILED BINARIES Pre compiled binaries are provided for Ubuntu Linux (32 bits and 64 bits) and Mac OS. The pre requisites are python and imagemagick, which are installed by default on most popular Linux distros. Important note: OverFeat compiled from source on your computer will run faster than the pre compiled binaries. Example of image classification, printing the 6 highest scoring categories: bin/YOUR_OS/overfeat n 6 samples/bee.jpg where YOUR_OS can be either linux_64, linux_32, or macos. Running the webcam demo: bin/YOUR_OS/webcam GPU PRE COMPILED BINARIES (EXPERIMENTAL) We are providing precompiled binaries to run overfeat on GPU. Because the code is not released yet, we do not provide the source for now. The GPU release is experimental and for now only runs on linux 64bits. It requires a Nvidia GPU with CUDA architecture > 2.0 (that covers all recent GPUs from Nvidia). You will need openblas to run the GPU binaries. The binaries are located in bin/linux_64/cuda And work the same way as the CPU versions. You can include the static library the same way as the CPU version. COMPILING FROM SOURCE Install dependencies : python, imagemagick, git, gcc, cmake (pkg config and opencv required for the webcam demo). On Ubuntu/Debian : apt get install g++ git python imagemagick cmake For the webcam demo : apt get install pkg config libopencv dev libopencv highgui dev Here are the instructions to build the OverFeat library and tools: Go to the src folder : cd src Build the tensor library (TH), OverFeat and the command line tools: make all Build the webcam demo (OpenCV required) : make cam On Mac OS, the default gcc doesn't support OpenMP. We strongly recommend to install a gcc version with OpenMP support. With MacPort : sudo port install gcc48 Which will provide g++ mp 48 . If you don't install this version, you will have to change the two corresponding lines in the Makefile. UPDATING A git repository is provided with the archive. You can update by typing git pull from the overfeat directory. HIGH LEVEL INTERFACE: The feature extractor requires a weight file, containing the weights of the network. We provide a weight file located in data/default/net_weight . The software we provide should be able to locate it automatically. In case it doesn't, the option d can be used to manually provide a path. Overfeat can use two sizes of network. By default, it uses the smaller one. For more accuracy, the option l can be used to use a larger, but slower, network. CLASSIFICATION: In order to get the top (by default, 5) classes from a number of images : bin/linux_64/overfeat n d l path_to_image1 path_to_image2 path_to_image3 ... To use overfeat online (feeding an image stream), feed its stdin stream with a sequence of ppm images (ended by end of file ('\0') character). In this case, please use option p. For instance : convert image1.jpg image2.jpg resize 231x231 ppm: ./overfeat n d l p Please note that to get the classes from an image, the image size should be 231x231. The image will be cropped if one dimension is larger than 231, and the network won't be able to work if both dimension are larger. For feature extraction without classification, it can be any size greater or equal to 231x231 for the small network, and 221x221 for the large network . FEATURE EXTRACTION: In order to extract the features instead of classifying, use f option. For instance : bin/linux_64/overfeat d l f image1.png image2.jpg It is compatible with option p. The option L (overrides f) can be used to return the output of any layer. For instance bin/linux_64/overfeat d l L 12 image1.png returns the output of layer 12. The option f corresponds to layer 19 for the small layer and 22 for the large one. It writes the features on stdout as a sequence. Each feature starts with three integers separated by spaces, the first is the number of features (n), the second is the number of rows (h) and the last is the number of columns (w). It is followed by a end of line ('\n') character. Then follows n h w floating point numbers (written in ascii) separated by spaces. The feature is the first dimension (so that to obtain the next feature, you must add w h to your index), followed by the row (to obtain the next row, add w to your index). That means that if you want the features corresponding to the top left window, you need to read pixels i h w for i 0..4095 . The output is going to be a 3D tensor. The first dimension correspond to the features, while dimensions 2 and 3 are spatial (y and x respectively). The spatial dimension is reduced at each layer, and with the default network, using option f, the output has size nFeatures h w where for the small network, nFeatures 4096 h ((H 11)/4 + 1)/8 6 w ((W 11)/4 + 1)/8 6 for the large network, nFeatures 4096 h ((H 7)/2 + 1)/18 5 w ((W 7)/2 + 1)/18 5 if the input has size 3 H W . Each pixel in the feature map corresponds to a localized window in the input. With the small network, the windows are 231x231 pixels, overlapping so that the i th window begins at pixel 32 i, while for the large network, the windows are 221x221, and the i th window begins at pixel 36 i. WEBCAM: We provide a live classifier based on the webcam. It reads images from the webcam, and displays the most likely classes along with the probabilities. It can be run with bin/linux_64/webcam d l w BATCH: We also provide an easy way to process a whole folder : ./bin/linux_64/overfeat_batch d l i o It process each image in the input folder and produces a corresponding file in the output directory, containing the features,in the same format as before. EXAMPLES: Classify image samples/bee.jpg, getting the 3 most likely classes : bin/linux_64/overfeat n 3 samples/bee.jpg Extract features from samples/pliers.jpg with the large network : bin/linux_64/overfeat f l samples/pliers.jpg Extract the features from all files in samples : ./bin/linux_64/overfeat_batch i samples o samples_features Run the webcam demo with the large network : bin/linux_64/webcam l ADVANCED: The true program is actually overfeatcmd, where overfeat is only a python script calling overfeatcmd. overfeatcmd is not designed to be used by itself, but can be if necessary. It taked three arguments : bin/linux_64/overfeatcmd If is positive, it is, as before, the number of top classes to display. If is nonpositive, the features are going to be the output. The option specifies from which layer the features are obtained (by default, 16, corresponding to the last layer before the classifier). corresponds to the size of the network : 0 for small, 1 for large. APIs: C++: The library is written in C++. It consists of one static library named liboverfeat.a . The corresponding header is overfeat.hpp . It uses the low level torch tensor library (TH). Sample code can be found in overfeatcmd.cpp and webcam.cpp. The library provides several functions in the namespace overfeat : void init(const std::string & weight_file_path, int net_idx) : This function must be called once before using the feature extractor. It reads the weights and must be passed a path to the weight files. It must also be passed the size of the network (net_idx), which should be 0, or 1, respectively for small or large networks. Note that the weight file must correspond to the size of the network. void free() : This function releases the ressources and should be called when the feature extractor is no longer used. THTensor fprop(THTensor input) : This is the main function. It takes an image stored in a THTensor and runs the network on it. It returns a pointer to a THTensor containing the output of the classifier. If the input is 3 H W, the output is going to be nClasses h w, where for the small network : nClasses 1000 h ((H 11)/4 + 1)/8 6 w ((W 11)/4 + 1)/8 6 for the large network : nClasses 1000 h ((H 7)/2 + 1)/18 5 w ((W 7)/2 + 1)/18 5 Each pixel of the output corresponds to a 231x231 window on the input for the small network, and 221x221 for the large network. The windows overlap in the same way as described earlier for the feature extraction. Each class gets a score, but they are not probabilities (they are not normalized). THTensor get_output(int i) : Once fprop has been computed, this function returns the output of any layer. For instance, in the default network, layer 16 corresponds to the final features before the classifier. int get_n_layers() : Returns the total number of layers of the network. void soft_max(THTensor input, THTensor output) : This function converts the output to probabilities. It only works if h w 1 (only one output pixel). std::string get_class_name(int i) : This function returns the string corresponding to the i th class. std::vector > get_top_classes(THTensor probas, int n) : Given a vector with nClasses elements containing scores or probabilities, this function returns the names of the top n classes, along with their score/probabilities. When compiling code using liboverfeat.a, the code must also be linked against libTH.a, the tensor library. The file libTH.a will have been produced when compiling torch. Torch7: We have bindings for torch, in the directory API/torch. The file API/torch/README contains more details. Python: The bindings for python are in API/python. See API/python/README .",Object Detection,Object Detection 2342,Computer Vision,Computer Vision,Computer Vision,"Single Shot Refinement Neural Network for Object Detection License (LICENSE) By Shifeng Zhang , Longyin Wen , Xiao Bian , Zhen Lei , Stan Z. Li . Introduction We propose a novel single shot based detector, called RefineDet, that achieves better accuracy than two stage methods and maintains comparable efficiency of one stage methods. You can use the code to train/evaluate the RefineDet method for object detection. For more details, please refer to our paper . System VOC2007 test mAP FPS (Titan X) Number of Boxes Input resolution : : : : : : : : : Faster R CNN (VGG16) 73.2 7 6000 1000 x 600 YOLO (GoogLeNet) 63.4 45 98 448 x 448 YOLOv2 (Darknet 19) 78.6 40 1445 544 x 544 SSD300 (VGG16) 77.2 46 8732 300 x 300 SSD512 (VGG16) 79.8 19 24564 512 x 512 RefineDet320 (VGG16) 80.0 40 6375 320 x 320 RefineDet512 (VGG16) 81.8 24 16320 512 x 512 _Note: RefineDet300+ and RefineDet512+ are evaluated with the multi scale testing strategy. The code of the multi scale testing has also been released in this repository._ Citing RefineDet Please cite our paper in your publications if it helps your research: @inproceedings{zhang2018single, title {Single Shot Refinement Neural Network for Object Detection}, author {Zhang, Shifeng and Wen, Longyin and Bian, Xiao and Lei, Zhen and Li, Stan Z.}, booktitle {CVPR}, year {2018} } Contents 1. Installation ( installation) 2. Preparation ( preparation) 3. Training ( training) 4. Evaluation ( evaluation) 5. Models ( models) Installation 1. Get the code. We will call the cloned directory as $RefineDet_ROOT . Shell git clone 2. Build the code. Please follow Caffe instruction to install all necessary packages and build it. Shell cd $RefineDet_ROOT Modify Makefile.config according to your Caffe installation. Make sure to include $RefineDet_ROOT/python to your PYTHONPATH. cp Makefile.config.example Makefile.config make all j && make py Preparation 1. Download fully convolutional reduced (atrous) VGGNet . By default, we assume the model is stored in $RefineDet_ROOT/models/VGGNet/ . 2. Download ResNet 101 . By default, we assume the model is stored in $RefineDet_ROOT/models/ResNet/ . 3. Follow the data/VOC0712/README.md to download VOC2007 and VOC2012 dataset and create the LMDB file for the VOC2007 training and testing. 4. Follow the data/VOC0712Plus/README.md to download VOC2007 and VOC2012 dataset and create the LMDB file for the VOC2012 training and testing. 5. Follow the data/coco/README.md to download MS COCO dataset and create the LMDB file for the COCO training and testing. Training 1. Train your model on PASCAL VOC. Shell It will create model definition files and save snapshot models in: $RefineDet_ROOT/models/VGGNet/VOC0712{Plus}/refinedet_vgg16_{size}x{size}/ and job file, log file, and the python script in: $RefineDet_ROOT/jobs/VGGNet/VOC0712{Plus}/refinedet_vgg16_{size}x{size}/ python examples/refinedet/VGG16_VOC2007_320.py python examples/refinedet/VGG16_VOC2007_512.py python examples/refinedet/VGG16_VOC2012_320.py python examples/refinedet/VGG16_VOC2012_512.py 2. Train your model on COCO. Shell It will create model definition files and save snapshot models in: $RefineDet_ROOT/models/{Network}/coco/refinedet_{network}_{size}x{size}/ and job file, log file, and the python script in: $RefineDet_ROOT/jobs/{Network}/coco/refinedet_{network}_{size}x{size}/ python examples/refinedet/VGG16_COCO_320.py python examples/refinedet/VGG16_COCO_512.py python examples/refinedet/ResNet101_COCO_320.py python examples/refinedet/ResNet101_COCO_512.py 3. Train your model form COOC to VOC (Based on VGG16). Shell It will extract a VOC model from a pretrained COCO model. ipython notebook convert_model_320.ipynb ipython notebook convert_model_512.ipynb It will create model definition files and save snapshot models in: $RefineDet_ROOT/models/VGGNet/VOC0712{Plus}/refinedet_vgg16_{size}x{size}_ft/ and job file, log file, and the python script in: $RefineDet_ROOT/jobs/VGGNet/VOC0712{Plus}/refinedet_vgg16_{size}x{size}_ft/ python examples/refinedet/finetune_VGG16_VOC2007_320.py python examples/refinedet/finetune_VGG16_VOC2007_512.py python examples/refinedet/finetune_VGG16_VOC2012_320.py python examples/refinedet/finetune_VGG16_VOC2012_512.py Evaluation 1. Build the Cython modules. Shell cd $RefineDet_ROOT/test/lib make j 2. Change the ‘self._devkit_path’ in test/lib/datasets/pascal_voc.py to yours. 3. Change the ‘self._data_path’ in test/lib/datasets/coco.py to yours. 4. Check out test/refinedet_demo.py on how to detect objects using the RefineDet model and how to plot detection results. Shell For GPU users python test/refinedet_demo.py For CPU users python test/refinedet_demo.py gpu_id 1 5. Evaluate the trained models via test/refinedet_test.py . Shell You can modify the parameters in refinedet_test.py for different types of evaluation: single_scale: True is single scale testing, False is multi_scale_testing. test_set: 'voc_2007_test', 'voc_2012_test', 'coco_2014_minival', 'coco_2015_test dev'. voc_path: where the trained voc caffemodel. coco_path: where the trained voc caffemodel. For 'voc_2007_test' and 'coco_2014_minival', it will directly output the mAP results. For 'voc_2012_test' and 'coco_2015_test dev', it will save the detections and you should submitted it to the evaluation server to get the mAP results. python test/refinedet_test.py Models We have provided the models that are trained from different datasets. To help reproduce the results in Table 1, Table 2, Table 4 , most models contain a pretrained .caffemodel file, many .prototxt files, and python scripts. 1. PASCAL VOC models (VGG 16): 07+12: RefineDet320 , RefineDet512 07++12: RefineDet320 , RefineDet512 COCO: RefineDet320 , RefineDet512 07+12+COCO: RefineDet320 , RefineDet512 07++12+COCO: RefineDet320 , RefineDet512 2. COCO models: trainval35k (VGG 16): RefineDet320 , RefineDet512 trainval35k (ResNet101): RefineDet320 , RefineDet512 _Note: If you can not download pre trained models through the above links, you can download them through BaiduYun ._",Object Detection,Object Detection 2370,Computer Vision,Computer Vision,Computer Vision,"Cross layer pooling algorithm This repository provides a basic implementation of Cross layer pooling algorithm (Liu et al. 2015) in C++ (using OpenCV and Caffe) and Python (using TensorFlow). The code uses pretrained ResNet 152 network for its initialization. Refer to the paper for more details on ResNet (He et al. 2015) and for details on the cross pooling method. For training the network on your own dataset, CrossLayerPoolingClassifier class to train a linear SVM on top of features computed through cross layer pooling strategy or use a pre trained SVM for predictions. Code Dependencies (C++): OpenCV 3.2 Caffe Boost Pretrained ResNet 152 model.caffemodel Code Dependencies (TensorFlow): TensorFlow (v1.10 or newer) TODOs: Add support for new classifiers along with SVM Add support for larger region sizes using PCA Add optimization Add code profiling Cite @article{doi:10.1093/icesjms/fsx109, author {Siddiqui, Shoaib Ahmed and Salman, Ahmad and Malik, Muhammad Imran and Shafait, Faisal and Mian, Ajmal and Shortis, Mark R and Harvey, Euan S and Handling editor: Howard Browman}, title {Automatic fish species classification in underwater videos: exploiting pre trained deep neural network models to compensate for limited labelled data}, journal {ICES Journal of Marine Science}, volume {75}, number {1}, pages {374 389}, year {2018}, doi {10.1093/icesjms/fsx109}, URL { eprint {/oup/backfile/content_public/journal/icesjms/75/1/10.1093_icesjms_fsx109/1/fsx109.pdf} } License: MIT Issues/Feedback: In case of any issues, feel free to drop me an email or open an issue on the repository. Email: shoaib_ahmed.siddiqui@dfki.de",Object Detection,Object Detection 2372,Computer Vision,Computer Vision,Computer Vision,"概要 iOS11のVisionFrameworkのサンプル ライセンス このサンプルコード自体のライセンスは LICENSE をご確認ください。 またこのサンプルアプリは、MITライセンスで配布されている、以下の学習済みモデルを含みます。 Model Name: ResNet50 Description: Detects the dominant objects present in an image from a set of 1000 categories such as trees, animals, food, vehicles, people, and more. The top 5 error from the original publication is 7.8%. Core ML Model Size: 102.6 MB Source Link Download Link Script downloads weights, constructs model and saves out a .h5 Keras model. Project Page Authors Original Paper: Kaiming He and Xiangyu Zhang and Shaoqing Ren and Jian Sun Keras Implementation: François Chollet Citations Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun. Deep Residual Learning for Image Recognition. Paper @misc{chollet2015keras, title {Keras}, author {Chollet, Fran\c{c}ois and others}, year {2015}, publisher {GitHub}, howpublished {\url{ } Labels Imagenet Labels from License MIT License Copyright (c) 2016 François Chollet Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the Software ), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED AS IS , WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE",Object Detection,Object Detection 2378,Computer Vision,Computer Vision,Computer Vision,"Chainer\_Mask\_R CNN Chainer implementation of Mask R CNN the multi task network for object detection, object classification, and instance segmentation. 日本語版 README What's New Training result for R 50 C4 model has been evaluated! COCO box AP 0.346 using our trainer (0.355 with official boxes) COCO mask AP 0.287 using our trainer (0.314 with official boxes) Examples to be updated Requirements Chainer Chainercv Cupy (operable if your environment can run chainer > v3 with cuda and cudnn.) (verified as operable: chainer 3.1.0, chainercv 0.7.0, cupy 1.0.3) $ pip install chainer $ pip install chainercv $ pip install cupy Python 3.0+ NumPy Matplotlib OpenCV TODOs x Precision Evaluator (bbox, COCO metric) x Detectron Model Parser x Modify ROIAlign x Mask inference using refined ROIs x Precision Evaluator (mask, COCO metric) Improve segmentation AP for R 50 C4 model Feature Pyramid Network (R 50 FPN) Keypoint Detection (R 50 FPN, Keypoints) Benchmark Results Box AP 50:95 Segm AP 50:95 Ours (1 GPU) 0.346 0.287 Detectron model 0.350 0.295 Detectron caffe2 0.355 0.314 Inference with Pretrained Models Download the pretrained model from the Model Zoo ( model link of R 50 C4 Mask at End to End Faster & Mask R CNN Baselines ) Make modelfiles directory and put the downloaded file model_final.pkl in it Execute: python utils/detectron_parser.py And the converted model file is saved in modelfiles Run the demo: python demo.py bn2affine modelfile modelfiles/e2e_mask_rcnn_R 50 C4_1x_d2c.npz image Prerequisites for training Download 'ResNet 50 model.caffemodel' from the OneDrive download of ResNet pretrained models for model initialization and place it in /.chainer/dataset/pfnet/chainer/models/ COCO 2017 dataset : the COCO dataset can be downloaded and unzipped by: bash getcoco.sh Setup the COCO API: git clone cd coco/PythonAPI/ make python setup.py install cd ../../ note: the official coco repository is not python3 compatible. Use the repository above in order to run our evaluation. Train python train.py arguments and the default conditions are defined as follows: ' dataset', choices ('coco2017'), default 'coco2017' ' extractor', choices ('resnet50','resnet101'), default 'resnet50', help 'extractor network' ' gpu', ' g', type int, default 0 ' lr', ' l', type float, default 1e 4 ' batchsize', ' b', type int, default 8 ' freeze_bn', action 'store_true', default False, help 'freeze batchnorm gamma/beta' ' bn2affine', action 'store_true', default False, help 'batchnorm to affine' ' out', ' o', default 'result', help 'output directory' ' seed', ' s', type int, default 0 ' roialign', action 'store_true', default True, help 'True: ROIAlign, False: ROIpooling' ' step_size', ' ss', type int, default 400000 ' lr_step', ' ls', type int, default 480000 ' lr_initialchange', ' li', type int, default 800 ' pretrained', ' p', type str, default 'imagenet' ' snapshot', type int, default 4000 ' validation', type int, default 30000 ' resume', type str ' iteration', ' i', type int, default 800000 ' roi_size', ' r', type int, default 14, help 'ROI size for mask head input' ' gamma', type float, default 1, help 'mask loss balancing factor' note that we use a subdivision based updater to enable training with large batch size. Demo Segment the objects in the input image by executing: python demo.py image modelfile result/snapshot_model.npz contour Evaluation Evaluate the trained model with COCO metric (bounding box, segmentation) : python train.py lr 0 iteration 1 validation 1 resume Citation Please cite the original paper in your publications if it helps your research: @article{DBLP:journals/corr/HeGDG17, author {Kaiming He and Georgia Gkioxari and Piotr Doll{\'{a}}r and Ross B. Girshick}, title {Mask {R CNN}}, journal {CoRR}, volume {abs/1703.06870}, year {2017}, url { archivePrefix {arXiv}, eprint {1703.06870}, timestamp {Wed, 07 Jun 2017 14:42:32 +0200}, biburl { bibsource {dblp computer science bibliography, }",Object Detection,Object Detection 2399,Computer Vision,Computer Vision,Computer Vision,"AtlasProteinChallenge files : 'models': sub repo with h5 files containing weights of each trained network. 'results': plots/ tabs to summarize results and exploration other files (.py / .ipynb) are different versions giving submissions for the competition. See the exploration below for more details. Kaggle competition : data can be found here Competition description : In this competition, Kagglers will develop models capable of classifying mixed patterns of proteins in microscope images. Proteins are “the doers” in the human cell, executing many functions that together enable life. Historically, classification of proteins has been limited to single patterns in one or a few cell types, but in order to fully understand the complexity of the human cell, models must classify mixed patterns across a range of different human cells. Images visualizing proteins in cells are commonly used for biomedical research, and these cells could hold the key for the next breakthrough in medicine. However, thanks to advances in high throughput microscopy, these images are generated at a far greater pace than what can be manually evaluated. Therefore, the need is greater than ever for automating biomedical image analysis to accelerate the understanding of human cells and disease. Credits to Allunia for the jupyter notebook which give a great description of this Computer Vision challenge. (protein atlas exploration and baseline.ipynb) / Credits to NikitPatel for the medical explanations (Atlas_medical_explanations.ipynb) Problem Description : Considering 30172 images(different size provided 512 512 & 2048 2048) with multilabels taking values in {0;1}^28, we aim to accordingly classify 11702 images(29% of test set provided for competition first stage) whose labels are only known by Kaggle organizers. For each image, 4 separed channels (green,red,bleu,yellow) showing different proteins are given. Our goal is to maximize the 'macro F1 score' : F1 computed separately for each class over all images, then averaged. My exploration Baseline : As I started late the competition, 3 or 4 architectures used to solve proteins classification were already found, so I used most recent one with interesting properties ( relatively small number of parameters, input flexibility, knowledge from features of different depth): associated article My computational means forced me to train this model with 256 256 images. ! alt text First, as the dataset is highly unbalanced, I tried to figure a way to improve prediction from a fixed model (Baseline optimizer: Adam (0.001) loss BCE reduceLRonPlateau Rotation/Flip augmentation) followed by threshold optimization for all labels. With this dataset it is impossible to stratify in batches(16 or 32), then train and validation sets are created thanks to multilabel stratification . I compared 3 approches: without upsampling upsampling based on weights upsampling only rare classes As evaluation metric doesn't discriminate labels and network learning is influenced by datasets balance, latest worked best ( based on LB score). Secondly, I tuned the loss function to optimize validation & LB scores: binary cross entropy already investigated F1 loss: aiming to directly optimize competition score by changing paradigm according to this article BCE followed by F1 loss on last layers Focal Loss which derives from alpha balanced CE, introduces another parameter gamma (>0) reducing the relative loss for well classified examples. I used Gamma 2 like in the article and as alpha is often set as inverse class frequency, I set alpha as mean of this criterion for each label. Focal loss outperformed the others on validation & LB scores ! Quite a surprise as I hadn't time to grid search Gamma and frequency of each label has high variance. More precisely I found: Focal > BCE+ F1 > BCE > F1. After that, I tried to specify a threshold by maximizing F1 for each label instead of setting a global thresh (simple greedy method...). Even with 5 fold CV, results increased on train and validation datasets but not on LB score. Kagglers found out LB dataset wasn't accordingly balanced, so for competition first stage I considered wiser to stay with global threshold optimization. I trained 5 similar models on different stratified datasets to ensemble them. Here I tested only two ensembling strategies and this matter deserved clearly more investigation (cf Thresholding Classifiers to Maximize F1 Score averaging probabilities given by each model + global threshold / majority vote among predictions of each (model + global threshold). My greatest score was obtained with first approach. Then, I tried to crop 256 256 images from 512 512 instead of resizing, which allow greater resolution considering each piece was well labeled. (fair assumption with our samples ). With same settings of previous best model, I trained 5 models on these cropped images and ensembled them. It clearly improved LB score and show how much images size ( & resolution) can influence prediction power. Finally I wanted to make use of pre trained models. For VGG 16, RESNET50 & InceptionV3, I made a simple classification scheme: Input > BN > Pre trained model > Conv + Relu > Flatten > Dropout > Dense + Relu > Dropout > Dense+sigmoid. Which I trained using cropped images, upsampling on rare classes & focal loss, for 15 epochs(batch size & augmentation adapted to pre trained input size & 4GB GPU memory) freezing the pre trained model, and then unfreezing it for 2(VGG16) or 5 ( RESNET50 & InceptionV3) epochs based on parameters quantities. InceptionV3 was better based on LB score but under ensemble model with GAPNET. I couldn't train any longer these pre trained models because I had to focus on other projects and exams so... I don't blame them. A fascinating approach which I would have loved to have time to do, is to increase network depth by connecting an encoder decoder mecanism ( like a mask ) based on pre trained model, and to train it with dual losses (focal as classification & BCE/DICE as segmentation) with GAPNET as classifier. (coming soon...) It was a part of the 4th place solution which I tried to implement before their publication but anyway I hadn't time to train it. I join Bestfitting solution here 3rd place github : & associated discussion It was a real pleasure to participate to such competition (thanks Kaggle !), I made mistakes but definitely learned a lot thanks to it and others kagglers. This competition was a blessing to link both subjects which fascinate me the most",Object Detection,Object Detection 2401,Computer Vision,Computer Vision,Computer Vision,"Yolo v3 and Yolo v2 for Windows and Linux (neural network for object detection) Tensor Cores can be used on Linux and Windows CircleCI Requirements ( requirements) Pre trained models ( pre trained models) Explanations in issues 0. Improvements in this repository ( improvements in this repository) 1. How to use ( how to use on the command line) 2. How to compile on Linux ( how to compile on linux) 3. How to compile on Windows Using vcpkg ( how to compile on windows using vcpkg) Legacy way ( how to compile on windows legacy way) 4. How to train (Pascal VOC Data) ( how to train pascal voc data) 5. How to train with multi GPU: ( how to train with multi gpu) 6. How to train (to detect your custom objects) ( how to train to detect your custom objects) 7. How to train tiny yolo (to detect your custom objects) ( how to train tiny yolo to detect your custom objects) 8. When should I stop training ( when should i stop training) 9. How to calculate mAP on PascalVOC 2007 ( how to calculate map on pascalvoc 2007) 10. How to improve object detection ( how to improve object detection) 11. How to mark bounded boxes of objects and create annotation files ( how to mark bounded boxes of objects and create annotation files) 12. How to use Yolo as DLL and SO libraries ( how to use yolo as dll and so libraries) ! Darknet Logo ! map_time mAP@0.5 (AP50) YOLOv3 spp better than YOLOv3 mAP 60.6%, FPS 20: Yolo v3 source chart for the RetinaNet on MS COCO got from Table 1 (e): Yolo v2 on Pascal VOC 2007: Yolo v2 on Pascal VOC 2012 (comp4): You Only Look Once: Unified, Real Time Object Detection (versions 2 & 3) A Yolo cross platform Windows and Linux version (for object detection). Contributors: This repository is forked from Linux version: More details: This repository supports: both Windows and Linux both OpenCV 2.x.x and OpenCV v7 CUDA > 7.5 also create SO library on Linux and DLL library on Windows Requirements CMake > 3.8 for modern CUDA support: CUDA 10.0 : (on Linux do Post installation Actions ) OpenCV 7.0 for CUDA 10.0 (set system variable CUDNN C:\cudnn where did you unpack cuDNN. On Linux in .bashrc file, on Windows see the image ) GPU with CC > 3.0 : on Linux GCC or Clang , on Windows MSVS 2017 (v15) Pre trained models There are weights file for different cfg files (smaller size > faster speed & lower accuracy: yolov3 openimages.cfg (247 MB COCO Yolo v3 ) requires 4 GB GPU RAM: yolov3 spp.cfg (240 MB COCO Yolo v3 ) requires 4 GB GPU RAM: yolov3.cfg (236 MB COCO Yolo v3 ) requires 4 GB GPU RAM: yolov3 tiny.cfg (34 MB COCO Yolo v3 tiny ) requires 1 GB GPU RAM: yolov2.cfg (194 MB COCO Yolo v2) requires 4 GB GPU RAM: yolo voc.cfg (194 MB VOC Yolo v2) requires 4 GB GPU RAM: yolov2 tiny.cfg (43 MB COCO Yolo v2) requires 1 GB GPU RAM: yolov2 tiny voc.cfg (60 MB VOC Yolo v2) requires 1 GB GPU RAM: yolo9000.cfg (186 MB Yolo9000 model) requires 4 GB GPU RAM: Put it near compiled: darknet.exe You can get cfg files by path: darknet/cfg/ Examples of results Everything Is AWESOME Others: Improvements in this repository added support for Windows improved binary neural network performance 2x 4x times for Detection on CPU and GPU if you trained your own weights by using this XNOR net model (bit 1 inference) : improved neural network performance 7% by fusing 2 layers into 1: Convolutional + Batch norm improved neural network performance Detection 3x times , Training 2 x times on GPU Volta (Tesla V100, Titan V, ...) using Tensor Cores if CUDNN_HALF defined in the Makefile or darknet.sln improved performance 1.2x times on FullHD, 2x times on 4K, for detection on the video (file/stream) using darknet detector demo ... improved performance 3.5 X times of data augmentation for training (using OpenCV SSE/AVX functions instead of hand written functions) removes bottleneck for training on multi GPU or GPU Volta improved performance of detection and training on Intel CPU with AVX (Yolo v3 85% , Yolo v2 10%) fixed usage of reorg layer optimized memory allocation during network resizing when random 1 optimized initialization GPU for detection we use batch 1 initially instead of re init with batch 1 added correct calculation of mAP, F1, IoU, Precision Recall using command darknet detector map ... added drawing of chart of average Loss and accuracy mAP ( map flag) during training run ./darknet detector demo ... json_port 8070 mjpeg_port 8090 as JSON and MJPEG server to get results online over the network by using your soft or Web browser added calculation of anchors for training added example of Detection and Tracking objects: fixed code for use Web cam on OpenCV 3.x run time tips and warnings if you use incorrect cfg file or dataset many other fixes of code... And added manual How to train Yolo v3/v2 (to detect your custom objects) ( how to train to detect your custom objects) Also, you might be interested in using a simplified repository where is implemented INT8 quantization (+30% speedup and 1% mAP reduced): How to use on the command line On Linux use ./darknet instead of darknet.exe , like this: ./darknet detector test ./cfg/coco.data ./cfg/yolov3.cfg ./yolov3.weights On Linux find executable file ./darknet in the root directory, while on Windows find it in the directory \build\darknet\x64 Yolo v3 COCO image : darknet.exe detector test cfg/coco.data cfg/yolov3.cfg yolov3.weights i 0 thresh 0.25 Output coordinates of objects: darknet.exe detector test cfg/coco.data yolov3.cfg yolov3.weights ext_output dog.jpg Yolo v3 COCO video : darknet.exe detector demo cfg/coco.data cfg/yolov3.cfg yolov3.weights ext_output test.mp4 Yolo v3 COCO WebCam 0 : darknet.exe detector demo cfg/coco.data cfg/yolov3.cfg yolov3.weights c 0 Yolo v3 COCO for net videocam Smart WebCam: darknet.exe detector demo cfg/coco.data cfg/yolov3.cfg yolov3.weights Yolo v3 save result videofile res.avi : darknet.exe detector demo cfg/coco.data cfg/yolov3.cfg yolov3.weights thresh 0.25 test.mp4 out_filename res.avi Yolo v3 Tiny COCO video: darknet.exe detector demo cfg/coco.data cfg/yolov3 tiny.cfg yolov3 tiny.weights test.mp4 JSON and MJPEG server that allows multiple connections from your soft or Web browser ip address:8070 and 8090: ./darknet detector demo ./cfg/coco.data ./cfg/yolov3.cfg ./yolov3.weights test50.mp4 json_port 8070 mjpeg_port 8090 ext_output Yolo v3 Tiny on GPU 0 : darknet.exe detector demo cfg/coco.data cfg/yolov3 tiny.cfg yolov3 tiny.weights i 0 test.mp4 Alternative method Yolo v3 COCO image: darknet.exe detect cfg/yolov3.cfg yolov3.weights i 0 thresh 0.25 Train on Amazon EC2 , to see mAP & Loss chart using URL like: in the Chrome/Firefox: ./darknet detector train cfg/coco.data yolov3.cfg darknet53.conv.74 dont_show mjpeg_port 8090 map 186 MB Yolo9000 image: darknet.exe detector test cfg/combine9k.data yolo9000.cfg yolo9000.weights Remeber to put data/9k.tree and data/coco9k.map under the same folder of your app if you use the cpp api to build an app To process a list of images data/train.txt and save results of detection to result.txt use: darknet.exe detector test cfg/coco.data yolov3.cfg yolov3.weights dont_show ext_output result.txt Pseudo lableing to process a list of images data/new_train.txt and save results of detection in Yolo training format for each image as label .txt (in this way you can increase the amount of training data) use: darknet.exe detector test cfg/coco.data yolov3.cfg yolov3.weights thresh 0.25 dont_show save_labels cd $env:VCPKG_ROOT PS Code\vcpkg> .\vcpkg install pthreads opencv replace with opencv cuda in case you want to use cuda accelerated openCV 8. necessary only with CUDA Customize the CMakeLists.txt with the preferred compute capability 9. Build with the Powershell script build.ps1 or use the Open Folder functionality of Visual Studio 2017. In the first option, if you want to use Visual Studio, you will find a custom solution created for you by CMake after the build containing all the appropriate config flags for your system. How to compile on Windows (legacy way) 1. If you have MSVS 2015, CUDA 10.0, cuDNN 7.4 and OpenCV 3.x (with paths: C:\opencv_3.0\opencv\build\include & C:\opencv_3.0\opencv\build\x64\vc14\lib ), then start MSVS, open build\darknet\darknet.sln , set x64 and Release and do the: Build > Build darknet. Also add Windows system variable CUDNN with path to CUDNN: NOTE: If installing OpenCV, use OpenCV 3.4.0 or earlier. This is a bug in OpenCV 3.4.1 in the C API (see 500 ). 1.1. Find files opencv_world320.dll and opencv_ffmpeg320_64.dll (or opencv_world340.dll and opencv_ffmpeg340_64.dll ) in C:\opencv_3.0\opencv\build\x64\vc14\bin and put it near with darknet.exe 1.2 Check that there are bin and include folders in the C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0 if aren't, then copy them to this folder from the path where is CUDA installed 1.3. To install CUDNN (speedup neural network), do the following: download and install cuDNN v7.4.1 for CUDA 10.0 : add Windows system variable CUDNN with path to CUDNN: copy file cudnn64_7.dll to the folder \build\darknet\x64 near with darknet.exe 1.4. If you want to build without CUDNN then: open \darknet.sln > (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions, and remove this: CUDNN; 2. If you have other version of CUDA (not 10.0) then open build\darknet\darknet.vcxproj by using Notepad, find 2 places with CUDA 10.0 and change it to your CUDA version. Then open \darknet.sln > (right click on project) > properties > CUDA C/C++ > Device and remove there ;compute_75,sm_75 . Then do step 1 3. If you don't have GPU , but have MSVS 2015 and OpenCV 3.0 (with paths: C:\opencv_3.0\opencv\build\include & C:\opencv_3.0\opencv\build\x64\vc14\lib ), then start MSVS, open build\darknet\darknet_no_gpu.sln , set x64 and Release , and do the: Build > Build darknet_no_gpu 4. If you have OpenCV 2.4.13 instead of 3.0 then you should change pathes after \darknet.sln is opened 4.1 (right click on project) > properties > C/C++ > General > Additional Include Directories: C:\opencv_2.4.13\opencv\build\include 4.2 (right click on project) > properties > Linker > General > Additional Library Directories: C:\opencv_2.4.13\opencv\build\x64\vc14\lib 5. If you have GPU with Tensor Cores (nVidia Titan V / Tesla V100 / DGX 2 and later) speedup Detection 3x, Training 2x: \darknet.sln > (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions, and add here: CUDNN_HALF; Note: CUDA must be installed only after that MSVS2015 had been installed. How to compile (custom): Also, you can to create your own darknet.sln & darknet.vcxproj , this example for CUDA 9.1 and OpenCV 3.0 Then add to your created project: (right click on project) > properties > C/C++ > General > Additional Include Directories, put here: C:\opencv_3.0\opencv\build\include;..\..\3rdparty\include;%(AdditionalIncludeDirectories);$(CudaToolkitIncludeDir);$(CUDNN)\include (right click on project) > Build dependecies > Build Customizations > set check on CUDA 9.1 or what version you have for example as here: add to project: all .c files all .cu files file from \src directory file darknet.h from \include directory (right click on project) > properties > Linker > General > Additional Library Directories, put here: C:\opencv_3.0\opencv\build\x64\vc14\lib;$(CUDA_PATH)lib\$(PlatformName);$(CUDNN)\lib\x64;%(AdditionalLibraryDirectories) (right click on project) > properties > Linker > Input > Additional dependecies, put here: ..\..\3rdparty\lib\x64\pthreadVC2.lib;cublas.lib;curand.lib;cudart.lib;cudnn.lib;%(AdditionalDependencies) (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions OPENCV;_TIMESPEC_DEFINED;_CRT_SECURE_NO_WARNINGS;_CRT_RAND_S;WIN32;NDEBUG;_CONSOLE;_LIB;%(PreprocessorDefinitions) compile to .exe (X64 & Release) and put .dll s near with .exe: pthreadVC2.dll, pthreadGC2.dll from \3rdparty\dll\x64 cusolver64_91.dll, curand64_91.dll, cudart64_91.dll, cublas64_91.dll 91 for CUDA 9.1 or your version, from C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1\bin For OpenCV 3.2: opencv_world320.dll and opencv_ffmpeg320_64.dll from C:\opencv_3.0\opencv\build\x64\vc14\bin For OpenCV 2.4.13: opencv_core2413.dll , opencv_highgui2413.dll and opencv_ffmpeg2413_64.dll from C:\opencv_2.4.13\opencv\build\x64\vc14\bin How to train (Pascal VOC Data): 1. Download pre trained weights for the convolutional layers (154 MB): and put to the directory build\darknet\x64 2. Download The Pascal VOC Data and unpack it to directory build\darknet\x64\data\voc will be created dir build\darknet\x64\data\voc\VOCdevkit\ : 2.1 Download file voc_label.py to dir build\darknet\x64\data\voc : 3. Download and install Python for Windows: 4. Run command: python build\darknet\x64\data\voc\voc_label.py (to generate files: 2007_test.txt, 2007_train.txt, 2007_val.txt, 2012_train.txt, 2012_val.txt) 5. Run command: type 2007_train.txt 2007_val.txt 2012_ .txt > train.txt 6. Set batch 64 and subdivisions 8 in the file yolov3 voc.cfg : link 7. Start training by using train_voc.cmd or by using the command line: darknet.exe detector train cfg/voc.data cfg/yolov3 voc.cfg darknet53.conv.74 ( Note: To disable Loss Window use flag dont_show . If you are using CPU, try darknet_no_gpu.exe instead of darknet.exe .) If required change pathes in the file build\darknet\x64\data\voc.data More information about training by the link: Note: If during training you see nan values for avg (loss) field then training goes wrong, but if nan is in some other lines then training goes well. How to train with multi GPU: 1. Train it first on 1 GPU for like 1000 iterations: darknet.exe detector train cfg/voc.data cfg/yolov3 voc.cfg darknet53.conv.74 2. Then stop and by using partially trained model /backup/yolov3 voc_1000.weights run training with multigpu (up to 4 GPUs): darknet.exe detector train cfg/voc.data cfg/yolov3 voc.cfg /backup/yolov3 voc_1000.weights gpus 0,1,2,3 Only for small datasets sometimes better to decrease learning rate, for 4 GPUs set learning_rate 0.00025 (i.e. learning_rate 0.001 / GPUs). In this case also increase 4x times burn_in and max_batches in your cfg file. I.e. use burn_in 4000 instead of 1000 . How to train (to detect your custom objects): (to train old Yolo v2 yolov2 voc.cfg , yolov2 tiny voc.cfg , yolo voc.cfg , yolo voc.2.0.cfg , ... click by the link ) Training Yolo v3: 1. Create file yolo obj.cfg with the same content as in yolov3.cfg (or copy yolov3.cfg to yolo obj.cfg) and: change line batch to batch 64 change line subdivisions to subdivisions 8 change line classes 80 to your number of objects in each of 3 yolo layers: change filters 255 to filters (classes + 5)x3 in the 3 convolutional before each yolo layer So if classes 1 then should be filters 18 . If classes 2 then write filters 21 . (Do not write in the cfg file: filters (classes + 5)x3) (Generally filters depends on the classes , coords and number of mask s, i.e. filters (classes + coords + 1) , where mask is indices of anchors. If mask is absence, then filters (classes + coords + 1) num ) So for example, for 2 objects, your file yolo obj.cfg should differ from yolov3.cfg in such lines in each of 3 yolo layers: convolutional filters 21 region classes 2 2. Create file obj.names in the directory build\darknet\x64\data\ , with objects names each in new line 3. Create file obj.data in the directory build\darknet\x64\data\ , containing (where classes number of objects ): classes 2 train data/train.txt valid data/test.txt names data/obj.names backup backup/ 4. Put image files (.jpg) of your objects in the directory build\darknet\x64\data\obj\ 5. You should label each object on images from your dataset. Use this visual GUI software for marking bounded boxes of objects and generating annotation files for Yolo v2 & v3: It will create .txt file for each .jpg image file in the same directory and with the same name, but with .txt extension, and put to file: object number and object coordinates on this image, for each object in new line: Where: integer object number from 0 to (classes 1) float values relative to width and height of image, it can be equal from (0.0 to 1.0 for example: / or / atention: are center of rectangle (are not top left corner) For example for img1.jpg you will be created img1.txt containing: 1 0.716797 0.395833 0.216406 0.147222 0 0.687109 0.379167 0.255469 0.158333 1 0.420312 0.395833 0.140625 0.166667 6. Create file train.txt in directory build\darknet\x64\data\ , with filenames of your images, each filename in new line, with path relative to darknet.exe , for example containing: data/obj/img1.jpg data/obj/img2.jpg data/obj/img3.jpg 7. Download pre trained weights for the convolutional layers (154 MB): and put to the directory build\darknet\x64 8. Start training by using the command line: darknet.exe detector train data/obj.data yolo obj.cfg darknet53.conv.74 To train on Linux use command: ./darknet detector train data/obj.data yolo obj.cfg darknet53.conv.74 (just use ./darknet instead of darknet.exe ) (file yolo obj_last.weights will be saved to the build\darknet\x64\backup\ for each 100 iterations) (file yolo obj_xxxx.weights will be saved to the build\darknet\x64\backup\ for each 1000 iterations) (to disable Loss Window use darknet.exe detector train data/obj.data yolo obj.cfg darknet53.conv.74 dont_show , if you train on computer without monitor like a cloud Amazon EC2) (to see the mAP & Loss chart during training on remote server without GUI, use command darknet.exe detector train data/obj.data yolo obj.cfg darknet53.conv.74 dont_show mjpeg_port 8090 map then open URL in Chrome/Firefox browser) 8.1. For training with mAP (mean average precisions) calculation for each 4 Epochs (set valid valid.txt or train.txt in obj.data file) and run: darknet.exe detector train data/obj.data yolo obj.cfg darknet53.conv.74 map 9. After training is complete get result yolo obj_final.weights from path build\darknet\x64\backup\ After each 100 iterations you can stop and later start training from this point. For example, after 2000 iterations you can stop training, and later just copy yolo obj_2000.weights from build\darknet\x64\backup\ to build\darknet\x64\ and start training using: darknet.exe detector train data/obj.data yolo obj.cfg yolo obj_2000.weights (in the original repository the weights file is saved only once every 10 000 iterations if(iterations > 1000) ) Also you can get result earlier than all 45000 iterations. Note: If during training you see nan values for avg (loss) field then training goes wrong, but if nan is in some other lines then training goes well. Note: If you changed width or height in your cfg file, then new width and height must be divisible by 32. Note: After training use such command for detection: darknet.exe detector test data/obj.data yolo obj.cfg yolo obj_8000.weights Note: if error Out of memory occurs then in .cfg file you should increase subdivisions 16 , 32 or 64: link How to train tiny yolo (to detect your custom objects): Do all the same steps as for the full yolo model as described above. With the exception of: Download default weights file for yolov3 tiny: Get pre trained weights yolov3 tiny.conv.15 using command: darknet.exe partial cfg/yolov3 tiny.cfg yolov3 tiny.weights yolov3 tiny.conv.15 15 Make your custom model yolov3 tiny obj.cfg based on cfg/yolov3 tiny_obj.cfg instead of yolov3.cfg Start training: darknet.exe detector train data/obj.data yolov3 tiny obj.cfg yolov3 tiny.conv.15 For training Yolo based on other models ( DenseNet201 Yolo or ResNet50 Yolo ), you can download and get pre trained weights as showed in this file: If you made you custom model that isn't based on other models, then you can train it without pre trained weights, then will be used random initial weights. When should I stop training: Usually sufficient 2000 iterations for each class(object), but not less than 4000 iterations in total. But for a more precise definition when you should stop training, use the following manual: 1. During training, you will see varying indicators of error, and you should stop when no longer decreases 0.XXXXXXX avg : > Region Avg IOU: 0.798363, Class: 0.893232, Obj: 0.700808, No Obj: 0.004567, Avg Recall: 1.000000, count: 8 > Region Avg IOU: 0.800677, Class: 0.892181, Obj: 0.701590, No Obj: 0.004574, Avg Recall: 1.000000, count: 8 > > 9002 : 0.211667, 0.060730 avg , 0.001000 rate, 3.868000 seconds, 576128 images > Loaded: 0.000000 seconds 9002 iteration number (number of batch) 0.060730 avg average loss (error) the lower, the better When you see that average loss 0.xxxxxx avg no longer decreases at many iterations then you should stop training. The final avgerage loss can be from 0.05 (for a small model and easy dataset) to 3.0 (for a big model and a difficult dataset). 2. Once training is stopped, you should take some of last .weights files from darknet\build\darknet\x64\backup and choose the best of them: For example, you stopped training after 9000 iterations, but the best result can give one of previous weights (7000, 8000, 9000). It can happen due to overfitting. Overfitting is case when you can detect objects on images from training dataset, but can't detect objects on any others images. You should get weights from Early Stopping Point : ! Overfitting To get weights from Early Stopping Point: 2.1. At first, in your file obj.data you must specify the path to the validation dataset valid valid.txt (format of valid.txt as in train.txt ), and if you haven't validation images, just copy data\train.txt to data\valid.txt . 2.2 If training is stopped after 9000 iterations, to validate some of previous weights use this commands: (If you use another GitHub repository, then use darknet.exe detector recall ... instead of darknet.exe detector map ...) darknet.exe detector map data/obj.data yolo obj.cfg backup\yolo obj_7000.weights darknet.exe detector map data/obj.data yolo obj.cfg backup\yolo obj_8000.weights darknet.exe detector map data/obj.data yolo obj.cfg backup\yolo obj_9000.weights And comapre last output lines for each weights (7000, 8000, 9000): Choose weights file with the highest mAP (mean average precision) or IoU (intersect over union) For example, bigger mAP gives weights yolo obj_8000.weights then use this weights for detection . Or just train with map flag: darknet.exe detector train data/obj.data yolo obj.cfg darknet53.conv.74 map So you will see mAP chart (red line) in the Loss chart Window. mAP will be calculated for each 4 Epochs using valid valid.txt file that is specified in obj.data file ( 1 Epoch images_in_train_txt / batch iterations) ! loss_chart_map_chart Example of custom object detection: darknet.exe detector test data/obj.data yolo obj.cfg yolo obj_8000.weights IoU (intersect over union) average instersect over union of objects and detections for a certain threshold 0.24 mAP (mean average precision) mean value of average precisions for each class, where average precision is average value of 11 points on PR curve for each possible threshold (each probability of detection) for the same class (Precision Recall in terms of PascalVOC, where Precision TP/(TP+FP) and Recall TP/(TP+FN) ), page 11: mAP is default metric of precision in the PascalVOC competition, this is the same as AP50 metric in the MS COCO competition. In terms of Wiki, indicators Precision and Recall have a slightly different meaning than in the PascalVOC competition, but IoU always has the same meaning . ! precision_recall_iou How to calculate mAP on PascalVOC 2007: 1. To calculate mAP (mean average precision) on PascalVOC 2007 test: Download PascalVOC dataset, install Python 3.x and get file 2007_test.txt as described here: Then download file to the dir build\darknet\x64\data\ then run voc_label_difficult.py to get the file difficult_2007_test.txt Remove symbol from this line to un comment it: Then there are 2 ways to get mAP: 1. Using Darknet + Python: run the file build/darknet/x64/calc_mAP_voc_py.cmd you will get mAP for yolo voc.cfg model, mAP 75.9% 2. Using this fork of Darknet: run the file build/darknet/x64/calc_mAP.cmd you will get mAP for yolo voc.cfg model, mAP 75.8% (The article specifies the value of mAP 76.8% for YOLOv2 416×416, page 4 table 3: We get values lower perhaps due to the fact that the model was trained on a slightly different source code than the code on which the detection is was done) if you want to get mAP for tiny yolo voc.cfg model, then un comment line for tiny yolo voc.cfg and comment line for yolo voc.cfg in the .cmd file if you have Python 2.x instead of Python 3.x, and if you use Darknet+Python way to get mAP, then in your cmd file use reval_voc.py and voc_eval.py instead of reval_voc_py3.py and voc_eval_py3.py from this directory: Custom object detection: Example of custom object detection: darknet.exe detector test data/obj.data yolo obj.cfg yolo obj_8000.weights ! Yolo_v2_training ! Yolo_v2_training How to improve object detection: 1. Before training: set flag random 1 in your .cfg file it will increase precision by training Yolo for different resolutions: link increase network resolution in your .cfg file ( height 608 , width 608 or any value multiple of 32) it will increase precision check that each object are mandatory labeled in your dataset no one object in your data set should not be without label. In the most training issues there are wrong labels in your dataset (got labels by using some conversion script, marked with a third party tool, ...). Always check your dataset by using: for each object which you want to detect there must be at least 1 similar object in the Training dataset with about the same: shape, side of object, relative size, angle of rotation, tilt, illumination. So desirable that your training dataset include images with objects at diffrent: scales, rotations, lightings, from different sides, on different backgrounds you should preferably have 2000 different images for each class or more, and you should train 2000 classes iterations or more desirable that your training dataset include images with non labeled objects that you do not want to detect negative samples without bounded box (empty .txt files) use as many images of negative samples as there are images with objects for training with a large number of objects in each image, add the parameter max 200 or higher value in the last yolo layer or region layer in your cfg file (the global maximum number of objects that can be detected by YoloV3 is 0,0615234375 (width height) where are width and height are parameters from net section in cfg file) for training for small objects (smaller than 16x16 after the image is resized to 416x416) set layers 1, 11 instead of and set stride 4 instead of for training for both small and large objects use modified models: Full model: 5 yolo layers: Tiny model: 3 yolo layers: Spatial full model: 3 yolo layers: If you train the model to distinguish Left and Right objects as separate classes (left/right hand, left/right turn on road signs, ...) then for disabling flip data augmentation add flip 0 here: General rule your training dataset should include such a set of relative sizes of objects that you want to detect: train_network_width train_obj_width / train_image_width detection_network_width detection_obj_width / detection_image_width train_network_height train_obj_height / train_image_height detection_network_height detection_obj_height / detection_image_height I.e. for each object from Test dataset there must be at least 1 object in the Training dataset with the same class_id and about the same relative size: object width in percent from Training dataset object width in percent from Test dataset That is, if only objects that occupied 80 90% of the image were present in the training set, then the trained network will not be able to detect objects that occupy 1 10% of the image. to speedup training (with decreasing detection accuracy) do Fine Tuning instead of Transfer Learning, set param stopbackward 1 here: then do this command: ./darknet partial cfg/yolov3.cfg yolov3.weights yolov3.conv.81 81 will be created file yolov3.conv.81 , then train by using weights file yolov3.conv.81 instead of darknet53.conv.74 each: model of object, side, illimination, scale, each 30 grad of the turn and inclination angles these are different objects from an internal perspective of the neural network. So the more different objects you want to detect, the more complex network model should be used. recalculate anchors for your dataset for width and height from cfg file: darknet.exe detector calc_anchors data/obj.data num_of_clusters 9 width 416 height 416 then set the same 9 anchors in each of 3 yolo layers in your cfg file. But you should change indexes of anchors masks for each yolo layer, so that 1st yolo layer has anchors larger than 60x60, 2nd larger than 30x30, 3rd remaining. Also you should change the filters (classes + 5) before each yolo layer. If many of the calculated anchors do not fit under the appropriate layers then just try using all the default anchors. 2. After training for detection: Increase network resolution by set in your .cfg file ( height 608 and width 608 ) or ( height 832 and width 832 ) or (any value multiple of 32) this increases the precision and makes it possible to detect small objects: link it is not necessary to train the network again, just use .weights file already trained for 416x416 resolution but to get even greater accuracy you should train with higher resolution 608x608 or 832x832, note: if error Out of memory occurs then in .cfg file you should increase subdivisions 16 , 32 or 64: link How to mark bounded boxes of objects and create annotation files: Here you can find repository with GUI software for marking bounded boxes of objects and generating annotation files for Yolo v2 & v3: With example of: train.txt , obj.names , obj.data , yolo obj.cfg , air 1 6 .txt , bird 1 4 .txt for 2 classes of objects (air, bird) and train_obj.cmd with example how to train this image set with Yolo v2 & v3 Using Yolo9000 Simultaneous detection and classification of 9000 objects: darknet.exe detector test cfg/combine9k.data cfg/yolo9000.cfg yolo9000.weights data/dog.jpg yolo9000.weights (186 MB Yolo9000 Model) requires 4 GB GPU RAM: yolo9000.cfg cfg file of the Yolo9000, also there are paths to the 9k.tree and coco9k.map 9k.tree WordTree of 9418 categories , if parent_id 1 then this label hasn't parent: coco9k.map map 80 categories from MSCOCO to WordTree 9k.tree : combine9k.data data file, there are paths to: 9k.labels , 9k.names , inet9k.map , (change path to your combine9k.train.list ): 9k.labels 9418 labels of objects: 9k.names 9418 names of objects: inet9k.map map 200 categories from ImageNet to WordTree 9k.tree : How to use Yolo as DLL and SO libraries on Linux set LIBSO 1 in the Makefile and do make on Windows compile build\darknet\yolo_cpp_dll.sln or build\darknet\yolo_cpp_dll_no_gpu.sln solution There are 2 APIs: C API: Python examples using the C API:: C++ API: C++ example that uses C++ API: 1. To compile Yolo as C++ DLL file yolo_cpp_dll.dll open in MSVS2015 file build\darknet\yolo_cpp_dll.sln , set x64 and Release , and do the: Build > Build yolo_cpp_dll You should have installed CUDA 10.0 To use cuDNN do: (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions, and add at the beginning of line: CUDNN; 2. To use Yolo as DLL file in your C++ console application open in MSVS2015 file build\darknet\yolo_console_dll.sln , set x64 and Release , and do the: Build > Build yolo_console_dll you can run your console application from Windows Explorer build\darknet\x64\yolo_console_dll.exe use this command : yolo_console_dll.exe data/coco.names yolov3.cfg yolov3.weights test.mp4 after launching your console application and entering the image file name you will see info for each object: to use simple OpenCV GUI you should uncomment line // define OPENCV in yolo_console_dll.cpp file: link you can see source code of simple example for detection on the video file: link yolo_cpp_dll.dll API: link struct bbox_t { unsigned int x, y, w, h; // (x,y) top left corner, (w, h) width & height of bounded box float prob; // confidence probability that the object was found correctly unsigned int obj_id; // class of object from range 0, classes 1 unsigned int track_id; // tracking id for video (0 untracked, 1 inf tracked object) unsigned int frames_counter;// counter of frames on which the object was detected }; class Detector { public: Detector(std::string cfg_filename, std::string weight_filename, int gpu_id 0); Detector(); std::vector detect(std::string image_filename, float thresh 0.2, bool use_mean false); std::vector detect(image_t img, float thresh 0.2, bool use_mean false); static image_t load_image(std::string image_filename); static void free_image(image_t m); ifdef OPENCV std::vector detect(cv::Mat mat, float thresh 0.2, bool use_mean false); std::shared_ptr mat_to_image_resize(cv::Mat mat) const; endif };",Object Detection,Object Detection 2402,Computer Vision,Computer Vision,Computer Vision,"Faster RCNN_TF This is an experimental Tensorflow implementation of Faster RCNN a convnet for object detection with a region proposal network. For details about R CNN please refer to the paper Faster R CNN: Towards Real Time Object Detection with Region Proposal Networks by Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun. Requirements: software 1. Requirements for Tensorflow (see: Tensorflow ) 2. Python packages you might not have: cython , python opencv , easydict Requirements: hardware 1. For training the end to end version of Faster R CNN with VGG16, 3G of GPU memory is sufficient (using CUDNN) Installation (sufficient for the demo) 1. Clone the Faster R CNN repository Shell Make sure to clone with recursive git clone recursive 2. Build the Cython modules Shell cd $FRCN_ROOT/lib make Demo After successfully completing basic installation ( installation sufficient for the demo) , you'll be ready to run the demo. Download model training on PASCAL VOC 2007 Google Drive Dropbox To run the demo Shell cd $FRCN_ROOT python ./tools/demo.py model model_path The demo performs detection using a VGG16 network trained for detection on PASCAL VOC 2007. Training Model 1. Download the training, validation, test data and VOCdevkit Shell wget wget wget 2. Extract all of these tars into one directory named VOCdevkit Shell tar xvf VOCtrainval_06 Nov 2007.tar tar xvf VOCtest_06 Nov 2007.tar tar xvf VOCdevkit_08 Jun 2007.tar 3. It should have this basic structure Shell $VOCdevkit/ development kit $VOCdevkit/VOCcode/ VOC utility code $VOCdevkit/VOC2007 image sets, annotations, etc. ... and several other directories ... 4. Create symlinks for the PASCAL VOC dataset Shell cd $FRCN_ROOT/data ln s $VOCdevkit VOCdevkit2007 5. Download pre trained ImageNet models Download the pre trained ImageNet models Google Drive Dropbox Shell mv VGG_imagenet.npy $FRCN_ROOT/data/pretrain_model/VGG_imagenet.npy 6. Run script to train and test model Shell cd $FRCN_ROOT ./experiments/scripts/faster_rcnn_end2end.sh $DEVICE $DEVICE_ID VGG16 pascal_voc DEVICE is either cpu/gpu The result of testing on PASCAL VOC 2007 Classes AP aeroplane 0.698 bicycle 0.788 bird 0.657 boat 0.565 bottle 0.478 bus 0.762 car 0.797 cat 0.793 chair 0.479 cow 0.724 diningtable 0.648 dog 0.803 horse 0.797 motorbike 0.732 person 0.770 pottedplant 0.384 sheep 0.664 sofa 0.650 train 0.766 tvmonitor 0.666 mAP 0.681 References Faster R CNN caffe version A tensorflow implementation of SubCNN (working progress)",Object Detection,Object Detection 2414,Computer Vision,Computer Vision,Computer Vision,"Yolo v3 and Yolo v2 for Windows and Linux (neural network for object detection) Tensor Cores can be used on Linux and Windows CircleCI 0. Improvements in this repository ( improvements in this repository) 1. How to use ( how to use) 2. How to compile on Linux ( how to compile on linux) 3. How to compile on Windows ( how to compile on windows) 4. How to train (Pascal VOC Data) ( how to train pascal voc data) 5. How to train (to detect your custom objects) ( how to train to detect your custom objects) 6. When should I stop training ( when should i stop training) 7. How to calculate mAP on PascalVOC 2007 ( how to calculate map on pascalvoc 2007) 8. How to improve object detection ( how to improve object detection) 9. How to mark bounded boxes of objects and create annotation files ( how to mark bounded boxes of objects and create annotation files) 10. Using Yolo9000 ( using yolo9000) 11. How to use Yolo as DLL ( how to use yolo as dll) ! Darknet Logo ! map_fps mAP (AP50) YOLOv3 spp (is not indicated) better than YOLOv3 mAP 60.6%, FPS 20: Yolo v3 source chart for the RetinaNet on MS COCO got from Table 1 (e): Yolo v2 on Pascal VOC 2007: Yolo v2 on Pascal VOC 2012 (comp4): You Only Look Once: Unified, Real Time Object Detection (versions 2 & 3) A Yolo cross platform Windows and Linux version (for object detection). Contributtors: This repository is forked from Linux version: More details: This repository supports: both Windows and Linux both OpenCV 2.x.x and OpenCV 7.5 also create SO library on Linux and DLL library on Windows Requires: Linux GCC> 4.9 or Windows MS Visual Studio 2015 (v140) : (or offline ISO image ) CUDA 9.1 : OpenCV 3.3.0 : or OpenCV 2.4.13 : OpenCV allows to show image or video detection in the window and store result to file that specified in command line out_filename res.avi GPU with CC > 3.0 : Pre trained models for different cfg files can be downloaded from (smaller > faster & lower quality): yolov3 spp.cfg (240 MB COCO Yolo v3 ) requires 4 GB GPU RAM: yolov3.cfg (236 MB COCO Yolo v3 ) requires 4 GB GPU RAM: yolov3 tiny.cfg (34 MB COCO Yolo v3 tiny ) requires 1 GB GPU RAM: yolov2.cfg (194 MB COCO Yolo v2) requires 4 GB GPU RAM: yolo voc.cfg (194 MB VOC Yolo v2) requires 4 GB GPU RAM: yolov2 tiny.cfg (43 MB COCO Yolo v2) requires 1 GB GPU RAM: yolov2 tiny voc.cfg (60 MB VOC Yolo v2) requires 1 GB GPU RAM: yolo9000.cfg (186 MB Yolo9000 model) requires 4 GB GPU RAM: Put it near compiled: darknet.exe You can get cfg files by path: darknet/cfg/ Examples of results: Everything Is AWESOME Others: Improvements in this repository added support for Windows improved binary neural network performance 2x 4x times for Detection on CPU and GPU if you trained your own weights by using this XNOR net model (bit 1 inference) : improved neural network performance 7% by fusing 2 layers into 1: Convolutional + Batch norm improved neural network performance Detection 3x times , Training 2 x times on GPU Volta (Tesla V100, Titan V, ...) using Tensor Cores if CUDNN_HALF defined in the Makefile or darknet.sln improved performance 1.2x times on FullHD, 2x times on 4K, for detection on the video (file/stream) using darknet detector demo ... improved performance 3.5 X times of data augmentation for training (using OpenCV SSE/AVX functions instead of hand written functions) removes bottleneck for training on multi GPU or GPU Volta improved performance of detection and training on Intel CPU with AVX (Yolo v3 85% , Yolo v2 10%) fixed usage of reorg layer optimized memory allocation during network resizing when random 1 optimized initialization GPU for detection we use batch 1 initially instead of re init with batch 1 added correct calculation of mAP, F1, IoU, Precision Recall using command darknet detector map ... added drawing of chart of average loss during training added calculation of anchors for training added example of Detection and Tracking objects: fixed code for use Web cam on OpenCV 3.x run time tips and warnings if you use incorrect cfg file or dataset many other fixes of code... And added manual How to train Yolo v3/v2 (to detect your custom objects) ( how to train to detect your custom objects) Also, you might be interested in using a simplified repository where is implemented INT8 quantization (+30% speedup and 1% mAP reduced): How to use: Example of usage in cmd files from build\darknet\x64\ : darknet_yolo_v3.cmd initialization with 236 MB Yolo v3 COCO model yolov3.weights & yolov3.cfg and show detection on the image: dog.jpg darknet_voc.cmd initialization with 194 MB VOC model yolo voc.weights & yolo voc.cfg and waiting for entering the name of the image file darknet_demo_voc.cmd initialization with 194 MB VOC model yolo voc.weights & yolo voc.cfg and play your video file which you must rename to: test.mp4 darknet_demo_store.cmd initialization with 194 MB VOC model yolo voc.weights & yolo voc.cfg and play your video file which you must rename to: test.mp4, and store result to: res.avi darknet_net_cam_voc.cmd initialization with 194 MB VOC model, play video from network video camera mjpeg stream (also from you phone) darknet_web_cam_voc.cmd initialization with 194 MB VOC model, play video from Web Camera number 0 darknet_coco_9000.cmd initialization with 186 MB Yolo9000 COCO model, and show detection on the image: dog.jpg darknet_coco_9000_demo.cmd initialization with 186 MB Yolo9000 COCO model, and show detection on the video (if it is present): street4k.mp4, and store result to: res.avi How to use on the command line: On Linux use ./darknet instead of darknet.exe , like this: ./darknet detector test ./cfg/coco.data ./cfg/yolov3.cfg ./yolov3.weights Yolo v3 COCO image: darknet.exe detector test data/coco.data cfg/yolov3.cfg yolov3.weights i 0 thresh 0.25 Alternative method Yolo v3 COCO image: darknet.exe detect cfg/yolov3.cfg yolov3.weights i 0 thresh 0.25 Output coordinates of objects: darknet.exe detector test data/coco.data yolov3.cfg yolov3.weights thresh 0.25 dog.jpg ext_output 194 MB VOC model image: darknet.exe detector test data/voc.data yolo voc.cfg yolo voc.weights i 0 194 MB VOC model video: darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights test.mp4 i 0 194 MB VOC model save result to the file res.avi : darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights test.mp4 i 0 out_filename res.avi Alternative method 194 MB VOC model video: darknet.exe yolo demo yolo voc.cfg yolo voc.weights test.mp4 i 0 43 MB VOC model for video: darknet.exe detector demo data/coco.data cfg/yolov2 tiny.cfg yolov2 tiny.weights test.mp4 i 0 Yolo v3 236 MB COCO for net videocam Smart WebCam: darknet.exe detector demo data/coco.data cfg/yolov3.cfg yolov3.weights i 0 194 MB VOC model for net videocam Smart WebCam: darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights i 0 194 MB VOC model WebCamera 0: darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights c 0 186 MB Yolo9000 image: darknet.exe detector test cfg/combine9k.data yolo9000.cfg yolo9000.weights Remeber to put data/9k.tree and data/coco9k.map under the same folder of your app if you use the cpp api to build an app To process a list of images data/train.txt and save results of detection to result.txt use: darknet.exe detector test data/voc.data yolo voc.cfg yolo voc.weights dont_show ext_output result.txt For using network video camera mjpeg stream with any Android smartphone: 1. Download for Android phone mjpeg stream soft: IP Webcam / Smart WebCam Smart WebCam preferably: IP Webcam: 2. Connect your Android phone to computer by WiFi (through a WiFi router) or USB 3. Start Smart WebCam on your phone 4. Replace the address below, on shown in the phone application (Smart WebCam) and launch: 194 MB COCO model: darknet.exe detector demo data/coco.data yolo.cfg yolo.weights i 0 194 MB VOC model: darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights i 0 How to compile on Linux: Just do make in the darknet directory. Before make, you can set such options in the Makefile : link GPU 1 to build with CUDA to accelerate by using GPU (CUDA should be in /usr/local/cuda ) CUDNN 1 to build with cuDNN v5 v7 to accelerate training by using GPU (cuDNN should be in /usr/local/cudnn ) CUDNN_HALF 1 to build for Tensor Cores (on Titan V / Tesla V100 / DGX 2 and later) speedup Detection 3x, Training 2x OPENCV 1 to build with OpenCV 3.x/2.4.x allows to detect on video files and video streams from network cameras or web cams DEBUG 1 to bould debug version of Yolo OPENMP 1 to build with OpenMP support to accelerate Yolo by using multi core CPU LIBSO 1 to build a library darknet.so and binary runable file uselib that uses this library. Or you can try to run so LD_LIBRARY_PATH ./:$LD_LIBRARY_PATH ./uselib test.mp4 How to use this SO library from your own code you can look at C++ example: or use in such a way: LD_LIBRARY_PATH ./:$LD_LIBRARY_PATH ./uselib data/coco.names cfg/yolov3.cfg yolov3.weights test.mp4 How to compile on Windows: 1. If you have MSVS 2015, CUDA 9.1, cuDNN 7.0 and OpenCV 3.x (with paths: C:\opencv_3.0\opencv\build\include & C:\opencv_3.0\opencv\build\x64\vc14\lib ), then start MSVS, open build\darknet\darknet.sln , set x64 and Release and do the: Build > Build darknet. NOTE: If installing OpenCV, use OpenCV 3.4.0 or earlier. This is a bug in OpenCV 3.4.1 in the C API (see 500 ). 1.1. Find files opencv_world320.dll and opencv_ffmpeg320_64.dll (or opencv_world340.dll and opencv_ffmpeg340_64.dll ) in C:\opencv_3.0\opencv\build\x64\vc14\bin and put it near with darknet.exe 1.2 Check that there are bin and include folders in the C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1 if aren't, then copy them to this folder from the path where is CUDA installed 1.3. To install CUDNN (speedup neural network), do the following: download and install cuDNN 7.0 for CUDA 9.1 : add Windows system variable cudnn with path to CUDNN: 1.4. If you want to build without CUDNN then: open \darknet.sln > (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions, and remove this: CUDNN; 2. If you have other version of CUDA (not 9.1) then open build\darknet\darknet.vcxproj by using Notepad, find 2 places with CUDA 9.1 and change it to your CUDA version, then do step 1 3. If you don't have GPU , but have MSVS 2015 and OpenCV 3.0 (with paths: C:\opencv_3.0\opencv\build\include & C:\opencv_3.0\opencv\build\x64\vc14\lib ), then start MSVS, open build\darknet\darknet_no_gpu.sln , set x64 and Release , and do the: Build > Build darknet_no_gpu 4. If you have OpenCV 2.4.13 instead of 3.0 then you should change pathes after \darknet.sln is opened 4.1 (right click on project) > properties > C/C++ > General > Additional Include Directories: C:\opencv_2.4.13\opencv\build\include 4.2 (right click on project) > properties > Linker > General > Additional Library Directories: C:\opencv_2.4.13\opencv\build\x64\vc14\lib 5. If you have GPU with Tensor Cores (nVidia Titan V / Tesla V100 / DGX 2 and later) speedup Detection 3x, Training 2x: \darknet.sln > (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions, and add here: CUDNN_HALF; Note: CUDA must be installed only after that MSVS2015 had been installed. How to compile (custom): Also, you can to create your own darknet.sln & darknet.vcxproj , this example for CUDA 9.1 and OpenCV 3.0 Then add to your created project: (right click on project) > properties > C/C++ > General > Additional Include Directories, put here: C:\opencv_3.0\opencv\build\include;..\..\3rdparty\include;%(AdditionalIncludeDirectories);$(CudaToolkitIncludeDir);$(cudnn)\include (right click on project) > Build dependecies > Build Customizations > set check on CUDA 9.1 or what version you have for example as here: add to project all .c & .cu files and file from \src (right click on project) > properties > Linker > General > Additional Library Directories, put here: C:\opencv_3.0\opencv\build\x64\vc14\lib;$(CUDA_PATH)lib\$(PlatformName);$(cudnn)\lib\x64;%(AdditionalLibraryDirectories) (right click on project) > properties > Linker > Input > Additional dependecies, put here: ..\..\3rdparty\lib\x64\pthreadVC2.lib;cublas.lib;curand.lib;cudart.lib;cudnn.lib;%(AdditionalDependencies) (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions OPENCV;_TIMESPEC_DEFINED;_CRT_SECURE_NO_WARNINGS;_CRT_RAND_S;WIN32;NDEBUG;_CONSOLE;_LIB;%(PreprocessorDefinitions) compile to .exe (X64 & Release) and put .dll s near with .exe: pthreadVC2.dll, pthreadGC2.dll from \3rdparty\dll\x64 cusolver64_91.dll, curand64_91.dll, cudart64_91.dll, cublas64_91.dll 91 for CUDA 9.1 or your version, from C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1\bin For OpenCV 3.2: opencv_world320.dll and opencv_ffmpeg320_64.dll from C:\opencv_3.0\opencv\build\x64\vc14\bin For OpenCV 2.4.13: opencv_core2413.dll , opencv_highgui2413.dll and opencv_ffmpeg2413_64.dll from C:\opencv_2.4.13\opencv\build\x64\vc14\bin How to train (Pascal VOC Data): 1. Download pre trained weights for the convolutional layers (154 MB): and put to the directory build\darknet\x64 2. Download The Pascal VOC Data and unpack it to directory build\darknet\x64\data\voc will be created dir build\darknet\x64\data\voc\VOCdevkit\ : 2.1 Download file voc_label.py to dir build\darknet\x64\data\voc : 3. Download and install Python for Windows: 4. Run command: python build\darknet\x64\data\voc\voc_label.py (to generate files: 2007_test.txt, 2007_train.txt, 2007_val.txt, 2012_train.txt, 2012_val.txt) 5. Run command: type 2007_train.txt 2007_val.txt 2012_ .txt > train.txt 6. Set batch 64 and subdivisions 8 in the file yolov3 voc.cfg : link 7. Start training by using train_voc.cmd or by using the command line: darknet.exe detector train data/voc.data cfg/yolov3 voc.cfg darknet53.conv.74 ( Note: To disable Loss Window use flag dont_show . If you are using CPU, try darknet_no_gpu.exe instead of darknet.exe .) If required change pathes in the file build\darknet\x64\data\voc.data More information about training by the link: Note: If during training you see nan values for avg (loss) field then training goes wrong, but if nan is in some other lines then training goes well. How to train with multi GPU: 1. Train it first on 1 GPU for like 1000 iterations: darknet.exe detector train data/voc.data cfg/yolov3 voc.cfg darknet53.conv.74 2. Adjust the learning rate ( cfg/yolov3 voc.cfg ) to fit the amount of GPUs. The learning rate should be equal to 0.001 , regardless of how many GPUs are used for training. So learning_rate GPUs 0.001 . For 4 GPUs adjust the value to learning_rate 0.00025 . 3. For 4xGPUs increase 4x times burn_in and max_batches in your cfg file. I.e. use burn_in 4000 instead of 1000 . 4. Then stop and by using partially trained model /backup/yolov3 voc_1000.weights run training with multigpu (up to 4 GPUs): darknet.exe detector train data/voc.data cfg/yolov3 voc.cfg /backup/yolov3 voc_1000.weights gpus 0,1,2,3 How to train (to detect your custom objects): (to train old Yolo v2 yolov2 voc.cfg , yolov2 tiny voc.cfg , yolo voc.cfg , yolo voc.2.0.cfg , ... click by the link ) Training Yolo v3: 1. Create file yolo obj.cfg with the same content as in yolov3.cfg (or copy yolov3.cfg to yolo obj.cfg) and: change line batch to batch 64 change line subdivisions to subdivisions 8 change line classes 80 to your number of objects in each of 3 yolo layers: change filters 255 to filters (classes + 5)x3 in the 3 convolutional before each yolo layer So if classes 1 then should be filters 18 . If classes 2 then write filters 21 . (Do not write in the cfg file: filters (classes + 5)x3) (Generally filters depends on the classes , coords and number of mask s, i.e. filters (classes + coords + 1) , where mask is indices of anchors. If mask is absence, then filters (classes + coords + 1) num ) So for example, for 2 objects, your file yolo obj.cfg should differ from yolov3.cfg in such lines in each of 3 yolo layers: convolutional filters 21 region classes 2 2. Create file obj.names in the directory build\darknet\x64\data\ , with objects names each in new line 3. Create file obj.data in the directory build\darknet\x64\data\ , containing (where classes number of objects ): classes 2 train data/train.txt valid data/test.txt names data/obj.names backup backup/ 4. Put image files (.jpg) of your objects in the directory build\darknet\x64\data\obj\ 5. You should label each object on images from your dataset. Use this visual GUI software for marking bounded boxes of objects and generating annotation files for Yolo v2 & v3: It will create .txt file for each .jpg image file in the same directory and with the same name, but with .txt extension, and put to file: object number and object coordinates on this image, for each object in new line: Where: integer object number from 0 to (classes 1) float values relative to width and height of image, it can be equal from (0.0 to 1.0 for example: / or / atention: are center of rectangle (are not top left corner) For example for img1.jpg you will be created img1.txt containing: 1 0.716797 0.395833 0.216406 0.147222 0 0.687109 0.379167 0.255469 0.158333 1 0.420312 0.395833 0.140625 0.166667 6. Create file train.txt in directory build\darknet\x64\data\ , with filenames of your images, each filename in new line, with path relative to darknet.exe , for example containing: data/obj/img1.jpg data/obj/img2.jpg data/obj/img3.jpg 7. Download pre trained weights for the convolutional layers (154 MB): and put to the directory build\darknet\x64 8. Start training by using the command line: darknet.exe detector train data/obj.data yolo obj.cfg darknet53.conv.74 (file yolo obj_xxx.weights will be saved to the build\darknet\x64\backup\ for each 100 iterations) (To disable Loss Window use darknet.exe detector train data/obj.data yolo obj.cfg darknet53.conv.74 dont_show , if you train on computer without monitor like a cloud Amazaon EC2) 9. After training is complete get result yolo obj_final.weights from path build\darknet\x64\backup\ After each 100 iterations you can stop and later start training from this point. For example, after 2000 iterations you can stop training, and later just copy yolo obj_2000.weights from build\darknet\x64\backup\ to build\darknet\x64\ and start training using: darknet.exe detector train data/obj.data yolo obj.cfg yolo obj_2000.weights (in the original repository the weights file is saved only once every 10 000 iterations if(iterations > 1000) ) Also you can get result earlier than all 45000 iterations. Note: If during training you see nan values for avg (loss) field then training goes wrong, but if nan is in some other lines then training goes well. Note: If you changed width or height in your cfg file, then new width and height must be divisible by 32. Note: After training use such command for detection: darknet.exe detector test data/obj.data yolo obj.cfg yolo obj_8000.weights Note: if error Out of memory occurs then in .cfg file you should increase subdivisions 16 , 32 or 64: link How to train tiny yolo (to detect your custom objects): Do all the same steps as for the full yolo model as described above. With the exception of: Download default weights file for yolov3 tiny: Get pre trained weights yolov3 tiny.conv.15 using command: darknet.exe partial cfg/yolov3 tiny.cfg yolov3 tiny.weights yolov3 tiny.conv.15 15 Make your custom model yolov3 tiny obj.cfg based on cfg/yolov3 tiny_obj.cfg instead of yolov3.cfg Start training: darknet.exe detector train data/obj.data yolov3 tiny obj.cfg yolov3 tiny.conv.15 For training Yolo based on other models ( DenseNet201 Yolo or ResNet50 Yolo ), you can download and get pre trained weights as showed in this file: If you made you custom model that isn't based on other models, then you can train it without pre trained weights, then will be used random initial weights. When should I stop training: Usually sufficient 2000 iterations for each class(object). But for a more precise definition when you should stop training, use the following manual: 1. During training, you will see varying indicators of error, and you should stop when no longer decreases 0.XXXXXXX avg : > Region Avg IOU: 0.798363, Class: 0.893232, Obj: 0.700808, No Obj: 0.004567, Avg Recall: 1.000000, count: 8 > Region Avg IOU: 0.800677, Class: 0.892181, Obj: 0.701590, No Obj: 0.004574, Avg Recall: 1.000000, count: 8 > > 9002 : 0.211667, 0.060730 avg , 0.001000 rate, 3.868000 seconds, 576128 images > Loaded: 0.000000 seconds 9002 iteration number (number of batch) 0.060730 avg average loss (error) the lower, the better When you see that average loss 0.xxxxxx avg no longer decreases at many iterations then you should stop training. 2. Once training is stopped, you should take some of last .weights files from darknet\build\darknet\x64\backup and choose the best of them: For example, you stopped training after 9000 iterations, but the best result can give one of previous weights (7000, 8000, 9000). It can happen due to overfitting. Overfitting is case when you can detect objects on images from training dataset, but can't detect objects on any others images. You should get weights from Early Stopping Point : ! Overfitting To get weights from Early Stopping Point: 2.1. At first, in your file obj.data you must specify the path to the validation dataset valid valid.txt (format of valid.txt as in train.txt ), and if you haven't validation images, just copy data\train.txt to data\valid.txt . 2.2 If training is stopped after 9000 iterations, to validate some of previous weights use this commands: (If you use another GitHub repository, then use darknet.exe detector recall ... instead of darknet.exe detector map ...) darknet.exe detector map data/obj.data yolo obj.cfg backup\yolo obj_7000.weights darknet.exe detector map data/obj.data yolo obj.cfg backup\yolo obj_8000.weights darknet.exe detector map data/obj.data yolo obj.cfg backup\yolo obj_9000.weights And comapre last output lines for each weights (7000, 8000, 9000): Choose weights file with the highest IoU (intersect of union) and mAP (mean average precision) For example, bigger IOU gives weights yolo obj_8000.weights then use this weights for detection . Example of custom object detection: darknet.exe detector test data/obj.data yolo obj.cfg yolo obj_8000.weights IoU (intersect of union) average instersect of union of objects and detections for a certain threshold 0.24 mAP (mean average precision) mean value of average precisions for each class, where average precision is average value of 11 points on PR curve for each possible threshold (each probability of detection) for the same class (Precision Recall in terms of PascalVOC, where Precision TP/(TP+FP) and Recall TP/(TP+FN) ), page 11: mAP is default metric of precision in the PascalVOC competition, this is the same as AP50 metric in the MS COCO competition. In terms of Wiki, indicators Precision and Recall have a slightly different meaning than in the PascalVOC competition, but IoU always has the same meaning . ! precision_recall_iou How to calculate mAP on PascalVOC 2007: 1. To calculate mAP (mean average precision) on PascalVOC 2007 test: Download PascalVOC dataset, install Python 3.x and get file 2007_test.txt as described here: Then download file to the dir build\darknet\x64\data\ then run voc_label_difficult.py to get the file difficult_2007_test.txt Remove symbol from this line to un comment it: Then there are 2 ways to get mAP: 1. Using Darknet + Python: run the file build/darknet/x64/calc_mAP_voc_py.cmd you will get mAP for yolo voc.cfg model, mAP 75.9% 2. Using this fork of Darknet: run the file build/darknet/x64/calc_mAP.cmd you will get mAP for yolo voc.cfg model, mAP 75.8% (The article specifies the value of mAP 76.8% for YOLOv2 416×416, page 4 table 3: We get values lower perhaps due to the fact that the model was trained on a slightly different source code than the code on which the detection is was done) if you want to get mAP for tiny yolo voc.cfg model, then un comment line for tiny yolo voc.cfg and comment line for yolo voc.cfg in the .cmd file if you have Python 2.x instead of Python 3.x, and if you use Darknet+Python way to get mAP, then in your cmd file use reval_voc.py and voc_eval.py instead of reval_voc_py3.py and voc_eval_py3.py from this directory: Custom object detection: Example of custom object detection: darknet.exe detector test data/obj.data yolo obj.cfg yolo obj_8000.weights ! Yolo_v2_training ! Yolo_v2_training How to improve object detection: 1. Before training: set flag random 1 in your .cfg file it will increase precision by training Yolo for different resolutions: link increase network resolution in your .cfg file ( height 608 , width 608 or any value multiple of 32) it will increase precision recalculate anchors for your dataset for width and height from cfg file: darknet.exe detector calc_anchors data/obj.data num_of_clusters 9 width 416 height 416 then set the same 9 anchors in each of 3 yolo layers in your cfg file check that each object are mandatory labeled in your dataset no one object in your data set should not be without label. In the most training issues there are wrong labels in your dataset (got labels by using some conversion script, marked with a third party tool, ...). Always check your dataset by using: desirable that your training dataset include images with objects at diffrent: scales, rotations, lightings, from different sides, on different backgrounds you should preferably have 2000 different images for each class or more, and you should train 2000 classes iterations or more desirable that your training dataset include images with non labeled objects that you do not want to detect negative samples without bounded box (empty .txt files) use as many images of negative samples as there are images with objects for training with a large number of objects in each image, add the parameter max 200 or higher value in the last yolo layer or region layer in your cfg file (the global maximum number of objects that can be detected by YoloV3 is 0,0615234375 (width height) where are width and height are parameters from net section in cfg file) for training for small objects set layers 1, 11 instead of and set stride 4 instead of If you train the model to distinguish Left and Right objects as separate classes (left/right hand, left/right turn on road signs, ...) then for disabling flip data augmentation add flip 0 here: General rule your training dataset should include such a set of relative sizes of objects that you want to detect: train_network_width train_obj_width / train_image_width detection_network_width detection_obj_width / detection_image_width train_network_height train_obj_height / train_image_height detection_network_height detection_obj_height / detection_image_height to speedup training (with decreasing detection accuracy) do Fine Tuning instead of Transfer Learning, set param stopbackward 1 here: 2. After training for detection: Increase network resolution by set in your .cfg file ( height 608 and width 608 ) or ( height 832 and width 832 ) or (any value multiple of 32) this increases the precision and makes it possible to detect small objects: link you do not need to train the network again, just use .weights file already trained for 416x416 resolution if error Out of memory occurs then in .cfg file you should increase subdivisions 16 , 32 or 64: link How to mark bounded boxes of objects and create annotation files: Here you can find repository with GUI software for marking bounded boxes of objects and generating annotation files for Yolo v2 & v3: With example of: train.txt , obj.names , obj.data , yolo obj.cfg , air 1 6 .txt , bird 1 4 .txt for 2 classes of objects (air, bird) and train_obj.cmd with example how to train this image set with Yolo v2 & v3 Using Yolo9000 Simultaneous detection and classification of 9000 objects: yolo9000.weights (186 MB Yolo9000 Model) requires 4 GB GPU RAM: yolo9000.cfg cfg file of the Yolo9000, also there are paths to the 9k.tree and coco9k.map 9k.tree WordTree of 9418 categories , if parent_id 1 then this label hasn't parent: coco9k.map map 80 categories from MSCOCO to WordTree 9k.tree : combine9k.data data file, there are paths to: 9k.labels , 9k.names , inet9k.map , (change path to your combine9k.train.list ): 9k.labels 9418 labels of objects: 9k.names 9418 names of objects: inet9k.map map 200 categories from ImageNet to WordTree 9k.tree : How to use Yolo as DLL 1. To compile Yolo as C++ DLL file yolo_cpp_dll.dll open in MSVS2015 file build\darknet\yolo_cpp_dll.sln , set x64 and Release , and do the: Build > Build yolo_cpp_dll You should have installed CUDA 9.1 To use cuDNN do: (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions, and add at the beginning of line: CUDNN; 2. To use Yolo as DLL file in your C++ console application open in MSVS2015 file build\darknet\yolo_console_dll.sln , set x64 and Release , and do the: Build > Build yolo_console_dll you can run your console application from Windows Explorer build\darknet\x64\yolo_console_dll.exe use this command : yolo_console_dll.exe data/coco.names yolov3.cfg yolov3.weights test.mp4 or you can run from MSVS2015 (before this you should copy 2 files yolo voc.cfg and yolo voc.weights to the directory build\darknet\ ) after launching your console application and entering the image file name you will see info for each object: to use simple OpenCV GUI you should uncomment line // define OPENCV in yolo_console_dll.cpp file: link you can see source code of simple example for detection on the video file: link yolo_cpp_dll.dll API: link class Detector { public: Detector(std::string cfg_filename, std::string weight_filename, int gpu_id 0); Detector(); std::vector detect(std::string image_filename, float thresh 0.2, bool use_mean false); std::vector detect(image_t img, float thresh 0.2, bool use_mean false); static image_t load_image(std::string image_filename); static void free_image(image_t m); ifdef OPENCV std::vector detect(cv::Mat mat, float thresh 0.2, bool use_mean false); endif };",Object Detection,Object Detection 2416,Computer Vision,Computer Vision,Computer Vision,"py RFCN priv py RFCN priv is based on py R FCN multiGPU , thanks for bharatsingh430's job. Disclaimer The official R FCN code (written in MATLAB) is available here . py R FCN is modified from the offcial R FCN implementation and py faster rcnn code , and the usage is quite similar to py faster rcnn . py R FCN multiGPU is a modified version of py R FCN , the original code is available here . py RFCN priv also supports soft nms . caffe priv supports convolution_depthwise , roi warping , roi mask pooling , bilinear interpolation , selu . New features py RFCN priv supports: Label shuffling (only single GPU training). PIXEL_STD. Anchors outside image (described in FPN ). ceil_mode in pooling layer . Performing bilinear interpolation operator accoording to input blobs size. 2017/07/31: support LargeMarginSoftmax and cpu forward psroipooling. 2017/08/04: add Deeplab and PSPNet support. 2017/08/10: add Deform psroipooling by lzx1413 . 2017/08/18: add ROIAlign support. 2017/08/27: add Axpy layer for Senet support. 2017/09/04: add Focal loss Installation 1. Clone the py RFCN priv repository Shell git clone We'll call the directory that you cloned py RFCN priv into PRIV_ROOT 2. Build the Cython modules Shell cd $PRIV_ROOT/lib make 3. Build Caffe and pycaffe Shell cd $RFCN_ROOT/caffe priv Now follow the Caffe installation instructions here: cp Makefile.config.example Makefile.config If you're experienced with Caffe and have all of the requirements installed and your Makefile.config in place, then simply do: make all j && make pycaffe j Note: Caffe must be built with support for Python layers! make In your Makefile.config, make sure to have this line uncommented WITH_PYTHON_LAYER : 1 Unrelatedly, it's also recommended that you use CUDNN USE_CUDNN : 1 NCCL is necessary for multi GPU training with python layer USE_NCCL : 1 How to install nccl git clone cd nccl sudo make install j sudo ldconfig License py RFCN priv and caffe priv are released under the MIT License (refer to the LICENSE file for details). Citing If you find R FCN or soft nms useful in your research, please consider citing: @article{dai16rfcn, Author {Jifeng Dai, Yi Li, Kaiming He, Jian Sun}, Title {{R FCN}: Object Detection via Region based Fully Convolutional Networks}, Journal {arXiv preprint arXiv:1605.06409}, Year {2016} } @article{1704.04503, Author {Navaneeth Bodla and Bharat Singh and Rama Chellappa and Larry S. Davis}, Title {Improving Object Detection With One Line of Code}, Journal {arXiv preprint arXiv:1704.04503}, Year {2017} }",Object Detection,Object Detection 2422,Computer Vision,Computer Vision,Computer Vision,"tf faster rcnn A Tensorflow implementation of faster RCNN detection framework by Xinlei Chen (xinleic@cs.cmu.edu). This repository is based on the python Caffe implementation of faster RCNN available here . Note : Several minor modifications are made when reimplementing the framework, which give potential improvements. For details about the modifications and ablative analysis, please refer to the technical report An Implementation of Faster RCNN with Study for Region Sampling . If you are seeking to reproduce the results in the original paper, please use the official code or maybe the semi official code . For details about the faster RCNN architecture please refer to the paper Faster R CNN: Towards Real Time Object Detection with Region Proposal Networks . Detection Performance The current code support VGG16 and Resnet V1 models. We tested it on plain VGG16 and Resnet101 (thank you @philokey!) architecture so far. As the baseline, we report numbers using a single model on a single convolution layer, so no multi scale, no multi stage bounding box regression, no skip connection, no extra input is used. The only data augmentation technique is left right flipping during training following the original Faster RCNN. All models are released. With VGG16 ( conv5_3 ): Train on VOC 2007 trainval and test on VOC 2007 test, 71.2 . Train on VOC 2007+2012 trainval and test on VOC 2007 test ( R FCN schedule), 75.3 . Train on COCO 2014 trainval35k and test on minival (900k/1190k), 29.5 . With Resnet101 (last conv4 ): Train on VOC 2007 trainval and test on VOC 2007 test, 75.2 . Train on VOC 2007+2012 trainval and test on VOC 2007 test (R FCN schedule), 79.3 . Train on COCO 2014 trainval35k and test on minival ( old , 900k/1290k), 34.0 . Train on COCO 2014 trainval35k and test on minival with approximate FPN baseline setup ( old , 900k/1290k), 35.8 . Note : Due to the randomness in GPU training with Tensorflow espeicially for VOC, the best numbers are reported (with 2 3 attempts) here. According to my experience, for COCO you can almost always get a very close number (within 0.2%) despite the randomness. All the numbers are obtained with a different testing scheme without selecting region proposals using non maximal suppression (TEST.MODE top), the default and original testing scheme (TEST.MODE nms) will likely result in slightly worse performance (see report , for COCO it drops 0.X AP). Since we keep the small proposals (\< 16 pixels width/height), our performance is especially good for small objects. For other minor modifications, please check the report . Notable ones include using crop_and_resize , and excluding ground truth boxes in RoIs during training. For COCO, we find the performance improving with more iterations (VGG16 350k/490k: 26.9, 600k/790k: 28.3, 900k/1190k: 29.5), and potentially better performance can be achieved with even more iterations. For Resnet101, we fix the first block (total 4) when fine tuning the network, and only use crop_and_resize to resize the RoIs (7x7) without max pool. The final feature maps are average pooled for classification and regression. All batch normalization parameters are fixed. Weight decay is set to Renset101 default 1e 4. Learning rate for biases is not doubled. For approximate FPN baseline setup we simply resize the image with 800 pixels, add 32^2 anchors, and take 1000 proposals during testing. Check out here / here / here for the latest models, including longer COCO VGG16 models and Resnet101 ones. Additional Features Additional features not mentioned in the report are added to make research life easier: Support for train and validation . During training, the validation data will also be tested from time to time to monitor the process and check potential overfitting. Ideally training and validation should be separate, where the model is loaded everytime to test on validation. However I have implemented it in a joint way to save time and GPU memory. Though in the default setup the testing data is used for validation, no special attempts is made to overfit on testing set. Support for resuming training . I tried to store as much information as possible when snapshoting, with the purpose to resume training from the lateset snapshot properly. The meta information includes current image index, permutation of images, and random state of numpy. However, when you resume training the random seed for tensorflow will be reset (not sure how to save the random state of tensorflow now), so it will result in a difference. Note that, the current implementation still cannot force the model to behave deterministically even with the random seeds set. Suggestion/solution is welcome and much appreciated. Support for visualization . The current implementation will summarize ground truth detections, statistics of losses, activations and variables during training, and dump it to a separate folder for tensorboard visualization. The computing graph is also saved for debugging. Prerequisites A basic Tensorflow installation. The code follows r1.0 format. If you are using an order version (r0.1 r0.12), please check out the v0.12 release. While it is not required, for experimenting the original RoI pooling (which requires modification of the C++ code in tensorflow), you can check out my tensorflow fork and look for tf.image.roi_pooling . Python packages you might not have: cython , opencv python , easydict (similar to py faster rcnn ). For easydict make sure you have the right version, for me it is 1.6. Docker users: A Docker image containing all of the required dependencies can be found in Docker hub at the docker folder. The Docker file used to create this image can be found in the docker directory of this repository. Installation 1. Clone the repository Shell git clone 2. Update your arch in setup script to match your GPU Shell cd tf faster rcnn/lib vim setup.py Check the GPU architecture, if you are using Pascal arch, please switch to sm_61 3. Build the Cython modules Shell make clean make cd .. 4. Install the Python COCO API . The code requires the API to access COCO dataset. Shell cd data git clone cd .. Setup data Please follow the instructions of py faster rcnn here to setup VOC and COCO datasets (Part of COCO is done). The steps involve downloading data and optionally creating softlinks in the data folder. Since faster RCNN does not rely on pre computed proposals, it is safe to ignore the steps that setup proposals. If you find it useful, the data/cache folder created on my side is also shared here . Demo and Test with pre trained models 1. Download pre trained model Shell Resnet101 for voc pre trained on 07+12 set ./data/scripts/fetch_faster_rcnn_models.sh Note : if you cannot download the models through the link, or you want to try more models, you can check out the following solutions and optionally update the downloading script: Another server here . Google drive here . 2. Create a folder and a softlink to use the pre trained model Shell NET res101 TRAIN_IMDB voc_2007_trainval+voc_2012_trainval mkdir p output/${NET}/${TRAIN_IMDB} cd output/${NET}/${TRAIN_IMDB} ln s ../../../data/voc_2007_trainval+voc_2012_trainval ./default cd ../../.. 3. Demo for testing on custom images Shell at reposistory root GPU_ID 0 CUDA_VISIBLE_DEVICES ${GPU_ID} ./tools/demo.py Note : Resnet101 testing probably requires several gigabytes of memory, so if you are using GPUs with a smaller memory capacity, please install it with CPU support only. Refer to Issue 25 . 4. Test with pre trained Resnet101 models Shell GPU_ID 0 ./experiments/scripts/test_faster_rcnn.sh $GPU_ID pascal_voc_0712 Note : If you cannot get the reported numbers, then probabaly the NMS function is compiled improperly, refer to Issue 5 . Train your own model 1. Download pre trained models and weights. The current code support VGG16 and Resnet V1 models. Pre trained models are provided by slim, you can get the pre trained models here and set them in the data/imagenet_weights folder. For example for VGG16 model, you can set up like: Shell mkdir p data/imagenet_weights cd data/imagenet_weights wget v tar xzvf vgg_16_2016_08_28.tar.gz mv vgg_16.ckpt vgg16.ckpt cd ../.. For Resnet101, you can set up like: Shell mkdir p data/imagenet_weights cd data/imagenet_weights wget v tar xzvf resnet_v1_101_2016_08_28.tar.gz mv resnet_v1_101.ckpt res101.ckpt cd ../.. 2. Train (and test, evaluation) Shell ./experiments/scripts/train_faster_rcnn.sh GPU_ID DATASET NET GPU_ID is the GPU you want to test on NET in {vgg16, res50, res101, res152} is the network arch to use DATASET {pascal_voc, pascal_voc_0712, coco} is defined in train_faster_rcnn.sh Examples: ./experiments/scripts/train_faster_rcnn.sh 0 pascal_voc vgg16 ./experiments/scripts/train_faster_rcnn.sh 1 coco res101 Note : double check you have deleted softlink to the pre trained models before training! 3. Visualization with Tensorboard Shell tensorboard logdir tensorboard/vgg16/voc_2007_trainval/ port 7001 & tensorboard logdir tensorboard/vgg16/coco_2014_train+coco_2014_valminusminival/ port 7002 & 4. Test and evaluate Shell ./experiments/scripts/test_faster_rcnn.sh GPU_ID DATASET NET GPU_ID is the GPU you want to test on NET in {vgg16, res50, res101, res152} is the network arch to use DATASET {pascal_voc, pascal_voc_0712, coco} is defined in test_faster_rcnn.sh Examples: ./experiments/scripts/test_faster_rcnn.sh 0 pascal_voc vgg16 ./experiments/scripts/test_faster_rcnn.sh 1 coco res101 5. You can use tools/reval.sh for re evaluation By default, trained networks are saved under: output/ NET / DATASET /default/ Test outputs are saved under: output/ NET / DATASET /default/ SNAPSHOT / Tensorboard information for train and validation is saved under: tensorboard/ NET / DATASET /default/ tensorboard/ NET / DATASET /default_val/ The default number of training iterations is kept the same to the original faster RCNN for VOC 2007, however I find it is beneficial to train longer (see report for COCO), probably due to the fact that the image batch size is 1. For VOC 07+12 we switch to a 80k/110k schedule following R FCN . Also note that due to the nondeterministic nature of the current implementation, the performance can vary a bit, but in general it should be within 1% of the reported numbers for VOC, and 0.2% of the reported numbers for COCO. Suggestions/Contributions are welcome. Citation If you find this implementation or the analysis conducted in our report helpful, please consider citing: @article{chen17implementation, Author {Xinlei Chen and Abhinav Gupta}, Title {An Implementation of Faster RCNN with Study for Region Sampling}, Journal {arXiv preprint arXiv:1702.02138}, Year {2017} } For convenience, here is the faster RCNN citation: @inproceedings{renNIPS15fasterrcnn, Author {Shaoqing Ren and Kaiming He and Ross Girshick and Jian Sun}, Title {Faster {R CNN}: Towards Real Time Object Detection with Region Proposal Networks}, Booktitle {Advances in Neural Information Processing Systems ({NIPS})}, Year {2015} } Detailed numbers from COCO server All the models are trained on COCO 2014 trainval35k . VGG16 COCO 2015 test dev (900k/1190k): Average Precision (AP) @ IoU 0.50:0.95 area all maxDets 100 0.297 Average Precision (AP) @ IoU 0.50 area all maxDets 100 0.504 Average Precision (AP) @ IoU 0.75 area all maxDets 100 0.312 Average Precision (AP) @ IoU 0.50:0.95 area small maxDets 100 0.128 Average Precision (AP) @ IoU 0.50:0.95 area medium maxDets 100 0.325 Average Precision (AP) @ IoU 0.50:0.95 area large maxDets 100 0.421 Average Recall (AR) @ IoU 0.50:0.95 area all maxDets 1 0.272 Average Recall (AR) @ IoU 0.50:0.95 area all maxDets 10 0.399 Average Recall (AR) @ IoU 0.50:0.95 area all maxDets 100 0.409 Average Recall (AR) @ IoU 0.50:0.95 area small maxDets 100 0.187 Average Recall (AR) @ IoU 0.50:0.95 area medium maxDets 100 0.451 Average Recall (AR) @ IoU 0.50:0.95 area large maxDets 100 0.591 VGG16 COCO 2015 test std (900k/1190k): Average Precision (AP) @ IoU 0.50:0.95 area all maxDets 100 0.295 Average Precision (AP) @ IoU 0.50 area all maxDets 100 0.501 Average Precision (AP) @ IoU 0.75 area all maxDets 100 0.312 Average Precision (AP) @ IoU 0.50:0.95 area small maxDets 100 0.119 Average Precision (AP) @ IoU 0.50:0.95 area medium maxDets 100 0.327 Average Precision (AP) @ IoU 0.50:0.95 area large maxDets 100 0.418 Average Recall (AR) @ IoU 0.50:0.95 area all maxDets 1 0.273 Average Recall (AR) @ IoU 0.50:0.95 area all maxDets 10 0.400 Average Recall (AR) @ IoU 0.50:0.95 area all maxDets 100 0.409 Average Recall (AR) @ IoU 0.50:0.95 area small maxDets 100 0.179 Average Recall (AR) @ IoU 0.50:0.95 area medium maxDets 100 0.455 Average Recall (AR) @ IoU 0.50:0.95 area large maxDets 100 0.586",Object Detection,Object Detection 2424,Computer Vision,Computer Vision,Computer Vision,"py faster rcnn has been deprecated. Please see Detectron , which includes an implementation of Mask R CNN . Disclaimer The official Faster R CNN code (written in MATLAB) is available here . If your goal is to reproduce the results in our NIPS 2015 paper, please use the official code . This repository contains a Python reimplementation of the MATLAB code. This Python implementation is built on a fork of Fast R CNN . There are slight differences between the two implementations. In particular, this Python port is 10% slower at test time, because some operations execute on the CPU in Python layers (e.g., 220ms / image vs. 200ms / image for VGG16) gives similar, but not exactly the same, mAP as the MATLAB version is not compatible with models trained using the MATLAB code due to the minor implementation differences includes approximate joint training that is 1.5x faster than alternating optimization (for VGG16) see these slides for more information Faster R CNN: Towards Real Time Object Detection with Region Proposal Networks By Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun (Microsoft Research) This Python implementation contains contributions from Sean Bell (Cornell) written during an MSR internship. Please see the official README.md for more details. Faster R CNN was initially described in an arXiv tech report and was subsequently published in NIPS 2015. License Faster R CNN is released under the MIT License (refer to the LICENSE file for details). Citing Faster R CNN If you find Faster R CNN useful in your research, please consider citing: @inproceedings{renNIPS15fasterrcnn, Author {Shaoqing Ren and Kaiming He and Ross Girshick and Jian Sun}, Title {Faster {R CNN}: Towards Real Time Object Detection with Region Proposal Networks}, Booktitle {Advances in Neural Information Processing Systems ({NIPS})}, Year {2015} } Contents 1. Requirements: software ( requirements software) 2. Requirements: hardware ( requirements hardware) 3. Basic installation ( installation sufficient for the demo) 4. Demo ( demo) 5. Beyond the demo: training and testing ( beyond the demo installation for training and testing models) 6. Usage ( usage) Requirements: software NOTE If you are having issues compiling and you are using a recent version of CUDA/cuDNN, please consult this issue for a workaround 1. Requirements for Caffe and pycaffe (see: Caffe installation instructions ) Note: Caffe must be built with support for Python layers! make In your Makefile.config, make sure to have this line uncommented WITH_PYTHON_LAYER : 1 Unrelatedly, it's also recommended that you use CUDNN USE_CUDNN : 1 You can download my Makefile.config for reference. 2. Python packages you might not have: cython , python opencv , easydict 3. Optional MATLAB is required for official PASCAL VOC evaluation only. The code now includes unofficial Python evaluation code. Requirements: hardware 1. For training smaller networks (ZF, VGG_CNN_M_1024) a good GPU (e.g., Titan, K20, K40, ...) with at least 3G of memory suffices 2. For training Fast R CNN with VGG16, you'll need a K40 (11G of memory) 3. For training the end to end version of Faster R CNN with VGG16, 3G of GPU memory is sufficient (using CUDNN) Installation (sufficient for the demo) 1. Clone the Faster R CNN repository Shell Make sure to clone with recursive git clone recursive 2. We'll call the directory that you cloned Faster R CNN into FRCN_ROOT Ignore notes 1 and 2 if you followed step 1 above. Note 1: If you didn't clone Faster R CNN with the recursive flag, then you'll need to manually clone the caffe fast rcnn submodule: Shell git submodule update init recursive Note 2: The caffe fast rcnn submodule needs to be on the faster rcnn branch (or equivalent detached state). This will happen automatically if you followed step 1 instructions . 3. Build the Cython modules Shell cd $FRCN_ROOT/lib make 4. Build Caffe and pycaffe Shell cd $FRCN_ROOT/caffe fast rcnn Now follow the Caffe installation instructions here: If you're experienced with Caffe and have all of the requirements installed and your Makefile.config in place, then simply do: make j8 && make pycaffe 5. Download pre computed Faster R CNN detectors Shell cd $FRCN_ROOT ./data/scripts/fetch_faster_rcnn_models.sh This will populate the $FRCN_ROOT/data folder with faster_rcnn_models . See data/README.md for details. These models were trained on VOC 2007 trainval. Demo After successfully completing basic installation ( installation sufficient for the demo) , you'll be ready to run the demo. To run the demo Shell cd $FRCN_ROOT ./tools/demo.py The demo performs detection using a VGG16 network trained for detection on PASCAL VOC 2007. Beyond the demo: installation for training and testing models 1. Download the training, validation, test data and VOCdevkit Shell wget wget wget 2. Extract all of these tars into one directory named VOCdevkit Shell tar xvf VOCtrainval_06 Nov 2007.tar tar xvf VOCtest_06 Nov 2007.tar tar xvf VOCdevkit_08 Jun 2007.tar 3. It should have this basic structure Shell $VOCdevkit/ development kit $VOCdevkit/VOCcode/ VOC utility code $VOCdevkit/VOC2007 image sets, annotations, etc. ... and several other directories ... 4. Create symlinks for the PASCAL VOC dataset Shell cd $FRCN_ROOT/data ln s $VOCdevkit VOCdevkit2007 Using symlinks is a good idea because you will likely want to share the same PASCAL dataset installation between multiple projects. 5. Optional follow similar steps to get PASCAL VOC 2010 and 2012 6. Optional If you want to use COCO, please see some notes under data/README.md 7. Follow the next sections to download pre trained ImageNet models Download pre trained ImageNet models Pre trained ImageNet models can be downloaded for the three networks described in the paper: ZF and VGG16. Shell cd $FRCN_ROOT ./data/scripts/fetch_imagenet_models.sh VGG16 comes from the Caffe Model Zoo , but is provided here for your convenience. ZF was trained at MSRA. Usage To train and test a Faster R CNN detector using the alternating optimization algorithm from our NIPS 2015 paper, use experiments/scripts/faster_rcnn_alt_opt.sh . Output is written underneath $FRCN_ROOT/output . Shell cd $FRCN_ROOT ./experiments/scripts/faster_rcnn_alt_opt.sh GPU_ID NET set ... GPU_ID is the GPU you want to train on NET in {ZF, VGG_CNN_M_1024, VGG16} is the network arch to use set ... allows you to specify fast_rcnn.config options, e.g. set EXP_DIR seed_rng1701 RNG_SEED 1701 ( alt opt refers to the alternating optimization training algorithm described in the NIPS paper.) To train and test a Faster R CNN detector using the approximate joint training method, use experiments/scripts/faster_rcnn_end2end.sh . Output is written underneath $FRCN_ROOT/output . Shell cd $FRCN_ROOT ./experiments/scripts/faster_rcnn_end2end.sh GPU_ID NET set ... GPU_ID is the GPU you want to train on NET in {ZF, VGG_CNN_M_1024, VGG16} is the network arch to use set ... allows you to specify fast_rcnn.config options, e.g. set EXP_DIR seed_rng1701 RNG_SEED 1701 This method trains the RPN module jointly with the Fast R CNN network, rather than alternating between training the two. It results in faster ( 1.5x speedup) training times and similar detection accuracy. See these slides for more details. Artifacts generated by the scripts in tools are written in this directory. Trained Fast R CNN networks are saved under: output/ / / Test outputs are saved under: output/ / / / Faster rcnn",Object Detection,Object Detection 2425,Computer Vision,Computer Vision,Computer Vision,"convnet benchmarks Easy benchmarking of all public open source implementations of convnets. A summary is provided in the section below. Machine: 6 core Intel Core i7 5930K CPU @ 3.50GHz + NVIDIA Titan X + Ubuntu 14.04 x86_64 Imagenet Winners Benchmarking I pick some popular imagenet models, and I clock the time for a full forward + backward pass. I average my times over 10 runs. I ignored dropout and softmax layers. Notation Input is described as {batch_size}x{num_filters}x{filter_width}x{filter_height} . Where batch_size is the number of images used in a minibatch, num_filters is the number of channels in an image, filter_width is the width of the image, and filter_height is the height of the image. One small note: The CuDNN benchmarks are done using Torch bindings. One can also do the same via Caffe bindings or bindings of any other library. This note is here to clarify that Caffe (native) and Torch (native) are the convolution kernels which are present as a default fallback. Some of the frameworks like TensorFlow and Chainer are benchmarked with CuDNN, but it is not explicitly mentioned, and hence one might think that these frameworks as a whole are faster, than for example Caffe, which might not be the case . AlexNet (One Weird Trick paper) Input 128x3x224x224 Library Class Time (ms) forward (ms) backward (ms) : : : : : : : CuDNN R4 fp16 (Torch) cudnn.SpatialConvolution 71 25 46 Nervana neon fp16 ConvLayer 78 25 52 CuDNN R4 fp32 (Torch) cudnn.SpatialConvolution 81 27 53 TensorFlow conv2d 81 26 55 Nervana neon fp32 ConvLayer 87 28 58 fbfft (Torch) fbnn.SpatialConvolution 104 31 72 Chainer Convolution2D 177 40 136 cudaconvnet2 ConvLayer 177 42 135 CuDNN R2 cudnn.SpatialConvolution 231 70 161 Caffe (native) ConvolutionLayer 324 121 203 Torch 7 (native) SpatialConvolutionMM 342 132 210 CL nn (Torch) SpatialConvolutionMM 963 388 574 Caffe CLGreenTea ConvolutionLayer 1442 210 1232 Overfeat fast Input 128x3x231x231 Library Class Time (ms) forward (ms) backward (ms) : : : : : : : Nervana neon fp16 ConvLayer 176 58 118 Nervana neon fp32 ConvLayer 211 69 141 CuDNN R4 fp16 (Torch) cudnn.SpatialConvolution 242 86 156 CuDNN R4 fp32 (Torch) cudnn.SpatialConvolution 268 94 174 TensorFlow conv2d 279 90 189 fbfft (Torch) SpatialConvolutionCuFFT 342 114 227 Chainer Convolution2D 620 135 484 cudaconvnet2 ConvLayer 723 176 547 CuDNN R2 cudnn.SpatialConvolution 810 234 576 Caffe ConvolutionLayer 823 355 468 Torch 7 (native) SpatialConvolutionMM 878 379 499 CL nn (Torch) SpatialConvolutionMM 963 388 574 Caffe CLGreenTea ConvolutionLayer 2857 616 2240 OxfordNet Model A Input 64x3x224x224 Library Class Time (ms) forward (ms) backward (ms) : : : : : : : Nervana neon fp16 ConvLayer 254 82 171 Nervana neon fp32 ConvLayer 320 103 217 CuDNN R4 fp16 (Torch) cudnn.SpatialConvolution 471 140 331 CuDNN R4 fp32 (Torch) cudnn.SpatialConvolution 529 162 366 TensorFlow conv2d 540 158 382 Chainer Convolution2D 885 251 632 fbfft (Torch) SpatialConvolutionCuFFT 1092 355 737 cudaconvnet2 ConvLayer 1229 408 821 CuDNN R2 cudnn.SpatialConvolution 1099 342 757 Caffe ConvolutionLayer 1068 323 745 Torch 7 (native) SpatialConvolutionMM 1105 350 755 CL nn (Torch) SpatialConvolutionMM 3437 875 2562 Caffe CLGreenTea ConvolutionLayer 5620 988 4632 GoogleNet V1 Input 128x3x224x224 Library Class Time (ms) forward (ms) backward (ms) : : : : : : : Nervana neon fp16 ConvLayer 230 72 157 Nervana neon fp32 ConvLayer 270 84 186 TensorFlow conv2d 445 135 310 CuDNN R4 fp16 (Torch) cudnn.SpatialConvolution 462 112 349 CuDNN R4 fp32 (Torch) cudnn.SpatialConvolution 470 130 340 Chainer Convolution2D 687 189 497 Caffe ConvolutionLayer 1935 786 1148 CL nn (Torch) SpatialConvolutionMM 7016 3027 3988 Caffe CLGreenTea ConvolutionLayer 9462 746 8716 Layer wise Benchmarking (Last Updated April 2015) Spatial Convolution layer (3D input 3D output, densely connected) forward + backprop (wrt input and weights) Original Library Class/Function Benchmarked Time (ms) forward (ms) backward (ms) : : : : : : : fbfft SpatialConvolutionCuFFT 256 101 155 cuda convnet2 ConvLayer 977 201 776 cuda convnet pylearn2.cuda_convnet 1077 312 765 CuDNN R2 cudnn.SpatialConvolution 1019 269 750 Theano CorrMM 1225 407 818 Caffe ConvolutionLayer 1231 396 835 Torch 7 SpatialConvolutionMM 1265 418 877 DeepCL ConvolutionLayer 6280 2648 3632 _cherry picking_ _best per layer_ _235_ _79_ _155_ This table is ___NOT UPDATED For TITAN X___. These numbers below were on Titan Black and are here only for informational and legacy purposes. Original Library Class/Function Benchmarked Time (ms) forward (ms) backward (ms) : : : : : : : Theano (experimental) conv2d_fft 1178 304 874 Torch 7 nn.SpatialConvolutionBHWD 1892 581 1311 ccv ccv_convnet_layer 809+bw 809 Theano (legacy) conv2d 70774 3833 66941 \ indicates that the library was tested with Torch bindings of the specific kernels. indicates that the library was tested with Pylearn2 bindings. This is an experimental module which used FFT to calculate convolutions. It uses a lot of memory according to @benanne The last row shows results obtainable when choosing the best performing library for each layer. L1 Input: 128x128 Batch size 128 , Feature maps: 3 >96 , Kernel Size: 11x11 , Stride: 1x1 L2 Input: 64x64 Batch size 128 , Feature maps: 64 >128 , Kernel Size: 9x9 , Stride: 1x1 L3 Input: 32x32 Batch size 128 , Feature maps: 128 >128 , Kernel Size: 9x9 , Stride: 1x1 L4 Input: 16x16 Batch size 128 , Feature maps: 128 >128 , Kernel Size: 7x7 , Stride: 1x1 L5 Input: 13x13 Batch size 128 , Feature maps: 384 >384 , Kernel Size: 3x3 , Stride: 1x1 The table is ranked according to the total time forward+backward calls for layers (L1 + L2 + L3 + L4 + L5) Breakdown forward Columns L1, L2, L3, L4, L5, Total are times in milliseconds Original Library Class/Function Benchmarked L1 L2 L3 L4 L5 Total : : : : : : : : : : fbfft SpatialConvolutionCuFFT 57 27 6 2 9 101 cuda convnet2 ConvLayer 36 113 40 4 8 201 cuda convnet pylearn2.cuda_convnet 38 183 68 7 16 312 CuDNN R2 cudnn.SpatialConvolution 56 143 53 6 11 269 Theano CorrMM 91 143 121 24 28 407 Caffe ConvolutionLayer\ 93 136 116 24 27 396 Torch 7 nn.SpatialConvolutionMM 94 149 123 24 28 418 DeepCL ConvolutionLayer 738 1241 518 47 104 2648 _cherry picking_ _best per layer_ _36_ _27_ _6_ _2_ _8_ 79 backward (gradInput + gradWeight) Columns L1, L2, L3, L4, L5, Total are times in milliseconds Original Library Class/Function Benchmarked L1 L2 L3 L4 L5 Total : : : : : : : : : : fbfft SpatialConvolutionCuFFT 76 45 12 4 18 155 cuda convnet2 ConvLayer 103 467 162 15 29 776 cuda convnet pylearn2.cuda_convnet 136 433 147 15 34 765 CuDNN R2 cudnn.SpatialConvolution 139 401 159 19 32 750 Theano CorrMM 179 405 174 29 31 818 Caffe ConvolutionLayer\ 200 405 172 28 30 835 Torch 7 nn.SpatialConvolutionMM 206 432 178 29 32 877 DeepCL ConvolutionLayer 484 2144 747 59 198 3632 _cherry picking_ _best per layer_ _76_ _45_ _12_ _4_ _18_ _155_",Object Detection,Object Detection 2426,Computer Vision,Computer Vision,Computer Vision,"Training an Object Classifier in Torch 7 on multiple GPUs over ImageNet In this concise example (1200 lines including a general purpose and highly scalable data loader for images), we showcase: train AlexNet or Overfeat , VGG and Googlenet on ImageNet showcase multiple backends: CuDNN, CuNN use nn.DataParallelTable to speedup training over multiple GPUs multithreaded data loading from disk (showcases sending tensors from one thread to another without serialization) Requirements Install torch on a machine with CUDA GPU If on Mac OSX, run brew install coreutils findutils to get GNU versions of wc , find , and cut Download Imagenet 12 dataset from . It has 1000 classes and 1.2 million images. Data processing The images dont need to be preprocessed or packaged in any database. It is preferred to keep the dataset on an SSD but we have used the data loader comfortably over NFS without loss in speed. We just use a simple convention: SubFolderName ClassName. So, for example: if you have classes {cat,dog}, cat images go into the folder dataset/cat and dog images go into dataset/dog The training images for imagenet are already in appropriate subfolders (like n07579787, n07880968). You need to get the validation groundtruth and move the validation images into appropriate subfolders. To do this, download ILSVRC2012_img_train.tar ILSVRC2012_img_val.tar and use the following commands: bash extract train data mkdir train && mv ILSVRC2012_img_train.tar train/ && cd train tar xvf ILSVRC2012_img_train.tar && rm f ILSVRC2012_img_train.tar find . name .tar while read NAME ; do mkdir p ${NAME%.tar} ; tar xvf ${NAME} C ${NAME%.tar} ; rm f ${NAME} ; done extract validation data cd ../ && mkdir val && mv ILSVRC2012_img_val.tar val/ && cd val && tar xvf ILSVRC2012_img_val.tar wget qO bash Now you are all set! If your imagenet dataset is on HDD or a slow SSD, run this command to resize all the images such that the smaller dimension is 256 and the aspect ratio is intact. This helps with loading the data from disk faster. bash find . name .JPEG xargs I {} convert {} resize 256^> {} Running The training scripts come with several options which can be listed by running the script with the flag help bash th main.lua help To run the training, simply run main.lua By default, the script runs 1 GPU AlexNet with the CuDNN backend and 2 data loader threads. bash th main.lua data imagenet folder with train and val folders For 2 GPU model parallel AlexNet + CuDNN, you can run it this way: bash th main.lua data imagenet folder with train and val folders nGPU 2 backend cudnn netType alexnet Similarly, you can switch the backends to 'cunn' to use a different set of CUDA kernels. You can also alternatively train OverFeat using this following command: bash th main.lua data imagenet folder with train and val folders netType overfeat multi GPU overfeat (let's say 2 GPU) th main.lua data imagenet folder with train and val folders netType overfeat nGPU 2 The training script prints the current Top 1 and Top 5 error as well as the objective loss at every mini batch. We hard coded a learning rate schedule so that AlexNet converges to an error of 42.5% at the end of 53 epochs. At the end of every epoch, the model is saved to disk (as model_ xx .t7 where xx is the epoch number). You can reload this model into torch at any time using torch.load lua model torch.load('model_10.t7') loading back a saved model Similarly, if you would like to test your model on a new image, you can use testHook from line 103 in donkey.lua to load your image, and send it through the model for predictions. For example: lua dofile('donkey.lua') img testHook({loadSize}, 'test.jpg') model torch.load('model_10.t7') if img:dim() 3 then img img:view(1, img:size(1), img:size(2), img:size(3)) end predictions model:forward(img:cuda()) If you ever want to reuse this example, and debug your scripts, it is suggested to debug and develop in the single threaded mode, so that stack traces are printed fully. lua th main.lua nDonkeys 0 ...options... Code Description main.lua (30 lines) loads all other files, starts training. opts.lua (50 lines) all the command line options and description data.lua (60 lines) contains the logic to create K threads for parallel data loading. donkey.lua (200 lines) contains the data loading logic and details. It is run by each data loader thread. random image cropping, generating 10 crops etc. are in here. model.lua (80 lines) creates AlexNet model and criterion train.lua (190 lines) logic for training the network. we hard code a learning rate + weight decay schedule that produces good results. test.lua (120 lines) logic for testing the network on validation set (including calculating top 1 and top 5 errors) dataset.lua (430 lines) a general purpose data loader, mostly derived from here: imagenetloader.torch . That repo has docs and more examples of using this loader.",Object Detection,Object Detection 2428,Computer Vision,Computer Vision,Computer Vision,"tf faster rcnn is deprecated: For a good and more up to date implementation for faster/mask RCNN with multi gpu support, please see the example in TensorPack here . tf faster rcnn A Tensorflow implementation of faster RCNN detection framework by Xinlei Chen (xinleic@cs.cmu.edu). This repository is based on the python Caffe implementation of faster RCNN available here . Note : Several minor modifications are made when reimplementing the framework, which give potential improvements. For details about the modifications and ablative analysis, please refer to the technical report An Implementation of Faster RCNN with Study for Region Sampling . If you are seeking to reproduce the results in the original paper, please use the official code or maybe the semi official code . For details about the faster RCNN architecture please refer to the paper Faster R CNN: Towards Real Time Object Detection with Region Proposal Networks . Detection Performance The current code supports VGG16 , Resnet V1 and Mobilenet V1 models. We mainly tested it on plain VGG16 and Resnet101 (thank you @philokey!) architecture. As the baseline, we report numbers using a single model on a single convolution layer, so no multi scale, no multi stage bounding box regression, no skip connection, no extra input is used. The only data augmentation technique is left right flipping during training following the original Faster RCNN. All models are released. With VGG16 ( conv5_3 ): Train on VOC 2007 trainval and test on VOC 2007 test, 70.8 . Train on VOC 2007+2012 trainval and test on VOC 2007 test ( R FCN schedule), 75.7 . Train on COCO 2014 trainval35k and test on minival ( Iterations : 900k/1190k), 30.2 . With Resnet101 (last conv4 ): Train on VOC 2007 trainval and test on VOC 2007 test, 75.7 . Train on VOC 2007+2012 trainval and test on VOC 2007 test (R FCN schedule), 79.8 . Train on COCO 2014 trainval35k and test on minival (900k/1190k), 35.4 . More Results: Train Mobilenet (1.0, 224) on COCO 2014 trainval35k and test on minival (900k/1190k), 21.8 . Train Resnet50 on COCO 2014 trainval35k and test on minival (900k/1190k), 32.4 . Train Resnet152 on COCO 2014 trainval35k and test on minival (900k/1190k), 36.1 . Approximate baseline setup from FPN (this repository does not contain training code for FPN yet): Train Resnet50 on COCO 2014 trainval35k and test on minival (900k/1190k), 34.2 . Train Resnet101 on COCO 2014 trainval35k and test on minival (900k/1190k), 37.4 . Train Resnet152 on COCO 2014 trainval35k and test on minival (900k/1190k), 38.2 . Note : Due to the randomness in GPU training with Tensorflow especially for VOC, the best numbers are reported (with 2 3 attempts) here. According to my experience, for COCO you can almost always get a very close number (within 0.2%) despite the randomness. The numbers are obtained with the default testing scheme which selects region proposals using non maximal suppression (TEST.MODE nms), the alternative testing scheme (TEST.MODE top) will likely result in slightly better performance (see report , for COCO it boosts 0.X AP). Since we keep the small proposals (\< 16 pixels width/height), our performance is especially good for small objects. We do not set a threshold (instead of 0.05) for a detection to be included in the final result, which increases recall. Weight decay is set to 1e 4. For other minor modifications, please check the report . Notable ones include using crop_and_resize , and excluding ground truth boxes in RoIs during training. For COCO, we find the performance improving with more iterations, and potentially better performance can be achieved with even more iterations. For Resnets, we fix the first block (total 4) when fine tuning the network, and only use crop_and_resize to resize the RoIs (7x7) without max pool (which I find useless especially for COCO). The final feature maps are average pooled for classification and regression. All batch normalization parameters are fixed. Learning rate for biases is not doubled. For Mobilenets, we fix the first five layers when fine tuning the network. All batch normalization parameters are fixed. Weight decay for Mobilenet layers is set to 4e 5. For approximate FPN baseline setup we simply resize the image with 800 pixels, add 32^2 anchors, and take 1000 proposals during testing. Check out here / here / here for the latest models, including longer COCO VGG16 models and Resnet ones. ! (data/imgs/gt.png) ! (data/imgs/pred.png) : : : : Displayed Ground Truth on Tensorboard Displayed Predictions on Tensorboard Additional features Additional features not mentioned in the report are added to make research life easier: Support for train and validation . During training, the validation data will also be tested from time to time to monitor the process and check potential overfitting. Ideally training and validation should be separate, where the model is loaded every time to test on validation. However I have implemented it in a joint way to save time and GPU memory. Though in the default setup the testing data is used for validation, no special attempts is made to overfit on testing set. Support for resuming training . I tried to store as much information as possible when snapshoting, with the purpose to resume training from the latest snapshot properly. The meta information includes current image index, permutation of images, and random state of numpy. However, when you resume training the random seed for tensorflow will be reset (not sure how to save the random state of tensorflow now), so it will result in a difference. Note that, the current implementation still cannot force the model to behave deterministically even with the random seeds set. Suggestion/solution is welcome and much appreciated. Support for visualization . The current implementation will summarize ground truth boxes, statistics of losses, activations and variables during training, and dump it to a separate folder for tensorboard visualization. The computing graph is also saved for debugging. Prerequisites A basic Tensorflow installation. The code follows r1.2 format. If you are using r1.0, please check out the r1.0 branch to fix the slim Resnet block issue. If you are using an older version (r0.1 r0.12), please check out the r0.12 branch. While it is not required, for experimenting the original RoI pooling (which requires modification of the C++ code in tensorflow), you can check out my tensorflow fork and look for tf.image.roi_pooling . Python packages you might not have: cython , opencv python , easydict (similar to py faster rcnn ). For easydict make sure you have the right version. I use 1.6. Docker users: Since the recent upgrade, the docker image on docker hub is no longer valid. However, you can still build your own image by using dockerfile located at docker folder (cuda 8 version, as it is required by Tensorflow r1.0.) And make sure following Tensorflow installation to install and use nvidia docker Last, after launching the container, you have to build the Cython modules within the running container. Installation 1. Clone the repository Shell git clone 2. Update your arch in setup script to match your GPU Shell cd tf faster rcnn/lib Change the GPU architecture ( arch) if necessary vim setup.py GPU model Architecture TitanX (Maxwell/Pascal) sm_52 GTX 960M sm_50 GTX 1080 (Ti) sm_61 Grid K520 (AWS g2.2xlarge) sm_30 Tesla K80 (AWS p2.xlarge) sm_37 Note : You are welcome to contribute the settings on your end if you have made the code work properly on other GPUs. Also even if you are only using CPU tensorflow, GPU based code (for NMS) will be used by default, so please set USE_GPU_NMS False to get the correct output. 3. Build the Cython modules Shell make clean make cd .. 4. Install the Python COCO API . The code requires the API to access COCO dataset. Shell cd data git clone cd coco/PythonAPI make cd ../../.. Setup data Please follow the instructions of py faster rcnn here to setup VOC and COCO datasets (Part of COCO is done). The steps involve downloading data and optionally creating soft links in the data folder. Since faster RCNN does not rely on pre computed proposals, it is safe to ignore the steps that setup proposals. If you find it useful, the data/cache folder created on my side is also shared here . Demo and Test with pre trained models 1. Download pre trained model Shell Resnet101 for voc pre trained on 07+12 set ./data/scripts/fetch_faster_rcnn_models.sh Note : if you cannot download the models through the link, or you want to try more models, you can check out the following solutions and optionally update the downloading script: Another server here . Google drive here . 2. Create a folder and a soft link to use the pre trained model Shell NET res101 TRAIN_IMDB voc_2007_trainval+voc_2012_trainval mkdir p output/${NET}/${TRAIN_IMDB} cd output/${NET}/${TRAIN_IMDB} ln s ../../../data/voc_2007_trainval+voc_2012_trainval ./default cd ../../.. 3. Demo for testing on custom images Shell at repository root GPU_ID 0 CUDA_VISIBLE_DEVICES ${GPU_ID} ./tools/demo.py Note : Resnet101 testing probably requires several gigabytes of memory, so if you encounter memory capacity issues, please install it with CPU support only. Refer to Issue 25 . 4. Test with pre trained Resnet101 models Shell GPU_ID 0 ./experiments/scripts/test_faster_rcnn.sh $GPU_ID pascal_voc_0712 res101 Note : If you cannot get the reported numbers (79.8 on my side), then probably the NMS function is compiled improperly, refer to Issue 5 . Train your own model 1. Download pre trained models and weights. The current code support VGG16 and Resnet V1 models. Pre trained models are provided by slim, you can get the pre trained models here and set them in the data/imagenet_weights folder. For example for VGG16 model, you can set up like: Shell mkdir p data/imagenet_weights cd data/imagenet_weights wget v tar xzvf vgg_16_2016_08_28.tar.gz mv vgg_16.ckpt vgg16.ckpt cd ../.. For Resnet101, you can set up like: Shell mkdir p data/imagenet_weights cd data/imagenet_weights wget v tar xzvf resnet_v1_101_2016_08_28.tar.gz mv resnet_v1_101.ckpt res101.ckpt cd ../.. 2. Train (and test, evaluation) Shell ./experiments/scripts/train_faster_rcnn.sh GPU_ID DATASET NET GPU_ID is the GPU you want to test on NET in {vgg16, res50, res101, res152} is the network arch to use DATASET {pascal_voc, pascal_voc_0712, coco} is defined in train_faster_rcnn.sh Examples: ./experiments/scripts/train_faster_rcnn.sh 0 pascal_voc vgg16 ./experiments/scripts/train_faster_rcnn.sh 1 coco res101 Note : Please double check you have deleted soft link to the pre trained models before training. If you find NaNs during training, please refer to Issue 86 . Also if you want to have multi gpu support, check out Issue 121 . 3. Visualization with Tensorboard Shell tensorboard logdir tensorboard/vgg16/voc_2007_trainval/ port 7001 & tensorboard logdir tensorboard/vgg16/coco_2014_train+coco_2014_valminusminival/ port 7002 & 4. Test and evaluate Shell ./experiments/scripts/test_faster_rcnn.sh GPU_ID DATASET NET GPU_ID is the GPU you want to test on NET in {vgg16, res50, res101, res152} is the network arch to use DATASET {pascal_voc, pascal_voc_0712, coco} is defined in test_faster_rcnn.sh Examples: ./experiments/scripts/test_faster_rcnn.sh 0 pascal_voc vgg16 ./experiments/scripts/test_faster_rcnn.sh 1 coco res101 5. You can use tools/reval.sh for re evaluation By default, trained networks are saved under: output/ NET / DATASET /default/ Test outputs are saved under: output/ NET / DATASET /default/ SNAPSHOT / Tensorboard information for train and validation is saved under: tensorboard/ NET / DATASET /default/ tensorboard/ NET / DATASET /default_val/ The default number of training iterations is kept the same to the original faster RCNN for VOC 2007, however I find it is beneficial to train longer (see report for COCO), probably due to the fact that the image batch size is one. For VOC 07+12 we switch to a 80k/110k schedule following R FCN . Also note that due to the nondeterministic nature of the current implementation, the performance can vary a bit, but in general it should be within 1% of the reported numbers for VOC, and 0.2% of the reported numbers for COCO. Suggestions/Contributions are welcome. Citation If you find this implementation or the analysis conducted in our report helpful, please consider citing: @article{chen17implementation, Author {Xinlei Chen and Abhinav Gupta}, Title {An Implementation of Faster RCNN with Study for Region Sampling}, Journal {arXiv preprint arXiv:1702.02138}, Year {2017} } Or for a formal paper, Spatial Memory Network : @article{chen2017spatial, title {Spatial Memory for Context Reasoning in Object Detection}, author {Chen, Xinlei and Gupta, Abhinav}, journal {arXiv preprint arXiv:1704.04224}, year {2017} } For convenience, here is the faster RCNN citation: @inproceedings{renNIPS15fasterrcnn, Author {Shaoqing Ren and Kaiming He and Ross Girshick and Jian Sun}, Title {Faster {R CNN}: Towards Real Time Object Detection with Region Proposal Networks}, Booktitle {Advances in Neural Information Processing Systems ({NIPS})}, Year {2015} }",Object Detection,Object Detection 2431,Computer Vision,Computer Vision,Computer Vision,VectorCards Best used with Bicycle® Playing Cards Standard deck Yolo,Object Detection,Object Detection 2432,Computer Vision,Computer Vision,Computer Vision,"Faster R CNN and Mask R CNN in PyTorch 1.0 This project aims at providing the necessary building blocks for easily creating detection and segmentation models using PyTorch 1.0. ! alt text (demo/demo_e2e_mask_rcnn_X_101_32x8d_FPN_1x.png from Highlights PyTorch 1.0: RPN, Faster R CNN and Mask R CNN implementations that matches or exceeds Detectron accuracies Very fast : up to 2x faster than Detectron and 30% faster than mmdetection during training. See MODEL_ZOO.md (MODEL_ZOO.md) for more details. Memory efficient: uses roughly 500MB less GPU memory than mmdetection during training Multi GPU training and inference Batched inference: can perform inference using multiple images per batch per GPU CPU support for inference: runs on CPU in inference time. See our webcam demo (demo) for an example Provides pre trained models for almost all reference Mask R CNN and Faster R CNN configurations with 1x schedule. Webcam and Jupyter notebook demo We provide a simple webcam demo that illustrates how you can use maskrcnn_benchmark for inference: bash cd demo by default, it runs on the GPU for best results, use min image size 800 python webcam.py min image size 800 can also run it on the CPU python webcam.py min image size 300 MODEL.DEVICE cpu or change the model that you want to use python webcam.py config file ../configs/caffe2/e2e_mask_rcnn_R_101_FPN_1x_caffe2.yaml min image size 300 MODEL.DEVICE cpu in order to see the probability heatmaps, pass show mask heatmaps python webcam.py min image size 300 show mask heatmaps MODEL.DEVICE cpu for the keypoint demo python webcam.py config file ../configs/caffe2/e2e_keypoint_rcnn_R_50_FPN_1x_caffe2.yaml min image size 300 MODEL.DEVICE cpu A notebook with the demo can be found in demo/Mask_R CNN_demo.ipynb (demo/Mask_R CNN_demo.ipynb). Installation Check INSTALL.md (INSTALL.md) for installation instructions. Model Zoo and Baselines Pre trained models, baselines and comparison with Detectron and mmdetection can be found in MODEL_ZOO.md (MODEL_ZOO.md) Inference in a few lines We provide a helper class to simplify writing inference pipelines using pre trained models. Here is how we would do it. Run this from the demo folder: python from maskrcnn_benchmark.config import cfg from predictor import COCODemo config_file ../configs/caffe2/e2e_mask_rcnn_R_50_FPN_1x_caffe2.yaml update the config options with the config file cfg.merge_from_file(config_file) manual override some options cfg.merge_from_list( MODEL.DEVICE , cpu ) coco_demo COCODemo( cfg, min_image_size 800, confidence_threshold 0.7, ) load image and then run prediction image ... predictions coco_demo.run_on_opencv_image(image) Perform training on COCO dataset For the following examples to work, you need to first install maskrcnn_benchmark . You will also need to download the COCO dataset. We recommend to symlink the path to the coco dataset to datasets/ as follows We use minival and valminusminival sets from Detectron bash symlink the coco dataset cd /github/maskrcnn benchmark mkdir p datasets/coco ln s /path_to_coco_dataset/annotations datasets/coco/annotations ln s /path_to_coco_dataset/train2014 datasets/coco/train2014 ln s /path_to_coco_dataset/test2014 datasets/coco/test2014 ln s /path_to_coco_dataset/val2014 datasets/coco/val2014 or use COCO 2017 version ln s /path_to_coco_dataset/annotations datasets/coco/annotations ln s /path_to_coco_dataset/train2017 datasets/coco/train2017 ln s /path_to_coco_dataset/test2017 datasets/coco/test2017 ln s /path_to_coco_dataset/val2017 datasets/coco/val2017 for pascal voc dataset: ln s /path_to_VOCdevkit_dir datasets/voc P.S. COCO_2017_train COCO_2014_train + valminusminival , COCO_2017_val minival You can also configure your own paths to the datasets. For that, all you need to do is to modify maskrcnn_benchmark/config/paths_catalog.py to point to the location where your dataset is stored. You can also create a new paths_catalog.py file which implements the same two classes, and pass it as a config argument PATHS_CATALOG during training. Single GPU training Most of the configuration files that we provide assume that we are running on 8 GPUs. In order to be able to run it on fewer GPUs, there are a few possibilities: 1. Run the following without modifications bash python /path_to_maskrcnn_benchmark/tools/train_net.py config file /path/to/config/file.yaml This should work out of the box and is very similar to what we should do for multi GPU training. But the drawback is that it will use much more GPU memory. The reason is that we set in the configuration files a global batch size that is divided over the number of GPUs. So if we only have a single GPU, this means that the batch size for that GPU will be 8x larger, which might lead to out of memory errors. If you have a lot of memory available, this is the easiest solution. 2. Modify the cfg parameters If you experience out of memory errors, you can reduce the global batch size. But this means that you'll also need to change the learning rate, the number of iterations and the learning rate schedule. Here is an example for Mask R CNN R 50 FPN with the 1x schedule: bash python tools/train_net.py config file configs/e2e_mask_rcnn_R_50_FPN_1x.yaml SOLVER.IMS_PER_BATCH 2 SOLVER.BASE_LR 0.0025 SOLVER.MAX_ITER 720000 SOLVER.STEPS (480000, 640000) TEST.IMS_PER_BATCH 1 This follows the scheduling rules from Detectron. Note that we have multiplied the number of iterations by 8x (as well as the learning rate schedules), and we have divided the learning rate by 8x. We also changed the batch size during testing, but that is generally not necessary because testing requires much less memory than training. Multi GPU training We use internally torch.distributed.launch in order to launch multi gpu training. This utility function from PyTorch spawns as many Python processes as the number of GPUs we want to use, and each Python process will only use a single GPU. bash export NGPUS 8 python m torch.distributed.launch nproc_per_node $NGPUS /path_to_maskrcnn_benchmark/tools/train_net.py config file path/to/config/file.yaml Abstractions For more information on some of the main abstractions in our implementation, see ABSTRACTIONS.md (ABSTRACTIONS.md). Adding your own dataset This implementation adds support for COCO style datasets. But adding support for training on a new dataset can be done as follows: python from maskrcnn_benchmark.structures.bounding_box import BoxList class MyDataset(object): def __init__(self, ...): as you would do normally def __getitem__(self, idx): load the image as a PIL Image image ... load the bounding boxes as a list of list of boxes in this case, for illustrative purposes, we use x1, y1, x2, y2 order. boxes 0, 0, 10, 10 , 10, 20, 50, 50 and labels labels torch.tensor( 10, 20 ) create a BoxList from the boxes boxlist BoxList(boxes, image.size, mode xyxy ) add the labels to the boxlist boxlist.add_field( labels , labels) if self.transforms: image, boxlist self.transforms(image, boxlist) return the image, the boxlist and the idx in your dataset return image, boxlist, idx def get_img_info(self, idx): get img_height and img_width. This is used if we want to split the batches according to the aspect ratio of the image, as it can be more efficient than loading the image from disk return { height : img_height, width : img_width} That's it. You can also add extra fields to the boxlist, such as segmentation masks (using structures.segmentation_mask.SegmentationMask ), or even your own instance type. For a full example of how the COCODataset is implemented, check maskrcnn_benchmark/data/datasets/coco.py (maskrcnn_benchmark/data/datasets/coco.py). Once you have created your dataset, it needs to be added in a couple of places: maskrcnn_benchmark/data/datasets/__init__.py (maskrcnn_benchmark/data/datasets/__init__.py): add it to __all__ maskrcnn_benchmark/config/paths_catalog.py (maskrcnn_benchmark/config/paths_catalog.py): DatasetCatalog.DATASETS and corresponding if clause in DatasetCatalog.get() Testing While the aforementioned example should work for training, we leverage the cocoApi for computing the accuracies during testing. Thus, test datasets should currently follow the cocoApi for now. To enable your dataset for testing, add a corresponding if statement in maskrcnn_benchmark/data/datasets/evaluation/__init__.py (maskrcnn_benchmark/data/datasets/evaluation/__init__.py): python if isinstance(dataset, datasets.MyDataset): return coco_evaluation( args) Finetuning from Detectron weights on custom datasets Create a script tools/trim_detectron_model.py like here . You can decide which keys to be removed and which keys to be kept by modifying the script. Then you can simply point the converted model path in the config file by changing MODEL.WEIGHT . For further information, please refer to 15 . Troubleshooting If you have issues running or compiling this code, we have compiled a list of common issues in TROUBLESHOOTING.md (TROUBLESHOOTING.md). If your issue is not present there, please feel free to open a new issue. Citations Please consider citing this project in your publications if it helps your research. The following is a BibTeX reference. The BibTeX entry requires the url LaTeX package. @misc{massa2018mrcnn, author {Massa, Francisco and Girshick, Ross}, title {{maskrcnn benchmark: Fast, modular reference implementation of Instance Segmentation and Object Detection algorithms in PyTorch}}, year {2018}, howpublished {\url{ note {Accessed: Insert date here } } Projects using maskrcnn benchmark RetinaMask: Learning to predict masks improves state of the art single shot detection for free . Cheng Yang Fu, Mykhailo Shvets, and Alexander C. Berg. Tech report, arXiv,1901.03353. License maskrcnn benchmark is released under the MIT license. See LICENSE (LICENSE) for additional details.",Object Detection,Object Detection 2436,Computer Vision,Computer Vision,Computer Vision,"A PyTorch implementation of a YOLO v1 Object Detector Implementation of YOLO v1 object detector in PyTorch. Full tutorial can be found here in korean. Tested under Python 3.6, PyTorch 0.4.1 on Ubuntu 16.04, Windows10. Requirements See requirements (./requirements.txt) for details. NOTICE: different versions of PyTorch package have different memory usages. How to use Training on PASCAL VOC (20 classes) main.py mode train data_path where/your/dataset/is class_path ./names/VOC.names num_class 20 use_augmentation True use_visdom True Test on PASCAL VOC (20 classes) main.py mode test data_path where/your/dataset/is class_path ./names/VOC.names num_class 20 checkpoint_path your_checkpoint.pth.tar pre built weights file python python3 utilities/download_checkpoint.py pre build weights donwload Supported Datasets Only Pascal VOC datasets are supported for now. Configuration Options argument type description default : : : : mode str train or test train dataset str only support voc now voc data_path str data path class_path str filenames text file path input_height int input height 448 input_width int input width 448 batch_size int batch size 16 num_epochs int of epochs 16000 learning_rate float initial learning rate 1e 3 dropout float dropout probability 0.5 num_gpus int of GPUs for training 1 checkpoint_path str checkpoint path ./ use_augmentation bool image Augmentation True use_visdom bool visdom False use_wandb bool wandb False use_summary bool descripte Model summary True use_gtcheck bool gt check flag False use_githash bool use githash False num_class int number of classes 5 Train Log ! train_log Results ! image ! image ! image ! image Authorship This project is equally contributed by Chanhee Jeong , Donghyeon Hwang , and Jaewon Lee . Copyright See LICENSE (./LICENSE) for details. REFERENCES 1 Redmon, Joseph, et al. You only look once: Unified, real time object detection. Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.",Object Detection,Object Detection 2440,Computer Vision,Computer Vision,Computer Vision,"AirSimDetectron AirSim integraded with Detectron. The goal of this project was to connect two worlds together, state of the art simulation with object detection. For simulation, AirSim is utilized and for object detection. Example Mask R CNN output. Installation Please find installation instructions for Caffe2 and Detectron in INSTALL.md (Needed for Detectron). Useful Link Quick Start: Using AirSimDetectron After installation, please see GETTING_STARTED.md (GETTING_STARTED.md) for brief tutorials covering inference and training with Detectron. References Original Repos: More information can be found about Airsim and Detectron on the following repositories. Airsim Detectron License AirSimDetectron is released under MIT License. Please review License file (LICENSE) for more details. See the NOTICE (NOTICE) file for additional details.",Object Detection,Object Detection 2441,Computer Vision,Computer Vision,Computer Vision,"SNIPER: Efficient Multi Scale Training SNIPER is an efficient multi scale training approach for instance level recognition tasks like object detection and instance level segmentation. Instead of processing all pixels in an image pyramid, SNIPER selectively processes context regions around the ground truth objects (a.k.a chips ). This significantly speeds up multi scale training as it operates on low resolution chips. Due to its memory efficient design, SNIPER can benefit from Batch Normalization during training and it makes larger batch sizes possible for instance level recognition tasks on a single GPU. Hence, we do not need to synchronize batch normalization statistics across GPUs and we can train object detectors similar to the way we do image classification! SNIPER is described in the following paper: SNIPER: Efficient Multi Scale Training Bharat Singh , Mahyar Najibi , and Larry S. Davis ( denotes equal contribution) arXiv preprint arXiv:1805.09300, 2018. Features 1. Train with a batch size of 160 images with a ResNet 101 backbone on 8 V100 GPUs 2. NO PYTHON LAYERS (Every layer is optimized for large batch sizes in CUDA/C++) 3. HALF PRECISION TRAINING with no loss in accuracy 4. 5 Images/second during inference on a single V100 GPU, 47.8/68.2 on COCO using ResNet 101 and without training on segmentation masks 5. Use the lightweight MobileNetV2 model trained with SNIPER to get 34.3/54.5 on COCO without training on segmentation masks 6. The R FCN 3K branch is also powered by SNIPER. Now 21% better than YOLO 9000 on ImageNetDet. This branch also supports on the fly training (in seconds) with very few samples (no bounding boxes needed!) 7. Train on OpenImagesV4 (14x bigger than COCO) with ResNet 101 in 3 days on a p3.x16.large AWS instance! Results Here are the COCO results for SNIPER trained using this repository. The models are trained on the trainval set (using only the bounding box annotations) and evaluated on the test dev set. network architecture pre trained dataset mAP mAP@0.5 mAP@0.75 mAP@S mAP@M mAP@L SNIPER ResNet 101 ImageNet 46.5 67.5 52.2 30.0 49.4 58.4 SNIPER ResNet 101 OpenImagesV4 47.8 68.2 53.6 31.5 50.4 59.8 SNIPER MobileNetV2 ImageNet 34.3 54.4 37.9 18.5 36.9 46.4 You can download the OpenImages pre trained model by running bash scripts/download_pretrained_models.sh . The SNIPER detectors based on both ResNet 101 and MobileNetV2 can be downloaded by running bash scripts/download_sniper_detector.sh . License SNIPER is released under Apache license. See LICENSE for details. Citing @article{sniper2018, title {{SNIPER}: Efficient Multi Scale Training}, author {Singh, Bharat and Najibi, Mahyar and Davis, Larry S}, journal {arXiv preprint arXiv:1805.09300}, year {2018} } @article{analysissnip2017, title {An analysis of scale invariance in object detection snip}, author {Singh, Bharat and Davis, Larry S}, journal {CVPR}, year {2018} } Contents 1. Installation ( install) 2. Running the demo ( demo) 3. Training a model with SNIPER ( training) 4. Evaluting a trained model ( evaluating) 5. Other methods and branches in this repo (SSH Face Detector, R FCN 3K, open images) ( others) Installation 1. Clone the repository: git clone recursive 2. Compile the provided MXNet fork in the repository. You need to install CUDA , CuDNN , OpenCV , and OpenBLAS . These libraries are set to be used by default in the provided config.mk file in the SNIPER mxnet repository. You can use the make command to build the MXNet library: cd SNIPER mxnet make j NUM_OF_PROCESS USE_CUDA_PATH PATH_TO_THE_CUDA_FOLDER If you plan to train models on multiple GPUs, it is optional but recommended to install NCCL and build MXNet with the NCCL support as instructed below: make j NUM_OF_PROCESS USE_CUDA_PATH PATH_TO_THE_CUDA_FOLDER USE_NCCL 1 In this case, you may also need to set the USE_NCCL_PATH variable in the above command to point to your NCCL installation path. If you need more information on how to compile MXNet please see here . 3. Compile the C++ files in the lib directory. The following script compiles them all: bash scripts/compile.sh 4. Install the required python packages: pip install r requirements.txt Running the demo For running the demo, you need to download the provided SNIPER model. The following script downloads the SNIPER model and extracts it into the default location: bash download_sniper_detector.sh After downloading the model, the following command would run the SNIPER detector with the default configs on the provided sample image: python demo.py If everything goes well, the sample detections would be saved as data/demo/demo_detections.jpg . You can also run the detector on an arbitrary image by providing its path to the script: python demo.py im_path PATH to the image However, if you plan to run the detector on multiple images, please consider using the provided multi process and multi batch main_test module. You can also test the provided SNIPER model based on the MobileNetV2 architecture by passing the provided config file as follows: python demo.py cfg configs/faster/sniper_mobilenetv2_e2e.yml Training a model For training SNIPER on COCO, you first need to download the pre trained models and configure the dataset as described below. Downloading pre trained models Running the following script downloads and extracts the pre trained models into the default path ( data/pretrained_model ): bash download_pretrained_models.sh Configuring the COCO dataset Please follow the official COCO dataset website to download the dataset. After downloading the dataset you should have the following directory structure: data datasets coco annotations images Training the SNIPER detector You can train the SNIPER detector with or without negative chip mining as described below. Training with Negative Chip Mining: Negative chip mining results in a relative improvement in AP (please refer to the paper for the details). To determine the candidate hard negative regions, SNIPER uses pre computed proposals. It is possible to use any set of proposals for this purpose. However, for COCO, we also provide the pre computed proposals extracted from a network trained with SNIPER and a short training schedule ( i.e. trained for two epochs as described in the paper). The following script downloads the pre computed proposals and extracts them into the default path ( data/proposals ): bash download_sniper_neg_props.sh After downloading the proposals, you can train the model with SNIPER and default parameters by calling the following script: python main_train.py Training without Negative Chip Mining: You can disable the negative chip mining by setting the TRAIN.USE_NEG_CHIPS to False . This is especially useful if you plan to try SNIPER on a new dataset or want to shorten the training cycle. In this case, there is no need for using any pre computed proposals and the training can be started by calling the following command: python main_train.py set TRAIN.USE_NEG_CHIPS False In any case, the default training settings can be overwritten by passing a configuration file (see the configs folder for example configuration files). The path to the configuration file can be passed as an argument to the above script using the cfg flag. It is also possible to set individual configuration key values by passing set as the last argument to the module followed by the desired key values ( i.e. set key1 value1 key2 value2 ... ). Please note that the default config file has the same settings used to train the released models. If you are using a GPU with less amount of memory, please consider reducing the training batch size (by setting TRAIN.BATCH_IMAGES in the config file or passing set TRAIN.BATCH_IMAGES DISIRED_VALUE as the last argument to the module). Also, multi processing is used to process the data. For smaller amounts of memory, you may need to reduce the number of processes and number of threads according to your system (by setting TRAIN.NUM_PROCESS and TRAIN.NUM_THREAD respectively). Evaluating a trained model Evaluating the provided SNIPER models The repository provides a set of pre trained SNIPER models which can be downloaded by running the following script: bash download_sniper_detector.sh This script downloads the model weights and extracts them into the expected directory. To evaluate these models on coco test dev with the default configuration, you can run the following script: python main_test.py The default settings can be overwritten by passing the path to a configuration file with the cfg flag (See the configs folder for examples). It is also possible to set individual configuration key values by passing set as the last argument to the module followed by the desired key values ( i.e. set key1 value1 key2 value2 ... ). Please note that the evaluation is performed in a multi image per batch and parallel model forward setting. In case of lower GPU memory, please consider reducing the batch size for different scales (by setting TEST.BATCH_IMAGES ) or reducing the number of parallel jobs (by setting TEST.CONCURRENT_JOBS in the config file). Evaluating a model trained with this repository For evaluating a model trained with this repository, you can run the following script by passing the same configuration file used during the training. The test settings can be set by updating the TEST section of the configuration file (See the configs folder for examples). python main_test.py cfg PATH TO THE CONFIG FILE USED FOR TRAINING By default, this would produce a json file containing the detections on the test dev which can be zipped and uploaded to the COCO evaluation server. Branches in this repo (SSH Face Detector, R FCN 3K, Soft Sampling) R FCN 3K This repo also contains the R FCN 3k detector. Please switch to the R FCN 3k branch for specific instructions. OpenImagesV4 with Soft Sampling This repo also contains modules to train on the open images dataset . Please switch to the openimages2 branch for specific instructions. The detector on OpenImagesV4 was trained with Soft Sampling . SSH Face Detector The SSH face detector would be added to this repository soon. In the meanwhile, you can use the code available at the original SSH repository .",Object Detection,Object Detection 2450,Computer Vision,Computer Vision,Computer Vision,"Deformable Convolutional Networks The major contributors of this repository include Yuwen Xiong , Haozhi Qi , Guodong Zhang , Yi Li , Jifeng Dai , Bin Xiao , Han Hu and Yichen Wei . We released training/testing code and pre trained models of Deformable FPN, which is the foundation of our COCO detection 2017 entry. Slides at COCO 2017 workshop . A third party improvement of Deformable R FCN + Soft NMS Introduction Deformable ConvNets is initially described in an ICCV 2017 oral paper . (Slides at ICCV 2017 Oral ) R FCN is initially described in a NIPS 2016 paper . Disclaimer This is an official implementation for Deformable Convolutional Networks (Deformable ConvNets) based on MXNet. It is worth noticing that: The original implementation is based on our internal Caffe version on Windows. There are slight differences in the final accuracy and running time due to the plenty details in platform switch. The code is tested on official MXNet@(commit 62ecb60) with the extra operators for Deformable ConvNets. After MXNet@(commit ce2bca6) the offical MXNet support all operators for Deformable ConvNets. We trained our model based on the ImageNet pre trained ResNet v1 101 using a model converter . The converted model produces slightly lower accuracy (Top 1 Error on ImageNet val: 24.0% v.s. 23.6%). This repository used code from MXNet rcnn example and mx rfcn . License © Microsoft, 2017. Licensed under an MIT license. Citing Deformable ConvNets If you find Deformable ConvNets useful in your research, please consider citing: @article{dai17dcn, Author {Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, Yichen Wei}, Title {Deformable Convolutional Networks}, Journal {arXiv preprint arXiv:1703.06211}, Year {2017} } @inproceedings{dai16rfcn, Author {Jifeng Dai, Yi Li, Kaiming He, Jian Sun}, Title {{R FCN}: Object Detection via Region based Fully Convolutional Networks}, Conference {NIPS}, Year {2016} } Main Results training data testing data mAP@0.5 mAP@0.7 time R FCN, ResNet v1 101 VOC 07+12 trainval VOC 07 test 79.6 63.1 0.16s Deformable R FCN, ResNet v1 101 VOC 07+12 trainval VOC 07 test 82.3 67.8 0.19s training data testing data mAP mAP@0.5 mAP@0.75 mAP@S mAP@M mAP@L R FCN, ResNet v1 101 coco trainval coco test dev 32.1 54.3 33.8 12.8 34.9 46.1 Deformable R FCN, ResNet v1 101 coco trainval coco test dev 35.7 56.8 38.3 15.2 38.8 51.5 Faster R CNN (2fc), ResNet v1 101 coco trainval coco test dev 30.3 52.1 31.4 9.9 32.2 47.4 Deformable Faster R CNN (2fc), ResNet v1 101 coco trainval coco test dev 35.0 55.0 38.3 14.3 37.7 52.0 training data testing data mAP mAP@0.5 mAP@0.75 mAP@S mAP@M mAP@L FPN+OHEM, ResNet v1 101 coco trainval35k coco minival 37.8 60.8 41.0 22.0 41.5 49.8 Deformable FPN + OHEM, ResNet v1 101 coco trainval35k coco minival 41.2 63.5 45.5 24.3 44.9 54.4 FPN + OHEM + Soft NMS + multi scale testing, ResNet v1 101 coco trainval35k coco minival 40.9 62.5 46.0 27.1 44.1 52.2 Deformable FPN + OHEM + Soft NMS + multi scale testing, ResNet v1 101 coco trainval35k coco minival 44.4 65.5 50.2 30.8 47.3 56.4 training data testing data mIoU time DeepLab, ResNet v1 101 Cityscapes train Cityscapes val 70.3 0.51s Deformable DeepLab, ResNet v1 101 Cityscapes train Cityscapes val 75.2 0.52s DeepLab, ResNet v1 101 VOC 12 train (augmented) VOC 12 val 70.7 0.08s Deformable DeepLab, ResNet v1 101 VOC 12 train (augmented) VOC 12 val 75.9 0.08s Running time is counted on a single Maxwell Titan X GPU (mini batch size is 1 in inference). Requirements: Software 1. MXNet from the offical repository . We tested our code on MXNet@(commit 62ecb60) . Due to the rapid development of MXNet, it is recommended to checkout this version if you encounter any issues. We may maintain this repository periodically if MXNet adds important feature in future release. 2. Python 2.7. We recommend using Anaconda2 as it already includes many common packages. We do not support Python 3 yet, if you want to use Python 3 you need to modify the code to make it work. 3. Python packages might missing: cython, opencv python > 3.2.0, easydict. If pip is set up on your system, those packages should be able to be fetched and installed by running pip install r requirements.txt 4. For Windows users, Visual Studio 2015 is needed to compile cython module. Requirements: Hardware Any NVIDIA GPUs with at least 4GB memory should be OK. Installation 1. Clone the Deformable ConvNets repository, and we'll call the directory that you cloned Deformable ConvNets as ${DCN_ROOT}. git clone 2. For Windows users, run cmd .\init.bat . For Linux user, run sh ./init.sh . The scripts will build cython module automatically and create some folders. 3. Install MXNet: Note: The MXNet's Custom Op cannot execute parallelly using multi gpus after this PR . We strongly suggest the user rollback to version MXNet@(commit 998378a) for training (following Section 3.2 3.5). Quick start 3.1 Install MXNet and all dependencies by pip install r requirements.txt If there is no other error message, MXNet should be installed successfully. Build from source (alternative way) 3.2 Clone MXNet and checkout to MXNet@(commit 998378a) by git clone recursive git checkout 998378a git submodule update if it's the first time to checkout, just use: git submodule update init recursive 3.3 Compile MXNet cd ${MXNET_ROOT} make j $(nproc) USE_OPENCV 1 USE_BLAS openblas USE_CUDA 1 USE_CUDA_PATH /usr/local/cuda USE_CUDNN 1 3.4 Install the MXNet Python binding by Note: If you will actively switch between different versions of MXNet, please follow 3.5 instead of 3.4 cd python sudo python setup.py install 3.5 For advanced users, you may put your Python packge into ./external/mxnet/$(YOUR_MXNET_PACKAGE) , and modify MXNET_VERSION in ./experiments/rfcn/cfgs/ .yaml to $(YOUR_MXNET_PACKAGE) . Thus you can switch among different versions of MXNet quickly. 4. For Deeplab, we use the argumented VOC 2012 dataset. The argumented annotations are provided by SBD dataset. For convenience, we provide the converted PNG annotations and the lists of train/val images, please download them from OneDrive . Demo & Deformable Model We provide trained deformable convnet models, including the deformable R FCN & Faster R CNN models trained on COCO trainval, and the deformable DeepLab model trained on CityScapes train. 1. To use the demo with our pre trained deformable models, please download manually from OneDrive or BaiduYun , and put it under folder model/ . Make sure it looks like this: ./model/rfcn_dcn_coco 0000.params ./model/rfcn_coco 0000.params ./model/fpn_dcn_coco 0000.params ./model/fpn_coco 0000.params ./model/rcnn_dcn_coco 0000.params ./model/rcnn_coco 0000.params ./model/deeplab_dcn_cityscapes 0000.params ./model/deeplab_cityscapes 0000.params ./model/deform_conv 0000.params ./model/deform_psroi 0000.params 2. To run the R FCN demo, run python ./rfcn/demo.py By default it will run Deformable R FCN and gives several prediction results, to run R FCN, use python ./rfcn/demo.py rfcn_only 3. To run the DeepLab demo, run python ./deeplab/demo.py By default it will run Deformable Deeplab and gives several prediction results, to run DeepLab, use python ./deeplab/demo.py deeplab_only 4. To visualize the offset of deformable convolution and deformable psroipooling, run python ./rfcn/deform_conv_demo.py python ./rfcn/deform_psroi_demo.py Preparation for Training & Testing For R FCN/Faster R CNN\: 1. Please download COCO and VOC 2007+2012 datasets, and make sure it looks like this: ./data/coco/ ./data/VOCdevkit/VOC2007/ ./data/VOCdevkit/VOC2012/ 2. Please download ImageNet pretrained ResNet v1 101 model manually from OneDrive , and put it under folder ./model . Make sure it looks like this: ./model/pretrained_model/resnet_v1_101 0000.params For DeepLab\: 1. Please download Cityscapes and VOC 2012 datasets and make sure it looks like this: ./data/cityscapes/ ./data/VOCdevkit/VOC2012/ 2. Please download argumented VOC 2012 annotations/image lists, and put the argumented annotations and the argumented train/val lists into: ./data/VOCdevkit/VOC2012/SegmentationClass/ ./data/VOCdevkit/VOC2012/ImageSets/Main/ , Respectively. 2. Please download ImageNet pretrained ResNet v1 101 model manually from OneDrive , and put it under folder ./model . Make sure it looks like this: ./model/pretrained_model/resnet_v1_101 0000.params Usage 1. All of our experiment settings (GPU , dataset, etc.) are kept in yaml config files at folder ./experiments/rfcn/cfgs , ./experiments/faster_rcnn/cfgs and ./experiments/deeplab/cfgs/ . 2. Eight config files have been provided so far, namely, R FCN for COCO/VOC, Deformable R FCN for COCO/VOC, Faster R CNN(2fc) for COCO/VOC, Deformable Faster R CNN(2fc) for COCO/VOC, Deeplab for Cityscapes/VOC and Deformable Deeplab for Cityscapes/VOC, respectively. We use 8 and 4 GPUs to train models on COCO and on VOC for R FCN, respectively. For deeplab, we use 4 GPUs for all experiments. 3. To perform experiments, run the python scripts with the corresponding config file as input. For example, to train and test deformable convnets on COCO with ResNet v1 101, use the following command python experiments\rfcn\rfcn_end2end_train_test.py cfg experiments\rfcn\cfgs\resnet_v1_101_coco_trainval_rfcn_dcn_end2end_ohem.yaml A cache folder would be created automatically to save the model and the log under output/rfcn_dcn_coco/ . 4. Please find more details in config files and in our code. Misc. Code has been tested under: Ubuntu 14.04 with a Maxwell Titan X GPU and Intel Xeon CPU E5 2620 v2 @ 2.10GHz Windows Server 2012 R2 with 8 K40 GPUs and Intel Xeon CPU E5 2650 v2 @ 2.60GHz Windows Server 2012 R2 with 4 Pascal Titan X GPUs and Intel Xeon CPU E5 2650 v4 @ 2.30GHz FAQ Q: It says AttributeError: 'module' object has no attribute 'DeformableConvolution' . A: This is because either you forget to copy the operators to your MXNet folder or you copy to the wrong path or you forget to re compile or you install the wrong MXNet Please print mxnet.__path__ to make sure you use correct MXNet Q: I encounter segment fault at the beginning. A: A compatibility issue has been identified between MXNet and opencv python 3.0+. We suggest that you always import cv2 first before import mxnet in the entry script. Q: I find the training speed becomes slower when training for a long time. A: It has been identified that MXNet on Windows has this problem. So we recommend to run this program on Linux. You could also stop it and resume the training process to regain the training speed if you encounter this problem. Q: Can you share your caffe implementation? A: Due to several reasons (code is based on a old, internal Caffe, port to public Caffe needs extra work, time limit, etc.). We do not plan to release our Caffe code. Since current MXNet convolution implementation is very similar to Caffe (almost the same), it is easy to port to Caffe by yourself, the core CUDA code could be kept unchanged. Anyone who wish to do it is welcome to make a pull request.",Object Detection,Object Detection 2451,Computer Vision,Computer Vision,Computer Vision,"Vehicle Detection Udacity Self Driving Car NanoDegree This repository contains code for a project I did as a part of Udacity's Self Driving Car Nano Degree Program . The goal is to write a software pipeline to detect vehicles in a video. The code is available in Vehicle_Detection.ipynb (Vehicle_Detection.ipynb). Algorithm Used: You Only Look Once (YOLO) v1 Brief Intro Traditional, computer vision technique based, approaches for object detection systems repurpose classifiers to perform detection. To detect an object, these systems take a classifier for that object and evaluate it at various locations and scales in a test image. Systems like deformable parts models (DPM) use a sliding window approach where the classifier is run at evenly spaced locations over the entire image. Other approaches like R CNN use region proposal methods to first generate potential bounding boxes in an image and then run a classifier on these proposed boxes. After classification, post processing is used to refine the bounding boxes, eliminate duplicate detections, and rescore the boxes based on other objects in the scene. These complex pipelines are slow and hard to optimize because each individual component must be trained separately. YOLO reframes object detection as a single regression problem, straight from image pixels to bounding box coordinates and class probabilities. A single convolutional network simultaneously predicts multiple bounding boxes and class probabilities for those boxes. YOLO trains on full images and directly optimizes detection performance. In this project we will implement tiny YOLO v1. Full details of the network, training and implementation are available in the paper YOLO Output YOLO divides the input image into an SxS grid. If the center of an object falls into a grid cell, that grid cell is responsible for detecting that object. Each grid cell predicts B bounding boxes and confidence scores for those boxes. Confidence is defined as (Probability that the grid cell contains an object) multiplied by (Intersection over union of predicted bounding box over the ground truth). Or Confidence Pr(Object) x IOU_truth_pred. (1) If no object exists in that cell, the confidence scores should be zero. Otherwise we want the confidence score to equal the intersection over union (IOU) between the predicted box and the ground truth. Each bounding box consists of 5 predictions: 1. x 2. y 3. w 4. h 5. confidence The (x; y) coordinates represent the center of the box relative to the bounds of the grid cell. The width and height are predicted relative to the whole image. Finally the confidence prediction represents the IOU between the predicted box and any ground truth box. Each grid cell also predicts C conditional class probabilities, Pr(ClassijObject). These probabilities are conditioned on the grid cell containing an object. We only predict one set of class probabilities per grid cell, regardless of the number of boxes B. At test time we multiply the conditional class probabilities and the individual box confidence predictions, Pr(Class Object) x Pr(Object) x IOU_truth_pred Pr(Class) x IOU_truth_pred (2) which gives us class specific confidence scores for each box. These scores encode both the probability of that class appearing in the box and how well the predicted box fits the object. So at test time, the final output vector for each image is a S x S x (B x 5 + C) length vector The Model Architecture The model architecture consists of 9 convolutional layers, followed by 3 fully connected layers. Each convolutional layer is followed by a Leaky RELU activation function, with alpha of 0.1. The first 6 convolutional layers also have a 2x2 max pooling layers. ! Architecture (tiny yolo.png) Implementation Pre processing Area of interest, cropping and resizing Input to the model is a batch of 448x448 images. So we first determine the area of interest for each image. We only consider this portion of the image for prediction, since cars won't be present all over the image, just on the roads in the lower portion of the image. Then this cropped image is resized to a 448x448 image. Normalization Each image pixel is normalized to have values between 1 and 1. We use simple min max normalization to achieve this. Training I have used pre trained weights for this project. Training is done in 2 parts Part 1: Training for classification This model was trained on ImageNet 1000 class classification dataset. For this we take the first 6 convolutional layers followed by a followed by a fully connected layer. Part 2: Training for detection The model is then converted for detection. This is done by adding 3 convolutional layers and 3 fully connected layers. The modified model is then trained on PASCAL VOC detection dataset. The pre trained weights for this model (180 MB) are available here . ! png (yolo.png) Post Processing The model was trained on PASCAL VOC dataset. We use S 7, B 2. PASCAL VOC has 20 labelled classes so C 20. So our final prediction, for each input image, is: output tensor length S x S x (B x 5 + C) output tensor length 7 x 7 x (2x5 + 20) output tensor length 1470. The structure of the 1470 length tensor is as follows: 1. First 980 values corresponds to probabilities for each of the 20 classes for each grid cell. These probabilities are conditioned on objects being present in each grid cell. 2. The next 98 values are confidence scores for 2 bounding boxes predicted by each grid cells. 3. The next 392 values are co ordinates (x, y, w, h) for 2 bounding boxes per grid cell. As you can see in the above image, each input image is divided into an S x S grid and for each grid cell, our model predicts B bounding boxes and C confidence scores. There is a fair amount of post processing involved to arrive at the final bounding boxes based on the model's predictions. Class score threshold We reject output from grid cells below a certain threshold (0.2) of class scores (equation 2), computed at test time. Reject overlapping (duplicate) bounding boxes If multiple bounding boxes, for each class overlap and have an IOU of more than 0.4 (intersecting area is 40% of union area of boxes), then we keep the box with the highest class score and reject the other box(es). Drawing the bounding boxes The predictions (x, y) for each bounding box are relative to the bounds of the grid cell and (w, h) are relative to the whole image. To compute the final bounding box coodinates we have to multiply w & h with the width & height of the portion of the image used as input for the network. Testing The pipeline is applied to individual images. Here is the result. ! png (test_output.png) The Video The pipeline is applied to a video. Click on the image to watch the video or click here . You will be redirected to YouTube. Project Video",Object Detection,Object Detection 2455,Computer Vision,Computer Vision,Computer Vision,"Flip kart grid challenge Main Approach: With the independent feature being the image itself and the dependent feature being two set of coordinates, our approach revolves around building a standard Convolutional Neural Network. The architecture of the CNN used is ResNet50,Resnet18 . A custom head is added to the CNN to get the desired 4 numbers as output. Some key techniques used to improve the model: 1.) Data Augmentation.(Changing the brightness, contrast, rotation, zooming, warping of the given images to artificialy generate more data.) 2.) Differential Learning rates. 3.) One cycle fitting policy 4.) Cyclical Learning rates 5.) Stochastic Gradient Descent with Restarts.(The basic idea is to reset our learning rate after a certain number of iterations so that we can pop out of the local minima if we appear to be stuck.) Main libraries used: Fastai built on top of PyTorch. Numpy Pandas OpenCV",Object Detection,Object Detection 2464,Computer Vision,Computer Vision,Computer Vision,"py faster rcnn has been deprecated. Please see Detectron , which includes an implementation of Mask R CNN . Disclaimer The official Faster R CNN code (written in MATLAB) is available here . If your goal is to reproduce the results in our NIPS 2015 paper, please use the official code . This repository contains a Python reimplementation of the MATLAB code. This Python implementation is built on a fork of Fast R CNN . There are slight differences between the two implementations. In particular, this Python port is 10% slower at test time, because some operations execute on the CPU in Python layers (e.g., 220ms / image vs. 200ms / image for VGG16) gives similar, but not exactly the same, mAP as the MATLAB version is not compatible with models trained using the MATLAB code due to the minor implementation differences includes approximate joint training that is 1.5x faster than alternating optimization (for VGG16) see these slides for more information Faster R CNN: Towards Real Time Object Detection with Region Proposal Networks By Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun (Microsoft Research) This Python implementation contains contributions from Sean Bell (Cornell) written during an MSR internship. Please see the official README.md for more details. Faster R CNN was initially described in an arXiv tech report and was subsequently published in NIPS 2015. License Faster R CNN is released under the MIT License (refer to the LICENSE file for details). Citing Faster R CNN If you find Faster R CNN useful in your research, please consider citing: @inproceedings{renNIPS15fasterrcnn, Author {Shaoqing Ren and Kaiming He and Ross Girshick and Jian Sun}, Title {Faster {R CNN}: Towards Real Time Object Detection with Region Proposal Networks}, Booktitle {Advances in Neural Information Processing Systems ({NIPS})}, Year {2015} } Contents 1. Requirements: software ( requirements software) 2. Requirements: hardware ( requirements hardware) 3. Basic installation ( installation sufficient for the demo) 4. Demo ( demo) 5. Beyond the demo: training and testing ( beyond the demo installation for training and testing models) 6. Usage ( usage) Requirements: software NOTE If you are having issues compiling and you are using a recent version of CUDA/cuDNN, please consult this issue for a workaround 1. Requirements for Caffe and pycaffe (see: Caffe installation instructions ) Note: Caffe must be built with support for Python layers! make In your Makefile.config, make sure to have this line uncommented WITH_PYTHON_LAYER : 1 Unrelatedly, it's also recommended that you use CUDNN USE_CUDNN : 1 You can download my Makefile.config for reference. 2. Python packages you might not have: cython , python opencv , easydict 3. Optional MATLAB is required for official PASCAL VOC evaluation only. The code now includes unofficial Python evaluation code. Requirements: hardware 1. For training smaller networks (ZF, VGG_CNN_M_1024) a good GPU (e.g., Titan, K20, K40, ...) with at least 3G of memory suffices 2. For training Fast R CNN with VGG16, you'll need a K40 (11G of memory) 3. For training the end to end version of Faster R CNN with VGG16, 3G of GPU memory is sufficient (using CUDNN) Installation (sufficient for the demo) 1. Clone the Faster R CNN repository Shell Make sure to clone with recursive git clone recursive 2. We'll call the directory that you cloned Faster R CNN into FRCN_ROOT Ignore notes 1 and 2 if you followed step 1 above. Note 1: If you didn't clone Faster R CNN with the recursive flag, then you'll need to manually clone the caffe fast rcnn submodule: Shell git submodule update init recursive Note 2: The caffe fast rcnn submodule needs to be on the faster rcnn branch (or equivalent detached state). This will happen automatically if you followed step 1 instructions . 3. Build the Cython modules Shell cd $FRCN_ROOT/lib make 4. Build Caffe and pycaffe Shell cd $FRCN_ROOT/caffe fast rcnn Now follow the Caffe installation instructions here: If you're experienced with Caffe and have all of the requirements installed and your Makefile.config in place, then simply do: make j8 && make pycaffe 5. Download pre computed Faster R CNN detectors Shell cd $FRCN_ROOT ./data/scripts/fetch_faster_rcnn_models.sh This will populate the $FRCN_ROOT/data folder with faster_rcnn_models . See data/README.md for details. These models were trained on VOC 2007 trainval. Demo After successfully completing basic installation ( installation sufficient for the demo) , you'll be ready to run the demo. To run the demo Shell cd $FRCN_ROOT ./tools/demo.py The demo performs detection using a VGG16 network trained for detection on PASCAL VOC 2007. Beyond the demo: installation for training and testing models 1. Download the training, validation, test data and VOCdevkit Shell wget wget wget 2. Extract all of these tars into one directory named VOCdevkit Shell tar xvf VOCtrainval_06 Nov 2007.tar tar xvf VOCtest_06 Nov 2007.tar tar xvf VOCdevkit_08 Jun 2007.tar 3. It should have this basic structure Shell $VOCdevkit/ development kit $VOCdevkit/VOCcode/ VOC utility code $VOCdevkit/VOC2007 image sets, annotations, etc. ... and several other directories ... 4. Create symlinks for the PASCAL VOC dataset Shell cd $FRCN_ROOT/data ln s $VOCdevkit VOCdevkit2007 Using symlinks is a good idea because you will likely want to share the same PASCAL dataset installation between multiple projects. 5. Optional follow similar steps to get PASCAL VOC 2010 and 2012 6. Optional If you want to use COCO, please see some notes under data/README.md 7. Follow the next sections to download pre trained ImageNet models Download pre trained ImageNet models Pre trained ImageNet models can be downloaded for the three networks described in the paper: ZF and VGG16. Shell cd $FRCN_ROOT ./data/scripts/fetch_imagenet_models.sh VGG16 comes from the Caffe Model Zoo , but is provided here for your convenience. ZF was trained at MSRA. Usage To train and test a Faster R CNN detector using the alternating optimization algorithm from our NIPS 2015 paper, use experiments/scripts/faster_rcnn_alt_opt.sh . Output is written underneath $FRCN_ROOT/output . Shell cd $FRCN_ROOT ./experiments/scripts/faster_rcnn_alt_opt.sh GPU_ID NET set ... GPU_ID is the GPU you want to train on NET in {ZF, VGG_CNN_M_1024, VGG16} is the network arch to use set ... allows you to specify fast_rcnn.config options, e.g. set EXP_DIR seed_rng1701 RNG_SEED 1701 ( alt opt refers to the alternating optimization training algorithm described in the NIPS paper.) To train and test a Faster R CNN detector using the approximate joint training method, use experiments/scripts/faster_rcnn_end2end.sh . Output is written underneath $FRCN_ROOT/output . Shell cd $FRCN_ROOT ./experiments/scripts/faster_rcnn_end2end.sh GPU_ID NET set ... GPU_ID is the GPU you want to train on NET in {ZF, VGG_CNN_M_1024, VGG16} is the network arch to use set ... allows you to specify fast_rcnn.config options, e.g. set EXP_DIR seed_rng1701 RNG_SEED 1701 This method trains the RPN module jointly with the Fast R CNN network, rather than alternating between training the two. It results in faster ( 1.5x speedup) training times and similar detection accuracy. See these slides for more details. Artifacts generated by the scripts in tools are written in this directory. Trained Fast R CNN networks are saved under: output/ / / Test outputs are saved under: output/ / / /",Object Detection,Object Detection 2466,Computer Vision,Computer Vision,Computer Vision,"IBM Developer Model Asset Exchange: Object Detector This repository contains code to instantiate and deploy an object detection model. This model recognizes the objects present in an image from the 80 different high level classes of objects in the COCO Dataset . The model consists of a deep convolutional net base model for image feature extraction, together with additional convolutional layers specialized for the task of object detection, that was trained on the COCO data set. The input to the model is an image, and the output is a list of estimated class probabilities for the objects detected in the image. The model is based on the SSD Mobilenet V1 object detection model for TensorFlow . The model files are hosted on IBM Cloud Object Storage . The code in this repository deploys the model as a web service in a Docker container. This repository was developed as part of the IBM Code Model Asset Exchange . Model Metadata Domain Application Industry Framework Training Data Input Data Format Vision Object Detection General TensorFlow COCO Dataset Image (RGB/HWC) References _J. Huang, V. Rathod, C. Sun, M. Zhu, A. Korattikara, A. Fathi, I. Fischer, Z. Wojna, Y. Song, S. Guadarrama, K. Murphy_, Speed/accuracy trade offs for modern convolutional object detectors , CVPR 2017 _Tsung Yi Lin, M. Maire, S. Belongie, L. Bourdev, R. Girshick, J. Hays, P. Perona, D. Ramanan, C. Lawrence Zitnick, P. Dollár_, Microsoft COCO: Common Objects in Context , arXiv 2015 _W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Fu, A. C. Berg_, SSD: Single Shot MultiBox Detector , CoRR (abs/1512.02325), 2016 _A.G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, H. Adam_, MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , arXiv 2017 TensorFlow Object Detection GitHub Repo Licenses Component License Link This repository Apache 2.0 LICENSE (LICENSE) Model Weights Apache 2.0 TensorFlow Models Repo Model Code (3rd party) Apache 2.0 TensorFlow Models Repo Test assets Various Asset README (assets/README.md) Pre requisites: docker : The Docker command line interface. Follow the installation instructions for your system. The minimum recommended resources for this model is 2GB Memory and 2 CPUs. Steps 1. Deploy from Docker Hub ( deploy from docker hub) 2. Deploy on Kubernetes ( deploy on kubernetes) 3. Run Locally ( run locally) Deploy from Docker Hub To run the docker image, which automatically starts the model serving API, run: $ docker run it p 5000:5000 codait/max object detector This will pull a pre built image from Docker Hub (or use an existing image if already cached locally) and run it. If you'd rather checkout and build the model locally you can follow the run locally ( run locally) steps below. Deploy on Kubernetes You can also deploy the model on Kubernetes using the latest docker image on Docker Hub. On your Kubernetes cluster, run the following commands: $ kubectl apply f The model will be available internally at port 5000 , but can also be accessed externally through the NodePort . Run Locally 1. Build the Model ( 1 build the model) 2. Deploy the Model ( 2 deploy the model) 3. Use the Model ( 3 use the model) 4. Development ( 4 development) 5. Cleanup ( 5 cleanup) 1. Build the Model Clone this repository locally. In a terminal, run the following command: $ git clone Change directory into the repository base folder: $ cd MAX Object Detector To build the docker image locally, run: $ docker build t max object detector . All required model assets will be downloaded during the build process. _Note_ that currently this docker image is CPU only (we will add support for GPU images later). 2. Deploy the Model To run the docker image, which automatically starts the model serving API, run: $ docker run it p 5000:5000 max object detector 3. Use the Model The API server automatically generates an interactive Swagger documentation page. Go to to load it. From there you can explore the API and also create test requests. Use the model/predict endpoint to load a test image (you can use one of the test images from the assets folder) and get predicted labels for the image from the API. The coordinates of the bounding box are returned in the detection_box field, and contain the array of normalized coordinates (ranging from 0 to 1) in the form ymin, xmin, ymax, xmax . ! Swagger Doc Screenshot (docs/swagger screenshot.png) You can also test it on the command line, for example: $ curl F image @assets/dog human.jpg XPOST You should see a JSON response like that below: json { status : ok , predictions : { label_id : 1 , label : person , probability : 0.944034993648529, detection_box : 0.1242099404335022, 0.12507188320159912, 0.8423267006874084, 0.5974075794219971 }, { label_id : 18 , label : dog , probability : 0.8645511865615845, detection_box : 0.10447660088539124, 0.17799153923988342, 0.8422801494598389, 0.732001781463623 } } You can also control the probability threshold for what objects are returned using the threshold argument like below: $ curl F image @assets/dog human.jpg XPOST The optional threshold parameter is the minimum probability value for predicted labels returned by the model. The default value for threshold is 0.7 . 4. Development To run the Flask API app in debug mode, edit config.py to set DEBUG True under the application settings. You will then need to rebuild the docker image (see step 1 ( 1 build the model)). 5. Cleanup To stop the Docker container, type CTRL + C in your terminal. Links Object Detector Web App : A reference application created by the IBM CODAIT team that uses the Object Detector Object Detector Web App The latest release of the MAX Object Detector Web App is included in the Object Detector docker image. When the model API server is running, the web app can be accessed at and provides interactive visualization of the bounding boxes and their related labels returned by the model. ! Mini Web App Screenshot (docs/mini web app.png) If you wish to disable the web app, start the model serving API by running: $ docker run it p 5000:5000 e DISABLE_WEB_APP true codait/max object detector",Object Detection,Object Detection 2473,Computer Vision,Computer Vision,Computer Vision,"PyTorch implementation of VQ VAE by van den Oord et al., 2017 applied to CIFAR10 dataset by Alex Krizhevsky, 2009 using classes, inspired from the code of zalandoresearch/pytorch vq vae and deepmind/sonnet . Results The trained models used in the following experiments are saved in results/shuffled and results/unshuffled/ directories. The experiments was shorter than necessary as it was only for educational purpose. In order to obtain better image reconstructions, it is necessary to increase the number of residual hidden neurons (ie., 256 instead of 256) and to increase the number of training updates (ie., 250K instead of 25K). The following results ( results/unsuffled ) are slightly less good than ( results/shuffled ). Using original version Reconstruction loss plot using the original version by van den Oord et al., 2017 : ! alt text (results/unshuffled//loss.png) The original images: ! alt text (results/unshuffled//original_images.png) The reconstructed images: ! alt text (results/unshuffled//validation_images.png) Using EMA updates In my experiments, using the EMA updates proposes in Roy et al., 2018 , the final reconstruction loss was 2.66 times smaller (0.235 instead of 0.627) for shuffled dataset, and similar for unshuffled dataset: ! alt text (results/unshuffled//loss_ema.png) The original images: ! alt text (results/unshuffled//original_images_ema.png) As we can see, the reconstructed images are less blurred than the previous ones: ! alt text (results/unshuffled//validation_images_ema.png) Using EMA updates + kaiming normal One can also use the weight normalization proposed by He, K et al., 2015 , as the model converges a little faster. ! alt text (results/unshuffled//loss_ema_norm_he et al.png) The original images : ! alt text (results/unshuffled//original_images_ema_norm_he et al.png) The reconstructed images : ! alt text (results/unshuffled//validation_images_ema_norm_he et al.png) I also used nn.utils.weight_norm() before each call of kaiming_normal() , as they do in ksw0306/ClariNet because the model converged better. In my experiments, EMA + kaiming without this additional normalisation reduces the performances, as we can see in the additional results (results/shuffled/loss_ema_he et al.png). Installation It requires python3, python3 pip and the packages listed in requirements.txt (requirements.txt). To install the required packages: bash pip3 install r requirements.txt Examples of usage First, move to the source directory: bash cd src bash python3 main.py help Output: usage: main.py h batch_size BATCH_SIZE num_training_updates NUM_TRAINING_UPDATES num_hiddens NUM_HIDDENS num_residual_hiddens NUM_RESIDUAL_HIDDENS num_residual_layers NUM_RESIDUAL_LAYERS embedding_dim EMBEDDING_DIM num_embeddings NUM_EMBEDDINGS commitment_cost COMMITMENT_COST decay DECAY learning_rate LEARNING_RATE use_kaiming_normal USE_KAIMING_NORMAL shuffle_dataset SHUFFLE_DATASET data_path DATA_PATH results_path RESULTS_PATH loss_plot_name LOSS_PLOT_NAME model_name MODEL_NAME original_images_name ORIGINAL_IMAGES_NAME validation_images_name VALIDATION_IMAGES_NAME use_cuda_if_available USE_CUDA_IF_AVAILABLE optional arguments: h, help show this help message and exit batch_size BATCH_SIZE The size of the batch during training (default: 32) num_training_updates NUM_TRAINING_UPDATES The number of updates during training (default: 25000) num_hiddens NUM_HIDDENS The number of hidden neurons in each layer (default: 128) num_residual_hiddens NUM_RESIDUAL_HIDDENS The number of hidden neurons in each layer within a residual block (default: 32) num_residual_layers NUM_RESIDUAL_LAYERS The number of residual layers in a residual stack (default: 2) embedding_dim EMBEDDING_DIM Representing the dimensionality of the tensors in the quantized space (default: 64) num_embeddings NUM_EMBEDDINGS The number of vectors in the quantized space (default: 512) commitment_cost COMMITMENT_COST Controls the weighting of the loss terms (default: 0.25) decay DECAY Decay for the moving averages (set to 0.0 to not use EMA) (default: 0.99) learning_rate LEARNING_RATE The learning rate of the optimizer during training updates (default: 0.0003) use_kaiming_normal USE_KAIMING_NORMAL Use the weight normalization proposed in He, K et al., 2015 (default: True) unshuffle_dataset Do not shuffle the dataset before training (default: False) data_path DATA_PATH The path of the data directory (default: data) results_path RESULTS_PATH The path of the results directory (default: results) loss_plot_name LOSS_PLOT_NAME The file name of the training loss plot (default: loss.png) model_name MODEL_NAME The file name of trained model (default: model.pth) original_images_name ORIGINAL_IMAGES_NAME The file name of the original images used in evaluation (default: original_images.png) validation_images_name VALIDATION_IMAGES_NAME The file name of the reconstructed images used in evaluation (default: validation_images.png) use_cuda_if_available USE_CUDA_IF_AVAILABLE Specify if GPU will be used if available (default: True) Use default vector quantized algorithm, do not shuffle the dataset and do not use He, K et al., 2015 weight normalization: bash python main.py results_path results/unshuffled/ use_kaiming_normal False decay 0.0 unshuffle_dataset Use EMA vector quantized algorithm, do not shuffle the dataset and do not use He, K et al., 2015 weight normalization: bash python main.py results_path results/unshuffled/ use_kaiming_normal False decay 0.99 loss_plot_name loss_ema.png model_name model_ema.pth original_images_name original_images_ema.png validation_images_name validation_images_ema.png unshuffle_dataset Use EMA vector quantized algorithm, do not shuffle the dataset and do use He, K et al., 2015 weight normalization: bash python main.py results_path results/unshuffled/ use_kaiming_normal True decay 0.99 loss_plot_name loss_ema_norm_he et al.png model_name model_ema_norm_he et al.pth original_images_name original_images_ema_norm_he et al.png validation_images_name validation_images_ema_norm_he et al.png unshuffle_dataset Code usage Example of usage (see here (src/main.py) for the complete example): py configuration Configuration.build_from_args(args) Get the dataset and model hyperparameters dataset Cifar10Dataset(configuration.batch_size, dataset_path) Create an instance of CIFAR10 dataset auto_encoder AutoEncoder(device, configuration).to(device) Create an AutoEncoder model using our GPU device optimizer optim.Adam(auto_encoder.parameters(), lr configuration.learning_rate, amsgrad True) Create an Adam optimizer instance trainer Trainer(device, auto_encoder, optimizer, dataset) Create a trainer instance trainer.train(configuration.num_training_updates) Train our model on the CIFAR10 dataset trainer.save_loss_plot(results_path + os.sep + 'loss.png') Save the loss plot auto_encoder.save(results_path + os.sep + 'model.pth') Save our trained model evaluator Evaluator(device, auto_encoder, dataset) Create en Evaluator instance to evaluate our trained model evaluator.reconstruct() Reconstruct our images from the embedded space evaluator.save_original_images_plot(results_path + os.sep + 'original_images.png') Save the original images for comparaison purpose evaluator.save_validation_reconstructions_plot(results_path + os.sep + 'validation_images.png') Reconstruct the decoded images and save them References van den Oord et al., 2017 van den Oord A., and Oriol Vinyals. Neural discrete representation learning. Advances in Neural Information Processing Systems(NIPS). 2017 . Alex Krizhevsky, 2009 Learning Multiple Layers of Features from Tiny Images . zalandoresearch/pytorch vq vae deepmind/sonnet Roy et al., 2018 A. Roy, A. Vaswani, A. Neelakantan, and N. Parmar. Theory and experiments on vector quantized autoencoders.arXiv preprint arXiv:1805.11063, 2018 . He, K et al., 2015 He, K., Zhang, X., Ren, S and Sun, J. Deep Residual Learning for Image Recognition. arXiv e prints arXiv:1502.01852 . ksw0306/ClariNet",Object Detection,Object Detection 2481,Computer Vision,Computer Vision,Computer Vision,resnet_cancer_detection My notebook comparing the performance of different ResNet architectures & DenseNet architectures on identifying metastatic tissue in histopathologic scans of lymph node sections.,Object Detection,Object Detection 2483,Computer Vision,Computer Vision,Computer Vision,resnet_cancer_detection My notebook comparing the performance of different ResNet architectures & DenseNet architectures on identifying metastatic tissue in histopathologic scans of lymph node sections.,Object Detection,Object Detection 2486,Computer Vision,Computer Vision,Computer Vision,"imagine nn Universite Paris Est Marne la Vallee IMAGINE/LIGM torch neural network routines Following modules are here for now: lua inn.SpatialStochasticPooling(kW,kH,dW,dH) inn.SpatialSameResponseNormalization( size 3 , alpha 0.00005 , beta 0.75 ) inn.MeanSubtraction(mean) inn.SpatialPyramidPooling({{w1,h1},{w2,h2},...,{wn,hn}}) inn.ROIPooling(W,H):setSpatialScale(scale) Look at for inn.SpatialStochasticPooling reference, this is fully working implementation. inn.ROIPooling is Spatial Adaptive Max Pooling layer for region proposals used in FastRCNN with bugfixes and 50 times faster in backprop. Set v2 false to use it's old version. inn.ROIPooling expects a table on input, first argument is features in NxDxHxW where N is number of images, second argument is bounding boxes in Bx5 where B is the number of regions to pool and 5 is image id + bbox. Image id is in 1,N range, boxes are in x1,y1,x2,y2 . inn.SpatialSameResponseNormalization is a local response normalization in the same map in BDHW format. For details refer to inn.MeanSubtraction(mean) is done to subtract the Imagenet mean directly on GPU. Mean tensor is expanded to BDHW batches without using additional memory. inn.SpatialPyramidPooling({{w1,h1},{w2,h2},...,{wn,hn}}) is a pyramid of regions obtained by using Spatial Adaptive Max Pooling with parameters (w1,h1),...,(wn,hn) in the input. The result is a fixed sized vector of size w1 h1 ...wn hn for any input dimension. For details see OBSOLETE modules The difference with inn.SpatialMax(Average)Pooling and nn.SpatialMax(Average)Pooling is that output size computed with ceil instead of floor (as in Caffe and cuda convnet2). Also SpatialAveragePooling does true average pooling, meaning that it divides outputs by kW kH. inn.SpatialMax(Average)Pooling(kW,kH,dW,dH) is equal to cudnn.SpatialMax(Average)Pooling(kW,kH,dW,dH):ceil(). inn.SpatialCrossResponseNormalization is local response normalization across maps in BDHW format (thanks to Caffe!). For details refer to inn.SpatialMaxPooling(kW,kH,dW,dH) OBSOLETE! USE nn.SpatialMaxPooling(kW,kH,dW,dH,padW,padH):ceil() inn.SpatialAveragePooling(kW,kH,dW,dH) OBSOLETE! USE nn.SpatialAveragePooling(kW,kH,dW,dH,padW,padH):ceil() inn.SpatialCrossResponseNormalization(size, alpha 0.0001 , beta 0.75 , k 1 ) OBSOLETE! USE nn.SpatialCrossMapLRN with the same arguments",Object Detection,Object Detection 2489,Computer Vision,Computer Vision,Computer Vision,"DSOD: Learning Deeply Supervised Object Detectors from Scratch Update (02/26/2019) We observe that if we simply increase the batch size (bs) on each GPU from 4 (Titan X) to 12 (P40) for training BN layers, our DSOD300 can achieve much better performance without any other modifications (see comparisons below). We think if we have a better solution to tune BN layers' params, e.g., Sync BN 1 when training detectors from scratch, the accuracy may be higher. This is also consistent with 2 . We have also provided some preliminary results on exploring the factors of training two stage detectors from scratch in our extended paper (v2) 3 . New results on PASCAL VOC test set: Method VOC 2007 test mAP parameters Models : : : : : : : DSOD300 (07+12) bs 4 on each GPU 77.7 14.8M Download (59.2M) DSOD300 (07+12) bs 12 on each GPU 78.9 14.8M Download (59.2M) 1 Chao Peng, Tete Xiao, Zeming Li, Yuning Jiang, Xiangyu Zhang, Kai Jia, Gang Yu, and Jian Sun. Megdet: A large mini batch object detector. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6181 6189. 2018. 2 Kaiming He, Ross Girshick, and Piotr Dollár. Rethinking ImageNet pre training. arXiv preprint arXiv:1811.08883 (2018). 3 Zhiqiang Shen, Zhuang Liu, Jianguo Li, Yu Gang Jiang, Yurong Chen, and Xiangyang Xue. Object Detection from Scratch with Deep Supervision. arXiv preprint arXiv:1809.09294 (2018). This repository contains the code for the following paper DSOD: Learning Deeply Supervised Object Detectors from Scratch (ICCV 2017). Zhiqiang Shen \ , Zhuang Liu \ , Jianguo Li , Yu Gang Jiang , Yurong chen , Xiangyang Xue . (\ Equal Contribution) The code is based on the SSD framework. Other Implementations: Pytorch by Yun Chen, Pytorch by uoip, Pytorch by qqadssp, Pytorch by Ellinier , Mxnet by Leo Cheng, Mxnet by eureka7mt, Tensorflow by Windaway. If you find this helps your research, please cite: @inproceedings{Shen2017DSOD, title {DSOD: Learning Deeply Supervised Object Detectors from Scratch}, author {Shen, Zhiqiang and Liu, Zhuang and Li, Jianguo and Jiang, Yu Gang and Chen, Yurong and Xue, Xiangyang}, booktitle {ICCV}, year {2017} } @article{shen2018object, title {Object Detection from Scratch with Deep Supervision}, author {Shen, Zhiqiang and Liu, Zhuang and Li, Jianguo and Jiang, Yu Gang and Chen, Yurong and Xue, Xiangyang}, journal {arXiv preprint arXiv:1809.09294}, year {2018} } Introduction DSOD focuses on the problem of training object detector from scratch (without pretrained models on ImageNet). To the best of our knowledge, this is the first work that trains neural object detectors from scratch with state of the art performance. In this work, we contribute a set of design principles for this purpose. One of the key findings is the deeply supervised structure enabled by dense layer wise connections , plays a critical role in learning a good detection model. Please see our paper for more details. Figure 1: DSOD prediction layers with plain and dense structures (for 300×300 input). Visualization 0. Visualizations of network structures (tools from ethereon , ignore the warning messages): DSOD300 Results & Models The tables below show the results on PASCAL VOC 2007, 2012 and MS COCO. PASCAL VOC test results: Method VOC 2007 test mAP fps (Titan X) parameters Models : : : : : : : : : DSOD300_smallest (07+12) 73.6 5.9M Download (23.5M) DSOD300_lite (07+12) 76.7 25.8 10.4M Download (41.8M) DSOD300 (07+12) 77.7 17.4 14.8M Download (59.2M) DSOD300 (07+12+COCO) 81.7 17.4 14.8M Download (59.2M) Method VOC 2012 test mAP fps parameters Models : : : : : : : : : DSOD300 (07++12) 76.3 17.4 14.8M Download (59.2M) DSOD300 (07++12+COCO) 79.3 17.4 14.8M Download (59.2M) COCO test dev 2015 result (COCO has more object categories than VOC dataset, so the model size is slightly bigger.): Method COCO test dev 2015 mAP (IoU 0.5:0.95) Models : : : : : DSOD300 (COCO trainval) 29.3 Download (87.2M) Preparation 0. Install SSD following the instructions there, including: (1) Install SSD caffe; (2) Download PASCAL VOC 2007 and 2012 datasets; and (3) Create LMDB file. Make sure you can run it without any errors. Our PASCAL VOC LMDB files: Method LMDBs : : : Train on VOC07+12 and test on VOC07 Download Train on VOC07++12 and test on VOC12 (Comp4) Download Train on VOC12 and test on VOC12 (Comp3) Download 1. Create a subfolder dsod under example/ , add files DSOD300_pascal.py , DSOD300_pascal++.py , DSOD300_coco.py , score_DSOD300_pascal.py and DSOD300_detection_demo.py to the folder example/dsod/ . 2. Create a subfolder grp_dsod under example/ , add files GRP_DSOD320_pascal.py and score_GRP_DSOD320_pascal.py to the folder example/grp_dsod/ . 3. Replace the file model_libs.py in the folder python/caffe/ with ours. Training & Testing Train a DSOD model on VOC 07+12: shell python examples/dsod/DSOD300_pascal.py Train a DSOD model on VOC 07++12: shell python examples/dsod/DSOD300_pascal++.py Train a DSOD model on COCO trainval: shell python examples/dsod/DSOD300_coco.py Evaluate the model (DSOD): shell python examples/dsod/score_DSOD300_pascal.py Run a demo (DSOD): shell python examples/dsod/DSOD300_detection_demo.py Train a GRP_DSOD model on VOC 07+12: shell python examples/grp_dsod/GRP_DSOD320_pascal.py Evaluate the model (GRP_DSOD): shell python examples/dsod/score_GRP_DSOD320_pascal.py Note : You can modify the file model_lib.py to design your own network structure as you like. Examples Contact Zhiqiang Shen (zhiqiangshen0214 at gmail.com) Zhuang Liu (liuzhuangthu at gmail.com) Any comments or suggestions are welcome!",Object Detection,Object Detection 2490,Computer Vision,Computer Vision,Computer Vision,"ResNet TensorFlow This is a TensorFlow implementation of ResNet, a deep residual network developed by Kaiming He , Xiangyu Zhang , Shaoqing Ren , Jian Sun . Read the original paper: Deep Residual Learning for Image Recognition . Disclaimer: I implemented this for only learning purposes. Check out the original repo for other unofficial implementations. TODO: put CIFAR 10 data in a TensorFlow Dataset object Getting Started Cloning the repo shell $ git clone $ cd resnet tf Setting up the virtualenv, installing TensorFlow (OS X) shell $ virtualenv venv $ source venv/bin/activate (venv)$ pip install upgrade If you don't have virtualenv installed, run pip install virtualenv . Also, the cifar 10 data for python can be found at: Place the data in the main directory. Start Training: shell (venv)$ python main.py This starts the training for ResNet 20, saving the progress after training every 512 images. To train a net of different depth, comment the line in main.py net models.resnet(X, 20) and uncomment the line initializing the appropriate model.",Object Detection,Object Detection 2492,Computer Vision,Computer Vision,Computer Vision,traial M2Det (use Keras 2.x) paper memo ! StructureSummary (doc/M2Det.png),Object Detection,Object Detection 2505,Computer Vision,Computer Vision,Computer Vision,"GANonymizer GANonymizer Image Annonymization Method Using Object Detection and Generative Adversarial Networks Improved version, GANonymizerV2 is available here . Note The code of this repository is leverated on the following repositories. Also we use the pre trained models opend publicly in their repository. 1. 2. 3. Requirements Written in the docker/gpu_requirements.txt . Preparation 1. git clone 2. download cfgs and weights for (SSD512 or YOLOV3) and GLCIC cd ganonymizer SSD cfgs wget P ganonymizer/src/detection/ssd/cfgs SSD weights wget P ganonymizer/src/detection/ssd/weights YOLO V3 (this link is the author page of YOLOV3.) wget P ganonymizer/src/detection/yolov3/weights GLCIC weights wget P ganonymizer/src/inpaint/glcic/weights In terms of SSD, We use Weiliu's SSD model . You can download SSD's cfgs and weights from this . GLCIC model we use is Iizuka's model . We convert the model's type from torch7 to pytorch. Usage (Docker) 1. Install Docker and Docker compose 1. custom config in ganonymizer/main.py 1. Build the container docker compose f ./docker/docker compose {cpu/gpu}.yml build {/ gpu} 1. Execute GANonymizer to the input image specified in the main.py docker compose f ./docker/docker compose {cpu/gpu}.yml run experiment python3 main.py Reference The followings are the main reference paper. 1. 2. 3. Details of this paper Title: GANonymizer: Image Anonymization Method Integrating Object detection and Generative Adversarial Networks Authors: Tomoki Tanimura, Makoto Kawano, Takuro Yonezawa, Jin Nakazawa",Object Detection,Object Detection 2527,Computer Vision,Computer Vision,Computer Vision,"Yolo v3 and Yolo v2 for Windows and Linux (neural network for object detection) Tensor Cores can be used on Linux and Windows CircleCI 0. Improvements in this repository ( improvements in this repository) 1. How to use ( how to use) 2. How to compile on Linux ( how to compile on linux) 3. How to compile on Windows ( how to compile on windows) 4. How to train (Pascal VOC Data) ( how to train pascal voc data) 5. How to train (to detect your custom objects) ( how to train to detect your custom objects) 6. When should I stop training ( when should i stop training) 7. How to calculate mAP on PascalVOC 2007 ( how to calculate map on pascalvoc 2007) 8. How to improve object detection ( how to improve object detection) 9. How to mark bounded boxes of objects and create annotation files ( how to mark bounded boxes of objects and create annotation files) 10. Using Yolo9000 ( using yolo9000) 11. How to use Yolo as DLL ( how to use yolo as dll) ! Darknet Logo ! map_fps mAP (AP50) YOLOv3 spp (is not indicated) better than YOLOv3 mAP 60.6%, FPS 20: Yolo v3 source chart for the RetinaNet on MS COCO got from Table 1 (e): Yolo v2 on Pascal VOC 2007: Yolo v2 on Pascal VOC 2012 (comp4): You Only Look Once: Unified, Real Time Object Detection (versions 2 & 3) A Yolo cross platform Windows and Linux version (for object detection). Contributtors: This repository is forked from Linux version: More details: This repository supports: both Windows and Linux both OpenCV 2.x.x and OpenCV 7.5 also create SO library on Linux and DLL library on Windows Requires: Linux GCC> 4.9 or Windows MS Visual Studio 2015 (v140) : (or offline ISO image ) CUDA 9.1 : OpenCV 3.3.0 : or OpenCV 2.4.13 : OpenCV allows to show image or video detection in the window and store result to file that specified in command line out_filename res.avi GPU with CC > 3.0 : Pre trained models for different cfg files can be downloaded from (smaller > faster & lower quality): yolov3 spp.cfg (240 MB COCO Yolo v3 ) requires 4 GB GPU RAM: yolov3.cfg (236 MB COCO Yolo v3 ) requires 4 GB GPU RAM: yolov3 tiny.cfg (34 MB COCO Yolo v3 tiny ) requires 1 GB GPU RAM: yolov2.cfg (194 MB COCO Yolo v2) requires 4 GB GPU RAM: yolo voc.cfg (194 MB VOC Yolo v2) requires 4 GB GPU RAM: yolov2 tiny.cfg (43 MB COCO Yolo v2) requires 1 GB GPU RAM: yolov2 tiny voc.cfg (60 MB VOC Yolo v2) requires 1 GB GPU RAM: yolo9000.cfg (186 MB Yolo9000 model) requires 4 GB GPU RAM: Put it near compiled: darknet.exe You can get cfg files by path: darknet/cfg/ Examples of results: Everything Is AWESOME Others: Improvements in this repository added support for Windows improved binary neural network performance 2x 4x times for Detection on CPU and GPU if you trained your own weights by using this XNOR net model (bit 1 inference) : improved neural network performance 7% by fusing 2 layers into 1: Convolutional + Batch norm improved neural network performance Detection 3x times , Training 2 x times on GPU Volta (Tesla V100, Titan V, ...) using Tensor Cores if CUDNN_HALF defined in the Makefile or darknet.sln improved performance 1.2x times on FullHD, 2x times on 4K, for detection on the video (file/stream) using darknet detector demo ... improved performance 3.5 X times of data augmentation for training (using OpenCV SSE/AVX functions instead of hand written functions) removes bottleneck for training on multi GPU or GPU Volta improved performance of detection and training on Intel CPU with AVX (Yolo v3 85% , Yolo v2 10%) fixed usage of reorg layer optimized memory allocation during network resizing when random 1 optimized initialization GPU for detection we use batch 1 initially instead of re init with batch 1 added correct calculation of mAP, F1, IoU, Precision Recall using command darknet detector map ... added drawing of chart of average loss during training added calculation of anchors for training added example of Detection and Tracking objects: fixed code for use Web cam on OpenCV 3.x run time tips and warnings if you use incorrect cfg file or dataset many other fixes of code... And added manual How to train Yolo v3/v2 (to detect your custom objects) ( how to train to detect your custom objects) Also, you might be interested in using a simplified repository where is implemented INT8 quantization (+30% speedup and 1% mAP reduced): How to use: Example of usage in cmd files from build\darknet\x64\ : darknet_yolo_v3.cmd initialization with 236 MB Yolo v3 COCO model yolov3.weights & yolov3.cfg and show detection on the image: dog.jpg darknet_voc.cmd initialization with 194 MB VOC model yolo voc.weights & yolo voc.cfg and waiting for entering the name of the image file darknet_demo_voc.cmd initialization with 194 MB VOC model yolo voc.weights & yolo voc.cfg and play your video file which you must rename to: test.mp4 darknet_demo_store.cmd initialization with 194 MB VOC model yolo voc.weights & yolo voc.cfg and play your video file which you must rename to: test.mp4, and store result to: res.avi darknet_net_cam_voc.cmd initialization with 194 MB VOC model, play video from network video camera mjpeg stream (also from you phone) darknet_web_cam_voc.cmd initialization with 194 MB VOC model, play video from Web Camera number 0 darknet_coco_9000.cmd initialization with 186 MB Yolo9000 COCO model, and show detection on the image: dog.jpg darknet_coco_9000_demo.cmd initialization with 186 MB Yolo9000 COCO model, and show detection on the video (if it is present): street4k.mp4, and store result to: res.avi How to use on the command line: On Linux use ./darknet instead of darknet.exe , like this: ./darknet detector test ./cfg/coco.data ./cfg/yolov3.cfg ./yolov3.weights Yolo v3 COCO image: darknet.exe detector test data/coco.data cfg/yolov3.cfg yolov3.weights i 0 thresh 0.25 Alternative method Yolo v3 COCO image: darknet.exe detect cfg/yolov3.cfg yolov3.weights i 0 thresh 0.25 Output coordinates of objects: darknet.exe detector test data/coco.data yolov3.cfg yolov3.weights thresh 0.25 dog.jpg ext_output 194 MB VOC model image: darknet.exe detector test data/voc.data yolo voc.cfg yolo voc.weights i 0 194 MB VOC model video: darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights test.mp4 i 0 194 MB VOC model save result to the file res.avi : darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights test.mp4 i 0 out_filename res.avi Alternative method 194 MB VOC model video: darknet.exe yolo demo yolo voc.cfg yolo voc.weights test.mp4 i 0 43 MB VOC model for video: darknet.exe detector demo data/coco.data cfg/yolov2 tiny.cfg yolov2 tiny.weights test.mp4 i 0 Yolo v3 236 MB COCO for net videocam Smart WebCam: darknet.exe detector demo data/coco.data cfg/yolov3.cfg yolov3.weights i 0 194 MB VOC model for net videocam Smart WebCam: darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights i 0 194 MB VOC model WebCamera 0: darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights c 0 186 MB Yolo9000 image: darknet.exe detector test cfg/combine9k.data yolo9000.cfg yolo9000.weights Remeber to put data/9k.tree and data/coco9k.map under the same folder of your app if you use the cpp api to build an app To process a list of images data/train.txt and save results of detection to result.txt use: darknet.exe detector test data/voc.data yolo voc.cfg yolo voc.weights dont_show ext_output result.txt For using network video camera mjpeg stream with any Android smartphone: 1. Download for Android phone mjpeg stream soft: IP Webcam / Smart WebCam Smart WebCam preferably: IP Webcam: 2. Connect your Android phone to computer by WiFi (through a WiFi router) or USB 3. Start Smart WebCam on your phone 4. Replace the address below, on shown in the phone application (Smart WebCam) and launch: 194 MB COCO model: darknet.exe detector demo data/coco.data yolo.cfg yolo.weights i 0 194 MB VOC model: darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights i 0 How to compile on Linux: Just do make in the darknet directory. Before make, you can set such options in the Makefile : link GPU 1 to build with CUDA to accelerate by using GPU (CUDA should be in /usr/local/cuda ) CUDNN 1 to build with cuDNN v5 v7 to accelerate training by using GPU (cuDNN should be in /usr/local/cudnn ) CUDNN_HALF 1 to build for Tensor Cores (on Titan V / Tesla V100 / DGX 2 and later) speedup Detection 3x, Training 2x OPENCV 1 to build with OpenCV 3.x/2.4.x allows to detect on video files and video streams from network cameras or web cams DEBUG 1 to bould debug version of Yolo OPENMP 1 to build with OpenMP support to accelerate Yolo by using multi core CPU LIBSO 1 to build a library darknet.so and binary runable file uselib that uses this library. Or you can try to run so LD_LIBRARY_PATH ./:$LD_LIBRARY_PATH ./uselib test.mp4 How to use this SO library from your own code you can look at C++ example: or use in such a way: LD_LIBRARY_PATH ./:$LD_LIBRARY_PATH ./uselib data/coco.names cfg/yolov3.cfg yolov3.weights test.mp4 How to compile on Windows: 1. If you have MSVS 2015, CUDA 9.1, cuDNN 7.0 and OpenCV 3.x (with paths: C:\opencv_3.0\opencv\build\include & C:\opencv_3.0\opencv\build\x64\vc14\lib ), then start MSVS, open build\darknet\darknet.sln , set x64 and Release and do the: Build > Build darknet. NOTE: If installing OpenCV, use OpenCV 3.4.0 or earlier. This is a bug in OpenCV 3.4.1 in the C API (see 500 ). 1.1. Find files opencv_world320.dll and opencv_ffmpeg320_64.dll (or opencv_world340.dll and opencv_ffmpeg340_64.dll ) in C:\opencv_3.0\opencv\build\x64\vc14\bin and put it near with darknet.exe 1.2 Check that there are bin and include folders in the C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1 if aren't, then copy them to this folder from the path where is CUDA installed 1.3. To install CUDNN (speedup neural network), do the following: download and install cuDNN 7.0 for CUDA 9.1 : add Windows system variable cudnn with path to CUDNN: 1.4. If you want to build without CUDNN then: open \darknet.sln > (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions, and remove this: CUDNN; 2. If you have other version of CUDA (not 9.1) then open build\darknet\darknet.vcxproj by using Notepad, find 2 places with CUDA 9.1 and change it to your CUDA version, then do step 1 3. If you don't have GPU , but have MSVS 2015 and OpenCV 3.0 (with paths: C:\opencv_3.0\opencv\build\include & C:\opencv_3.0\opencv\build\x64\vc14\lib ), then start MSVS, open build\darknet\darknet_no_gpu.sln , set x64 and Release , and do the: Build > Build darknet_no_gpu 4. If you have OpenCV 2.4.13 instead of 3.0 then you should change pathes after \darknet.sln is opened 4.1 (right click on project) > properties > C/C++ > General > Additional Include Directories: C:\opencv_2.4.13\opencv\build\include 4.2 (right click on project) > properties > Linker > General > Additional Library Directories: C:\opencv_2.4.13\opencv\build\x64\vc14\lib 5. If you have GPU with Tensor Cores (nVidia Titan V / Tesla V100 / DGX 2 and later) speedup Detection 3x, Training 2x: \darknet.sln > (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions, and add here: CUDNN_HALF; Note: CUDA must be installed only after that MSVS2015 had been installed. How to compile (custom): Also, you can to create your own darknet.sln & darknet.vcxproj , this example for CUDA 9.1 and OpenCV 3.0 Then add to your created project: (right click on project) > properties > C/C++ > General > Additional Include Directories, put here: C:\opencv_3.0\opencv\build\include;..\..\3rdparty\include;%(AdditionalIncludeDirectories);$(CudaToolkitIncludeDir);$(cudnn)\include (right click on project) > Build dependecies > Build Customizations > set check on CUDA 9.1 or what version you have for example as here: add to project all .c & .cu files and file from \src (right click on project) > properties > Linker > General > Additional Library Directories, put here: C:\opencv_3.0\opencv\build\x64\vc14\lib;$(CUDA_PATH)lib\$(PlatformName);$(cudnn)\lib\x64;%(AdditionalLibraryDirectories) (right click on project) > properties > Linker > Input > Additional dependecies, put here: ..\..\3rdparty\lib\x64\pthreadVC2.lib;cublas.lib;curand.lib;cudart.lib;cudnn.lib;%(AdditionalDependencies) (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions OPENCV;_TIMESPEC_DEFINED;_CRT_SECURE_NO_WARNINGS;_CRT_RAND_S;WIN32;NDEBUG;_CONSOLE;_LIB;%(PreprocessorDefinitions) compile to .exe (X64 & Release) and put .dll s near with .exe: pthreadVC2.dll, pthreadGC2.dll from \3rdparty\dll\x64 cusolver64_91.dll, curand64_91.dll, cudart64_91.dll, cublas64_91.dll 91 for CUDA 9.1 or your version, from C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1\bin For OpenCV 3.2: opencv_world320.dll and opencv_ffmpeg320_64.dll from C:\opencv_3.0\opencv\build\x64\vc14\bin For OpenCV 2.4.13: opencv_core2413.dll , opencv_highgui2413.dll and opencv_ffmpeg2413_64.dll from C:\opencv_2.4.13\opencv\build\x64\vc14\bin How to train (Pascal VOC Data): 1. Download pre trained weights for the convolutional layers (154 MB): and put to the directory build\darknet\x64 2. Download The Pascal VOC Data and unpack it to directory build\darknet\x64\data\voc will be created dir build\darknet\x64\data\voc\VOCdevkit\ : 2.1 Download file voc_label.py to dir build\darknet\x64\data\voc : 3. Download and install Python for Windows: 4. Run command: python build\darknet\x64\data\voc\voc_label.py (to generate files: 2007_test.txt, 2007_train.txt, 2007_val.txt, 2012_train.txt, 2012_val.txt) 5. Run command: type 2007_train.txt 2007_val.txt 2012_ .txt > train.txt 6. Set batch 64 and subdivisions 8 in the file yolov3 voc.cfg : link 7. Start training by using train_voc.cmd or by using the command line: darknet.exe detector train data/voc.data cfg/yolov3 voc.cfg darknet53.conv.74 ( Note: To disable Loss Window use flag dont_show . If you are using CPU, try darknet_no_gpu.exe instead of darknet.exe .) If required change pathes in the file build\darknet\x64\data\voc.data More information about training by the link: Note: If during training you see nan values for avg (loss) field then training goes wrong, but if nan is in some other lines then training goes well. How to train with multi GPU: 1. Train it first on 1 GPU for like 1000 iterations: darknet.exe detector train data/voc.data cfg/yolov3 voc.cfg darknet53.conv.74 2. Adjust the learning rate ( cfg/yolov3 voc.cfg ) to fit the amount of GPUs. The learning rate should be equal to 0.001 , regardless of how many GPUs are used for training. So learning_rate GPUs 0.001 . For 4 GPUs adjust the value to learning_rate 0.00025 . 3. For 4xGPUs increase 4x times burn_in and max_batches in your cfg file. I.e. use burn_in 4000 instead of 1000 . 4. Then stop and by using partially trained model /backup/yolov3 voc_1000.weights run training with multigpu (up to 4 GPUs): darknet.exe detector train data/voc.data cfg/yolov3 voc.cfg /backup/yolov3 voc_1000.weights gpus 0,1,2,3 How to train (to detect your custom objects): (to train old Yolo v2 yolov2 voc.cfg , yolov2 tiny voc.cfg , yolo voc.cfg , yolo voc.2.0.cfg , ... click by the link ) Training Yolo v3: 1. Create file yolo obj.cfg with the same content as in yolov3.cfg (or copy yolov3.cfg to yolo obj.cfg) and: change line batch to batch 64 change line subdivisions to subdivisions 8 change line classes 80 to your number of objects in each of 3 yolo layers: change filters 255 to filters (classes + 5)x3 in the 3 convolutional before each yolo layer So if classes 1 then should be filters 18 . If classes 2 then write filters 21 . (Do not write in the cfg file: filters (classes + 5)x3) (Generally filters depends on the classes , coords and number of mask s, i.e. filters (classes + coords + 1) , where mask is indices of anchors. If mask is absence, then filters (classes + coords + 1) num ) So for example, for 2 objects, your file yolo obj.cfg should differ from yolov3.cfg in such lines in each of 3 yolo layers: convolutional filters 21 region classes 2 2. Create file obj.names in the directory build\darknet\x64\data\ , with objects names each in new line 3. Create file obj.data in the directory build\darknet\x64\data\ , containing (where classes number of objects ): classes 2 train data/train.txt valid data/test.txt names data/obj.names backup backup/ 4. Put image files (.jpg) of your objects in the directory build\darknet\x64\data\obj\ 5. You should label each object on images from your dataset. Use this visual GUI software for marking bounded boxes of objects and generating annotation files for Yolo v2 & v3: It will create .txt file for each .jpg image file in the same directory and with the same name, but with .txt extension, and put to file: object number and object coordinates on this image, for each object in new line: Where: integer object number from 0 to (classes 1) float values relative to width and height of image, it can be equal from (0.0 to 1.0 for example: / or / atention: are center of rectangle (are not top left corner) For example for img1.jpg you will be created img1.txt containing: 1 0.716797 0.395833 0.216406 0.147222 0 0.687109 0.379167 0.255469 0.158333 1 0.420312 0.395833 0.140625 0.166667 6. Create file train.txt in directory build\darknet\x64\data\ , with filenames of your images, each filename in new line, with path relative to darknet.exe , for example containing: data/obj/img1.jpg data/obj/img2.jpg data/obj/img3.jpg 7. Download pre trained weights for the convolutional layers (154 MB): and put to the directory build\darknet\x64 8. Start training by using the command line: darknet.exe detector train data/obj.data yolo obj.cfg darknet53.conv.74 (file yolo obj_xxx.weights will be saved to the build\darknet\x64\backup\ for each 100 iterations) (To disable Loss Window use darknet.exe detector train data/obj.data yolo obj.cfg darknet53.conv.74 dont_show , if you train on computer without monitor like a cloud Amazaon EC2) 9. After training is complete get result yolo obj_final.weights from path build\darknet\x64\backup\ After each 100 iterations you can stop and later start training from this point. For example, after 2000 iterations you can stop training, and later just copy yolo obj_2000.weights from build\darknet\x64\backup\ to build\darknet\x64\ and start training using: darknet.exe detector train data/obj.data yolo obj.cfg yolo obj_2000.weights (in the original repository the weights file is saved only once every 10 000 iterations if(iterations > 1000) ) Also you can get result earlier than all 45000 iterations. Note: If during training you see nan values for avg (loss) field then training goes wrong, but if nan is in some other lines then training goes well. Note: If you changed width or height in your cfg file, then new width and height must be divisible by 32. Note: After training use such command for detection: darknet.exe detector test data/obj.data yolo obj.cfg yolo obj_8000.weights Note: if error Out of memory occurs then in .cfg file you should increase subdivisions 16 , 32 or 64: link How to train tiny yolo (to detect your custom objects): Do all the same steps as for the full yolo model as described above. With the exception of: Download default weights file for yolov3 tiny: Get pre trained weights yolov3 tiny.conv.15 using command: darknet.exe partial cfg/yolov3 tiny.cfg yolov3 tiny.weights yolov3 tiny.conv.15 15 Make your custom model yolov3 tiny obj.cfg based on cfg/yolov3 tiny_obj.cfg instead of yolov3.cfg Start training: darknet.exe detector train data/obj.data yolov3 tiny obj.cfg yolov3 tiny.conv.15 For training Yolo based on other models ( DenseNet201 Yolo or ResNet50 Yolo ), you can download and get pre trained weights as showed in this file: If you made you custom model that isn't based on other models, then you can train it without pre trained weights, then will be used random initial weights. When should I stop training: Usually sufficient 2000 iterations for each class(object). But for a more precise definition when you should stop training, use the following manual: 1. During training, you will see varying indicators of error, and you should stop when no longer decreases 0.XXXXXXX avg : > Region Avg IOU: 0.798363, Class: 0.893232, Obj: 0.700808, No Obj: 0.004567, Avg Recall: 1.000000, count: 8 > Region Avg IOU: 0.800677, Class: 0.892181, Obj: 0.701590, No Obj: 0.004574, Avg Recall: 1.000000, count: 8 > > 9002 : 0.211667, 0.060730 avg , 0.001000 rate, 3.868000 seconds, 576128 images > Loaded: 0.000000 seconds 9002 iteration number (number of batch) 0.060730 avg average loss (error) the lower, the better When you see that average loss 0.xxxxxx avg no longer decreases at many iterations then you should stop training. 2. Once training is stopped, you should take some of last .weights files from darknet\build\darknet\x64\backup and choose the best of them: For example, you stopped training after 9000 iterations, but the best result can give one of previous weights (7000, 8000, 9000). It can happen due to overfitting. Overfitting is case when you can detect objects on images from training dataset, but can't detect objects on any others images. You should get weights from Early Stopping Point : ! Overfitting To get weights from Early Stopping Point: 2.1. At first, in your file obj.data you must specify the path to the validation dataset valid valid.txt (format of valid.txt as in train.txt ), and if you haven't validation images, just copy data\train.txt to data\valid.txt . 2.2 If training is stopped after 9000 iterations, to validate some of previous weights use this commands: (If you use another GitHub repository, then use darknet.exe detector recall ... instead of darknet.exe detector map ...) darknet.exe detector map data/obj.data yolo obj.cfg backup\yolo obj_7000.weights darknet.exe detector map data/obj.data yolo obj.cfg backup\yolo obj_8000.weights darknet.exe detector map data/obj.data yolo obj.cfg backup\yolo obj_9000.weights And comapre last output lines for each weights (7000, 8000, 9000): Choose weights file with the highest IoU (intersect of union) and mAP (mean average precision) For example, bigger IOU gives weights yolo obj_8000.weights then use this weights for detection . Example of custom object detection: darknet.exe detector test data/obj.data yolo obj.cfg yolo obj_8000.weights IoU (intersect of union) average instersect of union of objects and detections for a certain threshold 0.24 mAP (mean average precision) mean value of average precisions for each class, where average precision is average value of 11 points on PR curve for each possible threshold (each probability of detection) for the same class (Precision Recall in terms of PascalVOC, where Precision TP/(TP+FP) and Recall TP/(TP+FN) ), page 11: mAP is default metric of precision in the PascalVOC competition, this is the same as AP50 metric in the MS COCO competition. In terms of Wiki, indicators Precision and Recall have a slightly different meaning than in the PascalVOC competition, but IoU always has the same meaning . ! precision_recall_iou How to calculate mAP on PascalVOC 2007: 1. To calculate mAP (mean average precision) on PascalVOC 2007 test: Download PascalVOC dataset, install Python 3.x and get file 2007_test.txt as described here: Then download file to the dir build\darknet\x64\data\ then run voc_label_difficult.py to get the file difficult_2007_test.txt Remove symbol from this line to un comment it: Then there are 2 ways to get mAP: 1. Using Darknet + Python: run the file build/darknet/x64/calc_mAP_voc_py.cmd you will get mAP for yolo voc.cfg model, mAP 75.9% 2. Using this fork of Darknet: run the file build/darknet/x64/calc_mAP.cmd you will get mAP for yolo voc.cfg model, mAP 75.8% (The article specifies the value of mAP 76.8% for YOLOv2 416×416, page 4 table 3: We get values lower perhaps due to the fact that the model was trained on a slightly different source code than the code on which the detection is was done) if you want to get mAP for tiny yolo voc.cfg model, then un comment line for tiny yolo voc.cfg and comment line for yolo voc.cfg in the .cmd file if you have Python 2.x instead of Python 3.x, and if you use Darknet+Python way to get mAP, then in your cmd file use reval_voc.py and voc_eval.py instead of reval_voc_py3.py and voc_eval_py3.py from this directory: Custom object detection: Example of custom object detection: darknet.exe detector test data/obj.data yolo obj.cfg yolo obj_8000.weights ! Yolo_v2_training ! Yolo_v2_training How to improve object detection: 1. Before training: set flag random 1 in your .cfg file it will increase precision by training Yolo for different resolutions: link increase network resolution in your .cfg file ( height 608 , width 608 or any value multiple of 32) it will increase precision recalculate anchors for your dataset for width and height from cfg file: darknet.exe detector calc_anchors data/obj.data num_of_clusters 9 width 416 height 416 then set the same 9 anchors in each of 3 yolo layers in your cfg file check that each object are mandatory labeled in your dataset no one object in your data set should not be without label. In the most training issues there are wrong labels in your dataset (got labels by using some conversion script, marked with a third party tool, ...). Always check your dataset by using: desirable that your training dataset include images with objects at diffrent: scales, rotations, lightings, from different sides, on different backgrounds you should preferably have 2000 different images for each class or more, and you should train 2000 classes iterations or more desirable that your training dataset include images with non labeled objects that you do not want to detect negative samples without bounded box (empty .txt files) use as many images of negative samples as there are images with objects for training with a large number of objects in each image, add the parameter max 200 or higher value in the last yolo layer or region layer in your cfg file (the global maximum number of objects that can be detected by YoloV3 is 0,0615234375 (width height) where are width and height are parameters from net section in cfg file) for training for small objects set layers 1, 11 instead of and set stride 4 instead of If you train the model to distinguish Left and Right objects as separate classes (left/right hand, left/right turn on road signs, ...) then for disabling flip data augmentation add flip 0 here: General rule your training dataset should include such a set of relative sizes of objects that you want to detect: train_network_width train_obj_width / train_image_width detection_network_width detection_obj_width / detection_image_width train_network_height train_obj_height / train_image_height detection_network_height detection_obj_height / detection_image_height to speedup training (with decreasing detection accuracy) do Fine Tuning instead of Transfer Learning, set param stopbackward 1 here: 2. After training for detection: Increase network resolution by set in your .cfg file ( height 608 and width 608 ) or ( height 832 and width 832 ) or (any value multiple of 32) this increases the precision and makes it possible to detect small objects: link you do not need to train the network again, just use .weights file already trained for 416x416 resolution if error Out of memory occurs then in .cfg file you should increase subdivisions 16 , 32 or 64: link How to mark bounded boxes of objects and create annotation files: Here you can find repository with GUI software for marking bounded boxes of objects and generating annotation files for Yolo v2 & v3: With example of: train.txt , obj.names , obj.data , yolo obj.cfg , air 1 6 .txt , bird 1 4 .txt for 2 classes of objects (air, bird) and train_obj.cmd with example how to train this image set with Yolo v2 & v3 Using Yolo9000 Simultaneous detection and classification of 9000 objects: yolo9000.weights (186 MB Yolo9000 Model) requires 4 GB GPU RAM: yolo9000.cfg cfg file of the Yolo9000, also there are paths to the 9k.tree and coco9k.map 9k.tree WordTree of 9418 categories , if parent_id 1 then this label hasn't parent: coco9k.map map 80 categories from MSCOCO to WordTree 9k.tree : combine9k.data data file, there are paths to: 9k.labels , 9k.names , inet9k.map , (change path to your combine9k.train.list ): 9k.labels 9418 labels of objects: 9k.names 9418 names of objects: inet9k.map map 200 categories from ImageNet to WordTree 9k.tree : How to use Yolo as DLL 1. To compile Yolo as C++ DLL file yolo_cpp_dll.dll open in MSVS2015 file build\darknet\yolo_cpp_dll.sln , set x64 and Release , and do the: Build > Build yolo_cpp_dll You should have installed CUDA 9.1 To use cuDNN do: (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions, and add at the beginning of line: CUDNN; 2. To use Yolo as DLL file in your C++ console application open in MSVS2015 file build\darknet\yolo_console_dll.sln , set x64 and Release , and do the: Build > Build yolo_console_dll you can run your console application from Windows Explorer build\darknet\x64\yolo_console_dll.exe use this command : yolo_console_dll.exe data/coco.names yolov3.cfg yolov3.weights test.mp4 or you can run from MSVS2015 (before this you should copy 2 files yolo voc.cfg and yolo voc.weights to the directory build\darknet\ ) after launching your console application and entering the image file name you will see info for each object: to use simple OpenCV GUI you should uncomment line // define OPENCV in yolo_console_dll.cpp file: link you can see source code of simple example for detection on the video file: link yolo_cpp_dll.dll API: link class Detector { public: Detector(std::string cfg_filename, std::string weight_filename, int gpu_id 0); Detector(); std::vector detect(std::string image_filename, float thresh 0.2, bool use_mean false); std::vector detect(image_t img, float thresh 0.2, bool use_mean false); static image_t load_image(std::string image_filename); static void free_image(image_t m); ifdef OPENCV std::vector detect(cv::Mat mat, float thresh 0.2, bool use_mean false); endif };",Object Detection,Object Detection 2547,Computer Vision,Computer Vision,Computer Vision,"TensorFlow Models This repository contains a number of different models implemented in TensorFlow : The official models (official) are a collection of example models that use TensorFlow's high level APIs. They are intended to be well maintained, tested, and kept up to date with the latest stable TensorFlow API. They should also be reasonably optimized for fast performance while still being easy to read. We especially recommend newer TensorFlow users to start here. The research models are a large collection of models implemented in TensorFlow by researchers. They are not officially supported or available in release branches; it is up to the individual researchers to maintain the models and/or provide support on issues and pull requests. The samples folder (samples) contains code snippets and smaller models that demonstrate features of TensorFlow, including code presented in various blog posts. The tutorials folder (tutorials) is a collection of models described in the TensorFlow tutorials . Contribution guidelines If you want to contribute to models, be sure to review the contribution guidelines (CONTRIBUTING.md). License Apache License 2.0 (LICENSE)",Object Detection,Object Detection 2550,Computer Vision,Computer Vision,Computer Vision,"TensorFlow Models This repository contains a number of different models implemented in TensorFlow : The official models (official) are a collection of example models that use TensorFlow's high level APIs. They are intended to be well maintained, tested, and kept up to date with the latest stable TensorFlow API. They should also be reasonably optimized for fast performance while still being easy to read. We especially recommend newer TensorFlow users to start here. The research models are a large collection of models implemented in TensorFlow by researchers. They are not officially supported or available in release branches; it is up to the individual researchers to maintain the models and/or provide support on issues and pull requests. The samples folder (samples) contains code snippets and smaller models that demonstrate features of TensorFlow, including code presented in various blog posts. The tutorials folder (tutorials) is a collection of models described in the TensorFlow tutorials . Contribution guidelines If you want to contribute to models, be sure to review the contribution guidelines (CONTRIBUTING.md). License Apache License 2.0 (LICENSE)",Object Detection,Object Detection 2552,Computer Vision,Computer Vision,Computer Vision,"TensorFlow Models This repository contains a number of different models implemented in TensorFlow : The official models (official) are a collection of example models that use TensorFlow's high level APIs. They are intended to be well maintained, tested, and kept up to date with the latest stable TensorFlow API. They should also be reasonably optimized for fast performance while still being easy to read. We especially recommend newer TensorFlow users to start here. The research models are a large collection of models implemented in TensorFlow by researchers. They are not officially supported or available in release branches; it is up to the individual researchers to maintain the models and/or provide support on issues and pull requests. The samples folder (samples) contains code snippets and smaller models that demonstrate features of TensorFlow, including code presented in various blog posts. The tutorials folder (tutorials) is a collection of models described in the TensorFlow tutorials . Contribution guidelines If you want to contribute to models, be sure to review the contribution guidelines (CONTRIBUTING.md). License Apache License 2.0 (LICENSE)",Object Detection,Object Detection 2563,Computer Vision,Computer Vision,Computer Vision,"This project is form Ultralytics. Thank for her excellent work. Introduction This directory contains python software and an iOS App developed by Ultralytics LLC, and is freely available for redistribution under the GPL 3.0 license . For more information on Ultralytics projects please visit: Description The repo contains inference and training code for YOLOv3 in PyTorch. The code works on Linux, MacOS and Windows. Training is done on the COCO dataset by default: Credit to Joseph Redmon for YOLO and to Erik Lindernoren for the PyTorch implementation this work is based on . Requirements Python 3.7 or later with the following pip3 install U r requirements.txt packages: numpy torch > 1.0.0 opencv python Training Start Training: Run train.py to begin training after downloading COCO data with data/get_coco_dataset.sh . Training runs about 1 hour per COCO epoch on a 1080 Ti. Resume Training: Run train.py resume to resume training from the most recently saved checkpoint weights/latest.pt . Each epoch trains on 120,000 images from the train and validate COCO sets, and tests on 5000 images from the COCO validate set. Default training settings produce loss plots below, with training speed of 0.6 s/batch on a 1080 Ti (15 epochs/day) or 0.45 s/batch on a 2080 Ti. ! Alt Image Augmentation datasets.py applies random OpenCV powered augmentation to the input images in accordance with the following specifications. Augmentation is applied only during training, not during inference. Bounding boxes are automatically tracked and updated with the images. 416 x 416 examples pictured below. Augmentation Description Translation +/ 10% (vertical and horizontal) Rotation +/ 5 degrees Shear +/ 2 degrees (vertical and horizontal) Scale +/ 10% Reflection 50% probability (horizontal only) H S V Saturation +/ 50% HS V Intensity +/ 50% Inference Run detect.py to apply trained weights to an image, such as zidane.jpg from the data/samples folder: YOLOv3: detect.py cfg cfg/yolov3.cfg weights weights/yolov3.pt YOLOv3 tiny: detect.py cfg cfg/yolov3 tiny.cfg weights weights/yolov3 tiny.pt Webcam Run detect.py with webcam True to show a live webcam feed. Pretrained Weights Download official YOLOv3 weights: Darknet format: PyTorch format: Validation mAP Run test.py to validate the official YOLOv3 weights weights/yolov3.weights against the 5000 validation images. You should obtain a .584 mAP at img size 416 , or .586 at img size 608 using this repo, compared to .579 at 608 x 608 reported in darknet . Run test.py weights weights/latest.pt to validate against the latest training results. Default training settings produce a 0.522 mAP at epoch 62. Hyperparameter settings and loss equation changes affect these results significantly, and additional trade studies may be needed to further improve this. Contact For questions or comments please contact Glenn Jocher at glenn.jocher@ultralytics.com or visit us at",Object Detection,Object Detection 2564,Computer Vision,Computer Vision,Computer Vision,"Introduction This directory contains python software and an iOS App developed by Ultralytics LLC, and is freely available for redistribution under the GPL 3.0 license . For more information on Ultralytics projects please visit: Description The repo contains inference and training code for YOLOv3 in PyTorch. The code works on Linux, MacOS and Windows. Training is done on the COCO dataset by default: Credit to Joseph Redmon for YOLO and to Erik Lindernoren for the PyTorch implementation this work is based on . Requirements Python 3.7 or later with the following pip3 install U r requirements.txt packages: numpy torch > 1.0.0 opencv python Training Start Training: Run train.py to begin training after downloading COCO data with data/get_coco_dataset.sh . Training runs about 1 hour per COCO epoch on a 1080 Ti. Resume Training: Run train.py resume to resume training from the most recently saved checkpoint weights/latest.pt . Each epoch trains on 120,000 images from the train and validate COCO sets, and tests on 5000 images from the COCO validate set. Default training settings produce loss plots below, with training speed of 0.6 s/batch on a 1080 Ti (15 epochs/day) or 0.45 s/batch on a 2080 Ti. ! Alt Image Augmentation datasets.py applies random OpenCV powered augmentation to the input images in accordance with the following specifications. Augmentation is applied only during training, not during inference. Bounding boxes are automatically tracked and updated with the images. 416 x 416 examples pictured below. Augmentation Description Translation +/ 10% (vertical and horizontal) Rotation +/ 5 degrees Shear +/ 2 degrees (vertical and horizontal) Scale +/ 10% Reflection 50% probability (horizontal only) H S V Saturation +/ 50% HS V Intensity +/ 50% Inference Run detect.py to apply trained weights to an image, such as zidane.jpg from the data/samples folder: YOLOv3: detect.py cfg cfg/yolov3.cfg weights weights/yolov3.pt YOLOv3 tiny: detect.py cfg cfg/yolov3 tiny.cfg weights weights/yolov3 tiny.pt Webcam Run detect.py with webcam True to show a live webcam feed. Pretrained Weights Download official YOLOv3 weights: Darknet format: PyTorch format: Validation mAP Run test.py to validate the official YOLOv3 weights weights/yolov3.weights against the 5000 validation images. You should obtain a .584 mAP at img size 416 , or .586 at img size 608 using this repo, compared to .579 at 608 x 608 reported in darknet . Run test.py weights weights/latest.pt to validate against the latest training results. Default training settings produce a 0.522 mAP at epoch 62. Hyperparameter settings and loss equation changes affect these results significantly, and additional trade studies may be needed to further improve this. Contact For questions or comments please contact Glenn Jocher at glenn.jocher@ultralytics.com or visit us at",Object Detection,Object Detection 2575,Computer Vision,Computer Vision,Computer Vision,"Intro Build Status codecov Real time object detection and classification. Paper: version 1 , version 2 . Read more about YOLO (in darknet) and download weight files here . In case the weight file cannot be found, I uploaded some of mine here , which include yolo full and yolo tiny of v1.0, tiny yolo v1.1 of v1.1 and yolo , tiny yolo voc of v2. See demo below or see on this imgur Dependencies Python3, tensorflow 1.0, numpy, opencv 3. Getting started You can choose _one_ of the following three ways to get started with darkflow. 1. Just build the Cython extensions in place. NOTE: If installing this way you will have to use ./flow in the cloned darkflow directory instead of flow as darkflow is not installed globally. python3 setup.py build_ext inplace 2. Let pip install darkflow globally in dev mode (still globally accessible, but changes to the code immediately take effect) pip install e . 3. Install with pip globally pip install . Update Android demo on Tensorflow's here I am looking for help: help wanted labels in issue track Parsing the annotations Skip this if you are not training or fine tuning anything (you simply want to forward flow a trained net) For example, if you want to work with only 3 classes tvmonitor , person , pottedplant ; edit labels.txt as follows tvmonitor person pottedplant And that's it. darkflow will take care of the rest. You can also set darkflow to load from a custom labels file with the labels flag (i.e. labels myOtherLabelsFile.txt ). This can be helpful when working with multiple models with different sets of output labels. When this flag is not set, darkflow will load from labels.txt by default (unless you are using one of the recognized .cfg files designed for the COCO or VOC dataset then the labels file will be ignored and the COCO or VOC labels will be loaded). Design the net Skip this if you are working with one of the original configurations since they are already there. Otherwise, see the following example: python ... convolutional batch_normalize 1 size 3 stride 1 pad 1 activation leaky maxpool connected output 4096 activation linear ... Flowing the graph using flow bash Have a look at its options flow h First, let's take a closer look at one of a very useful option load bash 1. Load tiny yolo.weights flow model cfg/tiny yolo.cfg load bin/tiny yolo.weights 2. To completely initialize a model, leave the load option flow model cfg/yolo new.cfg 3. It is useful to reuse the first identical layers of tiny for yolo new flow model cfg/yolo new.cfg load bin/tiny yolo.weights this will print out which layers are reused, which are initialized All input images from default folder sample_img/ are flowed through the net and predictions are put in sample_img/out/ . We can always specify more parameters for such forward passes, such as detection threshold, batch size, images folder, etc. bash Forward all images in sample_img/ using tiny yolo and 100% GPU usage flow imgdir sample_img/ model cfg/tiny yolo.cfg load bin/tiny yolo.weights gpu 1.0 json output can be generated with descriptions of the pixel location of each bounding box and the pixel location. Each prediction is stored in the sample_img/out folder by default. An example json array is shown below. bash Forward all images in sample_img/ using tiny yolo and JSON output. flow imgdir sample_img/ model cfg/tiny yolo.cfg load bin/tiny yolo.weights json JSON output: json { label : person , confidence : 0.56, topleft : { x : 184, y : 101}, bottomright : { x : 274, y : 382}}, { label : dog , confidence : 0.32, topleft : { x : 71, y : 263}, bottomright : { x : 193, y : 353}}, { label : horse , confidence : 0.76, topleft : { x : 412, y : 109}, bottomright : { x : 592, y : 337}} label: self explanatory confidence: somewhere between 0 and 1 (how confident yolo is about that detection) topleft: pixel coordinate of top left corner of box. bottomright: pixel coordinate of bottom right corner of box. Training new model Training is simple as you only have to add option train . Training set and annotation will be parsed if this is the first time a new configuration is trained. To point to training set and annotations, use option dataset and annotation . A few examples: bash Initialize yolo new from yolo tiny, then train the net on 100% GPU: flow model cfg/yolo new.cfg load bin/tiny yolo.weights train gpu 1.0 Completely initialize yolo new and train it with ADAM optimizer flow model cfg/yolo new.cfg train trainer adam During training, the script will occasionally save intermediate results into Tensorflow checkpoints, stored in ckpt/ . To resume to any checkpoint before performing training/testing, use load checkpoint_num option, if checkpoint_num < 0 , darkflow will load the most recent save by parsing ckpt/checkpoint . bash Resume the most recent checkpoint for training flow train model cfg/yolo new.cfg load 1 Test with checkpoint at step 1500 flow model cfg/yolo new.cfg load 1500 Fine tuning yolo tiny from the original one flow train model cfg/tiny yolo.cfg load bin/tiny yolo.weights Example of training on Pascal VOC 2007: bash Download the Pascal VOC dataset: curl O tar xf VOCtest_06 Nov 2007.tar An example of the Pascal VOC annotation format: vim VOCdevkit/VOC2007/Annotations/000001.xml Train the net on the Pascal dataset: flow model cfg/yolo new.cfg train dataset /VOCdevkit/VOC2007/JPEGImages annotation /VOCdevkit/VOC2007/Annotations Training on your own dataset The steps below assume we want to use tiny YOLO and our dataset has 3 classes 1. Create a copy of the configuration file tiny yolo voc.cfg and rename it according to your preference tiny yolo voc 3c.cfg (It is crucial that you leave the original tiny yolo voc.cfg file unchanged, see below for explanation). 2. In tiny yolo voc 3c.cfg , change classes in the region layer (the last layer) to the number of classes you are going to train for. In our case, classes are set to 3. python ... region anchors 1.08,1.19, 3.42,4.41, 6.63,11.38, 9.42,5.11, 16.62,10.52 bias_match 1 classes 3 coords 4 num 5 softmax 1 ... 3. In tiny yolo voc 3c.cfg , change filters in the convolutional layer (the second to last layer) to num (classes + 5). In our case, num is 5 and classes are 3 so 5 (3 + 5) 40 therefore filters are set to 40. python ... convolutional size 1 stride 1 pad 1 filters 40 activation linear region anchors 1.08,1.19, 3.42,4.41, 6.63,11.38, 9.42,5.11, 16.62,10.52 ... 4. Change labels.txt to include the label(s) you want to train on (number of labels should be the same as the number of classes you set in tiny yolo voc 3c.cfg file). In our case, labels.txt will contain 3 labels. label1 label2 label3 5. Reference the tiny yolo voc 3c.cfg model when you train. flow model cfg/tiny yolo voc 3c.cfg load bin/tiny yolo voc.weights train annotation train/Annotations dataset train/Images Why should I leave the original tiny yolo voc.cfg file unchanged? When darkflow sees you are loading tiny yolo voc.weights it will look for tiny yolo voc.cfg in your cfg/ folder and compare that configuration file to the new one you have set with model cfg/tiny yolo voc 3c.cfg . In this case, every layer will have the same exact number of weights except for the last two, so it will load the weights into all layers up to the last two because they now contain different number of weights. Camera/video file demo For a demo that entirely runs on the CPU: bash flow model cfg/yolo new.cfg load bin/yolo new.weights demo videofile.avi For a demo that runs 100% on the GPU: bash flow model cfg/yolo new.cfg load bin/yolo new.weights demo videofile.avi gpu 1.0 To use your webcam/camera, simply replace videofile.avi with keyword camera . To save a video with predicted bounding box, add saveVideo option. Using darkflow from another python application Please note that return_predict(img) must take an numpy.ndarray . Your image must be loaded beforehand and passed to return_predict(img) . Passing the file path won't work. Result from return_predict(img) will be a list of dictionaries representing each detected object's values in the same format as the JSON output listed above. python from darkflow.net.build import TFNet import cv2 options { model : cfg/yolo.cfg , load : bin/yolo.weights , threshold : 0.1} tfnet TFNet(options) imgcv cv2.imread( ./sample_img/sample_dog.jpg ) result tfnet.return_predict(imgcv) print(result) Save the built graph to a protobuf file ( .pb ) bash Saving the lastest checkpoint to protobuf file flow model cfg/yolo new.cfg load 1 savepb Saving graph and weights to protobuf file flow model cfg/yolo.cfg load bin/yolo.weights savepb When saving the .pb file, a .meta file will also be generated alongside it. This .meta file is a JSON dump of everything in the meta dictionary that contains information nessecary for post processing such as anchors and labels . This way, everything you need to make predictions from the graph and do post processing is contained in those two files no need to have the .cfg or any labels file tagging along. The created .pb file can be used to migrate the graph to mobile devices (JAVA / C++ / Objective C++). The name of input tensor and output tensor are respectively 'input' and 'output' . For further usage of this protobuf file, please refer to the official documentation of Tensorflow on C++ API _here_ . To run it on, say, iOS application, simply add the file to Bundle Resources and update the path to this file inside source code. Also, darkflow supports loading from a .pb and .meta file for generating predictions (instead of loading from a .cfg and checkpoint or .weights ). bash Forward images in sample_img for predictions based on protobuf file flow pbLoad built_graph/yolo.pb metaLoad built_graph/yolo.meta imgdir sample_img/ If you'd like to load a .pb and .meta file when using return_predict() you can set the pbLoad and metaLoad options in place of the model and load options you would normally set. That's all.",Object Detection,Object Detection 2576,Computer Vision,Computer Vision,Computer Vision,"Padam This repository contains our pytorch implementation of Partially Adaptive Momentum Estimation method (Padam) in the paper Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks . Prerequisites: Pytorch CUDA Usage: Use python to run run_cnn_test_cifar10.py for experiments on Cifar10 and run_cnn_test_cifar100.py for experiments on Cifar100 Command Line Arguments: lr: (start) learning rate method: optimization method, e.g., sgdm , adam , amsgrad , padam net: network architecture, e.g. vggnet , resnet , wideresnet partial: partially adaptive parameter for Padam method wd: weight decay Nepoch: number of training epochs resume: whether resume from previous training process Usage Examples: Run experiments on Cifar10: bash python run_cnn_test_cifar10.py lr 0.1 method padam net vggnet partial 0.125 wd 5e 4 Run experiments on Cifar100: bash python run_cnn_test_cifar100.py lr 0.1 method padam net resnet partial 0.125 wd 5e 4",Object Detection,Object Detection 2580,Computer Vision,Computer Vision,Computer Vision,"Yolo v3 and Yolo v2 for Windows and Linux (neural network for object detection) Tensor Cores can be used on Linux and Windows CircleCI 0. Improvements in this repository ( improvements in this repository) 1. How to use ( how to use) 2. How to compile on Linux ( how to compile on linux) 3. How to compile on Windows ( how to compile on windows) 4. How to train (Pascal VOC Data) ( how to train pascal voc data) 5. How to train (to detect your custom objects) ( how to train to detect your custom objects) 6. When should I stop training ( when should i stop training) 7. How to calculate mAP on PascalVOC 2007 ( how to calculate map on pascalvoc 2007) 8. How to improve object detection ( how to improve object detection) 9. How to mark bounded boxes of objects and create annotation files ( how to mark bounded boxes of objects and create annotation files) 10. Using Yolo9000 ( using yolo9000) 11. How to use Yolo as DLL ( how to use yolo as dll) ! Darknet Logo ! map_fps mAP (AP50) YOLOv3 spp (is not indicated) better than YOLOv3 mAP 60.6%, FPS 20: Yolo v3 source chart for the RetinaNet on MS COCO got from Table 1 (e): Yolo v2 on Pascal VOC 2007: Yolo v2 on Pascal VOC 2012 (comp4): You Only Look Once: Unified, Real Time Object Detection (versions 2 & 3) A Yolo cross platform Windows and Linux version (for object detection). Contributtors: This repository is forked from Linux version: More details: This repository supports: both Windows and Linux both OpenCV 2.x.x and OpenCV 7.5 also create SO library on Linux and DLL library on Windows Requires: Linux GCC> 4.9 or Windows MS Visual Studio 2015 (v140) : (or offline ISO image ) CUDA 9.1 : OpenCV 3.3.0 : or OpenCV 2.4.13 : OpenCV allows to show image or video detection in the window and store result to file that specified in command line out_filename res.avi GPU with CC > 3.0 : Pre trained models for different cfg files can be downloaded from (smaller > faster & lower quality): yolov3 openimages.cfg (247 MB COCO Yolo v3 ) requires 4 GB GPU RAM: yolov3 spp.cfg (240 MB COCO Yolo v3 ) requires 4 GB GPU RAM: yolov3.cfg (236 MB COCO Yolo v3 ) requires 4 GB GPU RAM: yolov3 tiny.cfg (34 MB COCO Yolo v3 tiny ) requires 1 GB GPU RAM: yolov2.cfg (194 MB COCO Yolo v2) requires 4 GB GPU RAM: yolo voc.cfg (194 MB VOC Yolo v2) requires 4 GB GPU RAM: yolov2 tiny.cfg (43 MB COCO Yolo v2) requires 1 GB GPU RAM: yolov2 tiny voc.cfg (60 MB VOC Yolo v2) requires 1 GB GPU RAM: yolo9000.cfg (186 MB Yolo9000 model) requires 4 GB GPU RAM: Put it near compiled: darknet.exe You can get cfg files by path: darknet/cfg/ Examples of results: Everything Is AWESOME Others: Improvements in this repository added support for Windows improved binary neural network performance 2x 4x times for Detection on CPU and GPU if you trained your own weights by using this XNOR net model (bit 1 inference) : improved neural network performance 7% by fusing 2 layers into 1: Convolutional + Batch norm improved neural network performance Detection 3x times , Training 2 x times on GPU Volta (Tesla V100, Titan V, ...) using Tensor Cores if CUDNN_HALF defined in the Makefile or darknet.sln improved performance 1.2x times on FullHD, 2x times on 4K, for detection on the video (file/stream) using darknet detector demo ... improved performance 3.5 X times of data augmentation for training (using OpenCV SSE/AVX functions instead of hand written functions) removes bottleneck for training on multi GPU or GPU Volta improved performance of detection and training on Intel CPU with AVX (Yolo v3 85% , Yolo v2 10%) fixed usage of reorg layer optimized memory allocation during network resizing when random 1 optimized initialization GPU for detection we use batch 1 initially instead of re init with batch 1 added correct calculation of mAP, F1, IoU, Precision Recall using command darknet detector map ... added drawing of chart of average loss during training added calculation of anchors for training added example of Detection and Tracking objects: fixed code for use Web cam on OpenCV 3.x run time tips and warnings if you use incorrect cfg file or dataset many other fixes of code... And added manual How to train Yolo v3/v2 (to detect your custom objects) ( how to train to detect your custom objects) Also, you might be interested in using a simplified repository where is implemented INT8 quantization (+30% speedup and 1% mAP reduced): How to use: Example of usage in cmd files from build\darknet\x64\ : darknet_yolo_v3.cmd initialization with 236 MB Yolo v3 COCO model yolov3.weights & yolov3.cfg and show detection on the image: dog.jpg darknet_voc.cmd initialization with 194 MB VOC model yolo voc.weights & yolo voc.cfg and waiting for entering the name of the image file darknet_demo_voc.cmd initialization with 194 MB VOC model yolo voc.weights & yolo voc.cfg and play your video file which you must rename to: test.mp4 darknet_demo_store.cmd initialization with 194 MB VOC model yolo voc.weights & yolo voc.cfg and play your video file which you must rename to: test.mp4, and store result to: res.avi darknet_net_cam_voc.cmd initialization with 194 MB VOC model, play video from network video camera mjpeg stream (also from you phone) darknet_web_cam_voc.cmd initialization with 194 MB VOC model, play video from Web Camera number 0 darknet_coco_9000.cmd initialization with 186 MB Yolo9000 COCO model, and show detection on the image: dog.jpg darknet_coco_9000_demo.cmd initialization with 186 MB Yolo9000 COCO model, and show detection on the video (if it is present): street4k.mp4, and store result to: res.avi How to use on the command line: On Linux use ./darknet instead of darknet.exe , like this: ./darknet detector test ./cfg/coco.data ./cfg/yolov3.cfg ./yolov3.weights Yolo v3 COCO image: darknet.exe detector test data/coco.data cfg/yolov3.cfg yolov3.weights i 0 thresh 0.25 Alternative method Yolo v3 COCO image: darknet.exe detect cfg/yolov3.cfg yolov3.weights i 0 thresh 0.25 Output coordinates of objects: darknet.exe detector test data/coco.data yolov3.cfg yolov3.weights thresh 0.25 dog.jpg ext_output 194 MB VOC model image: darknet.exe detector test data/voc.data yolo voc.cfg yolo voc.weights i 0 194 MB VOC model video: darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights test.mp4 i 0 194 MB VOC model save result to the file res.avi : darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights test.mp4 i 0 out_filename res.avi Alternative method 194 MB VOC model video: darknet.exe yolo demo yolo voc.cfg yolo voc.weights test.mp4 i 0 43 MB VOC model for video: darknet.exe detector demo data/coco.data cfg/yolov2 tiny.cfg yolov2 tiny.weights test.mp4 i 0 Yolo v3 236 MB COCO for net videocam Smart WebCam: darknet.exe detector demo data/coco.data cfg/yolov3.cfg yolov3.weights i 0 194 MB VOC model for net videocam Smart WebCam: darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights i 0 194 MB VOC model WebCamera 0: darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights c 0 186 MB Yolo9000 image: darknet.exe detector test cfg/combine9k.data yolo9000.cfg yolo9000.weights Remeber to put data/9k.tree and data/coco9k.map under the same folder of your app if you use the cpp api to build an app To process a list of images data/train.txt and save results of detection to result.txt use: darknet.exe detector test data/voc.data yolo voc.cfg yolo voc.weights dont_show ext_output result.txt For using network video camera mjpeg stream with any Android smartphone: 1. Download for Android phone mjpeg stream soft: IP Webcam / Smart WebCam Smart WebCam preferably: IP Webcam: 2. Connect your Android phone to computer by WiFi (through a WiFi router) or USB 3. Start Smart WebCam on your phone 4. Replace the address below, on shown in the phone application (Smart WebCam) and launch: 194 MB COCO model: darknet.exe detector demo data/coco.data yolo.cfg yolo.weights i 0 194 MB VOC model: darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights i 0 How to compile on Linux: Just do make in the darknet directory. Before make, you can set such options in the Makefile : link GPU 1 to build with CUDA to accelerate by using GPU (CUDA should be in /usr/local/cuda ) CUDNN 1 to build with cuDNN v5 v7 to accelerate training by using GPU (cuDNN should be in /usr/local/cudnn ) CUDNN_HALF 1 to build for Tensor Cores (on Titan V / Tesla V100 / DGX 2 and later) speedup Detection 3x, Training 2x OPENCV 1 to build with OpenCV 3.x/2.4.x allows to detect on video files and video streams from network cameras or web cams DEBUG 1 to bould debug version of Yolo OPENMP 1 to build with OpenMP support to accelerate Yolo by using multi core CPU LIBSO 1 to build a library darknet.so and binary runable file uselib that uses this library. Or you can try to run so LD_LIBRARY_PATH ./:$LD_LIBRARY_PATH ./uselib test.mp4 How to use this SO library from your own code you can look at C++ example: or use in such a way: LD_LIBRARY_PATH ./:$LD_LIBRARY_PATH ./uselib data/coco.names cfg/yolov3.cfg yolov3.weights test.mp4 How to compile on Windows: 1. If you have MSVS 2015, CUDA 9.1, cuDNN 7.0 and OpenCV 3.x (with paths: C:\opencv_3.0\opencv\build\include & C:\opencv_3.0\opencv\build\x64\vc14\lib ), then start MSVS, open build\darknet\darknet.sln , set x64 and Release and do the: Build > Build darknet. NOTE: If installing OpenCV, use OpenCV 3.4.0 or earlier. This is a bug in OpenCV 3.4.1 in the C API (see 500 ). 1.1. Find files opencv_world320.dll and opencv_ffmpeg320_64.dll (or opencv_world340.dll and opencv_ffmpeg340_64.dll ) in C:\opencv_3.0\opencv\build\x64\vc14\bin and put it near with darknet.exe 1.2 Check that there are bin and include folders in the C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1 if aren't, then copy them to this folder from the path where is CUDA installed 1.3. To install CUDNN (speedup neural network), do the following: download and install cuDNN 7.0 for CUDA 9.1 : add Windows system variable cudnn with path to CUDNN: 1.4. If you want to build without CUDNN then: open \darknet.sln > (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions, and remove this: CUDNN; 2. If you have other version of CUDA (not 9.1) then open build\darknet\darknet.vcxproj by using Notepad, find 2 places with CUDA 9.1 and change it to your CUDA version, then do step 1 3. If you don't have GPU , but have MSVS 2015 and OpenCV 3.0 (with paths: C:\opencv_3.0\opencv\build\include & C:\opencv_3.0\opencv\build\x64\vc14\lib ), then start MSVS, open build\darknet\darknet_no_gpu.sln , set x64 and Release , and do the: Build > Build darknet_no_gpu 4. If you have OpenCV 2.4.13 instead of 3.0 then you should change pathes after \darknet.sln is opened 4.1 (right click on project) > properties > C/C++ > General > Additional Include Directories: C:\opencv_2.4.13\opencv\build\include 4.2 (right click on project) > properties > Linker > General > Additional Library Directories: C:\opencv_2.4.13\opencv\build\x64\vc14\lib 5. If you have GPU with Tensor Cores (nVidia Titan V / Tesla V100 / DGX 2 and later) speedup Detection 3x, Training 2x: \darknet.sln > (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions, and add here: CUDNN_HALF; Note: CUDA must be installed only after that MSVS2015 had been installed. How to compile (custom): Also, you can to create your own darknet.sln & darknet.vcxproj , this example for CUDA 9.1 and OpenCV 3.0 Then add to your created project: (right click on project) > properties > C/C++ > General > Additional Include Directories, put here: C:\opencv_3.0\opencv\build\include;..\..\3rdparty\include;%(AdditionalIncludeDirectories);$(CudaToolkitIncludeDir);$(cudnn)\include (right click on project) > Build dependecies > Build Customizations > set check on CUDA 9.1 or what version you have for example as here: add to project all .c & .cu files and file from \src (right click on project) > properties > Linker > General > Additional Library Directories, put here: C:\opencv_3.0\opencv\build\x64\vc14\lib;$(CUDA_PATH)lib\$(PlatformName);$(cudnn)\lib\x64;%(AdditionalLibraryDirectories) (right click on project) > properties > Linker > Input > Additional dependecies, put here: ..\..\3rdparty\lib\x64\pthreadVC2.lib;cublas.lib;curand.lib;cudart.lib;cudnn.lib;%(AdditionalDependencies) (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions OPENCV;_TIMESPEC_DEFINED;_CRT_SECURE_NO_WARNINGS;_CRT_RAND_S;WIN32;NDEBUG;_CONSOLE;_LIB;%(PreprocessorDefinitions) compile to .exe (X64 & Release) and put .dll s near with .exe: pthreadVC2.dll, pthreadGC2.dll from \3rdparty\dll\x64 cusolver64_91.dll, curand64_91.dll, cudart64_91.dll, cublas64_91.dll 91 for CUDA 9.1 or your version, from C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1\bin For OpenCV 3.2: opencv_world320.dll and opencv_ffmpeg320_64.dll from C:\opencv_3.0\opencv\build\x64\vc14\bin For OpenCV 2.4.13: opencv_core2413.dll , opencv_highgui2413.dll and opencv_ffmpeg2413_64.dll from C:\opencv_2.4.13\opencv\build\x64\vc14\bin How to train (Pascal VOC Data): 1. Download pre trained weights for the convolutional layers (154 MB): and put to the directory build\darknet\x64 2. Download The Pascal VOC Data and unpack it to directory build\darknet\x64\data\voc will be created dir build\darknet\x64\data\voc\VOCdevkit\ : 2.1 Download file voc_label.py to dir build\darknet\x64\data\voc : 3. Download and install Python for Windows: 4. Run command: python build\darknet\x64\data\voc\voc_label.py (to generate files: 2007_test.txt, 2007_train.txt, 2007_val.txt, 2012_train.txt, 2012_val.txt) 5. Run command: type 2007_train.txt 2007_val.txt 2012_ .txt > train.txt 6. Set batch 64 and subdivisions 8 in the file yolov3 voc.cfg : link 7. Start training by using train_voc.cmd or by using the command line: darknet.exe detector train data/voc.data cfg/yolov3 voc.cfg darknet53.conv.74 ( Note: To disable Loss Window use flag dont_show . If you are using CPU, try darknet_no_gpu.exe instead of darknet.exe .) If required change pathes in the file build\darknet\x64\data\voc.data More information about training by the link: Note: If during training you see nan values for avg (loss) field then training goes wrong, but if nan is in some other lines then training goes well. How to train with multi GPU: 1. Train it first on 1 GPU for like 1000 iterations: darknet.exe detector train data/voc.data cfg/yolov3 voc.cfg darknet53.conv.74 2. Adjust the learning rate ( cfg/yolov3 voc.cfg ) to fit the amount of GPUs. The learning rate should be equal to 0.001 , regardless of how many GPUs are used for training. So learning_rate GPUs 0.001 . For 4 GPUs adjust the value to learning_rate 0.00025 . 3. For 4xGPUs increase 4x times burn_in and max_batches in your cfg file. I.e. use burn_in 4000 instead of 1000 . 4. Then stop and by using partially trained model /backup/yolov3 voc_1000.weights run training with multigpu (up to 4 GPUs): darknet.exe detector train data/voc.data cfg/yolov3 voc.cfg /backup/yolov3 voc_1000.weights gpus 0,1,2,3 How to train (to detect your custom objects): (to train old Yolo v2 yolov2 voc.cfg , yolov2 tiny voc.cfg , yolo voc.cfg , yolo voc.2.0.cfg , ... click by the link ) Training Yolo v3: 1. Create file yolo obj.cfg with the same content as in yolov3.cfg (or copy yolov3.cfg to yolo obj.cfg) and: change line batch to batch 64 change line subdivisions to subdivisions 8 change line classes 80 to your number of objects in each of 3 yolo layers: change filters 255 to filters (classes + 5)x3 in the 3 convolutional before each yolo layer So if classes 1 then should be filters 18 . If classes 2 then write filters 21 . (Do not write in the cfg file: filters (classes + 5)x3) (Generally filters depends on the classes , coords and number of mask s, i.e. filters (classes + coords + 1) , where mask is indices of anchors. If mask is absence, then filters (classes + coords + 1) num ) So for example, for 2 objects, your file yolo obj.cfg should differ from yolov3.cfg in such lines in each of 3 yolo layers: convolutional filters 21 region classes 2 2. Create file obj.names in the directory build\darknet\x64\data\ , with objects names each in new line 3. Create file obj.data in the directory build\darknet\x64\data\ , containing (where classes number of objects ): classes 2 train data/train.txt valid data/test.txt names data/obj.names backup backup/ 4. Put image files (.jpg) of your objects in the directory build\darknet\x64\data\obj\ 5. You should label each object on images from your dataset. Use this visual GUI software for marking bounded boxes of objects and generating annotation files for Yolo v2 & v3: It will create .txt file for each .jpg image file in the same directory and with the same name, but with .txt extension, and put to file: object number and object coordinates on this image, for each object in new line: Where: integer object number from 0 to (classes 1) float values relative to width and height of image, it can be equal from (0.0 to 1.0 for example: / or / atention: are center of rectangle (are not top left corner) For example for img1.jpg you will be created img1.txt containing: 1 0.716797 0.395833 0.216406 0.147222 0 0.687109 0.379167 0.255469 0.158333 1 0.420312 0.395833 0.140625 0.166667 6. Create file train.txt in directory build\darknet\x64\data\ , with filenames of your images, each filename in new line, with path relative to darknet.exe , for example containing: data/obj/img1.jpg data/obj/img2.jpg data/obj/img3.jpg 7. Download pre trained weights for the convolutional layers (154 MB): and put to the directory build\darknet\x64 8. Start training by using the command line: darknet.exe detector train data/obj.data yolo obj.cfg darknet53.conv.74 (file yolo obj_xxx.weights will be saved to the build\darknet\x64\backup\ for each 100 iterations) (To disable Loss Window use darknet.exe detector train data/obj.data yolo obj.cfg darknet53.conv.74 dont_show , if you train on computer without monitor like a cloud Amazaon EC2) 9. After training is complete get result yolo obj_final.weights from path build\darknet\x64\backup\ After each 100 iterations you can stop and later start training from this point. For example, after 2000 iterations you can stop training, and later just copy yolo obj_2000.weights from build\darknet\x64\backup\ to build\darknet\x64\ and start training using: darknet.exe detector train data/obj.data yolo obj.cfg yolo obj_2000.weights (in the original repository the weights file is saved only once every 10 000 iterations if(iterations > 1000) ) Also you can get result earlier than all 45000 iterations. Note: If during training you see nan values for avg (loss) field then training goes wrong, but if nan is in some other lines then training goes well. Note: If you changed width or height in your cfg file, then new width and height must be divisible by 32. Note: After training use such command for detection: darknet.exe detector test data/obj.data yolo obj.cfg yolo obj_8000.weights Note: if error Out of memory occurs then in .cfg file you should increase subdivisions 16 , 32 or 64: link How to train tiny yolo (to detect your custom objects): Do all the same steps as for the full yolo model as described above. With the exception of: Download default weights file for yolov3 tiny: Get pre trained weights yolov3 tiny.conv.15 using command: darknet.exe partial cfg/yolov3 tiny.cfg yolov3 tiny.weights yolov3 tiny.conv.15 15 Make your custom model yolov3 tiny obj.cfg based on cfg/yolov3 tiny_obj.cfg instead of yolov3.cfg Start training: darknet.exe detector train data/obj.data yolov3 tiny obj.cfg yolov3 tiny.conv.15 For training Yolo based on other models ( DenseNet201 Yolo or ResNet50 Yolo ), you can download and get pre trained weights as showed in this file: If you made you custom model that isn't based on other models, then you can train it without pre trained weights, then will be used random initial weights. When should I stop training: Usually sufficient 2000 iterations for each class(object), but not less than 4000 iterations in total. But for a more precise definition when you should stop training, use the following manual: 1. During training, you will see varying indicators of error, and you should stop when no longer decreases 0.XXXXXXX avg : > Region Avg IOU: 0.798363, Class: 0.893232, Obj: 0.700808, No Obj: 0.004567, Avg Recall: 1.000000, count: 8 > Region Avg IOU: 0.800677, Class: 0.892181, Obj: 0.701590, No Obj: 0.004574, Avg Recall: 1.000000, count: 8 > > 9002 : 0.211667, 0.060730 avg , 0.001000 rate, 3.868000 seconds, 576128 images > Loaded: 0.000000 seconds 9002 iteration number (number of batch) 0.060730 avg average loss (error) the lower, the better When you see that average loss 0.xxxxxx avg no longer decreases at many iterations then you should stop training. 2. Once training is stopped, you should take some of last .weights files from darknet\build\darknet\x64\backup and choose the best of them: For example, you stopped training after 9000 iterations, but the best result can give one of previous weights (7000, 8000, 9000). It can happen due to overfitting. Overfitting is case when you can detect objects on images from training dataset, but can't detect objects on any others images. You should get weights from Early Stopping Point : ! Overfitting To get weights from Early Stopping Point: 2.1. At first, in your file obj.data you must specify the path to the validation dataset valid valid.txt (format of valid.txt as in train.txt ), and if you haven't validation images, just copy data\train.txt to data\valid.txt . 2.2 If training is stopped after 9000 iterations, to validate some of previous weights use this commands: (If you use another GitHub repository, then use darknet.exe detector recall ... instead of darknet.exe detector map ...) darknet.exe detector map data/obj.data yolo obj.cfg backup\yolo obj_7000.weights darknet.exe detector map data/obj.data yolo obj.cfg backup\yolo obj_8000.weights darknet.exe detector map data/obj.data yolo obj.cfg backup\yolo obj_9000.weights And comapre last output lines for each weights (7000, 8000, 9000): Choose weights file with the highest IoU (intersect of union) and mAP (mean average precision) For example, bigger IOU gives weights yolo obj_8000.weights then use this weights for detection . Example of custom object detection: darknet.exe detector test data/obj.data yolo obj.cfg yolo obj_8000.weights IoU (intersect of union) average instersect of union of objects and detections for a certain threshold 0.24 mAP (mean average precision) mean value of average precisions for each class, where average precision is average value of 11 points on PR curve for each possible threshold (each probability of detection) for the same class (Precision Recall in terms of PascalVOC, where Precision TP/(TP+FP) and Recall TP/(TP+FN) ), page 11: mAP is default metric of precision in the PascalVOC competition, this is the same as AP50 metric in the MS COCO competition. In terms of Wiki, indicators Precision and Recall have a slightly different meaning than in the PascalVOC competition, but IoU always has the same meaning . ! precision_recall_iou How to calculate mAP on PascalVOC 2007: 1. To calculate mAP (mean average precision) on PascalVOC 2007 test: Download PascalVOC dataset, install Python 3.x and get file 2007_test.txt as described here: Then download file to the dir build\darknet\x64\data\ then run voc_label_difficult.py to get the file difficult_2007_test.txt Remove symbol from this line to un comment it: Then there are 2 ways to get mAP: 1. Using Darknet + Python: run the file build/darknet/x64/calc_mAP_voc_py.cmd you will get mAP for yolo voc.cfg model, mAP 75.9% 2. Using this fork of Darknet: run the file build/darknet/x64/calc_mAP.cmd you will get mAP for yolo voc.cfg model, mAP 75.8% (The article specifies the value of mAP 76.8% for YOLOv2 416×416, page 4 table 3: We get values lower perhaps due to the fact that the model was trained on a slightly different source code than the code on which the detection is was done) if you want to get mAP for tiny yolo voc.cfg model, then un comment line for tiny yolo voc.cfg and comment line for yolo voc.cfg in the .cmd file if you have Python 2.x instead of Python 3.x, and if you use Darknet+Python way to get mAP, then in your cmd file use reval_voc.py and voc_eval.py instead of reval_voc_py3.py and voc_eval_py3.py from this directory: Custom object detection: Example of custom object detection: darknet.exe detector test data/obj.data yolo obj.cfg yolo obj_8000.weights ! Yolo_v2_training ! Yolo_v2_training How to improve object detection: 1. Before training: set flag random 1 in your .cfg file it will increase precision by training Yolo for different resolutions: link increase network resolution in your .cfg file ( height 608 , width 608 or any value multiple of 32) it will increase precision recalculate anchors for your dataset for width and height from cfg file: darknet.exe detector calc_anchors data/obj.data num_of_clusters 9 width 416 height 416 then set the same 9 anchors in each of 3 yolo layers in your cfg file check that each object are mandatory labeled in your dataset no one object in your data set should not be without label. In the most training issues there are wrong labels in your dataset (got labels by using some conversion script, marked with a third party tool, ...). Always check your dataset by using: desirable that your training dataset include images with objects at diffrent: scales, rotations, lightings, from different sides, on different backgrounds you should preferably have 2000 different images for each class or more, and you should train 2000 classes iterations or more desirable that your training dataset include images with non labeled objects that you do not want to detect negative samples without bounded box (empty .txt files) use as many images of negative samples as there are images with objects for training with a large number of objects in each image, add the parameter max 200 or higher value in the last yolo layer or region layer in your cfg file (the global maximum number of objects that can be detected by YoloV3 is 0,0615234375 (width height) where are width and height are parameters from net section in cfg file) for training for small objects set layers 1, 11 instead of and set stride 4 instead of If you train the model to distinguish Left and Right objects as separate classes (left/right hand, left/right turn on road signs, ...) then for disabling flip data augmentation add flip 0 here: General rule your training dataset should include such a set of relative sizes of objects that you want to detect: train_network_width train_obj_width / train_image_width detection_network_width detection_obj_width / detection_image_width train_network_height train_obj_height / train_image_height detection_network_height detection_obj_height / detection_image_height to speedup training (with decreasing detection accuracy) do Fine Tuning instead of Transfer Learning, set param stopbackward 1 here: then do this command: ./darknet partial cfg/yolov3.cfg yolov3.weights yolov3.conv.81 81 will be created file yolov3.conv.81 , then train by using weights file yolov3.conv.81 instead of darknet53.conv.74 2. After training for detection: Increase network resolution by set in your .cfg file ( height 608 and width 608 ) or ( height 832 and width 832 ) or (any value multiple of 32) this increases the precision and makes it possible to detect small objects: link it is not necessary to train the network again, just use .weights file already trained for 416x416 resolution but to get even greater accuracy you should train with higher resolution 608x608 or 832x832, note: if error Out of memory occurs then in .cfg file you should increase subdivisions 16 , 32 or 64: link How to mark bounded boxes of objects and create annotation files: Here you can find repository with GUI software for marking bounded boxes of objects and generating annotation files for Yolo v2 & v3: With example of: train.txt , obj.names , obj.data , yolo obj.cfg , air 1 6 .txt , bird 1 4 .txt for 2 classes of objects (air, bird) and train_obj.cmd with example how to train this image set with Yolo v2 & v3 Using Yolo9000 Simultaneous detection and classification of 9000 objects: yolo9000.weights (186 MB Yolo9000 Model) requires 4 GB GPU RAM: yolo9000.cfg cfg file of the Yolo9000, also there are paths to the 9k.tree and coco9k.map 9k.tree WordTree of 9418 categories , if parent_id 1 then this label hasn't parent: coco9k.map map 80 categories from MSCOCO to WordTree 9k.tree : combine9k.data data file, there are paths to: 9k.labels , 9k.names , inet9k.map , (change path to your combine9k.train.list ): 9k.labels 9418 labels of objects: 9k.names 9418 names of objects: inet9k.map map 200 categories from ImageNet to WordTree 9k.tree : How to use Yolo as DLL 1. To compile Yolo as C++ DLL file yolo_cpp_dll.dll open in MSVS2015 file build\darknet\yolo_cpp_dll.sln , set x64 and Release , and do the: Build > Build yolo_cpp_dll You should have installed CUDA 9.1 To use cuDNN do: (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions, and add at the beginning of line: CUDNN; 2. To use Yolo as DLL file in your C++ console application open in MSVS2015 file build\darknet\yolo_console_dll.sln , set x64 and Release , and do the: Build > Build yolo_console_dll you can run your console application from Windows Explorer build\darknet\x64\yolo_console_dll.exe use this command : yolo_console_dll.exe data/coco.names yolov3.cfg yolov3.weights test.mp4 or you can run from MSVS2015 (before this you should copy 2 files yolo voc.cfg and yolo voc.weights to the directory build\darknet\ ) after launching your console application and entering the image file name you will see info for each object: to use simple OpenCV GUI you should uncomment line // define OPENCV in yolo_console_dll.cpp file: link you can see source code of simple example for detection on the video file: link yolo_cpp_dll.dll API: link class Detector { public: Detector(std::string cfg_filename, std::string weight_filename, int gpu_id 0); Detector(); std::vector detect(std::string image_filename, float thresh 0.2, bool use_mean false); std::vector detect(image_t img, float thresh 0.2, bool use_mean false); static image_t load_image(std::string image_filename); static void free_image(image_t m); ifdef OPENCV std::vector detect(cv::Mat mat, float thresh 0.2, bool use_mean false); endif };",Object Detection,Object Detection 2582,Computer Vision,Computer Vision,Computer Vision,"FCOS: Fully Convolutional One Stage Object Detection This project hosts the code for implementing the FCOS algorithm for object detection, as presented in our paper: FCOS: Fully Convolutional One Stage Object Detection; Tian Zhi, Chunhua Shen, Hao Chen, and Tong He; arXiv preprint arXiv:1904.01355 (2019). The full paper is available at: Highlights Totally anchor free: FCOS completely avoids the complicated computation related to anchor boxes and all hyper parameters of anchor boxes. Memory efficient: FCOS uses 2x less training memory footprint than its anchor based counterpart RetinaNet. Better performance: The very simple detector achieves better performance (37.1 vs. 36.8) than Faster R CNN. Faster training and inference: With the same hardwares, FCOS also requires less training hours (6.5h vs. 8.8h) and faster inference speed (71ms vs. 126 ms per im) than Faster R CNN. State of the art performance: Without bells and whistles, FCOS achieves state of the art performances. It achieves 41.5% (ResNet 101 FPN) and 43.2% (ResNeXt 64x4d 101) in AP on coco test dev. Required hardware We use 8 Nvidia V100 GPUs. \ But 4 1080Ti GPUs can also train a fully fledged ResNet 50 FPN based FCOS since FCOS is memory efficient. Installation This FCOS implementation is based on maskrcnn benchmark . Therefore the installation is the same as original maskrcnn benchmark. Please check INSTALL.md (INSTALL.md) for installation instructions. You may also want to see the original README.md (MASKRCNN_README.md) of maskrcnn benchmark. A quick demo Once the installation is done, you can follow the below steps to run a quick demo. assume that you are under the root directory of this project, and you have activated your virtual environment if needed. wget O FCOS_R_50_FPN_1x.pth python demo/fcos_demo.py Inference The inference command line on coco minival split: python tools/test_net.py \ config file configs/fcos/fcos_R_50_FPN_1x.yaml \ MODEL.WEIGHT models/FCOS_R_50_FPN_1x.pth \ TEST.IMS_PER_BATCH 4 Please note that: 1) If your model's name is different, please replace models/FCOS_R_50_FPN_1x.pth with your own. 2) If you enounter out of memory error, please try to reduce TEST.IMS_PER_BATCH to 1. 3) If you want to evaluate a different model, please change config file to its config file (in configs/fcos (configs/fcos)) and MODEL.WEIGHT to its weights file. For your convenience, we provide the following trained models (more models are coming soon). Model Total training mem (GB) Multi scale training Testing time / im AP (minival) AP (test dev) Link : : : : : : : : : : : : FCOS_R_50_FPN_1x 29.3 No 71ms 37.1 37.4 download FCOS_R_101_FPN_2x 44.1 Yes 74ms 41.4 41.5 download FCOS_X_101_32x8d_FPN_2x 72.9 Yes 122ms 42.5 42.7 download FCOS_X_101_64x4d_FPN_2x 77.7 Yes 140ms 43.0 43.2 download 1 1x and 2x mean the model is trained for 90K and 180K iterations, respectively. \ 2 We report total training memory footprint on all GPUs instead of the memory footprint per GPU as in maskrcnn benchmark . \ 3 All results are obtained with a single model and without any test time data augmentation such as multi scale, flipping and etc.. \ 4 Our results have been improved since our initial release. If you want to check out our original results, please checkout commit f4fd589 . Training The following command line will train FCOS_R_50_FPN_1x on 8 GPUs with Synchronous Stochastic Gradient Descent (SGD): python m torch.distributed.launch \ nproc_per_node 8 \ master_port $((RANDOM + 10000)) \ tools/train_net.py \ skip test \ config file configs/fcos/fcos_R_50_FPN_1x.yaml \ DATALOADER.NUM_WORKERS 2 \ OUTPUT_DIR training_dir/fcos_R_50_FPN_1x Note that: 1) If you want to use fewer GPUs, please change nproc_per_node to the number of GPUs. No other settings need to be changed. The total batch size does not depends on nproc_per_node . If you want to change the total batch size, please change SOLVER.IMS_PER_BATCH in configs/fcos/fcos_R_50_FPN_1x.yaml (configs/fcos/fcos_R_50_FPN_1x.yaml). 2) The models will be saved into OUTPUT_DIR . 3) If you want to train FCOS with other backbones, please change config file . 4) We haved noted that training FCOS with 4 GPUs (4 images per GPU) can achieve slightly better performance than with 8 GPUs (2 images per GPU). We are working to find the reasons. But if you pursuit the best performance, we suggest you train your models with 4 GPUs as long as an out of memory error does not happen. 5) Sometimes you may encounter a deadlock with 100% GPUs' usage, which might be a problem of NCCL. Please try export NCCL_P2P_DISABLE 1 before running the training command line. Contributing to the project Any pull requests or issues are welcome. Citations Please consider citing our paper in your publications if the project helps your research. BibTeX reference is as follows. @article{tian2019fcos, title {{FCOS}: Fully Convolutional One Stage Object Detection}, author {Tian, Zhi and Shen, Chunhua and Chen, Hao and He, Tong}, journal {arXiv preprint arXiv:1904.01355}, year {2019} } License For academic use, this project is licensed under the 2 clause BSD License see the LICENSE file for details. For commercial use, please contact the authors.",Object Detection,Object Detection 2599,Computer Vision,Computer Vision,Computer Vision,"Faster RCNN / Mask RCNN Airbus Ship Detection Challenge 10th Code. Code mostly improted from tensorpack (see below) Modified from Dependencies + Python 3; TensorFlow > 1.4.0 + pip install git+git://github.com/waspinator/pycococreator.git@0.2.0 + pip install 'git+ + Tensorpack@0.8.5 (pip install U git+ + OpenCV + Pre trained ResNet model from tensorpack model zoo. What Works 1. I implemented Online Hard Example Mining selecting top 128 roi's for training 2. Multi Scale Training: Since small object are hard for detection, I resize the image randomly from 1200 2000. 3. I used softnms for post processing. I used mask's overlap instead of box overlap to rescore each instance to avoid dealing with the rotated box problem. 4. MaskRCNN is prone to overfitting (The lesson learned from DSB2018). Data augmentation with MotionBlur, GaussNoise, ShiftScale, Rotate, CLAHE, RandomBrightness, RandomContrast ... 5. Enlarge mask crop from 28 to 56, using diceloss + BCE loss. Doesn't Work 1. Add stride 2 featuremap in FPN and add size 16 anchor. Improve local cv to 0.61 but worse public LB and private LB. 2. Cascade RCNN No improvement. I have implemented TTA and checkpoint ensemble (see eval.py) but all results in worse public LB. Turns out they are better in private LB (best 0.83). The final score are based on single model without TTA.",Object Detection,Object Detection 2608,Computer Vision,Computer Vision,Computer Vision,"Yolo v3 and Yolo v2 for Windows and Linux (neural network for object detection) CircleCI 1. How to use ( how to use) 2. How to compile on Linux ( how to compile on linux) 3. How to compile on Windows ( how to compile on windows) 4. How to train (Pascal VOC Data) ( how to train pascal voc data) 5. How to train (to detect your custom objects) ( how to train to detect your custom objects) 6. When should I stop training ( when should i stop training) 7. How to calculate mAP on PascalVOC 2007 ( how to calculate map on pascalvoc 2007) 8. How to improve object detection ( how to improve object detection) 9. How to mark bounded boxes of objects and create annotation files ( how to mark bounded boxes of objects and create annotation files) 10. Using Yolo9000 ( using yolo9000) 11. How to use Yolo as DLL ( how to use yolo as dll) ! Darknet Logo ! map_fps mAP (AP50) Yolo v3 source chart for the RetinaNet on MS COCO got from Table 1 (e): Yolo v2 on Pascal VOC 2007: Yolo v2 on Pascal VOC 2012 (comp4): You Only Look Once: Unified, Real Time Object Detection (versions 2 & 3) A Yolo cross platform Windows and Linux version (for object detection). Contributtors: This repository is forked from Linux version: More details: This repository supports: both Windows and Linux both OpenCV 2.x.x and OpenCV 7.5 also create SO library on Linux and DLL library on Windows Requires: Linux GCC> 4.9 or Windows MS Visual Studio 2015 (v140) : (or offline ISO image ) CUDA 9.1 : OpenCV 3.4.0 : or OpenCV 2.4.13 : OpenCV allows to show image or video detection in the window and store result to file that specified in command line out_filename res.avi GPU with CC > 3.0 : Pre trained models for different cfg files can be downloaded from (smaller > faster & lower quality): yolov3.cfg (236 MB COCO Yolo v3 ) requires 4 GB GPU RAM: yolov3 tiny.cfg (34 MB COCO Yolo v3 tiny ) requires 1 GB GPU RAM: yolov2.cfg (194 MB COCO Yolo v2) requires 4 GB GPU RAM: yolo voc.cfg (194 MB VOC Yolo v2) requires 4 GB GPU RAM: yolov2 tiny.cfg (43 MB COCO Yolo v2) requires 1 GB GPU RAM: yolov2 tiny voc.cfg (60 MB VOC Yolo v2) requires 1 GB GPU RAM: yolo9000.cfg (186 MB Yolo9000 model) requires 4 GB GPU RAM: Put it near compiled: darknet.exe You can get cfg files by path: darknet/cfg/ Examples of results: Everything Is AWESOME Others: How to use: Example of usage in cmd files from build\darknet\x64\ : darknet_yolo_v3.cmd initialization with 236 MB Yolo v3 COCO model yolov3.weights & yolov3.cfg and show detection on the image: dog.jpg darknet_voc.cmd initialization with 194 MB VOC model yolo voc.weights & yolo voc.cfg and waiting for entering the name of the image file darknet_demo_voc.cmd initialization with 194 MB VOC model yolo voc.weights & yolo voc.cfg and play your video file which you must rename to: test.mp4 darknet_demo_store.cmd initialization with 194 MB VOC model yolo voc.weights & yolo voc.cfg and play your video file which you must rename to: test.mp4, and store result to: res.avi darknet_net_cam_voc.cmd initialization with 194 MB VOC model, play video from network video camera mjpeg stream (also from you phone) darknet_web_cam_voc.cmd initialization with 194 MB VOC model, play video from Web Camera number 0 darknet_coco_9000.cmd initialization with 186 MB Yolo9000 COCO model, and show detection on the image: dog.jpg darknet_coco_9000_demo.cmd initialization with 186 MB Yolo9000 COCO model, and show detection on the video (if it is present): street4k.mp4, and store result to: res.avi How to use on the command line: On Linux use ./darknet instead of darknet.exe , like this: ./darknet detector test ./cfg/coco.data ./cfg/yolov3.cfg ./yolov3.weights Yolo v3 COCO image: darknet.exe detector test data/coco.data cfg/yolov3.cfg yolov3.weights i 0 thresh 0.25 Alternative method Yolo v3 COCO image: darknet.exe detect cfg/yolov3.cfg yolov3.weights i 0 thresh 0.25 Output coordinates of objects: darknet.exe detector test data/coco.data yolov3.cfg yolov3.weights thresh 0.25 dog.jpg ext_output 194 MB VOC model image: darknet.exe detector test data/voc.data yolo voc.cfg yolo voc.weights i 0 194 MB VOC model video: darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights test.mp4 i 0 194 MB VOC model save result to the file res.avi : darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights test.mp4 i 0 out_filename res.avi Alternative method 194 MB VOC model video: darknet.exe yolo demo yolo voc.cfg yolo voc.weights test.mp4 i 0 43 MB VOC model for video: darknet.exe detector demo data/coco.data cfg/yolov2 tiny.cfg yolov2 tiny.weights test.mp4 i 0 Yolo v3 236 MB COCO for net videocam Smart WebCam: darknet.exe detector demo data/coco.data cfg/yolov3.cfg yolov3.weights i 0 194 MB VOC model for net videocam Smart WebCam: darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights i 0 194 MB VOC model WebCamera 0: darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights c 0 186 MB Yolo9000 image: darknet.exe detector test cfg/combine9k.data yolo9000.cfg yolo9000.weights Remeber to put data/9k.tree and data/coco9k.map under the same folder of your app if you use the cpp api to build an app To process a list of images data/train.txt and save results of detection to result.txt use: darknet.exe detector test data/voc.data yolo voc.cfg yolo voc.weights dont_show ext_output result.txt For using network video camera mjpeg stream with any Android smartphone: 1. Download for Android phone mjpeg stream soft: IP Webcam / Smart WebCam Smart WebCam preferably: IP Webcam: 2. Connect your Android phone to computer by WiFi (through a WiFi router) or USB 3. Start Smart WebCam on your phone 4. Replace the address below, on shown in the phone application (Smart WebCam) and launch: 194 MB COCO model: darknet.exe detector demo data/coco.data yolo.cfg yolo.weights i 0 194 MB VOC model: darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights i 0 How to compile on Linux: Just do make in the darknet directory. Before make, you can set such options in the Makefile : link GPU 1 to build with CUDA to accelerate by using GPU (CUDA should be in /usr/local/cuda ) CUDNN 1 to build with cuDNN v5 v7 to accelerate training by using GPU (cuDNN should be in /usr/local/cudnn ) CUDNN_HALF 1 to build for Tensor Cores (on Titan V / Tesla V100 / DGX 2 and later) speedup Detection 3x, Training 2x OPENCV 1 to build with OpenCV 3.x/2.4.x allows to detect on video files and video streams from network cameras or web cams DEBUG 1 to bould debug version of Yolo OPENMP 1 to build with OpenMP support to accelerate Yolo by using multi core CPU LIBSO 1 to build a library darknet.so and binary runable file uselib that uses this library. Or you can try to run so LD_LIBRARY_PATH ./:$LD_LIBRARY_PATH ./uselib test.mp4 How to use this SO library from your own code you can look at C++ example: or use in such a way: LD_LIBRARY_PATH ./:$LD_LIBRARY_PATH ./uselib data/coco.names cfg/yolov3.cfg yolov3.weights test.mp4 How to compile on Windows: 1. If you have MSVS 2015, CUDA 9.1, cuDNN 7.0 and OpenCV 3.x (with paths: C:\opencv_3.0\opencv\build\include & C:\opencv_3.0\opencv\build\x64\vc14\lib ), then start MSVS, open build\darknet\darknet.sln , set x64 and Release and do the: Build > Build darknet. NOTE: If installing OpenCV, use OpenCV 3.4.0 or earlier. This is a bug in OpenCV 3.4.1 in the C API (see 500 ). 1.1. Find files opencv_world320.dll and opencv_ffmpeg320_64.dll (or opencv_world340.dll and opencv_ffmpeg340_64.dll ) in C:\opencv_3.0\opencv\build\x64\vc14\bin and put it near with darknet.exe 1.2 Check that there are bin and include folders in the C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1 if aren't, then copy them to this folder from the path where is CUDA installed 1.3. To install CUDNN (speedup neural network), do the following: download and install cuDNN 7.0 for CUDA 9.1 : add Windows system variable cudnn with path to CUDNN: 1.4. If you want to build without CUDNN then: open \darknet.sln > (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions, and remove this: CUDNN; 2. If you have other version of CUDA (not 9.1) then open build\darknet\darknet.vcxproj by using Notepad, find 2 places with CUDA 9.1 and change it to your CUDA version, then do step 1 3. If you don't have GPU , but have MSVS 2015 and OpenCV 3.0 (with paths: C:\opencv_3.0\opencv\build\include & C:\opencv_3.0\opencv\build\x64\vc14\lib ), then start MSVS, open build\darknet\darknet_no_gpu.sln , set x64 and Release , and do the: Build > Build darknet_no_gpu 4. If you have OpenCV 2.4.13 instead of 3.0 then you should change pathes after \darknet.sln is opened 4.1 (right click on project) > properties > C/C++ > General > Additional Include Directories: C:\opencv_2.4.13\opencv\build\include 4.2 (right click on project) > properties > Linker > General > Additional Library Directories: C:\opencv_2.4.13\opencv\build\x64\vc14\lib 5. If you have GPU with Tensor Cores (nVidia Titan V / Tesla V100 / DGX 2 and later) speedup Detection 3x, Training 2x: \darknet.sln > (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions, and add here: CUDNN_HALF; Note: CUDA must be installed only after that MSVS2015 had been installed. How to compile (custom): Also, you can to create your own darknet.sln & darknet.vcxproj , this example for CUDA 9.1 and OpenCV 3.0 Then add to your created project: (right click on project) > properties > C/C++ > General > Additional Include Directories, put here: C:\opencv_3.0\opencv\build\include;..\..\3rdparty\include;%(AdditionalIncludeDirectories);$(CudaToolkitIncludeDir);$(cudnn)\include (right click on project) > Build dependecies > Build Customizations > set check on CUDA 9.1 or what version you have for example as here: add to project all .c & .cu files and file from \src (right click on project) > properties > Linker > General > Additional Library Directories, put here: C:\opencv_3.0\opencv\build\x64\vc14\lib;$(CUDA_PATH)lib\$(PlatformName);$(cudnn)\lib\x64;%(AdditionalLibraryDirectories) (right click on project) > properties > Linker > Input > Additional dependecies, put here: ..\..\3rdparty\lib\x64\pthreadVC2.lib;cublas.lib;curand.lib;cudart.lib;cudnn.lib;%(AdditionalDependencies) (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions OPENCV;_TIMESPEC_DEFINED;_CRT_SECURE_NO_WARNINGS;_CRT_RAND_S;WIN32;NDEBUG;_CONSOLE;_LIB;%(PreprocessorDefinitions) compile to .exe (X64 & Release) and put .dll s near with .exe: pthreadVC2.dll, pthreadGC2.dll from \3rdparty\dll\x64 cusolver64_91.dll, curand64_91.dll, cudart64_91.dll, cublas64_91.dll 91 for CUDA 9.1 or your version, from C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1\bin For OpenCV 3.2: opencv_world320.dll and opencv_ffmpeg320_64.dll from C:\opencv_3.0\opencv\build\x64\vc14\bin For OpenCV 2.4.13: opencv_core2413.dll , opencv_highgui2413.dll and opencv_ffmpeg2413_64.dll from C:\opencv_2.4.13\opencv\build\x64\vc14\bin How to train (Pascal VOC Data): 1. Download pre trained weights for the convolutional layers (154 MB): and put to the directory build\darknet\x64 2. Download The Pascal VOC Data and unpack it to directory build\darknet\x64\data\voc will be created dir build\darknet\x64\data\voc\VOCdevkit\ : 2.1 Download file voc_label.py to dir build\darknet\x64\data\voc : 3. Download and install Python for Windows: 4. Run command: python build\darknet\x64\data\voc\voc_label.py (to generate files: 2007_test.txt, 2007_train.txt, 2007_val.txt, 2012_train.txt, 2012_val.txt) 5. Run command: type 2007_train.txt 2007_val.txt 2012_ .txt > train.txt 6. Set batch 64 and subdivisions 8 in the file yolov3 voc.cfg : link 7. Start training by using train_voc.cmd or by using the command line: darknet.exe detector train data/voc.data cfg/yolov3 voc.cfg darknet53.conv.74 ( Note: To disable Loss Window use flag dont_show . If you are using CPU, try darknet_no_gpu.exe instead of darknet.exe .) If required change pathes in the file build\darknet\x64\data\voc.data More information about training by the link: Note: If during training you see nan values for avg (loss) field then training goes wrong, but if nan is in some other lines then training goes well. How to train with multi GPU: 1. Train it first on 1 GPU for like 1000 iterations: darknet.exe detector train data/voc.data cfg/yolov3 voc.cfg darknet53.conv.74 2. Then stop and by using partially trained model /backup/yolov3 voc_1000.weights run training with multigpu (up to 4 GPUs): darknet.exe detector train data/voc.data cfg/yolov3 voc.cfg /backup/yolov3 voc_1000.weights gpus 0,1,2,3 How to train (to detect your custom objects): (to train old Yolo v2 yolov2 voc.cfg , yolov2 tiny voc.cfg , yolo voc.cfg , yolo voc.2.0.cfg , ... click by the link ) Training Yolo v3: 1. Create file yolo obj.cfg with the same content as in yolov3.cfg (or copy yolov3.cfg to yolo obj.cfg) and: change line batch to batch 64 change line subdivisions to subdivisions 8 change line classes 80 to your number of objects in each of 3 yolo layers: change filters 255 to filters (classes + 5)x3 in the 3 convolutional before each yolo layer So if classes 1 then should be filters 18 . If classes 2 then write filters 21 . (Do not write in the cfg file: filters (classes + 5)x3) (Generally filters depends on the classes , coords and number of mask s, i.e. filters (classes + coords + 1) , where mask is indices of anchors. If mask is absence, then filters (classes + coords + 1) num ) So for example, for 2 objects, your file yolo obj.cfg should differ from yolov3.cfg in such lines in each of 3 yolo layers: convolutional filters 21 region classes 2 2. Create file obj.names in the directory build\darknet\x64\data\ , with objects names each in new line 3. Create file obj.data in the directory build\darknet\x64\data\ , containing (where classes number of objects ): classes 2 train data/train.txt valid data/test.txt names data/obj.names backup backup/ 4. Put image files (.jpg) of your objects in the directory build\darknet\x64\data\obj\ 5. You should label each object on images from your dataset. Use this visual GUI software for marking bounded boxes of objects and generating annotation files for Yolo v2 & v3: It will create .txt file for each .jpg image file in the same directory and with the same name, but with .txt extension, and put to file: object number and object coordinates on this image, for each object in new line: Where: integer number of object from 0 to (classes 1) float values relative to width and height of image, it can be equal from (0.0 to 1.0 for example: / or / atention: are center of rectangle (are not top left corner) For example for img1.jpg you will be created img1.txt containing: 1 0.716797 0.395833 0.216406 0.147222 0 0.687109 0.379167 0.255469 0.158333 1 0.420312 0.395833 0.140625 0.166667 6. Create file train.txt in directory build\darknet\x64\data\ , with filenames of your images, each filename in new line, with path relative to darknet.exe , for example containing: data/obj/img1.jpg data/obj/img2.jpg data/obj/img3.jpg 7. Download pre trained weights for the convolutional layers (154 MB): and put to the directory build\darknet\x64 8. Start training by using the command line: darknet.exe detector train data/obj.data yolo obj.cfg darknet53.conv.74 (file yolo obj_xxx.weights will be saved to the build\darknet\x64\backup\ for each 100 iterations) (To disable Loss Window use darknet.exe detector train data/obj.data yolo obj.cfg darknet53.conv.74 dont_show , if you train on computer without monitor like a cloud Amazaon EC2) 9. After training is complete get result yolo obj_final.weights from path build\darknet\x64\backup\ After each 100 iterations you can stop and later start training from this point. For example, after 2000 iterations you can stop training, and later just copy yolo obj_2000.weights from build\darknet\x64\backup\ to build\darknet\x64\ and start training using: darknet.exe detector train data/obj.data yolo obj.cfg yolo obj_2000.weights (in the original repository the weights file is saved only once every 10 000 iterations if(iterations > 1000) ) Also you can get result earlier than all 45000 iterations. Note: If during training you see nan values for avg (loss) field then training goes wrong, but if nan is in some other lines then training goes well. Note: If you changed width or height in your cfg file, then new width and height must be divisible by 32. Note: After training use such command for detection: darknet.exe detector test data/obj.data yolo obj.cfg yolo obj_8000.weights How to train tiny yolo (to detect your custom objects): Do all the same steps as for the full yolo model as described above. With the exception of: Download default weights file for yolov3 tiny: Get pre trained weights yolov3 tiny.conv.15 using command: darknet.exe partial cfg/yolov3 tiny.cfg yolov3 tiny.weights yolov3 tiny.conv.15 15 Make your custom model yolov3 tiny obj.cfg based on cfg/yolov3 tiny_obj.cfg instead of yolov3.cfg Start training: darknet.exe detector train data/obj.data yolov3 tiny obj.cfg yolov3 tiny.conv.15 For training Yolo based on other models ( DenseNet201 Yolo or ResNet50 Yolo ), you can download and get pre trained weights as showed in this file: If you made you custom model that isn't based on other models, then you can train it without pre trained weights, then will be used random initial weights. When should I stop training: Usually sufficient 2000 iterations for each class(object). But for a more precise definition when you should stop training, use the following manual: 1. During training, you will see varying indicators of error, and you should stop when no longer decreases 0.XXXXXXX avg : > Region Avg IOU: 0.798363, Class: 0.893232, Obj: 0.700808, No Obj: 0.004567, Avg Recall: 1.000000, count: 8 > Region Avg IOU: 0.800677, Class: 0.892181, Obj: 0.701590, No Obj: 0.004574, Avg Recall: 1.000000, count: 8 > > 9002 : 0.211667, 0.060730 avg , 0.001000 rate, 3.868000 seconds, 576128 images > Loaded: 0.000000 seconds 9002 iteration number (number of batch) 0.060730 avg average loss (error) the lower, the better When you see that average loss 0.xxxxxx avg no longer decreases at many iterations then you should stop training. 2. Once training is stopped, you should take some of last .weights files from darknet\build\darknet\x64\backup and choose the best of them: For example, you stopped training after 9000 iterations, but the best result can give one of previous weights (7000, 8000, 9000). It can happen due to overfitting. Overfitting is case when you can detect objects on images from training dataset, but can't detect objects on any others images. You should get weights from Early Stopping Point : ! Overfitting To get weights from Early Stopping Point: 2.1. At first, in your file obj.data you must specify the path to the validation dataset valid valid.txt (format of valid.txt as in train.txt ), and if you haven't validation images, just copy data\train.txt to data\valid.txt . 2.2 If training is stopped after 9000 iterations, to validate some of previous weights use this commands: (If you use another GitHub repository, then use darknet.exe detector recall ... instead of darknet.exe detector map ...) darknet.exe detector map data/obj.data yolo obj.cfg backup\yolo obj_7000.weights darknet.exe detector map data/obj.data yolo obj.cfg backup\yolo obj_8000.weights darknet.exe detector map data/obj.data yolo obj.cfg backup\yolo obj_9000.weights And comapre last output lines for each weights (7000, 8000, 9000): Choose weights file with the highest IoU (intersect of union) and mAP (mean average precision) For example, bigger IOU gives weights yolo obj_8000.weights then use this weights for detection . Example of custom object detection: darknet.exe detector test data/obj.data yolo obj.cfg yolo obj_8000.weights IoU (intersect of union) average instersect of union of objects and detections for a certain threshold 0.24 mAP (mean average precision) mean value of average precisions for each class, where average precision is average value of 11 points on PR curve for each possible threshold (each probability of detection) for the same class (Precision Recall in terms of PascalVOC, where Precision TP/(TP+FP) and Recall TP/(TP+FN) ), page 11: mAP is default metric of precision in the PascalVOC competition, this is the same as AP50 metric in the MS COCO competition. In terms of Wiki, indicators Precision and Recall have a slightly different meaning than in the PascalVOC competition, but IoU always has the same meaning . ! precision_recall_iou How to calculate mAP on PascalVOC 2007: 1. To calculate mAP (mean average precision) on PascalVOC 2007 test: Download PascalVOC dataset, install Python 3.x and get file 2007_test.txt as described here: Then download file to the dir build\darknet\x64\data\ then run voc_label_difficult.py to get the file difficult_2007_test.txt Remove symbol from this line to un comment it: Then there are 2 ways to get mAP: 1. Using Darknet + Python: run the file build/darknet/x64/calc_mAP_voc_py.cmd you will get mAP for yolo voc.cfg model, mAP 75.9% 2. Using this fork of Darknet: run the file build/darknet/x64/calc_mAP.cmd you will get mAP for yolo voc.cfg model, mAP 75.8% (The article specifies the value of mAP 76.8% for YOLOv2 416×416, page 4 table 3: We get values lower perhaps due to the fact that the model was trained on a slightly different source code than the code on which the detection is was done) if you want to get mAP for tiny yolo voc.cfg model, then un comment line for tiny yolo voc.cfg and comment line for yolo voc.cfg in the .cmd file if you have Python 2.x instead of Python 3.x, and if you use Darknet+Python way to get mAP, then in your cmd file use reval_voc.py and voc_eval.py instead of reval_voc_py3.py and voc_eval_py3.py from this directory: Custom object detection: Example of custom object detection: darknet.exe detector test data/obj.data yolo obj.cfg yolo obj_8000.weights ! Yolo_v2_training ! Yolo_v2_training How to improve object detection: 1. Before training: set flag random 1 in your .cfg file it will increase precision by training Yolo for different resolutions: link increase network resolution in your .cfg file ( height 608 , width 608 or any value multiple of 32) it will increase precision recalculate anchors for your dataset for width and height from cfg file: darknet.exe detector calc_anchors data/obj.data num_of_clusters 9 width 416 height 416 then set the same 9 anchors in each of 3 yolo layers in your cfg file check that each object are mandatory labeled in your dataset no one object in your data set should not be without label. In the most training issues there are wrong labels in your dataset (got labels by using some conversion script, marked with a third party tool, ...). Always check your dataset by using: desirable that your training dataset include images with objects at diffrent: scales, rotations, lightings, from different sides, on different backgrounds you should preferably have 2000 different images for each class or more, and you should train 2000 classes iterations or more desirable that your training dataset include images with non labeled objects that you do not want to detect negative samples without bounded box (empty .txt files) use as many images of negative samples as there are images with objects for training with a large number of objects in each image, add the parameter max 200 or higher value in the last layer region in your cfg file for training for small objects set layers 1, 11 instead of and set stride 4 instead of General rule your training dataset should include such a set of relative sizes of objects that you want to detect: train_network_width train_obj_width / train_image_width detection_network_width detection_obj_width / detection_image_width train_network_height train_obj_height / train_image_height detection_network_height detection_obj_height / detection_image_height to speedup training (with decreasing detection accuracy) do Fine Tuning instead of Transfer Learning, set param stopbackward 1 here: 2. After training for detection: Increase network resolution by set in your .cfg file ( height 608 and width 608 ) or ( height 832 and width 832 ) or (any value multiple of 32) this increases the precision and makes it possible to detect small objects: link you do not need to train the network again, just use .weights file already trained for 416x416 resolution if error Out of memory occurs then in .cfg file you should increase subdivisions 16 , 32 or 64: link How to mark bounded boxes of objects and create annotation files: Here you can find repository with GUI software for marking bounded boxes of objects and generating annotation files for Yolo v2 & v3: With example of: train.txt , obj.names , obj.data , yolo obj.cfg , air 1 6 .txt , bird 1 4 .txt for 2 classes of objects (air, bird) and train_obj.cmd with example how to train this image set with Yolo v2 & v3 Using Yolo9000 Simultaneous detection and classification of 9000 objects: yolo9000.weights (186 MB Yolo9000 Model) requires 4 GB GPU RAM: yolo9000.cfg cfg file of the Yolo9000, also there are paths to the 9k.tree and coco9k.map 9k.tree WordTree of 9418 categories , if parent_id 1 then this label hasn't parent: coco9k.map map 80 categories from MSCOCO to WordTree 9k.tree : combine9k.data data file, there are paths to: 9k.labels , 9k.names , inet9k.map , (change path to your combine9k.train.list ): 9k.labels 9418 labels of objects: 9k.names 9418 names of objects: inet9k.map map 200 categories from ImageNet to WordTree 9k.tree : How to use Yolo as DLL 1. To compile Yolo as C++ DLL file yolo_cpp_dll.dll open in MSVS2015 file build\darknet\yolo_cpp_dll.sln , set x64 and Release , and do the: Build > Build yolo_cpp_dll You should have installed CUDA 9.1 To use cuDNN do: (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions, and add at the beginning of line: CUDNN; 2. To use Yolo as DLL file in your C++ console application open in MSVS2015 file build\darknet\yolo_console_dll.sln , set x64 and Release , and do the: Build > Build yolo_console_dll you can run your console application from Windows Explorer build\darknet\x64\yolo_console_dll.exe or you can run from MSVS2015 (before this you should copy 2 files yolo voc.cfg and yolo voc.weights to the directory build\darknet\ ) after launching your console application and entering the image file name you will see info for each object: to use simple OpenCV GUI you should uncomment line // define OPENCV in yolo_console_dll.cpp file: link you can see source code of simple example for detection on the video file: link yolo_cpp_dll.dll API: link class Detector { public: Detector(std::string cfg_filename, std::string weight_filename, int gpu_id 0); Detector(); std::vector detect(std::string image_filename, float thresh 0.2, bool use_mean false); std::vector detect(image_t img, float thresh 0.2, bool use_mean false); static image_t load_image(std::string image_filename); static void free_image(image_t m); ifdef OPENCV std::vector detect(cv::Mat mat, float thresh 0.2, bool use_mean false); endif };",Object Detection,Object Detection 2610,Computer Vision,Computer Vision,Computer Vision,"Yolo v3 and Yolo v2 for Windows and Linux (neural network for object detection) CircleCI 1. How to use ( how to use) 2. How to compile on Linux ( how to compile on linux) 3. How to compile on Windows ( how to compile on windows) 4. How to train (Pascal VOC Data) ( how to train pascal voc data) 5. How to train (to detect your custom objects) ( how to train to detect your custom objects) 6. When should I stop training ( when should i stop training) 7. How to calculate mAP on PascalVOC 2007 ( how to calculate map on pascalvoc 2007) 8. How to improve object detection ( how to improve object detection) 9. How to mark bounded boxes of objects and create annotation files ( how to mark bounded boxes of objects and create annotation files) 10. Using Yolo9000 ( using yolo9000) 11. How to use Yolo as DLL ( how to use yolo as dll) ! Darknet Logo ! map_fps mAP (AP50) Yolo v3 source chart for the RetinaNet on MS COCO got from Table 1 (e): Yolo v2 on Pascal VOC 2007: Yolo v2 on Pascal VOC 2012 (comp4): You Only Look Once: Unified, Real Time Object Detection (versions 2 & 3) A Yolo cross platform Windows and Linux version (for object detection). Contributtors: This repository is forked from Linux version: More details: This repository supports: both Windows and Linux both OpenCV 2.x.x and OpenCV 7.5 also create SO library on Linux and DLL library on Windows Requires: Linux GCC> 4.9 or Windows MS Visual Studio 2015 (v140) : (or offline ISO image ) CUDA 9.1 : OpenCV 3.4.0 : or OpenCV 2.4.13 : OpenCV allows to show image or video detection in the window and store result to file that specified in command line out_filename res.avi GPU with CC > 3.0 : Pre trained models for different cfg files can be downloaded from (smaller > faster & lower quality): yolov3.cfg (236 MB COCO Yolo v3 ) requires 4 GB GPU RAM: yolov3 tiny.cfg (34 MB COCO Yolo v3 tiny ) requires 1 GB GPU RAM: yolov2.cfg (194 MB COCO Yolo v2) requires 4 GB GPU RAM: yolo voc.cfg (194 MB VOC Yolo v2) requires 4 GB GPU RAM: yolov2 tiny.cfg (43 MB COCO Yolo v2) requires 1 GB GPU RAM: yolov2 tiny voc.cfg (60 MB VOC Yolo v2) requires 1 GB GPU RAM: yolo9000.cfg (186 MB Yolo9000 model) requires 4 GB GPU RAM: Put it near compiled: darknet.exe You can get cfg files by path: darknet/cfg/ Examples of results: Everything Is AWESOME Others: How to use: Example of usage in cmd files from build\darknet\x64\ : darknet_yolo_v3.cmd initialization with 236 MB Yolo v3 COCO model yolov3.weights & yolov3.cfg and show detection on the image: dog.jpg darknet_voc.cmd initialization with 194 MB VOC model yolo voc.weights & yolo voc.cfg and waiting for entering the name of the image file darknet_demo_voc.cmd initialization with 194 MB VOC model yolo voc.weights & yolo voc.cfg and play your video file which you must rename to: test.mp4 darknet_demo_store.cmd initialization with 194 MB VOC model yolo voc.weights & yolo voc.cfg and play your video file which you must rename to: test.mp4, and store result to: res.avi darknet_net_cam_voc.cmd initialization with 194 MB VOC model, play video from network video camera mjpeg stream (also from you phone) darknet_web_cam_voc.cmd initialization with 194 MB VOC model, play video from Web Camera number 0 darknet_coco_9000.cmd initialization with 186 MB Yolo9000 COCO model, and show detection on the image: dog.jpg darknet_coco_9000_demo.cmd initialization with 186 MB Yolo9000 COCO model, and show detection on the video (if it is present): street4k.mp4, and store result to: res.avi How to use on the command line: On Linux use ./darknet instead of darknet.exe , like this: ./darknet detector test ./cfg/coco.data ./cfg/yolov3.cfg ./yolov3.weights Yolo v3 COCO image: darknet.exe detector test data/coco.data cfg/yolov3.cfg yolov3.weights i 0 thresh 0.25 Alternative method Yolo v3 COCO image: darknet.exe detect cfg/yolov3.cfg yolov3.weights i 0 thresh 0.25 Output coordinates of objects: darknet.exe detector test data/coco.data yolov3.cfg yolov3.weights thresh 0.25 dog.jpg ext_output 194 MB VOC model image: darknet.exe detector test data/voc.data yolo voc.cfg yolo voc.weights i 0 194 MB VOC model video: darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights test.mp4 i 0 194 MB VOC model save result to the file res.avi : darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights test.mp4 i 0 out_filename res.avi Alternative method 194 MB VOC model video: darknet.exe yolo demo yolo voc.cfg yolo voc.weights test.mp4 i 0 43 MB VOC model for video: darknet.exe detector demo data/coco.data cfg/yolov2 tiny.cfg yolov2 tiny.weights test.mp4 i 0 Yolo v3 236 MB COCO for net videocam Smart WebCam: darknet.exe detector demo data/coco.data cfg/yolov3.cfg yolov3.weights i 0 194 MB VOC model for net videocam Smart WebCam: darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights i 0 194 MB VOC model WebCamera 0: darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights c 0 186 MB Yolo9000 image: darknet.exe detector test cfg/combine9k.data yolo9000.cfg yolo9000.weights Remeber to put data/9k.tree and data/coco9k.map under the same folder of your app if you use the cpp api to build an app To process a list of images data/train.txt and save results of detection to result.txt use: darknet.exe detector test data/voc.data yolo voc.cfg yolo voc.weights dont_show ext_output result.txt For using network video camera mjpeg stream with any Android smartphone: 1. Download for Android phone mjpeg stream soft: IP Webcam / Smart WebCam Smart WebCam preferably: IP Webcam: 2. Connect your Android phone to computer by WiFi (through a WiFi router) or USB 3. Start Smart WebCam on your phone 4. Replace the address below, on shown in the phone application (Smart WebCam) and launch: 194 MB COCO model: darknet.exe detector demo data/coco.data yolo.cfg yolo.weights i 0 194 MB VOC model: darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights i 0 How to compile on Linux: Just do make in the darknet directory. Before make, you can set such options in the Makefile : link GPU 1 to build with CUDA to accelerate by using GPU (CUDA should be in /usr/local/cuda ) CUDNN 1 to build with cuDNN v5 v7 to accelerate training by using GPU (cuDNN should be in /usr/local/cudnn ) CUDNN_HALF 1 to build for Tensor Cores (on Titan V / Tesla V100 / DGX 2 and later) speedup Detection 3x, Training 2x OPENCV 1 to build with OpenCV 3.x/2.4.x allows to detect on video files and video streams from network cameras or web cams DEBUG 1 to bould debug version of Yolo OPENMP 1 to build with OpenMP support to accelerate Yolo by using multi core CPU LIBSO 1 to build a library darknet.so and binary runable file uselib that uses this library. Or you can try to run so LD_LIBRARY_PATH ./:$LD_LIBRARY_PATH ./uselib test.mp4 How to use this SO library from your own code you can look at C++ example: or use in such a way: LD_LIBRARY_PATH ./:$LD_LIBRARY_PATH ./uselib data/coco.names cfg/yolov3.cfg yolov3.weights test.mp4 How to compile on Windows: 1. If you have MSVS 2015, CUDA 9.1, cuDNN 7.0 and OpenCV 3.x (with paths: C:\opencv_3.0\opencv\build\include & C:\opencv_3.0\opencv\build\x64\vc14\lib ), then start MSVS, open build\darknet\darknet.sln , set x64 and Release and do the: Build > Build darknet. NOTE: If installing OpenCV, use OpenCV 3.4.0 or earlier. This is a bug in OpenCV 3.4.1 in the C API (see 500 ). 1.1. Find files opencv_world320.dll and opencv_ffmpeg320_64.dll (or opencv_world340.dll and opencv_ffmpeg340_64.dll ) in C:\opencv_3.0\opencv\build\x64\vc14\bin and put it near with darknet.exe 1.2 Check that there are bin and include folders in the C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1 if aren't, then copy them to this folder from the path where is CUDA installed 1.3. To install CUDNN (speedup neural network), do the following: download and install cuDNN 7.0 for CUDA 9.1 : add Windows system variable cudnn with path to CUDNN: 1.4. If you want to build without CUDNN then: open \darknet.sln > (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions, and remove this: CUDNN; 2. If you have other version of CUDA (not 9.1) then open build\darknet\darknet.vcxproj by using Notepad, find 2 places with CUDA 9.1 and change it to your CUDA version, then do step 1 3. If you don't have GPU , but have MSVS 2015 and OpenCV 3.0 (with paths: C:\opencv_3.0\opencv\build\include & C:\opencv_3.0\opencv\build\x64\vc14\lib ), then start MSVS, open build\darknet\darknet_no_gpu.sln , set x64 and Release , and do the: Build > Build darknet_no_gpu 4. If you have OpenCV 2.4.13 instead of 3.0 then you should change pathes after \darknet.sln is opened 4.1 (right click on project) > properties > C/C++ > General > Additional Include Directories: C:\opencv_2.4.13\opencv\build\include 4.2 (right click on project) > properties > Linker > General > Additional Library Directories: C:\opencv_2.4.13\opencv\build\x64\vc14\lib 5. If you have GPU with Tensor Cores (nVidia Titan V / Tesla V100 / DGX 2 and later) speedup Detection 3x, Training 2x: \darknet.sln > (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions, and add here: CUDNN_HALF; Note: CUDA must be installed only after that MSVS2015 had been installed. How to compile (custom): Also, you can to create your own darknet.sln & darknet.vcxproj , this example for CUDA 9.1 and OpenCV 3.0 Then add to your created project: (right click on project) > properties > C/C++ > General > Additional Include Directories, put here: C:\opencv_3.0\opencv\build\include;..\..\3rdparty\include;%(AdditionalIncludeDirectories);$(CudaToolkitIncludeDir);$(cudnn)\include (right click on project) > Build dependecies > Build Customizations > set check on CUDA 9.1 or what version you have for example as here: add to project all .c & .cu files and file from \src (right click on project) > properties > Linker > General > Additional Library Directories, put here: C:\opencv_3.0\opencv\build\x64\vc14\lib;$(CUDA_PATH)lib\$(PlatformName);$(cudnn)\lib\x64;%(AdditionalLibraryDirectories) (right click on project) > properties > Linker > Input > Additional dependecies, put here: ..\..\3rdparty\lib\x64\pthreadVC2.lib;cublas.lib;curand.lib;cudart.lib;cudnn.lib;%(AdditionalDependencies) (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions OPENCV;_TIMESPEC_DEFINED;_CRT_SECURE_NO_WARNINGS;_CRT_RAND_S;WIN32;NDEBUG;_CONSOLE;_LIB;%(PreprocessorDefinitions) compile to .exe (X64 & Release) and put .dll s near with .exe: pthreadVC2.dll, pthreadGC2.dll from \3rdparty\dll\x64 cusolver64_91.dll, curand64_91.dll, cudart64_91.dll, cublas64_91.dll 91 for CUDA 9.1 or your version, from C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1\bin For OpenCV 3.2: opencv_world320.dll and opencv_ffmpeg320_64.dll from C:\opencv_3.0\opencv\build\x64\vc14\bin For OpenCV 2.4.13: opencv_core2413.dll , opencv_highgui2413.dll and opencv_ffmpeg2413_64.dll from C:\opencv_2.4.13\opencv\build\x64\vc14\bin How to train (Pascal VOC Data): 1. Download pre trained weights for the convolutional layers (154 MB): and put to the directory build\darknet\x64 2. Download The Pascal VOC Data and unpack it to directory build\darknet\x64\data\voc will be created dir build\darknet\x64\data\voc\VOCdevkit\ : 2.1 Download file voc_label.py to dir build\darknet\x64\data\voc : 3. Download and install Python for Windows: 4. Run command: python build\darknet\x64\data\voc\voc_label.py (to generate files: 2007_test.txt, 2007_train.txt, 2007_val.txt, 2012_train.txt, 2012_val.txt) 5. Run command: type 2007_train.txt 2007_val.txt 2012_ .txt > train.txt 6. Set batch 64 and subdivisions 8 in the file yolov3 voc.cfg : link 7. Start training by using train_voc.cmd or by using the command line: darknet.exe detector train data/voc.data cfg/yolov3 voc.cfg darknet53.conv.74 ( Note: To disable Loss Window use flag dont_show . If you are using CPU, try darknet_no_gpu.exe instead of darknet.exe .) If required change pathes in the file build\darknet\x64\data\voc.data More information about training by the link: Note: If during training you see nan values for avg (loss) field then training goes wrong, but if nan is in some other lines then training goes well. How to train with multi GPU: 1. Train it first on 1 GPU for like 1000 iterations: darknet.exe detector train data/voc.data cfg/yolov3 voc.cfg darknet53.conv.74 2. Then stop and by using partially trained model /backup/yolov3 voc_1000.weights run training with multigpu (up to 4 GPUs): darknet.exe detector train data/voc.data cfg/yolov3 voc.cfg /backup/yolov3 voc_1000.weights gpus 0,1,2,3 How to train (to detect your custom objects): (to train old Yolo v2 yolov2 voc.cfg , yolov2 tiny voc.cfg , yolo voc.cfg , yolo voc.2.0.cfg , ... click by the link ) Training Yolo v3: 1. Create file yolo obj.cfg with the same content as in yolov3.cfg (or copy yolov3.cfg to yolo obj.cfg) and: change line batch to batch 64 change line subdivisions to subdivisions 8 change line classes 80 to your number of objects in each of 3 yolo layers: change filters 255 to filters (classes + 5)x3 in the 3 convolutional before each yolo layer So if classes 1 then should be filters 18 . If classes 2 then write filters 21 . (Do not write in the cfg file: filters (classes + 5)x3) (Generally filters depends on the classes , coords and number of mask s, i.e. filters (classes + coords + 1) , where mask is indices of anchors. If mask is absence, then filters (classes + coords + 1) num ) So for example, for 2 objects, your file yolo obj.cfg should differ from yolov3.cfg in such lines in each of 3 yolo layers: convolutional filters 21 region classes 2 2. Create file obj.names in the directory build\darknet\x64\data\ , with objects names each in new line 3. Create file obj.data in the directory build\darknet\x64\data\ , containing (where classes number of objects ): classes 2 train data/train.txt valid data/test.txt names data/obj.names backup backup/ 4. Put image files (.jpg) of your objects in the directory build\darknet\x64\data\obj\ 5. You should label each object on images from your dataset. Use this visual GUI software for marking bounded boxes of objects and generating annotation files for Yolo v2 & v3: It will create .txt file for each .jpg image file in the same directory and with the same name, but with .txt extension, and put to file: object number and object coordinates on this image, for each object in new line: Where: integer object number from 0 to (classes 1) float values relative to width and height of image, it can be equal from (0.0 to 1.0 for example: / or / atention: are center of rectangle (are not top left corner) For example for img1.jpg you will be created img1.txt containing: 1 0.716797 0.395833 0.216406 0.147222 0 0.687109 0.379167 0.255469 0.158333 1 0.420312 0.395833 0.140625 0.166667 6. Create file train.txt in directory build\darknet\x64\data\ , with filenames of your images, each filename in new line, with path relative to darknet.exe , for example containing: data/obj/img1.jpg data/obj/img2.jpg data/obj/img3.jpg 7. Download pre trained weights for the convolutional layers (154 MB): and put to the directory build\darknet\x64 8. Start training by using the command line: darknet.exe detector train data/obj.data yolo obj.cfg darknet53.conv.74 (file yolo obj_xxx.weights will be saved to the build\darknet\x64\backup\ for each 100 iterations) (To disable Loss Window use darknet.exe detector train data/obj.data yolo obj.cfg darknet53.conv.74 dont_show , if you train on computer without monitor like a cloud Amazaon EC2) 9. After training is complete get result yolo obj_final.weights from path build\darknet\x64\backup\ After each 100 iterations you can stop and later start training from this point. For example, after 2000 iterations you can stop training, and later just copy yolo obj_2000.weights from build\darknet\x64\backup\ to build\darknet\x64\ and start training using: darknet.exe detector train data/obj.data yolo obj.cfg yolo obj_2000.weights (in the original repository the weights file is saved only once every 10 000 iterations if(iterations > 1000) ) Also you can get result earlier than all 45000 iterations. Note: If during training you see nan values for avg (loss) field then training goes wrong, but if nan is in some other lines then training goes well. Note: If you changed width or height in your cfg file, then new width and height must be divisible by 32. Note: After training use such command for detection: darknet.exe detector test data/obj.data yolo obj.cfg yolo obj_8000.weights Note: if error Out of memory occurs then in .cfg file you should increase subdivisions 16 , 32 or 64: link How to train tiny yolo (to detect your custom objects): Do all the same steps as for the full yolo model as described above. With the exception of: Download default weights file for yolov3 tiny: Get pre trained weights yolov3 tiny.conv.15 using command: darknet.exe partial cfg/yolov3 tiny.cfg yolov3 tiny.weights yolov3 tiny.conv.15 15 Make your custom model yolov3 tiny obj.cfg based on cfg/yolov3 tiny_obj.cfg instead of yolov3.cfg Start training: darknet.exe detector train data/obj.data yolov3 tiny obj.cfg yolov3 tiny.conv.15 For training Yolo based on other models ( DenseNet201 Yolo or ResNet50 Yolo ), you can download and get pre trained weights as showed in this file: If you made you custom model that isn't based on other models, then you can train it without pre trained weights, then will be used random initial weights. When should I stop training: Usually sufficient 2000 iterations for each class(object). But for a more precise definition when you should stop training, use the following manual: 1. During training, you will see varying indicators of error, and you should stop when no longer decreases 0.XXXXXXX avg : > Region Avg IOU: 0.798363, Class: 0.893232, Obj: 0.700808, No Obj: 0.004567, Avg Recall: 1.000000, count: 8 > Region Avg IOU: 0.800677, Class: 0.892181, Obj: 0.701590, No Obj: 0.004574, Avg Recall: 1.000000, count: 8 > > 9002 : 0.211667, 0.060730 avg , 0.001000 rate, 3.868000 seconds, 576128 images > Loaded: 0.000000 seconds 9002 iteration number (number of batch) 0.060730 avg average loss (error) the lower, the better When you see that average loss 0.xxxxxx avg no longer decreases at many iterations then you should stop training. 2. Once training is stopped, you should take some of last .weights files from darknet\build\darknet\x64\backup and choose the best of them: For example, you stopped training after 9000 iterations, but the best result can give one of previous weights (7000, 8000, 9000). It can happen due to overfitting. Overfitting is case when you can detect objects on images from training dataset, but can't detect objects on any others images. You should get weights from Early Stopping Point : ! Overfitting To get weights from Early Stopping Point: 2.1. At first, in your file obj.data you must specify the path to the validation dataset valid valid.txt (format of valid.txt as in train.txt ), and if you haven't validation images, just copy data\train.txt to data\valid.txt . 2.2 If training is stopped after 9000 iterations, to validate some of previous weights use this commands: (If you use another GitHub repository, then use darknet.exe detector recall ... instead of darknet.exe detector map ...) darknet.exe detector map data/obj.data yolo obj.cfg backup\yolo obj_7000.weights darknet.exe detector map data/obj.data yolo obj.cfg backup\yolo obj_8000.weights darknet.exe detector map data/obj.data yolo obj.cfg backup\yolo obj_9000.weights And comapre last output lines for each weights (7000, 8000, 9000): Choose weights file with the highest IoU (intersect of union) and mAP (mean average precision) For example, bigger IOU gives weights yolo obj_8000.weights then use this weights for detection . Example of custom object detection: darknet.exe detector test data/obj.data yolo obj.cfg yolo obj_8000.weights IoU (intersect of union) average instersect of union of objects and detections for a certain threshold 0.24 mAP (mean average precision) mean value of average precisions for each class, where average precision is average value of 11 points on PR curve for each possible threshold (each probability of detection) for the same class (Precision Recall in terms of PascalVOC, where Precision TP/(TP+FP) and Recall TP/(TP+FN) ), page 11: mAP is default metric of precision in the PascalVOC competition, this is the same as AP50 metric in the MS COCO competition. In terms of Wiki, indicators Precision and Recall have a slightly different meaning than in the PascalVOC competition, but IoU always has the same meaning . ! precision_recall_iou How to calculate mAP on PascalVOC 2007: 1. To calculate mAP (mean average precision) on PascalVOC 2007 test: Download PascalVOC dataset, install Python 3.x and get file 2007_test.txt as described here: Then download file to the dir build\darknet\x64\data\ then run voc_label_difficult.py to get the file difficult_2007_test.txt Remove symbol from this line to un comment it: Then there are 2 ways to get mAP: 1. Using Darknet + Python: run the file build/darknet/x64/calc_mAP_voc_py.cmd you will get mAP for yolo voc.cfg model, mAP 75.9% 2. Using this fork of Darknet: run the file build/darknet/x64/calc_mAP.cmd you will get mAP for yolo voc.cfg model, mAP 75.8% (The article specifies the value of mAP 76.8% for YOLOv2 416×416, page 4 table 3: We get values lower perhaps due to the fact that the model was trained on a slightly different source code than the code on which the detection is was done) if you want to get mAP for tiny yolo voc.cfg model, then un comment line for tiny yolo voc.cfg and comment line for yolo voc.cfg in the .cmd file if you have Python 2.x instead of Python 3.x, and if you use Darknet+Python way to get mAP, then in your cmd file use reval_voc.py and voc_eval.py instead of reval_voc_py3.py and voc_eval_py3.py from this directory: Custom object detection: Example of custom object detection: darknet.exe detector test data/obj.data yolo obj.cfg yolo obj_8000.weights ! Yolo_v2_training ! Yolo_v2_training How to improve object detection: 1. Before training: set flag random 1 in your .cfg file it will increase precision by training Yolo for different resolutions: link increase network resolution in your .cfg file ( height 608 , width 608 or any value multiple of 32) it will increase precision recalculate anchors for your dataset for width and height from cfg file: darknet.exe detector calc_anchors data/obj.data num_of_clusters 9 width 416 height 416 then set the same 9 anchors in each of 3 yolo layers in your cfg file check that each object are mandatory labeled in your dataset no one object in your data set should not be without label. In the most training issues there are wrong labels in your dataset (got labels by using some conversion script, marked with a third party tool, ...). Always check your dataset by using: desirable that your training dataset include images with objects at diffrent: scales, rotations, lightings, from different sides, on different backgrounds you should preferably have 2000 different images for each class or more, and you should train 2000 classes iterations or more desirable that your training dataset include images with non labeled objects that you do not want to detect negative samples without bounded box (empty .txt files) use as many images of negative samples as there are images with objects for training with a large number of objects in each image, add the parameter max 200 or higher value in the last layer region in your cfg file for training for small objects set layers 1, 11 instead of and set stride 4 instead of General rule your training dataset should include such a set of relative sizes of objects that you want to detect: train_network_width train_obj_width / train_image_width detection_network_width detection_obj_width / detection_image_width train_network_height train_obj_height / train_image_height detection_network_height detection_obj_height / detection_image_height to speedup training (with decreasing detection accuracy) do Fine Tuning instead of Transfer Learning, set param stopbackward 1 here: 2. After training for detection: Increase network resolution by set in your .cfg file ( height 608 and width 608 ) or ( height 832 and width 832 ) or (any value multiple of 32) this increases the precision and makes it possible to detect small objects: link you do not need to train the network again, just use .weights file already trained for 416x416 resolution if error Out of memory occurs then in .cfg file you should increase subdivisions 16 , 32 or 64: link How to mark bounded boxes of objects and create annotation files: Here you can find repository with GUI software for marking bounded boxes of objects and generating annotation files for Yolo v2 & v3: With example of: train.txt , obj.names , obj.data , yolo obj.cfg , air 1 6 .txt , bird 1 4 .txt for 2 classes of objects (air, bird) and train_obj.cmd with example how to train this image set with Yolo v2 & v3 Using Yolo9000 Simultaneous detection and classification of 9000 objects: yolo9000.weights (186 MB Yolo9000 Model) requires 4 GB GPU RAM: yolo9000.cfg cfg file of the Yolo9000, also there are paths to the 9k.tree and coco9k.map 9k.tree WordTree of 9418 categories , if parent_id 1 then this label hasn't parent: coco9k.map map 80 categories from MSCOCO to WordTree 9k.tree : combine9k.data data file, there are paths to: 9k.labels , 9k.names , inet9k.map , (change path to your combine9k.train.list ): 9k.labels 9418 labels of objects: 9k.names 9418 names of objects: inet9k.map map 200 categories from ImageNet to WordTree 9k.tree : How to use Yolo as DLL 1. To compile Yolo as C++ DLL file yolo_cpp_dll.dll open in MSVS2015 file build\darknet\yolo_cpp_dll.sln , set x64 and Release , and do the: Build > Build yolo_cpp_dll You should have installed CUDA 9.1 To use cuDNN do: (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions, and add at the beginning of line: CUDNN; 2. To use Yolo as DLL file in your C++ console application open in MSVS2015 file build\darknet\yolo_console_dll.sln , set x64 and Release , and do the: Build > Build yolo_console_dll you can run your console application from Windows Explorer build\darknet\x64\yolo_console_dll.exe or you can run from MSVS2015 (before this you should copy 2 files yolo voc.cfg and yolo voc.weights to the directory build\darknet\ ) after launching your console application and entering the image file name you will see info for each object: to use simple OpenCV GUI you should uncomment line // define OPENCV in yolo_console_dll.cpp file: link you can see source code of simple example for detection on the video file: link yolo_cpp_dll.dll API: link class Detector { public: Detector(std::string cfg_filename, std::string weight_filename, int gpu_id 0); Detector(); std::vector detect(std::string image_filename, float thresh 0.2, bool use_mean false); std::vector detect(image_t img, float thresh 0.2, bool use_mean false); static image_t load_image(std::string image_filename); static void free_image(image_t m); ifdef OPENCV std::vector detect(cv::Mat mat, float thresh 0.2, bool use_mean false); endif };",Object Detection,Object Detection 2611,Computer Vision,Computer Vision,Computer Vision,m2det tf(WIP) Tensorflow 2.0 implementation of m2det(WIP) https://arxiv.org/pdf/1811.04533.pdf,Object Detection,Object Detection 2613,Computer Vision,Computer Vision,Computer Vision,"CoupleNet CoupleNet: Coupling Global Structure with Local Parts for Object Detection The Code is modified from py R FCN , please follow the procedure in it to prepare the training data and testing data. Using the default hyperparameters and iterations, you can achieve a mAP around 81.7%. Main results training data test data mAP@0.5 time/img(ms) CoupleNet, ResNet 101 VOC 07+12 VOC 07 test 81.7% 102 CoupleNet, ResNet 101 VOC 07+12 VOC 07 test 82.1% 122 CoupleNet, ResNet 101 VOC 07++12 VOC 12 test 80.4% 122 : without adding context. training data test data mAP@ 0.5:0.95 time/img(ms) CoupleNet, ResNet 101 COCO 2014 trainval COCO test dev 34.4% 122 VOC 0712 model (trained on VOC 07+12, mAP 81.7%) Citing CoupleNet If you find CoupleNet useful in your research, please consider citing: @article{zhu2017couplenet, title {CoupleNet: Coupling Global Structure with Local Parts for Object Detection}, author {Zhu, Yousong and Zhao, Chaoyang and Wang, Jinqiao and Zhao, Xu and Wu, Yi and Lu, Hanqing}, journal {arXiv preprint arXiv:1708.02863}, year {2017} }",Object Detection,Object Detection 2617,Computer Vision,Computer Vision,Computer Vision,"SimpleDet A Simple and Versatile Framework for Object Detection and Instance Recognition Major Features ! (./doc/image/diagram.png) FP16 training for memory saving and up to 2.5X acceleration Highly scalable distributed training available out of box Full coverage of state of the art models including FasterRCNN, MaskRCNN, CascadeRCNN, RetinaNet and TridentNet Extensive feature set including large batch BN , deformable convolution, soft NMS, multi scale train/test Modular design for coding free exploration of new experiment settings Setup Install SimpleDet contains a lot of C++ operators not in MXNet offical repo, so one has to build MXNet from scratch. Please refer to INSTALL.md (./doc/INSTALL.md) more details Preparing Data SimpleDet requires groundtruth annotation organized as following format { gt_class : (nBox, ), gt_bbox : (nBox, 4), flipped : bool, h : int, w : int, image_url : str, im_id : int, this fields are generated on the fly during test rec_id : int, resize_h : int, resize_w : int, ... }, ... Especially, for experimenting on coco datatet, one can organize coco data in data/ coco/ annotations/ instances_train2014.json instances_valminusminival2014.json instances_minival2014.json image_info_test dev2017.json images/ train2014 val2014 test2017 and run the helper script to generate roidb bash python3 utils/generate_roidb.py dataset coco dataset split train2014 python3 utils/generate_roidb.py dataset coco dataset split valminusminival2014 python3 utils/generate_roidb.py dataset coco dataset split minival2014 python3 utils/generate_roidb.py dataset coco dataset split test dev2017 Deploy dependency and compile extension 1. setup mxnext, a wrapper of mxnet symbolic API bash pip3 install 'git+ 2. run make in simpledet directory to install cython extensions Quick Start bash train python3 detection_train.py config config/detection_config.py test python3 detection_test.py config config/detection_config.py Project Design Model Zoo Please refer to MODEL_ZOO.md (./MODEL_ZOO.md) for available models Code Structure detection_train.py detection_test.py config/ detection_config.py core/ detection_input.py detection_metric.py detection_module.py models/ FPN/ tridentnet/ maskrcnn/ cascade_rcnn/ retinanet/ mxnext/ symbol/ builder.py Config Everything is configurable from the config file, all the changes should be out of source . Experiments One experiment is a directory in experiments folder with the same name as the config file. > E.g. r50_fixbn_1x.py is the name of a config file config/ r50_fixbn_1x.py experiments/ r50_fixbn_1x/ checkpoint.params log.txt coco_minival2014_result.json Models The models directory contains SOTA models implemented in SimpletDet. How is Faster RCNN built Simpledet supports many popular detection methods and here we take Faster RCNN as a typical example to show how a detector is built. Preprocessing . The preprocessing methods of the detector is implemented through DetectionAugmentation . Image/bbox related preprocessing, such as Norm2DImage and Resize2DImageBbox . Anchor generator AnchorTarget2D , which generates anchors and corresponding anchor targets for training RPN. Network Structure . The training and testing symbols of Faster RCNN detector is defined in FasterRcnn . The key components are listed as follow: Backbone . Backbone provides interfaces to build backbone networks, e.g. ResNet and ResNext. Neck . Neck provides interfaces to build complementary feature extraction layers for backbone networks, e.g. FPNNeck builds Top down pathway for Feature Pyramid Network . RPN head . RpnHead aims to build classification and regression layers to generate proposal outputs for RPN. Meanwhile, it also provides interplace to generate sampled proposals for the subsequent R CNN. Roi Extractor . RoiExtractor extracts features for each roi (proposal) based on the R CNN features generated by Backbone and Neck . Bounding Box Head . BboxHead builds the R CNN layers for proposal refinement. How to build a custom detector The flexibility of simpledet framework makes it easy to build different detectors. We take TridentNet as an example to demonstrate how to build a custom detector simply based on the Faster RCNN framework. Preprocessing . The additional processing methods could be provided accordingly by inheriting from DetectionAugmentation . In TridentNet, a new TridentAnchorTarget2D is implemented to generate anchors for multiple branches and filter anchors for scale aware training scheme. Network Structure . The new network structure could be constructed easily for a custom detector by modifying some required components as needed and For TridentNet, we build trident blocks in the Backbone according to the descriptions in the paper. We also provide a TridentRpnHead to generate filtered proposals in RPN to implement the scale aware scheme. Other components are shared the same with original Faster RCNN. Distributed Training Please refer to DISTRIBUTED.md (./doc/DISTRIBUTED.md) Contributors Yuntao Chen, Chenxia Han, Yanghao Li, Zehao Huang, Yi Jiang, Naiyan Wang License and Citation This project is release under the Apache 2.0 license for non commercial usage. For commercial usage, please contact us for another license. If you find our project helpful, please consider cite our tech report. @article{chen2019simpledet, title {SimpleDet: A Simple and Versatile Distributed Framework for Object Detection and Instance Recognition}, author {Chen, Yuntao and and Han, Chenxia and Li, Yanghao and Huang, Zehao and Jiang, Yi and Wang, Naiyan and Zhang, Zhaoxiang}, journal {arXiv preprint arXiv:1903.05831}, year {2019} }",Object Detection,Object Detection 2625,Computer Vision,Computer Vision,Computer Vision,"Padam This repository contains our pytorch implementation of Partially Adaptive Momentum Estimation method (Padam) in the paper Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks . Prerequisites: Pytorch CUDA Usage: Use python to run run_cnn_test_cifar10.py for experiments on Cifar10 and run_cnn_test_cifar100.py for experiments on Cifar100 Command Line Arguments: lr: (start) learning rate method: optimization method, e.g., sgdm , adam , amsgrad , padam net: network architecture, e.g. vggnet , resnet , wideresnet partial: partially adaptive parameter for Padam method wd: weight decay Nepoch: number of training epochs resume: whether resume from previous training process Usage Examples: Run experiments on Cifar10: bash python run_cnn_test_cifar10.py lr 0.1 method padam net vggnet partial 0.125 wd 5e 4 Run experiments on Cifar100: bash python run_cnn_test_cifar100.py lr 0.1 method padam net resnet partial 0.125 wd 5e 4",Object Detection,Object Detection 2631,Computer Vision,Computer Vision,Computer Vision,"Introduction This directory contains PyTorch YOLOv3 software and an iOS App developed by Ultralytics LLC, and is freely available for redistribution under the GPL 3.0 license . For more information please visit Description The repo contains inference and training code for YOLOv3 in PyTorch. The code works on Linux, MacOS and Windows. Training is done on the COCO dataset by default: Credit to Joseph Redmon for YOLO: Requirements Python 3.7 or later with the following pip3 install U r requirements.txt packages: numpy torch > 1.0.0 opencv python tqdm Tutorials GCP Quickstart Transfer Learning Train Single Image Train Single Class Train Custom Data Training Start Training: python3 train.py to begin training after downloading COCO data with data/get_coco_dataset.sh . Resume Training: python3 train.py resume to resume training from weights/latest.pt . Each epoch trains on 117,263 images from the train and validate COCO sets, and tests on 5000 images from the COCO validate set. Default training settings produce loss plots below, with training speed of 0.25 s/batch on a V100 GPU (almost 50 COCO epochs/day) . Here we see training results from coco_1img.data , coco_10img.data and coco_100img.data , 3 example files available in the data/ folder, which train and test on the first 1, 10 and 100 images of the coco2014 trainval dataset. from utils import utils; utils.plot_results() ! results Image Augmentation datasets.py applies random OpenCV powered augmentation to the input images in accordance with the following specifications. Augmentation is applied only during training, not during inference. Bounding boxes are automatically tracked and updated with the images. 416 x 416 examples pictured below. Augmentation Description Translation +/ 10% (vertical and horizontal) Rotation +/ 5 degrees Shear +/ 2 degrees (vertical and horizontal) Scale +/ 10% Reflection 50% probability (horizontal only) H S V Saturation +/ 50% HS V Intensity +/ 50% Speed Machine type: n1 standard 8 (8 vCPUs, 30 GB memory) CPU platform: Intel Skylake GPUs: K80 ($0.198/hr), P4 ($0.279/hr), T4 ($0.353/hr), P100 ($0.493/hr), V100 ($0.803/hr) HDD: 100 GB SSD Dataset: COCO train 2014 GPUs batch_size batch time epoch time epoch cost (images) (s/batch) 1 K80 16 1.43s 175min $0.58 1 P4 8 0.51s 125min $0.58 1 T4 16 0.78s 94min $0.55 1 P100 16 0.39s 48min $0.39 2 P100 32 0.48s 29min $0.47 4 P100 64 0.65s 20min $0.65 1 V100 16 0.25s 31min $0.41 2 V100 32 0.29s 18min $0.48 4 V100 64 0.41s 13min $0.70 8 V100 128 0.49s 7min $0.80 Inference Run detect.py to apply trained weights to an image, such as zidane.jpg from the data/samples folder: YOLOv3: python3 detect.py cfg cfg/yolov3.cfg weights weights/yolov3.weights YOLOv3 tiny: python3 detect.py cfg cfg/yolov3 tiny.cfg weights weights/yolov3 tiny.weights YOLOv3 SPP: python3 detect.py cfg cfg/yolov3 spp.cfg weights weights/yolov3 spp.weights Webcam Run detect.py with webcam True to show a live webcam feed. Pretrained Weights Darknet .weights format: PyTorch .pt format: Darknet Conversion bash git clone && cd yolov3 convert darknet cfg/weights to pytorch model python3 c from models import ; convert('cfg/yolov3 spp.cfg', 'weights/yolov3 spp.weights') Success: converted 'weights/yolov3 spp.weights' to 'converted.pt' convert cfg/pytorch model to darknet weights python3 c from models import ; convert('cfg/yolov3 spp.cfg', 'weights/yolov3 spp.pt') Success: converted 'weights/yolov3 spp.pt' to 'converted.weights' mAP Use test.py weights weights/yolov3.weights to test the official YOLOv3 weights. Use test.py weights weights/latest.pt to test the latest training results. Compare to darknet published results ultralytics/yolov3 OR NMS 5:52@416 ( pycocotools ) darknet YOLOv3 320 51.9 (51.4) 51.5 YOLOv3 416 55.0 (54.9) 55.3 YOLOv3 608 57.5 (57.8) 57.9 ultralytics/yolov3 MERGE NMS 7:15@416 ( pycocotools ) darknet YOLOv3 320 52.3 (51.7) 51.5 YOLOv3 416 55.4 (55.3) 55.3 YOLOv3 608 57.9 (58.1) 57.9 ultralytics/yolov3 MERGE+earlier_pred4 8:34@416 ( pycocotools ) darknet YOLOv3 320 52.3 (51.8) 51.5 YOLOv3 416 55.5 (55.4) 55.3 YOLOv3 608 57.9 (58.2) 57.9 > ultralytics/yolov3 darknet YOLOv3 320 51.8 51.5 YOLOv3 416 55.4 55.3 YOLOv3 608 58.2 57.9 YOLOv3 spp 320 52.4 YOLOv3 spp 416 56.5 YOLOv3 spp 608 60.7 60.6 bash git clone bash yolov3/data/get_coco_dataset.sh git clone && cd cocoapi/PythonAPI && make && cd ../.. && cp r cocoapi/PythonAPI/pycocotools yolov3 cd yolov3 python3 test.py save json img size 416 Namespace(batch_size 32, cfg 'cfg/yolov3 spp.cfg', conf_thres 0.001, data_cfg 'data/coco.data', img_size 416, iou_thres 0.5, nms_thres 0.5, save_json True, weights 'weights/yolov3 spp.weights') Using CUDA device0 _CudaDeviceProperties(name 'Tesla V100 SXM2 16GB', total_memory 16130MB) Class Images Targets P R mAP F1 Calculating mAP: 100% █████████████████████████████████████████ 157/157 05:59<00:00, 1.71s/it all 5e+03 3.58e+04 0.109 0.773 0.57 0.186 Average Precision (AP) @ IoU 0.50:0.95 area all maxDets 100 0.335 Average Precision (AP) @ IoU 0.50 area all maxDets 100 0.565 Average Precision (AP) @ IoU 0.75 area all maxDets 100 0.349 Average Precision (AP) @ IoU 0.50:0.95 area small maxDets 100 0.151 Average Precision (AP) @ IoU 0.50:0.95 area medium maxDets 100 0.360 Average Precision (AP) @ IoU 0.50:0.95 area large maxDets 100 0.493 Average Recall (AR) @ IoU 0.50:0.95 area all maxDets 1 0.280 Average Recall (AR) @ IoU 0.50:0.95 area all maxDets 10 0.432 Average Recall (AR) @ IoU 0.50:0.95 area all maxDets 100 0.458 Average Recall (AR) @ IoU 0.50:0.95 area small maxDets 100 0.255 Average Recall (AR) @ IoU 0.50:0.95 area medium maxDets 100 0.494 Average Recall (AR) @ IoU 0.50:0.95 area large maxDets 100 0.620 python3 test.py save json img size 608 batch size 16 Namespace(batch_size 16, cfg 'cfg/yolov3 spp.cfg', conf_thres 0.001, data_cfg 'data/coco.data', img_size 608, iou_thres 0.5, nms_thres 0.5, save_json True, weights 'weights/yolov3 spp.weights') Using CUDA device0 _CudaDeviceProperties(name 'Tesla V100 SXM2 16GB', total_memory 16130MB) Class Images Targets P R mAP F1 Computing mAP: 100% █████████████████████████████████████████ 313/313 06:11<00:00, 1.01it/s all 5e+03 3.58e+04 0.12 0.81 0.611 0.203 Average Precision (AP) @ IoU 0.50:0.95 area all maxDets 100 0.366 Average Precision (AP) @ IoU 0.50 area all maxDets 100 0.607 Average Precision (AP) @ IoU 0.75 area all maxDets 100 0.386 Average Precision (AP) @ IoU 0.50:0.95 area small maxDets 100 0.207 Average Precision (AP) @ IoU 0.50:0.95 area medium maxDets 100 0.391 Average Precision (AP) @ IoU 0.50:0.95 area large maxDets 100 0.485 Average Recall (AR) @ IoU 0.50:0.95 area all maxDets 1 0.296 Average Recall (AR) @ IoU 0.50:0.95 area all maxDets 10 0.464 Average Recall (AR) @ IoU 0.50:0.95 area all maxDets 100 0.494 Average Recall (AR) @ IoU 0.50:0.95 area small maxDets 100 0.331 Average Recall (AR) @ IoU 0.50:0.95 area medium maxDets 100 0.517 Average Recall (AR) @ IoU 0.50:0.95 area large maxDets 100 0.618 Citation DOI Contact Issues should be raised directly in the repository. For additional questions or comments please email Glenn Jocher at glenn.jocher@ultralytics.com or visit us at",Object Detection,Object Detection 2637,Computer Vision,Computer Vision,Computer Vision,"Feature Pyramid Network on caffe This is the unoffical version Feature Pyramid Network for Feature Pyramid Networks for Object Detection results FPN(resnet50) end2end result is implemented without OHEM and train with pascal voc 2007 + 2012 test on 2007 merged rcnn mAP@0.5 aeroplane bicycle bird boat bottle bus car cat chair cow : : : : : : : : : : : : : 0.788 0.8079 0.8036 0.8010 0.7293 0.6743 0.8680 0.8766 0.8967 0.6122 0.8646 diningtable dog horse motorbike person pottedplant sheep sofa train tv : : : : : : : : : : : : 0.7330 0.8855 0.8760 0.8063 0.7999 0.5138 0.7905 0.7755 0.8637 0.7736 shared rcnn mAP@0.5 aeroplane bicycle bird boat bottle bus car cat chair cow : : : : : : : : : : : : : 0.7833 0.8585 0.8001 0.7970 0.7174 0.6522 0.8668 0.8768 0.8929 0.5842 0.8658 diningtable dog horse motorbike person pottedplant sheep sofa train tv : : : : : : : : : : : : 0.7022 0.8891 0.8680 0.7991 0.7944 0.5065 0.7896 0.7707 0.8697 0.7653 framework megred rcnn framework Network overview: link ! (merge_rcnn_framework.png) shared rcnn Network overview: link ! (framework.png) the red and yellow are shared params about the anchor size setting In the paper the anchor setting is Ratios: 0.5,1,2 ,scales : 8, With the setting and P2P6, all anchor sizes are 32,64,128,512,1024 ,but this setting is suit for COCO dataset which has so many small targets. But the voc dataset targets are range 128,256,512 . So, we desgin the anchor setting: Ratios: 0.5,1,2 ,scales : 8,16 , this is very import for voc dataset. usage download voc07,12 dataset ResNet50.caffemodel and rename to ResNet50.v2.caffemodel bash cp ResNet50.v2.caffemodel data/pretrained_model/ OneDrive download: link In my expriments, the codes require 10G GPU memory in training and 6G in testing. your can design the suit image size, mimbatch size and rcnn batch size for your GPUS. compile caffe & lib bash cd caffe fpn mkdir build cd build cmake .. make j16 all cd lib make train & test shared rcnn bash ./experiments/scripts/FP_Net_end2end.sh 1 FPN pascal_voc ./test.sh 1 FPN pascal_voc megred rcnn bash ./experiments/scripts/FP_Net_end2end_merge_rcnn.sh 0 FPN pascal_voc ./test_mergercnn.sh 0 FPN pascal_voc 0 1 is GPU id. TODO List x all tests passed x evaluate object detection performance on voc x evaluate merged rcnn version performance on voc feature pyramid networks for object detection Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2016). Feature pyramid networks for object detection. arXiv preprint arXiv:1612.03144.",Object Detection,Object Detection 2638,Computer Vision,Computer Vision,Computer Vision,Retina Net Focal loss for Dense Object Detection The code is unofficial version for RetinaNet in focal loss for Dense Object Detection. You can use the focal loss in usage install mxnet v0.9.5 1. download the dataset in data/ 2. download the params in ./init.sh train & test python train.py cfg kitti.yaml python test.py cfg kitti.yaml,Object Detection,Object Detection 2639,Computer Vision,Computer Vision,Computer Vision,"focal loss The code is unofficial version for focal loss for Dense Object Detection . this is implementtd using mxnet python layer. The retina net is in usage Assue that you have put the focal_loss.py in your operator path you can use: from your_operators.focal_loss import cls_prob mx.sym.Custom(op_type 'FocalLoss', name 'cls_prob', data cls_score, labels label, alpha 0.25, gamma 2) focal loss with softmax on kitti(10 cls) this is my experiments on kitti 10 cls, the performance on hard cls is great!! method@0.7 car van Truck cyclist pedestrian person_sitting tram misc dontcare : : : : : : : : : : base line(faster rcnn + ohem(1:2)) 0.7892 0.7462 0.8465 0.623 0.4254 0.1374 0.5035 0.5007 0.1329 faster rcnn + focal loss with softmax 0.797 0.874 0.8959 0.7914 0.5700 0.2806 0.7884 0.7052 0.1433 ! image about parameters in this expriment note!! very important!!! in my experiment, i have to use the strategy in paper section 3.3 . LIKE: ! image Uder such an initialization, in the presence of class imbalance, the loss due to the frequent class can dominate total loss and cause instability in early training. AND YOU CAN TRY MY INSTEAD STRATEGY: train the model using the classical softmax for several times (for examples 3 in kitti dataset) choose a litti learning rate: and the traing loss will work well: ! image about alpha now focal loss with softmax work well focal loss value is not used in focal_loss.py, becayse we should forward the cls_pro in this layer, the major task of focal_loss.py is to backward the focal loss gradient. the focal loss vale should be calculated in metric.py and use normalization in it. and this layer is not support use_ignore for example : python class RCNNLogLossMetric(mx.metric.EvalMetric): def __init__(self, cfg): super(RCNNLogLossMetric, self).__init__('RCNNLogLoss') self.e2e cfg.TRAIN.END2END self.ohem cfg.TRAIN.ENABLE_OHEM self.pred, self.label get_rcnn_names(cfg) def update(self, labels, preds): pred preds self.pred.index('rcnn_cls_prob') if self.ohem or self.e2e: label preds self.pred.index('rcnn_label') else: label labels self.label.index('rcnn_label') last_dim pred.shape 1 pred pred.asnumpy().reshape( 1, last_dim) label label.asnumpy().reshape( 1,).astype('int32') filter with keep_inds keep_inds np.where(label ! 1) 0 label label keep_inds cls pred keep_inds, label cls + 1e 14 gamma 2 alpha 0.25 cls_loss alpha ( 1.0 np.power(1 cls, gamma) np.log(cls)) cls_loss np.sum(cls_loss)/len(label) print cls_loss self.sum_metric + cls_loss self.num_inst + label.shape 0 the value must like forward value ! image backward gradient value ! image you can check the gradient value in your debug(if need). By the way this is my derivation about backward, if it has mistake, please note to me. softmax activation: ! image cross entropy with softmax ! image Focal loss with softmax ! image",Object Detection,Object Detection 2646,Computer Vision,Computer Vision,Computer Vision,"PreciseRoIPooling This repo implements the Precise RoI Pooling (PrRoI Pooling), proposed in the paper Acquisition of Localization Confidence for Accurate Object Detection published at ECCV 2018 (Oral Presentation). Acquisition of Localization Confidence for Accurate Object Detection _Borui Jiang , Ruixuan Luo , Jiayuan Mao , Tete Xiao, Yuning Jiang_ ( indicates equal contribution.) Brief In short, Precise RoI Pooling is an integration based (bilinear interpolation) average pooling method for RoI Pooling. It avoids any quantization and has a continuous gradient on bounding box coordinates. It is: different from the original RoI Pooling proposed in Fast R CNN . PrRoI Pooling uses average pooling instead of max pooling for each bin and has a continuous gradient on bounding box coordinates. That is, one can take the derivatives of some loss function w.r.t the coordinates of each RoI and optimize the RoI coordinates. different from the RoI Align proposed in Mask R CNN . PrRoI Pooling uses a full integration based average pooling instead of sampling a constant number of points. This makes the gradient w.r.t. the coordinates continuous. For a better illustration, we illustrate RoI Pooling, RoI Align and PrRoI Pooing in the following figure. More details including the gradient computation can be found in our paper. Implementation PrRoI Pooling was originally implemented by Tete Xiao based on MegBrain, an (internal) deep learning framework built by Megvii Inc. It was later adapted into open source deep learning frameworks. Currently, we only support PyTorch. Unfortunately, we don't have any specific plan for the adaptation into other frameworks such as TensorFlow, but any contributions (pull requests) will be more than welcome. Usage (PyTorch 1.0) In the directory pytorch/ , we provide a PyTorch based implementation of PrRoI Pooling. It requires PyTorch 1.0+ and only supports CUDA (CPU mode is not implemented). Since we use PyTorch JIT for cxx/cuda code compilation, to use the module in your code, simply do: from prroi_pool import PrRoIPool2D avg_pool PrRoIPool2D(window_height, window_width, spatial_scale) roi_features avg_pool(features, rois) for those who want to use the functional from prroi_pool.functional import prroi_pool2d roi_features prroi_pool2d(features, rois, window_height, window_width, spatial_scale) Usage (PyTorch 0.4) !!! Please first checkout to the branch pytorch0.4. In the directory pytorch/ , we provide a PyTorch based implementation of PrRoI Pooling. It requires PyTorch 0.4 and only supports CUDA (CPU mode is not implemented). To use the PrRoI Pooling module, first goto pytorch/prroi_pool and execute ./travis.sh to compile the essential components (you may need nvcc for this step). To use the module in your code, simply do: from prroi_pool import PrRoIPool2D avg_pool PrRoIPool2D(window_height, window_width, spatial_scale) roi_features avg_pool(features, rois) for those who want to use the functional from prroi_pool.functional import prroi_pool2d roi_features prroi_pool2d(features, rois, window_height, window_width, spatial_scale) Here, RoI is an m 5 float tensor of format (batch_index, x0, y0, x1, y1) , following the convention in the original Caffe implementation of RoI Pooling, although in some frameworks the batch indices are provided by an integer tensor. spatial_scale is multiplied to the RoIs. For example, if your feature maps are down sampled by a factor of 16 (w.r.t. the input image), you should use a spatial scale of 1/16 . The coordinates for RoI follows the L, R) convension. That is, (0, 0, 4, 4) denotes a box of size 4x4 .",Object Detection,Object Detection 2654,Computer Vision,Computer Vision,Computer Vision,"Yolo v3 and Yolo v2 for Windows and Linux (neural network for object detection) CircleCI 1. How to use ( how to use) 2. How to compile on Linux ( how to compile on linux) 3. How to compile on Windows ( how to compile on windows) 4. How to train (Pascal VOC Data) ( how to train pascal voc data) 5. How to train (to detect your custom objects) ( how to train to detect your custom objects) 6. When should I stop training ( when should i stop training) 7. How to calculate mAP on PascalVOC 2007 ( how to calculate map on pascalvoc 2007) 8. How to improve object detection ( how to improve object detection) 9. How to mark bounded boxes of objects and create annotation files ( how to mark bounded boxes of objects and create annotation files) 10. Using Yolo9000 ( using yolo9000) 11. How to use Yolo as DLL ( how to use yolo as dll) ! Darknet Logo ! map_fps mAP (AP50) Yolo v3 source chart for the RetinaNet on MS COCO got from Table 1 (e): Yolo v2 on Pascal VOC 2007: Yolo v2 on Pascal VOC 2012 (comp4): You Only Look Once: Unified, Real Time Object Detection (versions 2 & 3) A Yolo cross platform Windows and Linux version (for object detection). Contributtors: This repository is forked from Linux version: More details: This repository supports: both Windows and Linux both OpenCV 2.x.x and OpenCV 7.5 also create SO library on Linux and DLL library on Windows Requires: Linux GCC> 4.9 or Windows MS Visual Studio 2015 (v140) : (or offline ISO image ) CUDA 9.1 : OpenCV 3.4.0 : or OpenCV 2.4.13 : OpenCV allows to show image or video detection in the window and store result to file that specified in command line out_filename res.avi GPU with CC > 3.0 : Pre trained models for different cfg files can be downloaded from (smaller > faster & lower quality): yolov3.cfg (236 MB COCO Yolo v3 ) requires 4 GB GPU RAM: yolov3 tiny.cfg (34 MB COCO Yolo v3 tiny ) requires 1 GB GPU RAM: yolov2.cfg (194 MB COCO Yolo v2) requires 4 GB GPU RAM: yolo voc.cfg (194 MB VOC Yolo v2) requires 4 GB GPU RAM: yolov2 tiny.cfg (43 MB COCO Yolo v2) requires 1 GB GPU RAM: yolov2 tiny voc.cfg (60 MB VOC Yolo v2) requires 1 GB GPU RAM: yolo9000.cfg (186 MB Yolo9000 model) requires 4 GB GPU RAM: Put it near compiled: darknet.exe You can get cfg files by path: darknet/cfg/ Examples of results: Everything Is AWESOME Others: How to use: Example of usage in cmd files from build\darknet\x64\ : darknet_yolo_v3.cmd initialization with 236 MB Yolo v3 COCO model yolov3.weights & yolov3.cfg and show detection on the image: dog.jpg darknet_voc.cmd initialization with 194 MB VOC model yolo voc.weights & yolo voc.cfg and waiting for entering the name of the image file darknet_demo_voc.cmd initialization with 194 MB VOC model yolo voc.weights & yolo voc.cfg and play your video file which you must rename to: test.mp4 darknet_demo_store.cmd initialization with 194 MB VOC model yolo voc.weights & yolo voc.cfg and play your video file which you must rename to: test.mp4, and store result to: res.avi darknet_net_cam_voc.cmd initialization with 194 MB VOC model, play video from network video camera mjpeg stream (also from you phone) darknet_web_cam_voc.cmd initialization with 194 MB VOC model, play video from Web Camera number 0 darknet_coco_9000.cmd initialization with 186 MB Yolo9000 COCO model, and show detection on the image: dog.jpg darknet_coco_9000_demo.cmd initialization with 186 MB Yolo9000 COCO model, and show detection on the video (if it is present): street4k.mp4, and store result to: res.avi How to use on the command line: On Linux use ./darknet instead of darknet.exe , like this: ./darknet detector test ./cfg/coco.data ./cfg/yolov3.cfg ./yolov3.weights Yolo v3 COCO image: darknet.exe detector test data/coco.data cfg/yolov3.cfg yolov3.weights i 0 thresh 0.25 Alternative method Yolo v3 COCO image: darknet.exe detect cfg/yolov3.cfg yolov3.weights i 0 thresh 0.25 Output coordinates of objects: darknet.exe detector test data/coco.data yolov3.cfg yolov3.weights thresh 0.25 dog.jpg ext_output 194 MB VOC model image: darknet.exe detector test data/voc.data yolo voc.cfg yolo voc.weights i 0 194 MB VOC model video: darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights test.mp4 i 0 194 MB VOC model save result to the file res.avi : darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights test.mp4 i 0 out_filename res.avi Alternative method 194 MB VOC model video: darknet.exe yolo demo yolo voc.cfg yolo voc.weights test.mp4 i 0 43 MB VOC model for video: darknet.exe detector demo data/coco.data cfg/yolov2 tiny.cfg yolov2 tiny.weights test.mp4 i 0 Yolo v3 236 MB COCO for net videocam Smart WebCam: darknet.exe detector demo data/coco.data cfg/yolov3.cfg yolov3.weights i 0 194 MB VOC model for net videocam Smart WebCam: darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights i 0 194 MB VOC model WebCamera 0: darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights c 0 186 MB Yolo9000 image: darknet.exe detector test cfg/combine9k.data yolo9000.cfg yolo9000.weights Remeber to put data/9k.tree and data/coco9k.map under the same folder of your app if you use the cpp api to build an app To process a list of images data/train.txt and save results of detection to result.txt use: darknet.exe detector test data/voc.data yolo voc.cfg yolo voc.weights dont_show ext_output result.txt For using network video camera mjpeg stream with any Android smartphone: 1. Download for Android phone mjpeg stream soft: IP Webcam / Smart WebCam Smart WebCam preferably: IP Webcam: 2. Connect your Android phone to computer by WiFi (through a WiFi router) or USB 3. Start Smart WebCam on your phone 4. Replace the address below, on shown in the phone application (Smart WebCam) and launch: 194 MB COCO model: darknet.exe detector demo data/coco.data yolo.cfg yolo.weights i 0 194 MB VOC model: darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights i 0 How to compile on Linux: Just do make in the darknet directory. Before make, you can set such options in the Makefile : link GPU 1 to build with CUDA to accelerate by using GPU (CUDA should be in /usr/local/cuda ) CUDNN 1 to build with cuDNN v5 v7 to accelerate training by using GPU (cuDNN should be in /usr/local/cudnn ) CUDNN_HALF 1 to build for Tensor Cores (on Titan V / Tesla V100 / DGX 2 and later) speedup Detection 3x, Training 2x OPENCV 1 to build with OpenCV 3.x/2.4.x allows to detect on video files and video streams from network cameras or web cams DEBUG 1 to bould debug version of Yolo OPENMP 1 to build with OpenMP support to accelerate Yolo by using multi core CPU LIBSO 1 to build a library darknet.so and binary runable file uselib that uses this library. Or you can try to run so LD_LIBRARY_PATH ./:$LD_LIBRARY_PATH ./uselib test.mp4 How to use this SO library from your own code you can look at C++ example: How to compile on Windows: 1. If you have MSVS 2015, CUDA 9.1, cuDNN 7.0 and OpenCV 3.x (with paths: C:\opencv_3.0\opencv\build\include & C:\opencv_3.0\opencv\build\x64\vc14\lib ), then start MSVS, open build\darknet\darknet.sln , set x64 and Release and do the: Build > Build darknet. NOTE: If installing OpenCV, use OpenCV 3.4.0 or earlier. This is a bug in OpenCV 3.4.1 in the C API (see 500 ). 1.1. Find files opencv_world320.dll and opencv_ffmpeg320_64.dll (or opencv_world340.dll and opencv_ffmpeg340_64.dll ) in C:\opencv_3.0\opencv\build\x64\vc14\bin and put it near with darknet.exe 1.2 Check that there are bin and include folders in the C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1 if aren't, then copy them to this folder from the path where is CUDA installed 1.3. To install CUDNN (speedup neural network), do the following: download and install cuDNN 7.0 for CUDA 9.1 : add Windows system variable cudnn with path to CUDNN: 1.4. If you want to build without CUDNN then: open \darknet.sln > (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions, and remove this: CUDNN; 2. If you have other version of CUDA (not 9.1) then open build\darknet\darknet.vcxproj by using Notepad, find 2 places with CUDA 9.1 and change it to your CUDA version, then do step 1 3. If you don't have GPU , but have MSVS 2015 and OpenCV 3.0 (with paths: C:\opencv_3.0\opencv\build\include & C:\opencv_3.0\opencv\build\x64\vc14\lib ), then start MSVS, open build\darknet\darknet_no_gpu.sln , set x64 and Release , and do the: Build > Build darknet_no_gpu 4. If you have OpenCV 2.4.13 instead of 3.0 then you should change pathes after \darknet.sln is opened 4.1 (right click on project) > properties > C/C++ > General > Additional Include Directories: C:\opencv_2.4.13\opencv\build\include 4.2 (right click on project) > properties > Linker > General > Additional Library Directories: C:\opencv_2.4.13\opencv\build\x64\vc14\lib 5. If you have GPU with Tensor Cores (nVidia Titan V / Tesla V100 / DGX 2 and later) speedup Detection 3x, Training 2x: \darknet.sln > (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions, and add here: CUDNN_HALF; Note: CUDA must be installed only after that MSVS2015 had been installed. How to compile (custom): Also, you can to create your own darknet.sln & darknet.vcxproj , this example for CUDA 9.1 and OpenCV 3.0 Then add to your created project: (right click on project) > properties > C/C++ > General > Additional Include Directories, put here: C:\opencv_3.0\opencv\build\include;..\..\3rdparty\include;%(AdditionalIncludeDirectories);$(CudaToolkitIncludeDir);$(cudnn)\include (right click on project) > Build dependecies > Build Customizations > set check on CUDA 9.1 or what version you have for example as here: add to project all .c & .cu files and file from \src (right click on project) > properties > Linker > General > Additional Library Directories, put here: C:\opencv_3.0\opencv\build\x64\vc14\lib;$(CUDA_PATH)lib\$(PlatformName);$(cudnn)\lib\x64;%(AdditionalLibraryDirectories) (right click on project) > properties > Linker > Input > Additional dependecies, put here: ..\..\3rdparty\lib\x64\pthreadVC2.lib;cublas.lib;curand.lib;cudart.lib;cudnn.lib;%(AdditionalDependencies) (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions OPENCV;_TIMESPEC_DEFINED;_CRT_SECURE_NO_WARNINGS;_CRT_RAND_S;WIN32;NDEBUG;_CONSOLE;_LIB;%(PreprocessorDefinitions) compile to .exe (X64 & Release) and put .dll s near with .exe: pthreadVC2.dll, pthreadGC2.dll from \3rdparty\dll\x64 cusolver64_91.dll, curand64_91.dll, cudart64_91.dll, cublas64_91.dll 91 for CUDA 9.1 or your version, from C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1\bin For OpenCV 3.2: opencv_world320.dll and opencv_ffmpeg320_64.dll from C:\opencv_3.0\opencv\build\x64\vc14\bin For OpenCV 2.4.13: opencv_core2413.dll , opencv_highgui2413.dll and opencv_ffmpeg2413_64.dll from C:\opencv_2.4.13\opencv\build\x64\vc14\bin How to train (Pascal VOC Data): 1. Download pre trained weights for the convolutional layers (154 MB): and put to the directory build\darknet\x64 2. Download The Pascal VOC Data and unpack it to directory build\darknet\x64\data\voc will be created dir build\darknet\x64\data\voc\VOCdevkit\ : 2.1 Download file voc_label.py to dir build\darknet\x64\data\voc : 3. Download and install Python for Windows: 4. Run command: python build\darknet\x64\data\voc\voc_label.py (to generate files: 2007_test.txt, 2007_train.txt, 2007_val.txt, 2012_train.txt, 2012_val.txt) 5. Run command: type 2007_train.txt 2007_val.txt 2012_ .txt > train.txt 6. Set batch 64 and subdivisions 8 in the file yolov3 voc.cfg : link 7. Start training by using train_voc.cmd or by using the command line: darknet.exe detector train data/voc.data cfg/yolov3 voc.cfg darknet53.conv.74 ( Note: To disable Loss Window use flag dont_show . If you are using CPU, try darknet_no_gpu.exe instead of darknet.exe .) If required change pathes in the file build\darknet\x64\data\voc.data More information about training by the link: Note: If during training you see nan values for avg (loss) field then training goes wrong, but if nan is in some other lines then training goes well. How to train with multi GPU: 1. Train it first on 1 GPU for like 1000 iterations: darknet.exe detector train data/voc.data cfg/yolov3 voc.cfg darknet53.conv.74 2. Then stop and by using partially trained model /backup/yolov3 voc_1000.weights run training with multigpu (up to 4 GPUs): darknet.exe detector train data/voc.data cfg/yolov3 voc.cfg /backup/yolov3 voc_1000.weights gpus 0,1,2,3 How to train (to detect your custom objects): (to train old Yolo v2 yolov2 voc.cfg , yolov2 tiny voc.cfg , yolo voc.cfg , yolo voc.2.0.cfg , ... click by the link ) Training Yolo v3: 1. Create file yolo obj.cfg with the same content as in yolov3.cfg (or copy yolov3.cfg to yolo obj.cfg) and: change line batch to batch 64 change line subdivisions to subdivisions 8 change line classes 80 to your number of objects in each of 3 yolo layers: change filters 255 to filters (classes + 5)x3 in the 3 convolutional before each yolo layer So if classes 1 then should be filters 18 . If classes 2 then write filters 21 . (Do not write in the cfg file: filters (classes + 5)x3) (Generally filters depends on the classes , coords and number of mask s, i.e. filters (classes + coords + 1) , where mask is indices of anchors. If mask is absence, then filters (classes + coords + 1) num ) So for example, for 2 objects, your file yolo obj.cfg should differ from yolov3.cfg in such lines in each of 3 yolo layers: convolutional filters 21 region classes 2 2. Create file obj.names in the directory build\darknet\x64\data\ , with objects names each in new line 3. Create file obj.data in the directory build\darknet\x64\data\ , containing (where classes number of objects ): classes 2 train data/train.txt valid data/test.txt names data/obj.names backup backup/ 4. Put image files (.jpg) of your objects in the directory build\darknet\x64\data\obj\ 5. You should label each object on images from your dataset. Use this visual GUI software for marking bounded boxes of objects and generating annotation files for Yolo v2 & v3: It will create .txt file for each .jpg image file in the same directory and with the same name, but with .txt extension, and put to file: object number and object coordinates on this image, for each object in new line: Where: integer number of object from 0 to (classes 1) float values relative to width and height of image, it can be equal from (0.0 to 1.0 for example: / or / atention: are center of rectangle (are not top left corner) For example for img1.jpg you will be created img1.txt containing: 1 0.716797 0.395833 0.216406 0.147222 0 0.687109 0.379167 0.255469 0.158333 1 0.420312 0.395833 0.140625 0.166667 6. Create file train.txt in directory build\darknet\x64\data\ , with filenames of your images, each filename in new line, with path relative to darknet.exe , for example containing: data/obj/img1.jpg data/obj/img2.jpg data/obj/img3.jpg 7. Download pre trained weights for the convolutional layers (154 MB): and put to the directory build\darknet\x64 8. Start training by using the command line: darknet.exe detector train data/obj.data yolo obj.cfg darknet53.conv.74 (file yolo obj_xxx.weights will be saved to the build\darknet\x64\backup\ for each 100 iterations) (To disable Loss Window use darknet.exe detector train data/obj.data yolo obj.cfg darknet53.conv.74 dont_show , if you train on computer without monitor like a cloud Amazaon EC2) 9. After training is complete get result yolo obj_final.weights from path build\darknet\x64\backup\ After each 100 iterations you can stop and later start training from this point. For example, after 2000 iterations you can stop training, and later just copy yolo obj_2000.weights from build\darknet\x64\backup\ to build\darknet\x64\ and start training using: darknet.exe detector train data/obj.data yolo obj.cfg yolo obj_2000.weights (in the original repository the weights file is saved only once every 10 000 iterations if(iterations > 1000) ) Also you can get result earlier than all 45000 iterations. Note: If during training you see nan values for avg (loss) field then training goes wrong, but if nan is in some other lines then training goes well. Note: If you changed width or height in your cfg file, then new width and height must be divisible by 32. Note: After training use such command for detection: darknet.exe detector test data/obj.data yolo obj.cfg yolo obj_8000.weights How to train tiny yolo (to detect your custom objects): Do all the same steps as for the full yolo model as described above. With the exception of: Download default weights file for yolov3 tiny: Get pre trained weights yolov3 tiny.conv.15 using command: darknet.exe partial cfg/yolov3 tiny.cfg yolov3 tiny.weights yolov3 tiny.conv.15 15 Make your custom model yolov3 tiny obj.cfg based on cfg/yolov3 tiny_obj.cfg instead of yolov3.cfg Start training: darknet.exe detector train data/obj.data yolov3 tiny obj.cfg yolov3 tiny.conv.15 For training Yolo based on other models ( DenseNet201 Yolo or ResNet50 Yolo ), you can download and get pre trained weights as showed in this file: If you made you custom model that isn't based on other models, then you can train it without pre trained weights, then will be used random initial weights. When should I stop training: Usually sufficient 2000 iterations for each class(object). But for a more precise definition when you should stop training, use the following manual: 1. During training, you will see varying indicators of error, and you should stop when no longer decreases 0.XXXXXXX avg : > Region Avg IOU: 0.798363, Class: 0.893232, Obj: 0.700808, No Obj: 0.004567, Avg Recall: 1.000000, count: 8 > Region Avg IOU: 0.800677, Class: 0.892181, Obj: 0.701590, No Obj: 0.004574, Avg Recall: 1.000000, count: 8 > > 9002 : 0.211667, 0.060730 avg , 0.001000 rate, 3.868000 seconds, 576128 images > Loaded: 0.000000 seconds 9002 iteration number (number of batch) 0.060730 avg average loss (error) the lower, the better When you see that average loss 0.xxxxxx avg no longer decreases at many iterations then you should stop training. 2. Once training is stopped, you should take some of last .weights files from darknet\build\darknet\x64\backup and choose the best of them: For example, you stopped training after 9000 iterations, but the best result can give one of previous weights (7000, 8000, 9000). It can happen due to overfitting. Overfitting is case when you can detect objects on images from training dataset, but can't detect objects on any others images. You should get weights from Early Stopping Point : ! Overfitting To get weights from Early Stopping Point: 2.1. At first, in your file obj.data you must specify the path to the validation dataset valid valid.txt (format of valid.txt as in train.txt ), and if you haven't validation images, just copy data\train.txt to data\valid.txt . 2.2 If training is stopped after 9000 iterations, to validate some of previous weights use this commands: (If you use another GitHub repository, then use darknet.exe detector recall ... instead of darknet.exe detector map ...) darknet.exe detector map data/obj.data yolo obj.cfg backup\yolo obj_7000.weights darknet.exe detector map data/obj.data yolo obj.cfg backup\yolo obj_8000.weights darknet.exe detector map data/obj.data yolo obj.cfg backup\yolo obj_9000.weights And comapre last output lines for each weights (7000, 8000, 9000): Choose weights file with the highest IoU (intersect of union) and mAP (mean average precision) For example, bigger IOU gives weights yolo obj_8000.weights then use this weights for detection . Example of custom object detection: darknet.exe detector test data/obj.data yolo obj.cfg yolo obj_8000.weights IoU (intersect of union) average instersect of union of objects and detections for a certain threshold 0.24 mAP (mean average precision) mean value of average precisions for each class, where average precision is average value of 11 points on PR curve for each possible threshold (each probability of detection) for the same class (Precision Recall in terms of PascalVOC, where Precision TP/(TP+FP) and Recall TP/(TP+FN) ), page 11: mAP is default metric of precision in the PascalVOC competition, this is the same as AP50 metric in the MS COCO competition. In terms of Wiki, indicators Precision and Recall have a slightly different meaning than in the PascalVOC competition, but IoU always has the same meaning . ! precision_recall_iou How to calculate mAP on PascalVOC 2007: 1. To calculate mAP (mean average precision) on PascalVOC 2007 test: Download PascalVOC dataset, install Python 3.x and get file 2007_test.txt as described here: Then download file to the dir build\darknet\x64\data\ then run voc_label_difficult.py to get the file difficult_2007_test.txt Remove symbol from this line to un comment it: Then there are 2 ways to get mAP: 1. Using Darknet + Python: run the file build/darknet/x64/calc_mAP_voc_py.cmd you will get mAP for yolo voc.cfg model, mAP 75.9% 2. Using this fork of Darknet: run the file build/darknet/x64/calc_mAP.cmd you will get mAP for yolo voc.cfg model, mAP 75.8% (The article specifies the value of mAP 76.8% for YOLOv2 416×416, page 4 table 3: We get values lower perhaps due to the fact that the model was trained on a slightly different source code than the code on which the detection is was done) if you want to get mAP for tiny yolo voc.cfg model, then un comment line for tiny yolo voc.cfg and comment line for yolo voc.cfg in the .cmd file if you have Python 2.x instead of Python 3.x, and if you use Darknet+Python way to get mAP, then in your cmd file use reval_voc.py and voc_eval.py instead of reval_voc_py3.py and voc_eval_py3.py from this directory: Custom object detection: Example of custom object detection: darknet.exe detector test data/obj.data yolo obj.cfg yolo obj_8000.weights ! Yolo_v2_training ! Yolo_v2_training How to improve object detection: 1. Before training: set flag random 1 in your .cfg file it will increase precision by training Yolo for different resolutions: link increase network resolution in your .cfg file ( height 608 , width 608 or any value multiple of 32) it will increase precision recalculate anchors for your dataset for width and height from cfg file: darknet.exe detector calc_anchors data/obj.data num_of_clusters 9 width 416 height 416 then set the same 9 anchors in each of 3 yolo layers in your cfg file check that each object are mandatory labeled in your dataset no one object in your data set should not be without label. In the most training issues there are wrong labels in your dataset (got labels by using some conversion script, marked with a third party tool, ...). Always check your dataset by using: desirable that your training dataset include images with objects at diffrent: scales, rotations, lightings, from different sides, on different backgrounds you should preferably have 2000 images for each class or more desirable that your training dataset include images with non labeled objects that you do not want to detect negative samples without bounded box (empty .txt files) use as many images of negative samples as there are images with objects for training with a large number of objects in each image, add the parameter max 200 or higher value in the last layer region in your cfg file for training for small objects set layers 1, 11 instead of and set stride 4 instead of General rule your training dataset should include such a set of relative sizes of objects that you want to detect: train_network_width train_obj_width / train_image_width detection_network_width detection_obj_width / detection_image_width train_network_height train_obj_height / train_image_height detection_network_height detection_obj_height / detection_image_height to speedup training (with decreasing detection accuracy) do Fine Tuning instead of Transfer Learning, set param stopbackward 1 here: 2. After training for detection: Increase network resolution by set in your .cfg file ( height 608 and width 608 ) or ( height 832 and width 832 ) or (any value multiple of 32) this increases the precision and makes it possible to detect small objects: link you do not need to train the network again, just use .weights file already trained for 416x416 resolution if error Out of memory occurs then in .cfg file you should increase subdivisions 16 , 32 or 64: link How to mark bounded boxes of objects and create annotation files: Here you can find repository with GUI software for marking bounded boxes of objects and generating annotation files for Yolo v2 & v3: With example of: train.txt , obj.names , obj.data , yolo obj.cfg , air 1 6 .txt , bird 1 4 .txt for 2 classes of objects (air, bird) and train_obj.cmd with example how to train this image set with Yolo v2 & v3 Using Yolo9000 Simultaneous detection and classification of 9000 objects: yolo9000.weights (186 MB Yolo9000 Model) requires 4 GB GPU RAM: yolo9000.cfg cfg file of the Yolo9000, also there are paths to the 9k.tree and coco9k.map 9k.tree WordTree of 9418 categories , if parent_id 1 then this label hasn't parent: coco9k.map map 80 categories from MSCOCO to WordTree 9k.tree : combine9k.data data file, there are paths to: 9k.labels , 9k.names , inet9k.map , (change path to your combine9k.train.list ): 9k.labels 9418 labels of objects: 9k.names 9418 names of objects: inet9k.map map 200 categories from ImageNet to WordTree 9k.tree : How to use Yolo as DLL 1. To compile Yolo as C++ DLL file yolo_cpp_dll.dll open in MSVS2015 file build\darknet\yolo_cpp_dll.sln , set x64 and Release , and do the: Build > Build yolo_cpp_dll You should have installed CUDA 9.1 To use cuDNN do: (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions, and add at the beginning of line: CUDNN; 2. To use Yolo as DLL file in your C++ console application open in MSVS2015 file build\darknet\yolo_console_dll.sln , set x64 and Release , and do the: Build > Build yolo_console_dll you can run your console application from Windows Explorer build\darknet\x64\yolo_console_dll.exe or you can run from MSVS2015 (before this you should copy 2 files yolo voc.cfg and yolo voc.weights to the directory build\darknet\ ) after launching your console application and entering the image file name you will see info for each object: to use simple OpenCV GUI you should uncomment line // define OPENCV in yolo_console_dll.cpp file: link you can see source code of simple example for detection on the video file: link yolo_cpp_dll.dll API: link class Detector { public: Detector(std::string cfg_filename, std::string weight_filename, int gpu_id 0); Detector(); std::vector detect(std::string image_filename, float thresh 0.2, bool use_mean false); std::vector detect(image_t img, float thresh 0.2, bool use_mean false); static image_t load_image(std::string image_filename); static void free_image(image_t m); ifdef OPENCV std::vector detect(cv::Mat mat, float thresh 0.2, bool use_mean false); endif };",Object Detection,Object Detection 2657,Computer Vision,Computer Vision,Computer Vision,"This consists of Pytorch implementation of ReSENet along with baseline Resnet and SE Resnet. This has been trained and tested on pytorch version 0.4.1. So please use the same version. Do not use version pytorch version 1.0 There is script file by name script.sh provided for training. The script automatically downloads necessary datasets, please make sure to have internet connection while running this. The script provides automatically creates directories for storing the trained model, so you do not have to create any. You can modify the script to change the architecture, dataset and depth before training. There are 3 architectures available: 1. Baseline Resnet 2. Res SE Net (Our proposed model) 3. SE Resnet Training can be done on 2 datasets: 1. CIFAR 10 2. CIFAR 100 Run this to give executable permission to the script chmod 777 script.sh You can then run the script by typing ./script.sh It is advisable to retain the hyperparameters. Please retain random seed in cifar.py for reproducibility. Citation If you use Res SE Net model or the code please cite my work as: @article{res se net, title {Res SE Net: Boosting Performance of Resnets by Enhancing Bridge connections}, author {Varshaneya V, Balasubramanian S, Darshan Gera}, journal {arXiv preprint arXiv:1902.06066}, year {2019} }",Object Detection,Object Detection 2676,Computer Vision,Computer Vision,Computer Vision,"Implementation of the paper YOLOv2, which can be found here:",Object Detection,Object Detection 2685,Computer Vision,Computer Vision,Computer Vision,"OverFeat OverFeat is a Convolutional Network based image classifier and feature extractor. OverFeat was trained on the ImageNet dataset and participated in the ImageNet 2013 competition. This package allows researchers to use OverFeat to recognize images and extract features. A library with C++ source code is provided for running the OverFeat convolutional network, together with wrappers in various scripting languages (Python, Lua, Matlab coming soon). OverFeat was trained with the Torch7 package ( ). The OverFeat package provides tools to run the network in a standalone fashion. The training code is not distributed at this time. CREDITS, LICENSE, CITATION OverFeat is Copyright NYU 2013. Authors of the present package are Michael Mathieu, Pierre Sermanet, and Yann LeCun. The OverFeat system is by Pierre Sermanet, David Eigen, Xiang Zhang, Michael Mathieu, Rob Fergus, and Yann LeCun. Please refer to the LICENSE file in the same directory as the present file for licensing information. If you use OverFeat in your research, please cite the following paper: OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , Pierre Sermanet, David Eigen, Xiang Zhang, Michael Mathieu, Rob Fergus, Yann LeCun INSTALLATION: Download the archive from Extract the files: tar xvf overfeat vXX.tgz cd overfeat Overfeat uses external weight files. Since these files are large and do not change often, they are not included in the archive. We provide a script to automatically download the weights : ./download_weights.py The weight files should be in the folder data/default in the overfeat directory. Overfeat can run without BLAS, however it would be very slow. We strongly advice you to install openblas on linux (on MacOS, Accelerate should be available without any installation). On Ubuntu/Debian you should compile it (it might take a while, but it is worth it) : sudo apt get install build essential gcc g++ gfortran git libgfortran3 cd /tmp git clone cd OpenBLAS make NO_AFFINITY 1 USE_OPENMP 1 sudo make install For some reason, on 32 bits Ubuntu, libgfortran doesn't create the correct symlink. If you have issues linking with libgfortran, locate where libgfortran is installed (for instance /usr/lib/i386 linux gnu) and create the correct symlink : cd sudo ln sf libgfortran.so.3 libgfortran.so The precompiled binaries use BLAS. If you don't want to (or can't, for some reason) use BLAS, you must recompile overfeat. RUNNING THE PRE COMPILED BINARIES Pre compiled binaries are provided for Ubuntu Linux (32 bits and 64 bits) and Mac OS. The pre requisites are python and imagemagick, which are installed by default on most popular Linux distros. Important note: OverFeat compiled from source on your computer will run faster than the pre compiled binaries. Example of image classification, printing the 6 highest scoring categories: bin/YOUR_OS/overfeat n 6 samples/bee.jpg where YOUR_OS can be either linux_64, linux_32, or macos. Running the webcam demo: bin/YOUR_OS/webcam GPU PRE COMPILED BINARIES (EXPERIMENTAL) We are providing precompiled binaries to run overfeat on GPU. Because the code is not released yet, we do not provide the source for now. The GPU release is experimental and for now only runs on linux 64bits. It requires a Nvidia GPU with CUDA architecture > 2.0 (that covers all recent GPUs from Nvidia). You will need openblas to run the GPU binaries. The binaries are located in bin/linux_64/cuda And work the same way as the CPU versions. You can include the static library the same way as the CPU version. COMPILING FROM SOURCE Install dependencies : python, imagemagick, git, gcc, cmake (pkg config and opencv required for the webcam demo). On Ubuntu/Debian : apt get install g++ git python imagemagick cmake For the webcam demo : apt get install pkg config libopencv dev libopencv highgui dev Here are the instructions to build the OverFeat library and tools: Go to the src folder : cd src Build the tensor library (TH), OverFeat and the command line tools: make all Build the webcam demo (OpenCV required) : make cam On Mac OS, the default gcc doesn't support OpenMP. We strongly recommend to install a gcc version with OpenMP support. With MacPort : sudo port install gcc48 Which will provide g++ mp 48 . If you don't install this version, you will have to change the two corresponding lines in the Makefile. UPDATING A git repository is provided with the archive. You can update by typing git pull from the overfeat directory. HIGH LEVEL INTERFACE: The feature extractor requires a weight file, containing the weights of the network. We provide a weight file located in data/default/net_weight . The software we provide should be able to locate it automatically. In case it doesn't, the option d can be used to manually provide a path. Overfeat can use two sizes of network. By default, it uses the smaller one. For more accuracy, the option l can be used to use a larger, but slower, network. CLASSIFICATION: In order to get the top (by default, 5) classes from a number of images : bin/linux_64/overfeat n d l path_to_image1 path_to_image2 path_to_image3 ... To use overfeat online (feeding an image stream), feed its stdin stream with a sequence of ppm images (ended by end of file ('\0') character). In this case, please use option p. For instance : convert image1.jpg image2.jpg resize 231x231 ppm: ./overfeat n d l p Please note that to get the classes from an image, the image size should be 231x231. The image will be cropped if one dimension is larger than 231, and the network won't be able to work if both dimension are larger. For feature extraction without classification, it can be any size greater or equal to 231x231 for the small network, and 221x221 for the large network . FEATURE EXTRACTION: In order to extract the features instead of classifying, use f option. For instance : bin/linux_64/overfeat d l f image1.png image2.jpg It is compatible with option p. The option L (overrides f) can be used to return the output of any layer. For instance bin/linux_64/overfeat d l L 12 image1.png returns the output of layer 12. The option f corresponds to layer 19 for the small layer and 22 for the large one. It writes the features on stdout as a sequence. Each feature starts with three integers separated by spaces, the first is the number of features (n), the second is the number of rows (h) and the last is the number of columns (w). It is followed by a end of line ('\n') character. Then follows n h w floating point numbers (written in ascii) separated by spaces. The feature is the first dimension (so that to obtain the next feature, you must add w h to your index), followed by the row (to obtain the next row, add w to your index). That means that if you want the features corresponding to the top left window, you need to read pixels i h w for i 0..4095 . The output is going to be a 3D tensor. The first dimension correspond to the features, while dimensions 2 and 3 are spatial (y and x respectively). The spatial dimension is reduced at each layer, and with the default network, using option f, the output has size nFeatures h w where for the small network, nFeatures 4096 h ((H 11)/4 + 1)/8 6 w ((W 11)/4 + 1)/8 6 for the large network, nFeatures 4096 h ((H 7)/2 + 1)/18 5 w ((W 7)/2 + 1)/18 5 if the input has size 3 H W . Each pixel in the feature map corresponds to a localized window in the input. With the small network, the windows are 231x231 pixels, overlapping so that the i th window begins at pixel 32 i, while for the large network, the windows are 221x221, and the i th window begins at pixel 36 i. WEBCAM: We provide a live classifier based on the webcam. It reads images from the webcam, and displays the most likely classes along with the probabilities. It can be run with bin/linux_64/webcam d l w BATCH: We also provide an easy way to process a whole folder : ./bin/linux_64/overfeat_batch d l i o It process each image in the input folder and produces a corresponding file in the output directory, containing the features,in the same format as before. EXAMPLES: Classify image samples/bee.jpg, getting the 3 most likely classes : bin/linux_64/overfeat n 3 samples/bee.jpg Extract features from samples/pliers.jpg with the large network : bin/linux_64/overfeat f l samples/pliers.jpg Extract the features from all files in samples : ./bin/linux_64/overfeat_batch i samples o samples_features Run the webcam demo with the large network : bin/linux_64/webcam l ADVANCED: The true program is actually overfeatcmd, where overfeat is only a python script calling overfeatcmd. overfeatcmd is not designed to be used by itself, but can be if necessary. It taked three arguments : bin/linux_64/overfeatcmd If is positive, it is, as before, the number of top classes to display. If is nonpositive, the features are going to be the output. The option specifies from which layer the features are obtained (by default, 16, corresponding to the last layer before the classifier). corresponds to the size of the network : 0 for small, 1 for large. APIs: C++: The library is written in C++. It consists of one static library named liboverfeat.a . The corresponding header is overfeat.hpp . It uses the low level torch tensor library (TH). Sample code can be found in overfeatcmd.cpp and webcam.cpp. The library provides several functions in the namespace overfeat : void init(const std::string & weight_file_path, int net_idx) : This function must be called once before using the feature extractor. It reads the weights and must be passed a path to the weight files. It must also be passed the size of the network (net_idx), which should be 0, or 1, respectively for small or large networks. Note that the weight file must correspond to the size of the network. void free() : This function releases the ressources and should be called when the feature extractor is no longer used. THTensor fprop(THTensor input) : This is the main function. It takes an image stored in a THTensor and runs the network on it. It returns a pointer to a THTensor containing the output of the classifier. If the input is 3 H W, the output is going to be nClasses h w, where for the small network : nClasses 1000 h ((H 11)/4 + 1)/8 6 w ((W 11)/4 + 1)/8 6 for the large network : nClasses 1000 h ((H 7)/2 + 1)/18 5 w ((W 7)/2 + 1)/18 5 Each pixel of the output corresponds to a 231x231 window on the input for the small network, and 221x221 for the large network. The windows overlap in the same way as described earlier for the feature extraction. Each class gets a score, but they are not probabilities (they are not normalized). THTensor get_output(int i) : Once fprop has been computed, this function returns the output of any layer. For instance, in the default network, layer 16 corresponds to the final features before the classifier. int get_n_layers() : Returns the total number of layers of the network. void soft_max(THTensor input, THTensor output) : This function converts the output to probabilities. It only works if h w 1 (only one output pixel). std::string get_class_name(int i) : This function returns the string corresponding to the i th class. std::vector > get_top_classes(THTensor probas, int n) : Given a vector with nClasses elements containing scores or probabilities, this function returns the names of the top n classes, along with their score/probabilities. When compiling code using liboverfeat.a, the code must also be linked against libTH.a, the tensor library. The file libTH.a will have been produced when compiling torch. Torch7: We have bindings for torch, in the directory API/torch. The file API/torch/README contains more details. Python: The bindings for python are in API/python. See API/python/README .",Object Detection,Object Detection 2691,Computer Vision,Computer Vision,Computer Vision,"Yolo v3 and Yolo v2 for Windows and Linux (neural network for object detection) CircleCI 1. How to use ( how to use) 2. How to compile on Linux ( how to compile on linux) 3. How to compile on Windows ( how to compile on windows) 4. How to train (Pascal VOC Data) ( how to train pascal voc data) 5. How to train (to detect your custom objects) ( how to train to detect your custom objects) 6. When should I stop training ( when should i stop training) 7. How to calculate mAP on PascalVOC 2007 ( how to calculate map on pascalvoc 2007) 8. How to improve object detection ( how to improve object detection) 9. How to mark bounded boxes of objects and create annotation files ( how to mark bounded boxes of objects and create annotation files) 10. Using Yolo9000 ( using yolo9000) 11. How to use Yolo as DLL ( how to use yolo as dll) ! Darknet Logo ! map_fps mAP (AP50) Yolo v3 source chart for the RetinaNet on MS COCO got from Table 1 (e): Yolo v2 on Pascal VOC 2007: Yolo v2 on Pascal VOC 2012 (comp4): You Only Look Once: Unified, Real Time Object Detection (versions 2 & 3) A Yolo cross platform Windows and Linux version (for object detection). Contributtors: This repository is forked from Linux version: More details: This repository supports: both Windows and Linux both OpenCV 2.x.x and OpenCV 7.5 also create SO library on Linux and DLL library on Windows Requires: Linux GCC> 4.9 or Windows MS Visual Studio 2015 (v140) : (or offline ISO image ) CUDA 9.1 : OpenCV 3.4.0 : or OpenCV 2.4.13 : OpenCV allows to show image or video detection in the window and store result to file that specified in command line out_filename res.avi GPU with CC > 3.0 : Pre trained models for different cfg files can be downloaded from (smaller > faster & lower quality): yolov3.cfg (236 MB COCO Yolo v3 ) requires 4 GB GPU RAM: yolov3 tiny.cfg (34 MB COCO Yolo v3 tiny ) requires 1 GB GPU RAM: yolov2.cfg (194 MB COCO Yolo v2) requires 4 GB GPU RAM: yolo voc.cfg (194 MB VOC Yolo v2) requires 4 GB GPU RAM: yolov2 tiny.cfg (43 MB COCO Yolo v2) requires 1 GB GPU RAM: yolov2 tiny voc.cfg (60 MB VOC Yolo v2) requires 1 GB GPU RAM: yolo9000.cfg (186 MB Yolo9000 model) requires 4 GB GPU RAM: Put it near compiled: darknet.exe You can get cfg files by path: darknet/cfg/ Examples of results: Everything Is AWESOME Others: How to use: Example of usage in cmd files from build\darknet\x64\ : darknet_yolo_v3.cmd initialization with 236 MB Yolo v3 COCO model yolov3.weights & yolov3.cfg and show detection on the image: dog.jpg darknet_voc.cmd initialization with 194 MB VOC model yolo voc.weights & yolo voc.cfg and waiting for entering the name of the image file darknet_demo_voc.cmd initialization with 194 MB VOC model yolo voc.weights & yolo voc.cfg and play your video file which you must rename to: test.mp4 darknet_demo_store.cmd initialization with 194 MB VOC model yolo voc.weights & yolo voc.cfg and play your video file which you must rename to: test.mp4, and store result to: res.avi darknet_net_cam_voc.cmd initialization with 194 MB VOC model, play video from network video camera mjpeg stream (also from you phone) darknet_web_cam_voc.cmd initialization with 194 MB VOC model, play video from Web Camera number 0 darknet_coco_9000.cmd initialization with 186 MB Yolo9000 COCO model, and show detection on the image: dog.jpg darknet_coco_9000_demo.cmd initialization with 186 MB Yolo9000 COCO model, and show detection on the video (if it is present): street4k.mp4, and store result to: res.avi How to use on the command line: On Linux use ./darknet instead of darknet.exe , like this: ./darknet detector test ./cfg/coco.data ./cfg/yolov3.cfg ./yolov3.weights Yolo v3 COCO image: darknet.exe detector test data/coco.data cfg/yolov3.cfg yolov3.weights i 0 thresh 0.25 Alternative method Yolo v3 COCO image: darknet.exe detect cfg/yolov3.cfg yolov3.weights i 0 thresh 0.25 Output coordinates of objects: darknet.exe detector test data/coco.data yolov3.cfg yolov3.weights thresh 0.25 dog.jpg ext_output 194 MB VOC model image: darknet.exe detector test data/voc.data yolo voc.cfg yolo voc.weights i 0 194 MB VOC model video: darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights test.mp4 i 0 194 MB VOC model save result to the file res.avi : darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights test.mp4 i 0 out_filename res.avi Alternative method 194 MB VOC model video: darknet.exe yolo demo yolo voc.cfg yolo voc.weights test.mp4 i 0 43 MB VOC model for video: darknet.exe detector demo data/coco.data cfg/yolov2 tiny.cfg yolov2 tiny.weights test.mp4 i 0 Yolo v3 236 MB COCO for net videocam Smart WebCam: darknet.exe detector demo data/coco.data cfg/yolov3.cfg yolov3.weights i 0 194 MB VOC model for net videocam Smart WebCam: darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights i 0 194 MB VOC model WebCamera 0: darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights c 0 186 MB Yolo9000 image: darknet.exe detector test cfg/combine9k.data yolo9000.cfg yolo9000.weights Remeber to put data/9k.tree and data/coco9k.map under the same folder of your app if you use the cpp api to build an app To process a list of images data/train.txt and save results of detection to result.txt use: darknet.exe detector test data/voc.data yolo voc.cfg yolo voc.weights dont_show ext_output result.txt For using network video camera mjpeg stream with any Android smartphone: 1. Download for Android phone mjpeg stream soft: IP Webcam / Smart WebCam Smart WebCam preferably: IP Webcam: 2. Connect your Android phone to computer by WiFi (through a WiFi router) or USB 3. Start Smart WebCam on your phone 4. Replace the address below, on shown in the phone application (Smart WebCam) and launch: 194 MB COCO model: darknet.exe detector demo data/coco.data yolo.cfg yolo.weights i 0 194 MB VOC model: darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights i 0 How to compile on Linux: Just do make in the darknet directory. Before make, you can set such options in the Makefile : link GPU 1 to build with CUDA to accelerate by using GPU (CUDA should be in /usr/local/cuda ) CUDNN 1 to build with cuDNN v5 v7 to accelerate training by using GPU (cuDNN should be in /usr/local/cudnn ) CUDNN_HALF 1 to build for Tensor Cores (on Titan V / Tesla V100 / DGX 2 and later) speedup Detection 3x, Training 2x OPENCV 1 to build with OpenCV 3.x/2.4.x allows to detect on video files and video streams from network cameras or web cams DEBUG 1 to bould debug version of Yolo OPENMP 1 to build with OpenMP support to accelerate Yolo by using multi core CPU LIBSO 1 to build a library darknet.so and binary runable file uselib that uses this library. Or you can try to run so LD_LIBRARY_PATH ./:$LD_LIBRARY_PATH ./uselib test.mp4 How to use this SO library from your own code you can look at C++ example: How to compile on Windows: 1. If you have MSVS 2015, CUDA 9.1, cuDNN 7.0 and OpenCV 3.x (with paths: C:\opencv_3.0\opencv\build\include & C:\opencv_3.0\opencv\build\x64\vc14\lib ), then start MSVS, open build\darknet\darknet.sln , set x64 and Release and do the: Build > Build darknet. NOTE: If installing OpenCV, use OpenCV 3.4.0 or earlier. This is a bug in OpenCV 3.4.1 in the C API (see 500 ). 1.1. Find files opencv_world320.dll and opencv_ffmpeg320_64.dll (or opencv_world340.dll and opencv_ffmpeg340_64.dll ) in C:\opencv_3.0\opencv\build\x64\vc14\bin and put it near with darknet.exe 1.2 Check that there are bin and include folders in the C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1 if aren't, then copy them to this folder from the path where is CUDA installed 1.3. To install CUDNN (speedup neural network), do the following: download and install cuDNN 7.0 for CUDA 9.1 : add Windows system variable cudnn with path to CUDNN: 1.4. If you want to build without CUDNN then: open \darknet.sln > (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions, and remove this: CUDNN; 2. If you have other version of CUDA (not 9.1) then open build\darknet\darknet.vcxproj by using Notepad, find 2 places with CUDA 9.1 and change it to your CUDA version, then do step 1 3. If you don't have GPU , but have MSVS 2015 and OpenCV 3.0 (with paths: C:\opencv_3.0\opencv\build\include & C:\opencv_3.0\opencv\build\x64\vc14\lib ), then start MSVS, open build\darknet\darknet_no_gpu.sln , set x64 and Release , and do the: Build > Build darknet_no_gpu 4. If you have OpenCV 2.4.13 instead of 3.0 then you should change pathes after \darknet.sln is opened 4.1 (right click on project) > properties > C/C++ > General > Additional Include Directories: C:\opencv_2.4.13\opencv\build\include 4.2 (right click on project) > properties > Linker > General > Additional Library Directories: C:\opencv_2.4.13\opencv\build\x64\vc14\lib 5. If you have GPU with Tensor Cores (nVidia Titan V / Tesla V100 / DGX 2 and later) speedup Detection 3x, Training 2x: \darknet.sln > (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions, and add here: CUDNN_HALF; Note: CUDA must be installed only after that MSVS2015 had been installed. How to compile (custom): Also, you can to create your own darknet.sln & darknet.vcxproj , this example for CUDA 9.1 and OpenCV 3.0 Then add to your created project: (right click on project) > properties > C/C++ > General > Additional Include Directories, put here: C:\opencv_3.0\opencv\build\include;..\..\3rdparty\include;%(AdditionalIncludeDirectories);$(CudaToolkitIncludeDir);$(cudnn)\include (right click on project) > Build dependecies > Build Customizations > set check on CUDA 9.1 or what version you have for example as here: add to project all .c & .cu files and file from \src (right click on project) > properties > Linker > General > Additional Library Directories, put here: C:\opencv_3.0\opencv\build\x64\vc14\lib;$(CUDA_PATH)lib\$(PlatformName);$(cudnn)\lib\x64;%(AdditionalLibraryDirectories) (right click on project) > properties > Linker > Input > Additional dependecies, put here: ..\..\3rdparty\lib\x64\pthreadVC2.lib;cublas.lib;curand.lib;cudart.lib;cudnn.lib;%(AdditionalDependencies) (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions OPENCV;_TIMESPEC_DEFINED;_CRT_SECURE_NO_WARNINGS;_CRT_RAND_S;WIN32;NDEBUG;_CONSOLE;_LIB;%(PreprocessorDefinitions) compile to .exe (X64 & Release) and put .dll s near with .exe: pthreadVC2.dll, pthreadGC2.dll from \3rdparty\dll\x64 cusolver64_91.dll, curand64_91.dll, cudart64_91.dll, cublas64_91.dll 91 for CUDA 9.1 or your version, from C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1\bin For OpenCV 3.2: opencv_world320.dll and opencv_ffmpeg320_64.dll from C:\opencv_3.0\opencv\build\x64\vc14\bin For OpenCV 2.4.13: opencv_core2413.dll , opencv_highgui2413.dll and opencv_ffmpeg2413_64.dll from C:\opencv_2.4.13\opencv\build\x64\vc14\bin How to train (Pascal VOC Data): 1. Download pre trained weights for the convolutional layers (154 MB): and put to the directory build\darknet\x64 2. Download The Pascal VOC Data and unpack it to directory build\darknet\x64\data\voc will be created dir build\darknet\x64\data\voc\VOCdevkit\ : 2.1 Download file voc_label.py to dir build\darknet\x64\data\voc : 3. Download and install Python for Windows: 4. Run command: python build\darknet\x64\data\voc\voc_label.py (to generate files: 2007_test.txt, 2007_train.txt, 2007_val.txt, 2012_train.txt, 2012_val.txt) 5. Run command: type 2007_train.txt 2007_val.txt 2012_ .txt > train.txt 6. Set batch 64 and subdivisions 8 in the file yolov3 voc.cfg : link 7. Start training by using train_voc.cmd or by using the command line: darknet.exe detector train data/voc.data cfg/yolov3 voc.cfg darknet53.conv.74 ( Note: To disable Loss Window use flag dont_show . If you are using CPU, try darknet_no_gpu.exe instead of darknet.exe .) If required change pathes in the file build\darknet\x64\data\voc.data More information about training by the link: Note: If during training you see nan values for avg (loss) field then training goes wrong, but if nan is in some other lines then training goes well. How to train with multi GPU: 1. Train it first on 1 GPU for like 1000 iterations: darknet.exe detector train data/voc.data cfg/yolov3 voc.cfg darknet53.conv.74 2. Then stop and by using partially trained model /backup/yolov3 voc_1000.weights run training with multigpu (up to 4 GPUs): darknet.exe detector train data/voc.data cfg/yolov3 voc.cfg /backup/yolov3 voc_1000.weights gpus 0,1,2,3 How to train (to detect your custom objects): (to train old Yolo v2 yolov2 voc.cfg , yolov2 tiny voc.cfg , yolo voc.cfg , yolo voc.2.0.cfg , ... click by the link ) Training Yolo v3: 1. Create file yolo obj.cfg with the same content as in yolov3.cfg (or copy yolov3.cfg to yolo obj.cfg) and: change line batch to batch 64 change line subdivisions to subdivisions 8 change line classes 80 to your number of objects in each of 3 yolo layers: change filters 255 to filters (classes + 5)x3 in the 3 convolutional before each yolo layer So if classes 1 then should be filters 18 . If classes 2 then write filters 21 . (Do not write in the cfg file: filters (classes + 5)x3) (Generally filters depends on the classes , coords and number of mask s, i.e. filters (classes + coords + 1) , where mask is indices of anchors. If mask is absence, then filters (classes + coords + 1) num ) So for example, for 2 objects, your file yolo obj.cfg should differ from yolov3.cfg in such lines in each of 3 yolo layers: convolutional filters 21 region classes 2 2. Create file obj.names in the directory build\darknet\x64\data\ , with objects names each in new line 3. Create file obj.data in the directory build\darknet\x64\data\ , containing (where classes number of objects ): classes 2 train data/train.txt valid data/test.txt names data/obj.names backup backup/ 4. Put image files (.jpg) of your objects in the directory build\darknet\x64\data\obj\ 5. You should label each object on images from your dataset. Use this visual GUI software for marking bounded boxes of objects and generating annotation files for Yolo v2 & v3: It will create .txt file for each .jpg image file in the same directory and with the same name, but with .txt extension, and put to file: object number and object coordinates on this image, for each object in new line: Where: integer number of object from 0 to (classes 1) float values relative to width and height of image, it can be equal from (0.0 to 1.0 for example: / or / atention: are center of rectangle (are not top left corner) For example for img1.jpg you will be created img1.txt containing: 1 0.716797 0.395833 0.216406 0.147222 0 0.687109 0.379167 0.255469 0.158333 1 0.420312 0.395833 0.140625 0.166667 6. Create file train.txt in directory build\darknet\x64\data\ , with filenames of your images, each filename in new line, with path relative to darknet.exe , for example containing: data/obj/img1.jpg data/obj/img2.jpg data/obj/img3.jpg 7. Download pre trained weights for the convolutional layers (154 MB): and put to the directory build\darknet\x64 8. Start training by using the command line: darknet.exe detector train data/obj.data yolo obj.cfg darknet53.conv.74 (file yolo obj_xxx.weights will be saved to the build\darknet\x64\backup\ for each 100 iterations) (To disable Loss Window use darknet.exe detector train data/obj.data yolo obj.cfg darknet53.conv.74 dont_show , if you train on computer without monitor like a cloud Amazaon EC2) 9. After training is complete get result yolo obj_final.weights from path build\darknet\x64\backup\ After each 100 iterations you can stop and later start training from this point. For example, after 2000 iterations you can stop training, and later just copy yolo obj_2000.weights from build\darknet\x64\backup\ to build\darknet\x64\ and start training using: darknet.exe detector train data/obj.data yolo obj.cfg yolo obj_2000.weights (in the original repository the weights file is saved only once every 10 000 iterations if(iterations > 1000) ) Also you can get result earlier than all 45000 iterations. Note: If during training you see nan values for avg (loss) field then training goes wrong, but if nan is in some other lines then training goes well. Note: If you changed width or height in your cfg file, then new width and height must be divisible by 32. Note: After training use such command for detection: darknet.exe detector test data/obj.data yolo obj.cfg yolo obj_8000.weights How to train tiny yolo (to detect your custom objects): Do all the same steps as for the full yolo model as described above. With the exception of: Download default weights file for yolov3 tiny: Get pre trained weights yolov3 tiny.conv.15 using command: darknet.exe partial cfg/yolov3 tiny.cfg yolov3 tiny.weights yolov3 tiny.conv.15 15 Make your custom model yolov3 tiny obj.cfg based on cfg/yolov3 tiny_obj.cfg instead of yolov3.cfg Start training: darknet.exe detector train data/obj.data yolov3 tiny obj.cfg yolov3 tiny.conv.15 For training Yolo based on other models ( DenseNet201 Yolo or ResNet50 Yolo ), you can download and get pre trained weights as showed in this file: If you made you custom model that isn't based on other models, then you can train it without pre trained weights, then will be used random initial weights. When should I stop training: Usually sufficient 2000 iterations for each class(object). But for a more precise definition when you should stop training, use the following manual: 1. During training, you will see varying indicators of error, and you should stop when no longer decreases 0.XXXXXXX avg : > Region Avg IOU: 0.798363, Class: 0.893232, Obj: 0.700808, No Obj: 0.004567, Avg Recall: 1.000000, count: 8 > Region Avg IOU: 0.800677, Class: 0.892181, Obj: 0.701590, No Obj: 0.004574, Avg Recall: 1.000000, count: 8 > > 9002 : 0.211667, 0.060730 avg , 0.001000 rate, 3.868000 seconds, 576128 images > Loaded: 0.000000 seconds 9002 iteration number (number of batch) 0.060730 avg average loss (error) the lower, the better When you see that average loss 0.xxxxxx avg no longer decreases at many iterations then you should stop training. 2. Once training is stopped, you should take some of last .weights files from darknet\build\darknet\x64\backup and choose the best of them: For example, you stopped training after 9000 iterations, but the best result can give one of previous weights (7000, 8000, 9000). It can happen due to overfitting. Overfitting is case when you can detect objects on images from training dataset, but can't detect objects on any others images. You should get weights from Early Stopping Point : ! Overfitting To get weights from Early Stopping Point: 2.1. At first, in your file obj.data you must specify the path to the validation dataset valid valid.txt (format of valid.txt as in train.txt ), and if you haven't validation images, just copy data\train.txt to data\valid.txt . 2.2 If training is stopped after 9000 iterations, to validate some of previous weights use this commands: (If you use another GitHub repository, then use darknet.exe detector recall ... instead of darknet.exe detector map ...) darknet.exe detector map data/obj.data yolo obj.cfg backup\yolo obj_7000.weights darknet.exe detector map data/obj.data yolo obj.cfg backup\yolo obj_8000.weights darknet.exe detector map data/obj.data yolo obj.cfg backup\yolo obj_9000.weights And comapre last output lines for each weights (7000, 8000, 9000): Choose weights file with the highest IoU (intersect of union) and mAP (mean average precision) For example, bigger IOU gives weights yolo obj_8000.weights then use this weights for detection . Example of custom object detection: darknet.exe detector test data/obj.data yolo obj.cfg yolo obj_8000.weights IoU (intersect of union) average instersect of union of objects and detections for a certain threshold 0.24 mAP (mean average precision) mean value of average precisions for each class, where average precision is average value of 11 points on PR curve for each possible threshold (each probability of detection) for the same class (Precision Recall in terms of PascalVOC, where Precision TP/(TP+FP) and Recall TP/(TP+FN) ), page 11: mAP is default metric of precision in the PascalVOC competition, this is the same as AP50 metric in the MS COCO competition. In terms of Wiki, indicators Precision and Recall have a slightly different meaning than in the PascalVOC competition, but IoU always has the same meaning . ! precision_recall_iou How to calculate mAP on PascalVOC 2007: 1. To calculate mAP (mean average precision) on PascalVOC 2007 test: Download PascalVOC dataset, install Python 3.x and get file 2007_test.txt as described here: Then download file to the dir build\darknet\x64\data\ then run voc_label_difficult.py to get the file difficult_2007_test.txt Remove symbol from this line to un comment it: Then there are 2 ways to get mAP: 1. Using Darknet + Python: run the file build/darknet/x64/calc_mAP_voc_py.cmd you will get mAP for yolo voc.cfg model, mAP 75.9% 2. Using this fork of Darknet: run the file build/darknet/x64/calc_mAP.cmd you will get mAP for yolo voc.cfg model, mAP 75.8% (The article specifies the value of mAP 76.8% for YOLOv2 416×416, page 4 table 3: We get values lower perhaps due to the fact that the model was trained on a slightly different source code than the code on which the detection is was done) if you want to get mAP for tiny yolo voc.cfg model, then un comment line for tiny yolo voc.cfg and comment line for yolo voc.cfg in the .cmd file if you have Python 2.x instead of Python 3.x, and if you use Darknet+Python way to get mAP, then in your cmd file use reval_voc.py and voc_eval.py instead of reval_voc_py3.py and voc_eval_py3.py from this directory: Custom object detection: Example of custom object detection: darknet.exe detector test data/obj.data yolo obj.cfg yolo obj_8000.weights ! Yolo_v2_training ! Yolo_v2_training How to improve object detection: 1. Before training: set flag random 1 in your .cfg file it will increase precision by training Yolo for different resolutions: link increase network resolution in your .cfg file ( height 608 , width 608 or any value multiple of 32) it will increase precision recalculate anchors for your dataset for width and height from cfg file: darknet.exe detector calc_anchors data/obj.data num_of_clusters 9 width 416 height 416 then set the same 9 anchors in each of 3 yolo layers in your cfg file check that each object are mandatory labeled in your dataset no one object in your data set should not be without label. In the most training issues there are wrong labels in your dataset (got labels by using some conversion script, marked with a third party tool, ...). Always check your dataset by using: desirable that your training dataset include images with objects at diffrent: scales, rotations, lightings, from different sides, on different backgrounds you should preferably have 2000 images for each class or more desirable that your training dataset include images with non labeled objects that you do not want to detect negative samples without bounded box (empty .txt files) use as many images of negative samples as there are images with objects for training with a large number of objects in each image, add the parameter max 200 or higher value in the last layer region in your cfg file for training for small objects set layers 1, 11 instead of and set stride 4 instead of General rule your training dataset should include such a set of relative sizes of objects that you want to detect: train_network_width train_obj_width / train_image_width detection_network_width detection_obj_width / detection_image_width train_network_height train_obj_height / train_image_height detection_network_height detection_obj_height / detection_image_height to speedup training (with decreasing detection accuracy) do Fine Tuning instead of Transfer Learning, set param stopbackward 1 here: 2. After training for detection: Increase network resolution by set in your .cfg file ( height 608 and width 608 ) or ( height 832 and width 832 ) or (any value multiple of 32) this increases the precision and makes it possible to detect small objects: link you do not need to train the network again, just use .weights file already trained for 416x416 resolution if error Out of memory occurs then in .cfg file you should increase subdivisions 16 , 32 or 64: link How to mark bounded boxes of objects and create annotation files: Here you can find repository with GUI software for marking bounded boxes of objects and generating annotation files for Yolo v2 & v3: With example of: train.txt , obj.names , obj.data , yolo obj.cfg , air 1 6 .txt , bird 1 4 .txt for 2 classes of objects (air, bird) and train_obj.cmd with example how to train this image set with Yolo v2 & v3 Using Yolo9000 Simultaneous detection and classification of 9000 objects: yolo9000.weights (186 MB Yolo9000 Model) requires 4 GB GPU RAM: yolo9000.cfg cfg file of the Yolo9000, also there are paths to the 9k.tree and coco9k.map 9k.tree WordTree of 9418 categories , if parent_id 1 then this label hasn't parent: coco9k.map map 80 categories from MSCOCO to WordTree 9k.tree : combine9k.data data file, there are paths to: 9k.labels , 9k.names , inet9k.map , (change path to your combine9k.train.list ): 9k.labels 9418 labels of objects: 9k.names 9418 names of objects: inet9k.map map 200 categories from ImageNet to WordTree 9k.tree : How to use Yolo as DLL 1. To compile Yolo as C++ DLL file yolo_cpp_dll.dll open in MSVS2015 file build\darknet\yolo_cpp_dll.sln , set x64 and Release , and do the: Build > Build yolo_cpp_dll You should have installed CUDA 9.1 To use cuDNN do: (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions, and add at the beginning of line: CUDNN; 2. To use Yolo as DLL file in your C++ console application open in MSVS2015 file build\darknet\yolo_console_dll.sln , set x64 and Release , and do the: Build > Build yolo_console_dll you can run your console application from Windows Explorer build\darknet\x64\yolo_console_dll.exe or you can run from MSVS2015 (before this you should copy 2 files yolo voc.cfg and yolo voc.weights to the directory build\darknet\ ) after launching your console application and entering the image file name you will see info for each object: to use simple OpenCV GUI you should uncomment line // define OPENCV in yolo_console_dll.cpp file: link you can see source code of simple example for detection on the video file: link yolo_cpp_dll.dll API: link class Detector { public: Detector(std::string cfg_filename, std::string weight_filename, int gpu_id 0); Detector(); std::vector detect(std::string image_filename, float thresh 0.2, bool use_mean false); std::vector detect(image_t img, float thresh 0.2, bool use_mean false); static image_t load_image(std::string image_filename); static void free_image(image_t m); ifdef OPENCV std::vector detect(cv::Mat mat, float thresh 0.2, bool use_mean false); endif };",Object Detection,Object Detection 2697,Computer Vision,Computer Vision,Computer Vision,"Backbone This branch implement some backbones which origin maskrcnn_benchmark project don't contain, config files for those backbone can be found in configs/tuo . resnet 18/34 You can download pretrain Model from resnet18 and resnet34 , and move it to /.torch/models senet resnext Faster R CNN and Mask R CNN in PyTorch 1.0 This project aims at providing the necessary building blocks for easily creating detection and segmentation models using PyTorch 1.0. ! alt text (demo/demo_e2e_mask_rcnn_X_101_32x8d_FPN_1x.png from Highlights PyTorch 1.0: RPN, Faster R CNN and Mask R CNN implementations that matches or exceeds Detectron accuracies Very fast : up to 2x faster than Detectron and 30% faster than mmdetection during training. See MODEL_ZOO.md (MODEL_ZOO.md) for more details. Memory efficient: uses roughly 500MB less GPU memory than mmdetection during training Multi GPU training and inference Batched inference: can perform inference using multiple images per batch per GPU CPU support for inference: runs on CPU in inference time. See our webcam demo (demo) for an example Provides pre trained models for almost all reference Mask R CNN and Faster R CNN configurations with 1x schedule. Webcam and Jupyter notebook demo We provide a simple webcam demo that illustrates how you can use maskrcnn_benchmark for inference: bash cd demo by default, it runs on the GPU for best results, use min image size 800 python webcam.py min image size 800 can also run it on the CPU python webcam.py min image size 300 MODEL.DEVICE cpu or change the model that you want to use python webcam.py config file ../configs/caffe2/e2e_mask_rcnn_R_101_FPN_1x_caffe2.yaml min image size 300 MODEL.DEVICE cpu in order to see the probability heatmaps, pass show mask heatmaps python webcam.py min image size 300 show mask heatmaps MODEL.DEVICE cpu for the keypoint demo python webcam.py config file ../configs/caffe2/e2e_keypoint_rcnn_R_50_FPN_1x_caffe2.yaml min image size 300 MODEL.DEVICE cpu A notebook with the demo can be found in demo/Mask_R CNN_demo.ipynb (demo/Mask_R CNN_demo.ipynb). Installation Check INSTALL.md (INSTALL.md) for installation instructions. Model Zoo and Baselines Pre trained models, baselines and comparison with Detectron and mmdetection can be found in MODEL_ZOO.md (MODEL_ZOO.md) Inference in a few lines We provide a helper class to simplify writing inference pipelines using pre trained models. Here is how we would do it. Run this from the demo folder: python from maskrcnn_benchmark.config import cfg from predictor import COCODemo config_file ../configs/caffe2/e2e_mask_rcnn_R_50_FPN_1x_caffe2.yaml update the config options with the config file cfg.merge_from_file(config_file) manual override some options cfg.merge_from_list( MODEL.DEVICE , cpu ) coco_demo COCODemo( cfg, min_image_size 800, confidence_threshold 0.7, ) load image and then run prediction image ... predictions coco_demo.run_on_opencv_image(image) Perform training on COCO dataset For the following examples to work, you need to first install maskrcnn_benchmark . You will also need to download the COCO dataset. We recommend to symlink the path to the coco dataset to datasets/ as follows We use minival and valminusminival sets from Detectron bash symlink the coco dataset cd /github/maskrcnn benchmark mkdir p datasets/coco ln s /path_to_coco_dataset/annotations datasets/coco/annotations ln s /path_to_coco_dataset/train2014 datasets/coco/train2014 ln s /path_to_coco_dataset/test2014 datasets/coco/test2014 ln s /path_to_coco_dataset/val2014 datasets/coco/val2014 or use COCO 2017 version ln s /path_to_coco_dataset/annotations datasets/coco/annotations ln s /path_to_coco_dataset/train2017 datasets/coco/train2017 ln s /path_to_coco_dataset/test2017 datasets/coco/test2017 ln s /path_to_coco_dataset/val2017 datasets/coco/val2017 for pascal voc dataset: ln s /path_to_VOCdevkit_dir datasets/voc P.S. COCO_2017_train COCO_2014_train + valminusminival , COCO_2017_val minival You can also configure your own paths to the datasets. For that, all you need to do is to modify maskrcnn_benchmark/config/paths_catalog.py to point to the location where your dataset is stored. You can also create a new paths_catalog.py file which implements the same two classes, and pass it as a config argument PATHS_CATALOG during training. Single GPU training Most of the configuration files that we provide assume that we are running on 8 GPUs. In order to be able to run it on fewer GPUs, there are a few possibilities: 1. Run the following without modifications bash python /path_to_maskrcnn_benchmark/tools/train_net.py config file /path/to/config/file.yaml This should work out of the box and is very similar to what we should do for multi GPU training. But the drawback is that it will use much more GPU memory. The reason is that we set in the configuration files a global batch size that is divided over the number of GPUs. So if we only have a single GPU, this means that the batch size for that GPU will be 8x larger, which might lead to out of memory errors. If you have a lot of memory available, this is the easiest solution. 2. Modify the cfg parameters If you experience out of memory errors, you can reduce the global batch size. But this means that you'll also need to change the learning rate, the number of iterations and the learning rate schedule. Here is an example for Mask R CNN R 50 FPN with the 1x schedule: bash python tools/train_net.py config file configs/e2e_mask_rcnn_R_50_FPN_1x.yaml SOLVER.IMS_PER_BATCH 2 SOLVER.BASE_LR 0.0025 SOLVER.MAX_ITER 720000 SOLVER.STEPS (480000, 640000) TEST.IMS_PER_BATCH 1 This follows the scheduling rules from Detectron. Note that we have multiplied the number of iterations by 8x (as well as the learning rate schedules), and we have divided the learning rate by 8x. We also changed the batch size during testing, but that is generally not necessary because testing requires much less memory than training. Multi GPU training We use internally torch.distributed.launch in order to launch multi gpu training. This utility function from PyTorch spawns as many Python processes as the number of GPUs we want to use, and each Python process will only use a single GPU. bash export NGPUS 8 python m torch.distributed.launch nproc_per_node $NGPUS /path_to_maskrcnn_benchmark/tools/train_net.py config file path/to/config/file.yaml Abstractions For more information on some of the main abstractions in our implementation, see ABSTRACTIONS.md (ABSTRACTIONS.md). Adding your own dataset This implementation adds support for COCO style datasets. But adding support for training on a new dataset can be done as follows: python from maskrcnn_benchmark.structures.bounding_box import BoxList class MyDataset(object): def __init__(self, ...): as you would do normally def __getitem__(self, idx): load the image as a PIL Image image ... load the bounding boxes as a list of list of boxes in this case, for illustrative purposes, we use x1, y1, x2, y2 order. boxes 0, 0, 10, 10 , 10, 20, 50, 50 and labels labels torch.tensor( 10, 20 ) create a BoxList from the boxes boxlist BoxList(boxes, image.size, mode xyxy ) add the labels to the boxlist boxlist.add_field( labels , labels) if self.transforms: image, boxlist self.transforms(image, boxlist) return the image, the boxlist and the idx in your dataset return image, boxlist, idx def get_img_info(self, idx): get img_height and img_width. This is used if we want to split the batches according to the aspect ratio of the image, as it can be more efficient than loading the image from disk return { height : img_height, width : img_width} That's it. You can also add extra fields to the boxlist, such as segmentation masks (using structures.segmentation_mask.SegmentationMask ), or even your own instance type. For a full example of how the COCODataset is implemented, check maskrcnn_benchmark/data/datasets/coco.py (maskrcnn_benchmark/data/datasets/coco.py). Note: While the aforementioned example should work for training, we leverage the cocoApi for computing the accuracies during testing. Thus, test datasets should currently follow the cocoApi for now. Finetuning from Detectron weights on custom datasets Create a script tools/trim_detectron_model.py like here . You can decide which keys to be removed and which keys to be kept by modifying the script. Then you can simply point the converted model path in the config file by changing MODEL.WEIGHT . For further information, please refer to 15 . Troubleshooting If you have issues running or compiling this code, we have compiled a list of common issues in TROUBLESHOOTING.md (TROUBLESHOOTING.md). If your issue is not present there, please feel free to open a new issue. Citations Please consider citing this project in your publications if it helps your research. The following is a BibTeX reference. The BibTeX entry requires the url LaTeX package. @misc{massa2018mrcnn, author {Massa, Francisco and Girshick, Ross}, title {{maskrnn benchmark: Fast, modular reference implementation of Instance Segmentation and Object Detection algorithms in PyTorch}}, year {2018}, howpublished {\url{ note {Accessed: Insert date here } } Projects using maskrcnn benchmark RetinaMask: Learning to predict masks improves state of the art single shot detection for free . Cheng Yang Fu, Mykhailo Shvets, and Alexander C. Berg. Tech report, arXiv,1901.03353. License maskrcnn benchmark is released under the MIT license. See LICENSE (LICENSE) for additional details.",Object Detection,Object Detection 2704,Computer Vision,Computer Vision,Computer Vision,"NOAA Fish Finding Perform object detection technique like faster RCNN and SSD to the 2018 CVPR workshop and challenge: Automated Analysis of Marine Video for Environmental Monitoring Table of Contents Introduction Prerequisites Installation Prepare Data Training Testing Demo Reference Introduction Overview This data challenge is a workshop in 2018 CVPR, with large amounts of image data have been collected and annotated by the National Oceanic and Atmospheric Administration (NOAA) from a a variety of image and video underwater. workshop website Datasets The data releases are comprised of images and annotations from five different data sources, with six datasets in total. HabCam: abcam_seq0 MOUSS: mouss_seq0, mouss_seq1 AFSC DropCam: afsc_seq0 MBARI: mbari_seq0 NWFSC: nwfsc_seq0 Each dataset contains different lighting conditions, camera angles, and wildlife. The data released depends on the nature of the data in the entire dataset. Datasets detail Scoring The challenge will evaluate accuracy in detection and classification , following the format in the MSCOCO Detection Challenge , for bounding box output. The annotations for scoring are bounding boxes around every animal, with a species classification label for each. Prerequisites Python 3+ Tensorflow > 1.6.0 pytorch 0.3.0 Python package cython , opencv python , easydict Installation 1. Clone the repository git clone 2. Download Pre trained model cd $NOAA fish finding sh data/scripts/setModel.sh 3. Update GPU arch Update your arch in setup script to match your GPU check this to match GPU architecture cd lib Change the GPU architecture ( arch) if necessary vim setup.py 4. bulid Faster RCNN Cython modules make clean && make cd .. 5. Install the Python COCO API. The code requires the API to access COCO dataset. cd data git clone cd coco/PythonAPI make cd ../../.. Prepare Data 1. download training and testing data Training data (270.7 GB) Testing data (272.1 GB) 2. unzip both tars, it should have this basic structure $annotations/ annotation root directory $annotations/habcam_seq0_training.mscoco.json $annotations/mbari_seq0_training.mscoco.json $annotations/mouss_seq0_training.mscoco.json $annotations/mouss_seq1_training.mscoco.json $annotations/... $imagery/ image root directory $imagery/habcam_seq0/ $imagery/mbari_seq0/ $imagery/mouss_seq0/ $imagery/mouss_seq1/ $imagery/... 3. Create symlinks for the NOAA dataset cd $NOAA fish finding/data/VOCdevkit2007 mkdir p DATASET cd DATASET ln s $imagery/ DATASET / PNGImages DATASET {mouss_seq0, mouss_seq1, mbari_seq0, habcam_seq0} 4. Prepare training images & annotations python3 preprocess/jsonParser.py dataset DATASET anno_path PATH mode MODE DATASET {mouss_seq0, mouss_seq1, mbari_seq0, habcam_seq0} PATH: training annotation path root directory MODE: image preprocess mode original: only conver png to jpg contrast: enhance constrst equal: preform CLAHE(Contrast Limit Adaptive Histogram Equalization) Training for Faster RCNN ./experiments/scripts/train_faster_rcnn.sh GPU_ID DATASET NET GPU_ID is the GPU you want to test on NET in {vgg16, res50, res101, res152} is the network arch to use DATASET {mouss_seq0, mouss_seq1, mbari_seq0, habcam_seq0} is defined in train_faster_rcnn.sh Examples: ./experiments/scripts/train_faster_rcnn.sh 0 mouss_seq1 vgg16 ./experiments/scripts/train_faster_rcnn.sh 1 mbari_seq0 res101 for SSD 300/512 $ python3 SSD/train.py dataset DATASET ssd_size 300/512 DATASET {mouss_seq0, mouss_seq1, mbari_seq0, habcam_seq0} Testing before testing, remember to remove cache of last predict $ rm data/cache/ $ rm data/VOCdevkit2007/annotations_cache/ for Faster RCNN ./experiments/scripts/test_faster_rcnn.sh GPU_ID DATASET NET GPU_ID is the GPU you want to test on NET in {vgg16, res50, res101, res152} is the network arch to use DATASET {mouss_seq0, mouss_seq1, mbari_seq0, habcam_seq0} is defined in test_faster_rcnn.sh for SSD 300/512 python SSD/eval.py dataset DATASET ssd_size 300/512 path PATH DATASET {mouss_seq0, mouss_seq1, mbari_seq0, habcam_seq0} PATH: model path Demo put the tested image in the data/demo folder for Faster RCNN python3 tools/demo.py net NET train_set TRAIN test_set TEST mode predict/ both NET {VGG16/ResNet101} TRAIN {mouss_seq0, mouss_seq1, mbari_seq0, habcam_seq0} TEST {mouss1/2/3/4/5, mbari1, habcam} MODE: predict: only plot prediction both: ground truth and prediction Result Dataset method mAP : : : mouss_seq0 Faster RCNN 0.989 mouss_seq1 Faster RCNN 0.909 mbari_seq0 Faster RCNN 0.8358 habcam_seq0 Faster RCNN 0.4752 full experiment Detection Snapshot ! Imgur ! Imgur ! Imgur ! Imgur Reference 1 Faster RCNN: 2 SSD: Single Shot MultiBox Detector: 3 tf faster RCNN: 4 ssd.pytorch:",Object Detection,Object Detection 2709,Computer Vision,Computer Vision,Computer Vision,"ResNet in Tensorflow This implementation of resnet and its variants is designed to be straightforward and friendly to new ResNet users. You can train a resnet on cifar10 by downloading and running the code. There are screen outputs, tensorboard statistics and tensorboard graph visualization to help you monitor the training process and visualize the model. Now the code works with tensorflow 1.0.0 and 1.1.0, but it's no longer compatible with earlier versions. If you like the code, please star it! You are welcome to post questions and suggestions on my github. Table of Contents Validation errors ( validation errors) Training curves ( training curves) User's guide ( users guide) Pre requisites ( pre requisites) Overall structure ( overall structure) Hyper parameters ( hyper parameters) Resnet Strcuture ( resnet structure) Training ( training) Test ( test) Validation errors The lowest valdiation errors of ResNet 32, ResNet 56 and ResNet 110 are 6.7%, 6.5% and 6.2% respectively. You can change the number of the total layers by changing the hyper parameter num_residual_blocks. Total layers 6 num_residual_blocks + 2 Network Lowest Validation Error ResNet 32 6.7% ResNet 56 6.5% ResNet 110 6.2% Training curves ! alt tag User's guide You can run cifar10_train.py and see how it works from the screen output (the code will download the data for you if you don't have it yet). It’s better to speicify version identifier before running, since the training logs, checkpoints, and error.csv file will be saved in the folder with name logs_$version. You can do this by command line: python cifar10_train.py version 'test' . You may also change the version number inside the hyper_parameters.py file The training and validation error will be output on the screen. They can also be viewed using tensorboard. Use tensorboard logdir 'logs_$version' command to pull them out. (For e.g. If the version is ‘test’, the logdir should be ‘logs_test’.) The relevant statistics of each layer can be found on tensorboard. Pre requisites pandas, numpy , opencv, tensorflow(1.0.0) Overall structure There are four python files in the repository. cifar10_input.py, resnet.py, cifar10_train.py, hyper_parameters.py. cifar10_input.py includes helper functions to download, extract and pre process the cifar10 images. resnet.py defines the resnet structure. cifar10_train.py is responsible for the training and validation. hyper_parameters.py defines hyper parameters related to train, resnet structure, data augmentation, etc. The following sections expain the codes in details. hyper parameters The hyper_parameters.py file defines all the hyper parameters that you may change to customize your training. You may use python cifar10_train.py hyper_parameter1 value1 hyper_parameter2 value2 to set all the hyper parameters. You may also change the default values inside the python script. There are five categories of hyper parameters. 1. Hyper parameters about saving training logs, tensorboard outputs and screen outputs, which includes: version : str. The checkpoints and output events will be saved in logs_$version/ report_freq : int. How many batches to run a full validation and print screen output once. Screen output looks like: ! alt tag train_ema_decay : float. The tensorboard will record a moving average of batch train errors, besides the original ones. This decay factor is used to define an ExponentialMovingAverage object in tensorflow with tf.train.ExponentialMovingAverage(FLAGS.train_ema_decay, global_step) . Essentially, the recorded error train_ema_decay shadowed_error + (1 train_ema_decay) current_batch_error. The larger the train_ema_decay is, the smoother the training curve will be. 2. Hyper parameters regarding the training process train_steps : int. Total training steps is_full_validation : boolean. If you want to use all the 10000 validation images to run the validation (True), or you want to randomly draw a batch of validation data (False) train_batch_size : int. Training batch size validation_batch_size : int. Validation batch size (which is only effective if is_full_validation False) init_lr : float. The initial learning rate. The learning rate may decay based on the settings below lr_decay_factor : float. The decaying factor of learning rate. The learning rate will become lr_decay_factor current_learning_rate every time it is decayed. decay_step0 : int. The learning rate will decay at decay_step0 for the first time decay_step1 : int. The second time when the learning rate will decay 3. Hyper parameters that controls the network num_residual_blocks : int. The total layers of the ResNet 6 num_residual_blocks + 2 weight_decay : float. The weight decay used to regularize the network. Total_loss train_loss + weight_decay sume of sqaures of the weights 4. About data augmentation padding_size : int. padding_size is numbers of zero pads to add on each side of the image. Padding and random cropping during training can prevent overfitting. 5. Loading checkpoints ckpt_path : str. The path of the checkpoint that you want to load is_use_ckpt : boolean. If yes, use a checkpoint and continue the training from the checkpoint ResNet Structure Here we use the latest version of ResNet. The structure of the residual block looks like ref : The inference() function is the main function of resnet.py. It will be used twice in both building the training graph and validation graph. Training The class Train() defines all the functions regarding training process, with train() being the main function. The basic idea is to run train_op for FLAGS.train_steps times. If step % FLAGS.report_freq 0, it will valdiate once, train once and wrote all the summaries onto the tensorboard. Test The test() function in the class Train() help you predict. It returns the softmax probability with shape num_test_images, num_labels . You need to prepare and pre process your test data and pass it to the function. You may either use your own checkpoints or the pre trained ResNet 110 checkpoint I uploaded. You may wrote the following lines at the end of cifar10_train.py file train Train() test_image_array ... Better to be whitened in advance. Shape 1, img_height, img_width, img_depth predictions train.test(test_image_array) predictions is the predicted softmax array. Run the following commands in the command line: If you want to use my checkpoint. python cifar10_train.py test_ckpt_path 'model_110.ckpt 79999'",Object Detection,Object Detection 2715,Computer Vision,Computer Vision,Computer Vision,"Yolo v3 and Yolo v2 for Windows and Linux (Prueba NFPA) (Prueba de red neuronal YOLO para reconocer símbolos NFPA) CircleCI 1. How to use ( how to use) 2. How to compile on Linux ( how to compile on linux) 3. How to compile on Windows ( how to compile on windows) 4. How to train (Pascal VOC Data) ( how to train pascal voc data) 5. How to train (to detect your custom objects) ( how to train to detect your custom objects) 6. When should I stop training ( when should i stop training) 7. How to calculate mAP on PascalVOC 2007 ( how to calculate map on pascalvoc 2007) 8. How to improve object detection ( how to improve object detection) 9. How to mark bounded boxes of objects and create annotation files ( how to mark bounded boxes of objects and create annotation files) 10. Using Yolo9000 ( using yolo9000) 11. How to use Yolo as DLL ( how to use yolo as dll) 12. Para este ejemplo ( prueba ejemplo) ! Darknet Logo ! map_fps mAP (AP50) Yolo v3 source chart for the RetinaNet on MS COCO got from Table 1 (e): Yolo v2 on Pascal VOC 2007: Yolo v2 on Pascal VOC 2012 (comp4): You Only Look Once: Unified, Real Time Object Detection (versions 2 & 3) A Yolo cross platform Windows and Linux version (for object detection). Contributtors: This repository is forked from Linux version: More details: This repository supports: both Windows and Linux both OpenCV 2.x.x and OpenCV 7.5 also create SO library on Linux and DLL library on Windows Requires: Linux GCC> 4.9 or Windows MS Visual Studio 2015 (v140) : (or offline ISO image ) CUDA 9.1 : OpenCV 3.4.0 : or OpenCV 2.4.13 : OpenCV allows to show image or video detection in the window and store result to file that specified in command line out_filename res.avi GPU with CC > 3.0 : Pre trained models for different cfg files can be downloaded from (smaller > faster & lower quality): yolov3.cfg (236 MB COCO Yolo v3 ) requires 4 GB GPU RAM: yolov3 tiny.cfg (34 MB COCO Yolo v3 tiny ) requires 1 GB GPU RAM: yolov2.cfg (194 MB COCO Yolo v2) requires 4 GB GPU RAM: yolo voc.cfg (194 MB VOC Yolo v2) requires 4 GB GPU RAM: yolov2 tiny.cfg (43 MB COCO Yolo v2) requires 1 GB GPU RAM: yolov2 tiny voc.cfg (60 MB VOC Yolo v2) requires 1 GB GPU RAM: yolo9000.cfg (186 MB Yolo9000 model) requires 4 GB GPU RAM: Put it near compiled: darknet.exe You can get cfg files by path: darknet/cfg/ Examples of results: Everything Is AWESOME Others: How to use: Example of usage in cmd files from build\darknet\x64\ : darknet_yolo_v3.cmd initialization with 236 MB Yolo v3 COCO model yolov3.weights & yolov3.cfg and show detection on the image: dog.jpg darknet_voc.cmd initialization with 194 MB VOC model yolo voc.weights & yolo voc.cfg and waiting for entering the name of the image file darknet_demo_voc.cmd initialization with 194 MB VOC model yolo voc.weights & yolo voc.cfg and play your video file which you must rename to: test.mp4 darknet_demo_store.cmd initialization with 194 MB VOC model yolo voc.weights & yolo voc.cfg and play your video file which you must rename to: test.mp4, and store result to: res.avi darknet_net_cam_voc.cmd initialization with 194 MB VOC model, play video from network video camera mjpeg stream (also from you phone) darknet_web_cam_voc.cmd initialization with 194 MB VOC model, play video from Web Camera number 0 darknet_coco_9000.cmd initialization with 186 MB Yolo9000 COCO model, and show detection on the image: dog.jpg darknet_coco_9000_demo.cmd initialization with 186 MB Yolo9000 COCO model, and show detection on the video (if it is present): street4k.mp4, and store result to: res.avi How to use on the command line: On Linux use ./darknet instead of darknet.exe , like this: ./darknet detector test ./cfg/coco.data ./cfg/yolov3.cfg ./yolov3.weights Yolo v3 COCO image: darknet.exe detector test data/coco.data cfg/yolov3.cfg yolov3.weights i 0 thresh 0.25 Alternative method Yolo v3 COCO image: darknet.exe detect cfg/yolov3.cfg yolov3.weights i 0 thresh 0.25 Output coordinates of objects: darknet.exe detector test data/coco.data yolov3.cfg yolov3.weights thresh 0.25 dog.jpg ext_output 194 MB VOC model image: darknet.exe detector test data/voc.data yolo voc.cfg yolo voc.weights i 0 194 MB VOC model video: darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights test.mp4 i 0 194 MB VOC model save result to the file res.avi : darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights test.mp4 i 0 out_filename res.avi Alternative method 194 MB VOC model video: darknet.exe yolo demo yolo voc.cfg yolo voc.weights test.mp4 i 0 43 MB VOC model for video: darknet.exe detector demo data/coco.data cfg/yolov2 tiny.cfg yolov2 tiny.weights test.mp4 i 0 Yolo v3 236 MB COCO for net videocam Smart WebCam: darknet.exe detector demo data/coco.data cfg/yolov3.cfg yolov3.weights i 0 194 MB VOC model for net videocam Smart WebCam: darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights i 0 194 MB VOC model WebCamera 0: darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights c 0 186 MB Yolo9000 image: darknet.exe detector test cfg/combine9k.data yolo9000.cfg yolo9000.weights Remeber to put data/9k.tree and data/coco9k.map under the same folder of your app if you use the cpp api to build an app To process a list of images data/train.txt and save results of detection to result.txt use: darknet.exe detector test data/voc.data yolo voc.cfg yolo voc.weights dont_show ext_output result.txt For using network video camera mjpeg stream with any Android smartphone: 1. Download for Android phone mjpeg stream soft: IP Webcam / Smart WebCam Smart WebCam preferably: IP Webcam: 2. Connect your Android phone to computer by WiFi (through a WiFi router) or USB 3. Start Smart WebCam on your phone 4. Replace the address below, on shown in the phone application (Smart WebCam) and launch: 194 MB COCO model: darknet.exe detector demo data/coco.data yolo.cfg yolo.weights i 0 194 MB VOC model: darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights i 0 How to compile on Linux: Just do make in the darknet directory. Before make, you can set such options in the Makefile : link GPU 1 to build with CUDA to accelerate by using GPU (CUDA should be in /usr/local/cuda ) CUDNN 1 to build with cuDNN v5 v7 to accelerate training by using GPU (cuDNN should be in /usr/local/cudnn ) CUDNN_HALF 1 to build for Tensor Cores (on Titan V / Tesla V100 / DGX 2 and later) speedup Detection 3x, Training 2x OPENCV 1 to build with OpenCV 3.x/2.4.x allows to detect on video files and video streams from network cameras or web cams DEBUG 1 to bould debug version of Yolo OPENMP 1 to build with OpenMP support to accelerate Yolo by using multi core CPU LIBSO 1 to build a library darknet.so and binary runable file uselib that uses this library. Or you can try to run so LD_LIBRARY_PATH ./:$LD_LIBRARY_PATH ./uselib test.mp4 How to use this SO library from your own code you can look at C++ example: How to compile on Windows: 1. If you have MSVS 2015, CUDA 9.1, cuDNN 7.0 and OpenCV 3.x (with paths: C:\opencv_3.0\opencv\build\include & C:\opencv_3.0\opencv\build\x64\vc14\lib ), then start MSVS, open build\darknet\darknet.sln , set x64 and Release and do the: Build > Build darknet. NOTE: If installing OpenCV, use OpenCV 3.4.0 or earlier. This is a bug in OpenCV 3.4.1 in the C API (see 500 ). 1.1. Find files opencv_world320.dll and opencv_ffmpeg320_64.dll (or opencv_world340.dll and opencv_ffmpeg340_64.dll ) in C:\opencv_3.0\opencv\build\x64\vc14\bin and put it near with darknet.exe 1.2 Check that there are bin and include folders in the C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1 if aren't, then copy them to this folder from the path where is CUDA installed 1.3. To install CUDNN (speedup neural network), do the following: download and install cuDNN 7.0 for CUDA 9.1 : add Windows system variable cudnn with path to CUDNN: 1.4. If you want to build without CUDNN then: open \darknet.sln > (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions, and remove this: CUDNN; 2. If you have other version of CUDA (not 9.1) then open build\darknet\darknet.vcxproj by using Notepad, find 2 places with CUDA 9.1 and change it to your CUDA version, then do step 1 3. If you don't have GPU , but have MSVS 2015 and OpenCV 3.0 (with paths: C:\opencv_3.0\opencv\build\include & C:\opencv_3.0\opencv\build\x64\vc14\lib ), then start MSVS, open build\darknet\darknet_no_gpu.sln , set x64 and Release , and do the: Build > Build darknet_no_gpu 4. If you have OpenCV 2.4.13 instead of 3.0 then you should change pathes after \darknet.sln is opened 4.1 (right click on project) > properties > C/C++ > General > Additional Include Directories: C:\opencv_2.4.13\opencv\build\include 4.2 (right click on project) > properties > Linker > General > Additional Library Directories: C:\opencv_2.4.13\opencv\build\x64\vc14\lib 5. If you have GPU with Tensor Cores (nVidia Titan V / Tesla V100 / DGX 2 and later) speedup Detection 3x, Training 2x: \darknet.sln > (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions, and add here: CUDNN_HALF; Note: CUDA must be installed only after that MSVS2015 had been installed. How to compile (custom): Also, you can to create your own darknet.sln & darknet.vcxproj , this example for CUDA 9.1 and OpenCV 3.0 Then add to your created project: (right click on project) > properties > C/C++ > General > Additional Include Directories, put here: C:\opencv_3.0\opencv\build\include;..\..\3rdparty\include;%(AdditionalIncludeDirectories);$(CudaToolkitIncludeDir);$(cudnn)\include (right click on project) > Build dependecies > Build Customizations > set check on CUDA 9.1 or what version you have for example as here: add to project all .c & .cu files and file from \src (right click on project) > properties > Linker > General > Additional Library Directories, put here: C:\opencv_3.0\opencv\build\x64\vc14\lib;$(CUDA_PATH)lib\$(PlatformName);$(cudnn)\lib\x64;%(AdditionalLibraryDirectories) (right click on project) > properties > Linker > Input > Additional dependecies, put here: ..\..\3rdparty\lib\x64\pthreadVC2.lib;cublas.lib;curand.lib;cudart.lib;cudnn.lib;%(AdditionalDependencies) (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions OPENCV;_TIMESPEC_DEFINED;_CRT_SECURE_NO_WARNINGS;_CRT_RAND_S;WIN32;NDEBUG;_CONSOLE;_LIB;%(PreprocessorDefinitions) compile to .exe (X64 & Release) and put .dll s near with .exe: pthreadVC2.dll, pthreadGC2.dll from \3rdparty\dll\x64 cusolver64_91.dll, curand64_91.dll, cudart64_91.dll, cublas64_91.dll 91 for CUDA 9.1 or your version, from C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1\bin For OpenCV 3.2: opencv_world320.dll and opencv_ffmpeg320_64.dll from C:\opencv_3.0\opencv\build\x64\vc14\bin For OpenCV 2.4.13: opencv_core2413.dll , opencv_highgui2413.dll and opencv_ffmpeg2413_64.dll from C:\opencv_2.4.13\opencv\build\x64\vc14\bin How to train (Pascal VOC Data): 1. Download pre trained weights for the convolutional layers (154 MB): and put to the directory build\darknet\x64 2. Download The Pascal VOC Data and unpack it to directory build\darknet\x64\data\voc will be created dir build\darknet\x64\data\voc\VOCdevkit\ : 2.1 Download file voc_label.py to dir build\darknet\x64\data\voc : 3. Download and install Python for Windows: 4. Run command: python build\darknet\x64\data\voc\voc_label.py (to generate files: 2007_test.txt, 2007_train.txt, 2007_val.txt, 2012_train.txt, 2012_val.txt) 5. Run command: type 2007_train.txt 2007_val.txt 2012_ .txt > train.txt 6. Set batch 64 and subdivisions 8 in the file yolov3 voc.cfg : link 7. Start training by using train_voc.cmd or by using the command line: darknet.exe detector train data/voc.data cfg/yolov3 voc.cfg darknet53.conv.74 ( Note: To disable Loss Window use flag dont_show . If you are using CPU, try darknet_no_gpu.exe instead of darknet.exe .) If required change pathes in the file build\darknet\x64\data\voc.data More information about training by the link: Note: If during training you see nan values for avg (loss) field then training goes wrong, but if nan is in some other lines then training goes well. How to train with multi GPU: 1. Train it first on 1 GPU for like 1000 iterations: darknet.exe detector train data/voc.data cfg/yolov3 voc.cfg darknet53.conv.74 2. Then stop and by using partially trained model /backup/yolov3 voc_1000.weights run training with multigpu (up to 4 GPUs): darknet.exe detector train data/voc.data cfg/yolov3 voc.cfg /backup/yolov3 voc_1000.weights gpus 0,1,2,3 How to train (to detect your custom objects): (to train old Yolo v2 yolov2 voc.cfg , yolov2 tiny voc.cfg , yolo voc.cfg , yolo voc.2.0.cfg , ... click by the link ) Training Yolo v3: 1. Create file yolo obj.cfg with the same content as in yolov3.cfg (or copy yolov3.cfg to yolo obj.cfg) and: change line batch to batch 64 change line subdivisions to subdivisions 8 change line classes 80 to your number of objects in each of 3 yolo layers: change filters 255 to filters (classes + 5)x3 in the 3 convolutional before each yolo layer So if classes 1 then should be filters 18 . If classes 2 then write filters 21 . (Do not write in the cfg file: filters (classes + 5)x3) (Generally filters depends on the classes , coords and number of mask s, i.e. filters (classes + coords + 1) , where mask is indices of anchors. If mask is absence, then filters (classes + coords + 1) num ) So for example, for 2 objects, your file yolo obj.cfg should differ from yolov3.cfg in such lines in each of 3 yolo layers: convolutional filters 21 region classes 2 2. Create file obj.names in the directory build\darknet\x64\data\ , with objects names each in new line 3. Create file obj.data in the directory build\darknet\x64\data\ , containing (where classes number of objects ): classes 2 train data/train.txt valid data/test.txt names data/obj.names backup backup/ 4. Put image files (.jpg) of your objects in the directory build\darknet\x64\data\obj\ 5. You should label each object on images from your dataset. Use this visual GUI software for marking bounded boxes of objects and generating annotation files for Yolo v2 & v3: It will create .txt file for each .jpg image file in the same directory and with the same name, but with .txt extension, and put to file: object number and object coordinates on this image, for each object in new line: Where: integer number of object from 0 to (classes 1) float values relative to width and height of image, it can be equal from (0.0 to 1.0 for example: / or / atention: are center of rectangle (are not top left corner) For example for img1.jpg you will be created img1.txt containing: 1 0.716797 0.395833 0.216406 0.147222 0 0.687109 0.379167 0.255469 0.158333 1 0.420312 0.395833 0.140625 0.166667 6. Create file train.txt in directory build\darknet\x64\data\ , with filenames of your images, each filename in new line, with path relative to darknet.exe , for example containing: data/obj/img1.jpg data/obj/img2.jpg data/obj/img3.jpg 7. Download pre trained weights for the convolutional layers (154 MB): and put to the directory build\darknet\x64 8. Start training by using the command line: darknet.exe detector train data/obj.data yolo obj.cfg darknet53.conv.74 (file yolo obj_xxx.weights will be saved to the build\darknet\x64\backup\ for each 100 iterations) (To disable Loss Window use darknet.exe detector train data/obj.data yolo obj.cfg darknet53.conv.74 dont_show , if you train on computer without monitor like a cloud Amazaon EC2) 9. After training is complete get result yolo obj_final.weights from path build\darknet\x64\backup\ After each 100 iterations you can stop and later start training from this point. For example, after 2000 iterations you can stop training, and later just copy yolo obj_2000.weights from build\darknet\x64\backup\ to build\darknet\x64\ and start training using: darknet.exe detector train data/obj.data yolo obj.cfg yolo obj_2000.weights (in the original repository the weights file is saved only once every 10 000 iterations if(iterations > 1000) ) Also you can get result earlier than all 45000 iterations. Note: If during training you see nan values for avg (loss) field then training goes wrong, but if nan is in some other lines then training goes well. Note: If you changed width or height in your cfg file, then new width and height must be divisible by 32. Note: After training use such command for detection: darknet.exe detector test data/obj.data yolo obj.cfg yolo obj_8000.weights How to train tiny yolo (to detect your custom objects): Do all the same steps as for the full yolo model as described above. With the exception of: Download default weights file for yolov3 tiny: Get pre trained weights yolov3 tiny.conv.15 using command: darknet.exe partial cfg/yolov3 tiny.cfg yolov3 tiny.weights yolov3 tiny.conv.15 15 Make your custom model yolov3 tiny obj.cfg based on cfg/yolov3 tiny_obj.cfg instead of yolov3.cfg Start training: darknet.exe detector train data/obj.data yolov3 tiny obj.cfg yolov3 tiny.conv.15 For training Yolo based on other models ( DenseNet201 Yolo or ResNet50 Yolo ), you can download and get pre trained weights as showed in this file: If you made you custom model that isn't based on other models, then you can train it without pre trained weights, then will be used random initial weights. When should I stop training: Usually sufficient 2000 iterations for each class(object). But for a more precise definition when you should stop training, use the following manual: 1. During training, you will see varying indicators of error, and you should stop when no longer decreases 0.XXXXXXX avg : > Region Avg IOU: 0.798363, Class: 0.893232, Obj: 0.700808, No Obj: 0.004567, Avg Recall: 1.000000, count: 8 > Region Avg IOU: 0.800677, Class: 0.892181, Obj: 0.701590, No Obj: 0.004574, Avg Recall: 1.000000, count: 8 > > 9002 : 0.211667, 0.060730 avg , 0.001000 rate, 3.868000 seconds, 576128 images > Loaded: 0.000000 seconds 9002 iteration number (number of batch) 0.060730 avg average loss (error) the lower, the better When you see that average loss 0.xxxxxx avg no longer decreases at many iterations then you should stop training. 2. Once training is stopped, you should take some of last .weights files from darknet\build\darknet\x64\backup and choose the best of them: For example, you stopped training after 9000 iterations, but the best result can give one of previous weights (7000, 8000, 9000). It can happen due to overfitting. Overfitting is case when you can detect objects on images from training dataset, but can't detect objects on any others images. You should get weights from Early Stopping Point : ! Overfitting To get weights from Early Stopping Point: 2.1. At first, in your file obj.data you must specify the path to the validation dataset valid valid.txt (format of valid.txt as in train.txt ), and if you haven't validation images, just copy data\train.txt to data\valid.txt . 2.2 If training is stopped after 9000 iterations, to validate some of previous weights use this commands: (If you use another GitHub repository, then use darknet.exe detector recall ... instead of darknet.exe detector map ...) darknet.exe detector map data/obj.data yolo obj.cfg backup\yolo obj_7000.weights darknet.exe detector map data/obj.data yolo obj.cfg backup\yolo obj_8000.weights darknet.exe detector map data/obj.data yolo obj.cfg backup\yolo obj_9000.weights And comapre last output lines for each weights (7000, 8000, 9000): Choose weights file with the highest IoU (intersect of union) and mAP (mean average precision) For example, bigger IOU gives weights yolo obj_8000.weights then use this weights for detection . Example of custom object detection: darknet.exe detector test data/obj.data yolo obj.cfg yolo obj_8000.weights IoU (intersect of union) average instersect of union of objects and detections for a certain threshold 0.24 mAP (mean average precision) mean value of average precisions for each class, where average precision is average value of 11 points on PR curve for each possible threshold (each probability of detection) for the same class (Precision Recall in terms of PascalVOC, where Precision TP/(TP+FP) and Recall TP/(TP+FN) ), page 11: mAP is default metric of precision in the PascalVOC competition, this is the same as AP50 metric in the MS COCO competition. In terms of Wiki, indicators Precision and Recall have a slightly different meaning than in the PascalVOC competition, but IoU always has the same meaning . ! precision_recall_iou How to calculate mAP on PascalVOC 2007: 1. To calculate mAP (mean average precision) on PascalVOC 2007 test: Download PascalVOC dataset, install Python 3.x and get file 2007_test.txt as described here: Then download file to the dir build\darknet\x64\data\ then run voc_label_difficult.py to get the file difficult_2007_test.txt Remove symbol from this line to un comment it: Then there are 2 ways to get mAP: 1. Using Darknet + Python: run the file build/darknet/x64/calc_mAP_voc_py.cmd you will get mAP for yolo voc.cfg model, mAP 75.9% 2. Using this fork of Darknet: run the file build/darknet/x64/calc_mAP.cmd you will get mAP for yolo voc.cfg model, mAP 75.8% (The article specifies the value of mAP 76.8% for YOLOv2 416×416, page 4 table 3: We get values lower perhaps due to the fact that the model was trained on a slightly different source code than the code on which the detection is was done) if you want to get mAP for tiny yolo voc.cfg model, then un comment line for tiny yolo voc.cfg and comment line for yolo voc.cfg in the .cmd file if you have Python 2.x instead of Python 3.x, and if you use Darknet+Python way to get mAP, then in your cmd file use reval_voc.py and voc_eval.py instead of reval_voc_py3.py and voc_eval_py3.py from this directory: Custom object detection: Example of custom object detection: darknet.exe detector test data/obj.data yolo obj.cfg yolo obj_8000.weights ! Yolo_v2_training ! Yolo_v2_training How to improve object detection: 1. Before training: set flag random 1 in your .cfg file it will increase precision by training Yolo for different resolutions: link increase network resolution in your .cfg file ( height 608 , width 608 or any value multiple of 32) it will increase precision recalculate anchors for your dataset for width and height from cfg file: darknet.exe detector calc_anchors data/obj.data num_of_clusters 9 width 416 height 416 then set the same 9 anchors in each of 3 yolo layers in your cfg file check that each object are mandatory labeled in your dataset no one object in your data set should not be without label. In the most training issues there are wrong labels in your dataset (got labels by using some conversion script, marked with a third party tool, ...). Always check your dataset by using: desirable that your training dataset include images with objects at diffrent: scales, rotations, lightings, from different sides, on different backgrounds you should preferably have 2000 images for each class or more desirable that your training dataset include images with non labeled objects that you do not want to detect negative samples without bounded box (empty .txt files) use as many images of negative samples as there are images with objects for training with a large number of objects in each image, add the parameter max 200 or higher value in the last layer region in your cfg file for training for small objects set layers 1, 11 instead of and set stride 4 instead of General rule your training dataset should include such a set of relative sizes of objects that you want to detect: train_network_width train_obj_width / train_image_width detection_network_width detection_obj_width / detection_image_width train_network_height train_obj_height / train_image_height detection_network_height detection_obj_height / detection_image_height to speedup training (with decreasing detection accuracy) do Fine Tuning instead of Transfer Learning, set param stopbackward 1 here: 2. After training for detection: Increase network resolution by set in your .cfg file ( height 608 and width 608 ) or ( height 832 and width 832 ) or (any value multiple of 32) this increases the precision and makes it possible to detect small objects: link you do not need to train the network again, just use .weights file already trained for 416x416 resolution if error Out of memory occurs then in .cfg file you should increase subdivisions 16 , 32 or 64: link How to mark bounded boxes of objects and create annotation files: Here you can find repository with GUI software for marking bounded boxes of objects and generating annotation files for Yolo v2 & v3: With example of: train.txt , obj.names , obj.data , yolo obj.cfg , air 1 6 .txt , bird 1 4 .txt for 2 classes of objects (air, bird) and train_obj.cmd with example how to train this image set with Yolo v2 & v3 Using Yolo9000 Simultaneous detection and classification of 9000 objects: yolo9000.weights (186 MB Yolo9000 Model) requires 4 GB GPU RAM: yolo9000.cfg cfg file of the Yolo9000, also there are paths to the 9k.tree and coco9k.map 9k.tree WordTree of 9418 categories , if parent_id 1 then this label hasn't parent: coco9k.map map 80 categories from MSCOCO to WordTree 9k.tree : combine9k.data data file, there are paths to: 9k.labels , 9k.names , inet9k.map , (change path to your combine9k.train.list ): 9k.labels 9418 labels of objects: 9k.names 9418 names of objects: inet9k.map map 200 categories from ImageNet to WordTree 9k.tree : How to use Yolo as DLL 1. To compile Yolo as C++ DLL file yolo_cpp_dll.dll open in MSVS2015 file build\darknet\yolo_cpp_dll.sln , set x64 and Release , and do the: Build > Build yolo_cpp_dll You should have installed CUDA 9.1 To use cuDNN do: (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions, and add at the beginning of line: CUDNN; 2. To use Yolo as DLL file in your C++ console application open in MSVS2015 file build\darknet\yolo_console_dll.sln , set x64 and Release , and do the: Build > Build yolo_console_dll you can run your console application from Windows Explorer build\darknet\x64\yolo_console_dll.exe or you can run from MSVS2015 (before this you should copy 2 files yolo voc.cfg and yolo voc.weights to the directory build\darknet\ ) after launching your console application and entering the image file name you will see info for each object: to use simple OpenCV GUI you should uncomment line // define OPENCV in yolo_console_dll.cpp file: link you can see source code of simple example for detection on the video file: link yolo_cpp_dll.dll API: link class Detector { public: Detector(std::string cfg_filename, std::string weight_filename, int gpu_id 0); Detector(); std::vector detect(std::string image_filename, float thresh 0.2, bool use_mean false); std::vector detect(image_t img, float thresh 0.2, bool use_mean false); static image_t load_image(std::string image_filename); static void free_image(image_t m); ifdef OPENCV std::vector detect(cv::Mat mat, float thresh 0.2, bool use_mean false); endif }; Prueba ejemplo: Ejeciutar: darknet.exe detector train cfg/obj.data cfg/yolo obj.cfg darknet19_448.conv.23",Object Detection,Object Detection 2716,Computer Vision,Computer Vision,Computer Vision,"Yolo v3 and Yolo v2 for Windows and Linux (neural network for object detection) Tensor Cores can be used on Linux and Windows CircleCI 1. How to use ( how to use) 2. How to compile on Linux ( how to compile on linux) 3. How to compile on Windows ( how to compile on windows) 4. How to train (Pascal VOC Data) ( how to train pascal voc data) 5. How to train (to detect your custom objects) ( how to train to detect your custom objects) 6. When should I stop training ( when should i stop training) 7. How to calculate mAP on PascalVOC 2007 ( how to calculate map on pascalvoc 2007) 8. How to improve object detection ( how to improve object detection) 9. How to mark bounded boxes of objects and create annotation files ( how to mark bounded boxes of objects and create annotation files) 10. Using Yolo9000 ( using yolo9000) 11. How to use Yolo as DLL ( how to use yolo as dll) ! Darknet Logo ! map_fps mAP (AP50) YOLOv3 spp (is not indicated) better than YOLOv3 mAP 60.6%, FPS 20: Yolo v3 source chart for the RetinaNet on MS COCO got from Table 1 (e): Yolo v2 on Pascal VOC 2007: Yolo v2 on Pascal VOC 2012 (comp4): You Only Look Once: Unified, Real Time Object Detection (versions 2 & 3) A Yolo cross platform Windows and Linux version (for object detection). Contributtors: This repository is forked from Linux version: More details: This repository supports: both Windows and Linux both OpenCV 2.x.x and OpenCV 7.5 also create SO library on Linux and DLL library on Windows Requires: Linux GCC> 4.9 or Windows MS Visual Studio 2015 (v140) : (or offline ISO image ) CUDA 9.1 : OpenCV 3.4.0 : or OpenCV 2.4.13 : OpenCV allows to show image or video detection in the window and store result to file that specified in command line out_filename res.avi GPU with CC > 3.0 : Pre trained models for different cfg files can be downloaded from (smaller > faster & lower quality): yolov3 spp.cfg (240 MB COCO Yolo v3 ) requires 4 GB GPU RAM: yolov3.cfg (236 MB COCO Yolo v3 ) requires 4 GB GPU RAM: yolov3 tiny.cfg (34 MB COCO Yolo v3 tiny ) requires 1 GB GPU RAM: yolov2.cfg (194 MB COCO Yolo v2) requires 4 GB GPU RAM: yolo voc.cfg (194 MB VOC Yolo v2) requires 4 GB GPU RAM: yolov2 tiny.cfg (43 MB COCO Yolo v2) requires 1 GB GPU RAM: yolov2 tiny voc.cfg (60 MB VOC Yolo v2) requires 1 GB GPU RAM: yolo9000.cfg (186 MB Yolo9000 model) requires 4 GB GPU RAM: Put it near compiled: darknet.exe You can get cfg files by path: darknet/cfg/ Examples of results: Everything Is AWESOME Others: How to use: Example of usage in cmd files from build\darknet\x64\ : darknet_yolo_v3.cmd initialization with 236 MB Yolo v3 COCO model yolov3.weights & yolov3.cfg and show detection on the image: dog.jpg darknet_voc.cmd initialization with 194 MB VOC model yolo voc.weights & yolo voc.cfg and waiting for entering the name of the image file darknet_demo_voc.cmd initialization with 194 MB VOC model yolo voc.weights & yolo voc.cfg and play your video file which you must rename to: test.mp4 darknet_demo_store.cmd initialization with 194 MB VOC model yolo voc.weights & yolo voc.cfg and play your video file which you must rename to: test.mp4, and store result to: res.avi darknet_net_cam_voc.cmd initialization with 194 MB VOC model, play video from network video camera mjpeg stream (also from you phone) darknet_web_cam_voc.cmd initialization with 194 MB VOC model, play video from Web Camera number 0 darknet_coco_9000.cmd initialization with 186 MB Yolo9000 COCO model, and show detection on the image: dog.jpg darknet_coco_9000_demo.cmd initialization with 186 MB Yolo9000 COCO model, and show detection on the video (if it is present): street4k.mp4, and store result to: res.avi How to use on the command line: On Linux use ./darknet instead of darknet.exe , like this: ./darknet detector test ./cfg/coco.data ./cfg/yolov3.cfg ./yolov3.weights Yolo v3 COCO image: darknet.exe detector test data/coco.data cfg/yolov3.cfg yolov3.weights i 0 thresh 0.25 Alternative method Yolo v3 COCO image: darknet.exe detect cfg/yolov3.cfg yolov3.weights i 0 thresh 0.25 Output coordinates of objects: darknet.exe detector test data/coco.data yolov3.cfg yolov3.weights thresh 0.25 dog.jpg ext_output 194 MB VOC model image: darknet.exe detector test data/voc.data yolo voc.cfg yolo voc.weights i 0 194 MB VOC model video: darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights test.mp4 i 0 194 MB VOC model save result to the file res.avi : darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights test.mp4 i 0 out_filename res.avi Alternative method 194 MB VOC model video: darknet.exe yolo demo yolo voc.cfg yolo voc.weights test.mp4 i 0 43 MB VOC model for video: darknet.exe detector demo data/coco.data cfg/yolov2 tiny.cfg yolov2 tiny.weights test.mp4 i 0 Yolo v3 236 MB COCO for net videocam Smart WebCam: darknet.exe detector demo data/coco.data cfg/yolov3.cfg yolov3.weights i 0 194 MB VOC model for net videocam Smart WebCam: darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights i 0 194 MB VOC model WebCamera 0: darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights c 0 186 MB Yolo9000 image: darknet.exe detector test cfg/combine9k.data yolo9000.cfg yolo9000.weights Remeber to put data/9k.tree and data/coco9k.map under the same folder of your app if you use the cpp api to build an app To process a list of images data/train.txt and save results of detection to result.txt use: darknet.exe detector test data/voc.data yolo voc.cfg yolo voc.weights dont_show ext_output result.txt For using network video camera mjpeg stream with any Android smartphone: 1. Download for Android phone mjpeg stream soft: IP Webcam / Smart WebCam Smart WebCam preferably: IP Webcam: 2. Connect your Android phone to computer by WiFi (through a WiFi router) or USB 3. Start Smart WebCam on your phone 4. Replace the address below, on shown in the phone application (Smart WebCam) and launch: 194 MB COCO model: darknet.exe detector demo data/coco.data yolo.cfg yolo.weights i 0 194 MB VOC model: darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights i 0 How to compile on Linux: Just do make in the darknet directory. Before make, you can set such options in the Makefile : link GPU 1 to build with CUDA to accelerate by using GPU (CUDA should be in /usr/local/cuda ) CUDNN 1 to build with cuDNN v5 v7 to accelerate training by using GPU (cuDNN should be in /usr/local/cudnn ) CUDNN_HALF 1 to build for Tensor Cores (on Titan V / Tesla V100 / DGX 2 and later) speedup Detection 3x, Training 2x OPENCV 1 to build with OpenCV 3.x/2.4.x allows to detect on video files and video streams from network cameras or web cams DEBUG 1 to bould debug version of Yolo OPENMP 1 to build with OpenMP support to accelerate Yolo by using multi core CPU LIBSO 1 to build a library darknet.so and binary runable file uselib that uses this library. Or you can try to run so LD_LIBRARY_PATH ./:$LD_LIBRARY_PATH ./uselib test.mp4 How to use this SO library from your own code you can look at C++ example: or use in such a way: LD_LIBRARY_PATH ./:$LD_LIBRARY_PATH ./uselib data/coco.names cfg/yolov3.cfg yolov3.weights test.mp4 How to compile on Windows: 1. If you have MSVS 2015, CUDA 9.1, cuDNN 7.0 and OpenCV 3.x (with paths: C:\opencv_3.0\opencv\build\include & C:\opencv_3.0\opencv\build\x64\vc14\lib ), then start MSVS, open build\darknet\darknet.sln , set x64 and Release and do the: Build > Build darknet. NOTE: If installing OpenCV, use OpenCV 3.4.0 or earlier. This is a bug in OpenCV 3.4.1 in the C API (see 500 ). 1.1. Find files opencv_world320.dll and opencv_ffmpeg320_64.dll (or opencv_world340.dll and opencv_ffmpeg340_64.dll ) in C:\opencv_3.0\opencv\build\x64\vc14\bin and put it near with darknet.exe 1.2 Check that there are bin and include folders in the C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1 if aren't, then copy them to this folder from the path where is CUDA installed 1.3. To install CUDNN (speedup neural network), do the following: download and install cuDNN 7.0 for CUDA 9.1 : add Windows system variable cudnn with path to CUDNN: 1.4. If you want to build without CUDNN then: open \darknet.sln > (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions, and remove this: CUDNN; 2. If you have other version of CUDA (not 9.1) then open build\darknet\darknet.vcxproj by using Notepad, find 2 places with CUDA 9.1 and change it to your CUDA version, then do step 1 3. If you don't have GPU , but have MSVS 2015 and OpenCV 3.0 (with paths: C:\opencv_3.0\opencv\build\include & C:\opencv_3.0\opencv\build\x64\vc14\lib ), then start MSVS, open build\darknet\darknet_no_gpu.sln , set x64 and Release , and do the: Build > Build darknet_no_gpu 4. If you have OpenCV 2.4.13 instead of 3.0 then you should change pathes after \darknet.sln is opened 4.1 (right click on project) > properties > C/C++ > General > Additional Include Directories: C:\opencv_2.4.13\opencv\build\include 4.2 (right click on project) > properties > Linker > General > Additional Library Directories: C:\opencv_2.4.13\opencv\build\x64\vc14\lib 5. If you have GPU with Tensor Cores (nVidia Titan V / Tesla V100 / DGX 2 and later) speedup Detection 3x, Training 2x: \darknet.sln > (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions, and add here: CUDNN_HALF; Note: CUDA must be installed only after that MSVS2015 had been installed. How to compile (custom): Also, you can to create your own darknet.sln & darknet.vcxproj , this example for CUDA 9.1 and OpenCV 3.0 Then add to your created project: (right click on project) > properties > C/C++ > General > Additional Include Directories, put here: C:\opencv_3.0\opencv\build\include;..\..\3rdparty\include;%(AdditionalIncludeDirectories);$(CudaToolkitIncludeDir);$(cudnn)\include (right click on project) > Build dependecies > Build Customizations > set check on CUDA 9.1 or what version you have for example as here: add to project all .c & .cu files and file from \src (right click on project) > properties > Linker > General > Additional Library Directories, put here: C:\opencv_3.0\opencv\build\x64\vc14\lib;$(CUDA_PATH)lib\$(PlatformName);$(cudnn)\lib\x64;%(AdditionalLibraryDirectories) (right click on project) > properties > Linker > Input > Additional dependecies, put here: ..\..\3rdparty\lib\x64\pthreadVC2.lib;cublas.lib;curand.lib;cudart.lib;cudnn.lib;%(AdditionalDependencies) (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions OPENCV;_TIMESPEC_DEFINED;_CRT_SECURE_NO_WARNINGS;_CRT_RAND_S;WIN32;NDEBUG;_CONSOLE;_LIB;%(PreprocessorDefinitions) compile to .exe (X64 & Release) and put .dll s near with .exe: pthreadVC2.dll, pthreadGC2.dll from \3rdparty\dll\x64 cusolver64_91.dll, curand64_91.dll, cudart64_91.dll, cublas64_91.dll 91 for CUDA 9.1 or your version, from C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1\bin For OpenCV 3.2: opencv_world320.dll and opencv_ffmpeg320_64.dll from C:\opencv_3.0\opencv\build\x64\vc14\bin For OpenCV 2.4.13: opencv_core2413.dll , opencv_highgui2413.dll and opencv_ffmpeg2413_64.dll from C:\opencv_2.4.13\opencv\build\x64\vc14\bin How to train (Pascal VOC Data): 1. Download pre trained weights for the convolutional layers (154 MB): and put to the directory build\darknet\x64 2. Download The Pascal VOC Data and unpack it to directory build\darknet\x64\data\voc will be created dir build\darknet\x64\data\voc\VOCdevkit\ : 2.1 Download file voc_label.py to dir build\darknet\x64\data\voc : 3. Download and install Python for Windows: 4. Run command: python build\darknet\x64\data\voc\voc_label.py (to generate files: 2007_test.txt, 2007_train.txt, 2007_val.txt, 2012_train.txt, 2012_val.txt) 5. Run command: type 2007_train.txt 2007_val.txt 2012_ .txt > train.txt 6. Set batch 64 and subdivisions 8 in the file yolov3 voc.cfg : link 7. Start training by using train_voc.cmd or by using the command line: darknet.exe detector train data/voc.data cfg/yolov3 voc.cfg darknet53.conv.74 ( Note: To disable Loss Window use flag dont_show . If you are using CPU, try darknet_no_gpu.exe instead of darknet.exe .) If required change pathes in the file build\darknet\x64\data\voc.data More information about training by the link: Note: If during training you see nan values for avg (loss) field then training goes wrong, but if nan is in some other lines then training goes well. How to train with multi GPU: 1. Train it first on 1 GPU for like 1000 iterations: darknet.exe detector train data/voc.data cfg/yolov3 voc.cfg darknet53.conv.74 2. Adjust the learning rate ( cfg/yolov3 voc.cfg ) to fit the amount of GPUs. The learning rate should be equal to 0.001 , regardless of how many GPUs are used for training. So learning_rate GPUs 0.001 . For 4 GPUs adjust the value to learning_rate 0.00025 . 3. Then stop and by using partially trained model /backup/yolov3 voc_1000.weights run training with multigpu (up to 4 GPUs): darknet.exe detector train data/voc.data cfg/yolov3 voc.cfg /backup/yolov3 voc_1000.weights gpus 0,1,2,3 How to train (to detect your custom objects): (to train old Yolo v2 yolov2 voc.cfg , yolov2 tiny voc.cfg , yolo voc.cfg , yolo voc.2.0.cfg , ... click by the link ) Training Yolo v3: 1. Create file yolo obj.cfg with the same content as in yolov3.cfg (or copy yolov3.cfg to yolo obj.cfg) and: change line batch to batch 64 change line subdivisions to subdivisions 8 change line classes 80 to your number of objects in each of 3 yolo layers: change filters 255 to filters (classes + 5)x3 in the 3 convolutional before each yolo layer So if classes 1 then should be filters 18 . If classes 2 then write filters 21 . (Do not write in the cfg file: filters (classes + 5)x3) (Generally filters depends on the classes , coords and number of mask s, i.e. filters (classes + coords + 1) , where mask is indices of anchors. If mask is absence, then filters (classes + coords + 1) num ) So for example, for 2 objects, your file yolo obj.cfg should differ from yolov3.cfg in such lines in each of 3 yolo layers: convolutional filters 21 region classes 2 2. Create file obj.names in the directory build\darknet\x64\data\ , with objects names each in new line 3. Create file obj.data in the directory build\darknet\x64\data\ , containing (where classes number of objects ): classes 2 train data/train.txt valid data/test.txt names data/obj.names backup backup/ 4. Put image files (.jpg) of your objects in the directory build\darknet\x64\data\obj\ 5. You should label each object on images from your dataset. Use this visual GUI software for marking bounded boxes of objects and generating annotation files for Yolo v2 & v3: It will create .txt file for each .jpg image file in the same directory and with the same name, but with .txt extension, and put to file: object number and object coordinates on this image, for each object in new line: Where: integer object number from 0 to (classes 1) float values relative to width and height of image, it can be equal from (0.0 to 1.0 for example: / or / atention: are center of rectangle (are not top left corner) For example for img1.jpg you will be created img1.txt containing: 1 0.716797 0.395833 0.216406 0.147222 0 0.687109 0.379167 0.255469 0.158333 1 0.420312 0.395833 0.140625 0.166667 6. Create file train.txt in directory build\darknet\x64\data\ , with filenames of your images, each filename in new line, with path relative to darknet.exe , for example containing: data/obj/img1.jpg data/obj/img2.jpg data/obj/img3.jpg 7. Download pre trained weights for the convolutional layers (154 MB): and put to the directory build\darknet\x64 8. Start training by using the command line: darknet.exe detector train data/obj.data yolo obj.cfg darknet53.conv.74 (file yolo obj_xxx.weights will be saved to the build\darknet\x64\backup\ for each 100 iterations) (To disable Loss Window use darknet.exe detector train data/obj.data yolo obj.cfg darknet53.conv.74 dont_show , if you train on computer without monitor like a cloud Amazaon EC2) 9. After training is complete get result yolo obj_final.weights from path build\darknet\x64\backup\ After each 100 iterations you can stop and later start training from this point. For example, after 2000 iterations you can stop training, and later just copy yolo obj_2000.weights from build\darknet\x64\backup\ to build\darknet\x64\ and start training using: darknet.exe detector train data/obj.data yolo obj.cfg yolo obj_2000.weights (in the original repository the weights file is saved only once every 10 000 iterations if(iterations > 1000) ) Also you can get result earlier than all 45000 iterations. Note: If during training you see nan values for avg (loss) field then training goes wrong, but if nan is in some other lines then training goes well. Note: If you changed width or height in your cfg file, then new width and height must be divisible by 32. Note: After training use such command for detection: darknet.exe detector test data/obj.data yolo obj.cfg yolo obj_8000.weights Note: if error Out of memory occurs then in .cfg file you should increase subdivisions 16 , 32 or 64: link How to train tiny yolo (to detect your custom objects): Do all the same steps as for the full yolo model as described above. With the exception of: Download default weights file for yolov3 tiny: Get pre trained weights yolov3 tiny.conv.15 using command: darknet.exe partial cfg/yolov3 tiny.cfg yolov3 tiny.weights yolov3 tiny.conv.15 15 Make your custom model yolov3 tiny obj.cfg based on cfg/yolov3 tiny_obj.cfg instead of yolov3.cfg Start training: darknet.exe detector train data/obj.data yolov3 tiny obj.cfg yolov3 tiny.conv.15 For training Yolo based on other models ( DenseNet201 Yolo or ResNet50 Yolo ), you can download and get pre trained weights as showed in this file: If you made you custom model that isn't based on other models, then you can train it without pre trained weights, then will be used random initial weights. When should I stop training: Usually sufficient 2000 iterations for each class(object). But for a more precise definition when you should stop training, use the following manual: 1. During training, you will see varying indicators of error, and you should stop when no longer decreases 0.XXXXXXX avg : > Region Avg IOU: 0.798363, Class: 0.893232, Obj: 0.700808, No Obj: 0.004567, Avg Recall: 1.000000, count: 8 > Region Avg IOU: 0.800677, Class: 0.892181, Obj: 0.701590, No Obj: 0.004574, Avg Recall: 1.000000, count: 8 > > 9002 : 0.211667, 0.060730 avg , 0.001000 rate, 3.868000 seconds, 576128 images > Loaded: 0.000000 seconds 9002 iteration number (number of batch) 0.060730 avg average loss (error) the lower, the better When you see that average loss 0.xxxxxx avg no longer decreases at many iterations then you should stop training. 2. Once training is stopped, you should take some of last .weights files from darknet\build\darknet\x64\backup and choose the best of them: For example, you stopped training after 9000 iterations, but the best result can give one of previous weights (7000, 8000, 9000). It can happen due to overfitting. Overfitting is case when you can detect objects on images from training dataset, but can't detect objects on any others images. You should get weights from Early Stopping Point : ! Overfitting To get weights from Early Stopping Point: 2.1. At first, in your file obj.data you must specify the path to the validation dataset valid valid.txt (format of valid.txt as in train.txt ), and if you haven't validation images, just copy data\train.txt to data\valid.txt . 2.2 If training is stopped after 9000 iterations, to validate some of previous weights use this commands: (If you use another GitHub repository, then use darknet.exe detector recall ... instead of darknet.exe detector map ...) darknet.exe detector map data/obj.data yolo obj.cfg backup\yolo obj_7000.weights darknet.exe detector map data/obj.data yolo obj.cfg backup\yolo obj_8000.weights darknet.exe detector map data/obj.data yolo obj.cfg backup\yolo obj_9000.weights And comapre last output lines for each weights (7000, 8000, 9000): Choose weights file with the highest IoU (intersect of union) and mAP (mean average precision) For example, bigger IOU gives weights yolo obj_8000.weights then use this weights for detection . Example of custom object detection: darknet.exe detector test data/obj.data yolo obj.cfg yolo obj_8000.weights IoU (intersect of union) average instersect of union of objects and detections for a certain threshold 0.24 mAP (mean average precision) mean value of average precisions for each class, where average precision is average value of 11 points on PR curve for each possible threshold (each probability of detection) for the same class (Precision Recall in terms of PascalVOC, where Precision TP/(TP+FP) and Recall TP/(TP+FN) ), page 11: mAP is default metric of precision in the PascalVOC competition, this is the same as AP50 metric in the MS COCO competition. In terms of Wiki, indicators Precision and Recall have a slightly different meaning than in the PascalVOC competition, but IoU always has the same meaning . ! precision_recall_iou How to calculate mAP on PascalVOC 2007: 1. To calculate mAP (mean average precision) on PascalVOC 2007 test: Download PascalVOC dataset, install Python 3.x and get file 2007_test.txt as described here: Then download file to the dir build\darknet\x64\data\ then run voc_label_difficult.py to get the file difficult_2007_test.txt Remove symbol from this line to un comment it: Then there are 2 ways to get mAP: 1. Using Darknet + Python: run the file build/darknet/x64/calc_mAP_voc_py.cmd you will get mAP for yolo voc.cfg model, mAP 75.9% 2. Using this fork of Darknet: run the file build/darknet/x64/calc_mAP.cmd you will get mAP for yolo voc.cfg model, mAP 75.8% (The article specifies the value of mAP 76.8% for YOLOv2 416×416, page 4 table 3: We get values lower perhaps due to the fact that the model was trained on a slightly different source code than the code on which the detection is was done) if you want to get mAP for tiny yolo voc.cfg model, then un comment line for tiny yolo voc.cfg and comment line for yolo voc.cfg in the .cmd file if you have Python 2.x instead of Python 3.x, and if you use Darknet+Python way to get mAP, then in your cmd file use reval_voc.py and voc_eval.py instead of reval_voc_py3.py and voc_eval_py3.py from this directory: Custom object detection: Example of custom object detection: darknet.exe detector test data/obj.data yolo obj.cfg yolo obj_8000.weights ! Yolo_v2_training ! Yolo_v2_training How to improve object detection: 1. Before training: set flag random 1 in your .cfg file it will increase precision by training Yolo for different resolutions: link increase network resolution in your .cfg file ( height 608 , width 608 or any value multiple of 32) it will increase precision recalculate anchors for your dataset for width and height from cfg file: darknet.exe detector calc_anchors data/obj.data num_of_clusters 9 width 416 height 416 then set the same 9 anchors in each of 3 yolo layers in your cfg file check that each object are mandatory labeled in your dataset no one object in your data set should not be without label. In the most training issues there are wrong labels in your dataset (got labels by using some conversion script, marked with a third party tool, ...). Always check your dataset by using: desirable that your training dataset include images with objects at diffrent: scales, rotations, lightings, from different sides, on different backgrounds you should preferably have 2000 different images for each class or more, and you should train 2000 classes iterations or more desirable that your training dataset include images with non labeled objects that you do not want to detect negative samples without bounded box (empty .txt files) use as many images of negative samples as there are images with objects for training with a large number of objects in each image, add the parameter max 200 or higher value in the last yolo layer or region layer in your cfg file (the global maximum number of objects that can be detected by YoloV3 is 0,0615234375 (width height) where are width and height are parameters from net section in cfg file) for training for small objects set layers 1, 11 instead of and set stride 4 instead of If you train the model to distinguish Left and Right objects as separate classes (left/right hand, left/right turn on road signs, ...) then for disabling flip data augmentation add flip 0 here: General rule your training dataset should include such a set of relative sizes of objects that you want to detect: train_network_width train_obj_width / train_image_width detection_network_width detection_obj_width / detection_image_width train_network_height train_obj_height / train_image_height detection_network_height detection_obj_height / detection_image_height to speedup training (with decreasing detection accuracy) do Fine Tuning instead of Transfer Learning, set param stopbackward 1 here: 2. After training for detection: Increase network resolution by set in your .cfg file ( height 608 and width 608 ) or ( height 832 and width 832 ) or (any value multiple of 32) this increases the precision and makes it possible to detect small objects: link you do not need to train the network again, just use .weights file already trained for 416x416 resolution if error Out of memory occurs then in .cfg file you should increase subdivisions 16 , 32 or 64: link How to mark bounded boxes of objects and create annotation files: Here you can find repository with GUI software for marking bounded boxes of objects and generating annotation files for Yolo v2 & v3: With example of: train.txt , obj.names , obj.data , yolo obj.cfg , air 1 6 .txt , bird 1 4 .txt for 2 classes of objects (air, bird) and train_obj.cmd with example how to train this image set with Yolo v2 & v3 Using Yolo9000 Simultaneous detection and classification of 9000 objects: yolo9000.weights (186 MB Yolo9000 Model) requires 4 GB GPU RAM: yolo9000.cfg cfg file of the Yolo9000, also there are paths to the 9k.tree and coco9k.map 9k.tree WordTree of 9418 categories , if parent_id 1 then this label hasn't parent: coco9k.map map 80 categories from MSCOCO to WordTree 9k.tree : combine9k.data data file, there are paths to: 9k.labels , 9k.names , inet9k.map , (change path to your combine9k.train.list ): 9k.labels 9418 labels of objects: 9k.names 9418 names of objects: inet9k.map map 200 categories from ImageNet to WordTree 9k.tree : How to use Yolo as DLL 1. To compile Yolo as C++ DLL file yolo_cpp_dll.dll open in MSVS2015 file build\darknet\yolo_cpp_dll.sln , set x64 and Release , and do the: Build > Build yolo_cpp_dll You should have installed CUDA 9.1 To use cuDNN do: (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions, and add at the beginning of line: CUDNN; 2. To use Yolo as DLL file in your C++ console application open in MSVS2015 file build\darknet\yolo_console_dll.sln , set x64 and Release , and do the: Build > Build yolo_console_dll you can run your console application from Windows Explorer build\darknet\x64\yolo_console_dll.exe use this command : yolo_console_dll.exe data/coco.names yolov3.cfg yolov3.weights test.mp4 or you can run from MSVS2015 (before this you should copy 2 files yolo voc.cfg and yolo voc.weights to the directory build\darknet\ ) after launching your console application and entering the image file name you will see info for each object: to use simple OpenCV GUI you should uncomment line // define OPENCV in yolo_console_dll.cpp file: link you can see source code of simple example for detection on the video file: link yolo_cpp_dll.dll API: link class Detector { public: Detector(std::string cfg_filename, std::string weight_filename, int gpu_id 0); Detector(); std::vector detect(std::string image_filename, float thresh 0.2, bool use_mean false); std::vector detect(image_t img, float thresh 0.2, bool use_mean false); static image_t load_image(std::string image_filename); static void free_image(image_t m); ifdef OPENCV std::vector detect(cv::Mat mat, float thresh 0.2, bool use_mean false); endif };",Object Detection,Object Detection 2718,Computer Vision,Computer Vision,Computer Vision,"This branch of Caffe extends BVLC led Caffe by adding Windows support and other functionalities commonly used by Microsoft's researchers, such as managed code wrapper, Faster RCNN , R FCN , etc. Update : this branch is not actively maintained. Please checkout this for more active Windows support. Caffe Linux (CPU) Windows (CPU) Travis Build Status AppVeyor Build Status License (LICENSE) Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by the Berkeley Vision and Learning Center ( BVLC ) and community contributors. Check out the project site for all the details like DIY Deep Learning for Vision with Caffe Tutorial Documentation BVLC reference models and the community model zoo Installation instructions and step by step examples. Windows Setup Requirements : Visual Studio 2013 Pre Build Steps Copy .\windows\CommonSettings.props.example to .\windows\CommonSettings.props By defaults Windows build requires CUDA and cuDNN libraries. Both can be disabled by adjusting build variables in .\windows\CommonSettings.props . Python support is disabled by default, but can be enabled via .\windows\CommonSettings.props as well. 3rd party dependencies required by Caffe are automatically resolved via NuGet. CUDA Download CUDA Toolkit 7.5 from nVidia website . If you don't have CUDA installed, you can experiment with CPU_ONLY build. In .\windows\CommonSettings.props set CpuOnlyBuild to true and set UseCuDNN to false . cuDNN Download cuDNN v4 or cuDNN v5 from nVidia website . Unpack downloaded zip to %CUDA_PATH% (environment variable set by CUDA installer). Alternatively, you can unpack zip to any location and set CuDnnPath to point to this location in .\windows\CommonSettings.props . CuDnnPath defined in .\windows\CommonSettings.props . Also, you can disable cuDNN by setting UseCuDNN to false in the property file. Python To build Caffe Python wrapper set PythonSupport to true in .\windows\CommonSettings.props . Download Miniconda 2.7 64 bit Windows installer from Miniconda website . Install for all users and add Python to PATH (through installer). Run the following commands from elevated command prompt: conda install yes numpy scipy matplotlib scikit image pip pip install protobuf Remark After you have built solution with Python support, in order to use it you have to either: set PythonPath environment variable to point to \Build\x64\Release\pycaffe , or copy folder \Build\x64\Release\pycaffe\caffe under \lib\site packages . Matlab To build Caffe Matlab wrapper set MatlabSupport to true and MatlabDir to the root of your Matlab installation in .\windows\CommonSettings.props . Remark After you have built solution with Matlab support, in order to use it you have to: add the generated matcaffe folder to Matlab search path, and add \Build\x64\Release to your system path. Build Now, you should be able to build .\windows\Caffe.sln License and Citation Caffe is released under the BSD 2 Clause license . The BVLC reference models are released for unrestricted use. Please cite Caffe in your publications if it helps your research: @article{jia2014caffe, Author {Jia, Yangqing and Shelhamer, Evan and Donahue, Jeff and Karayev, Sergey and Long, Jonathan and Girshick, Ross and Guadarrama, Sergio and Darrell, Trevor}, Journal {arXiv preprint arXiv:1408.5093}, Title {Caffe: Convolutional Architecture for Fast Feature Embedding}, Year {2014} }",Object Detection,Object Detection 2732,Computer Vision,Computer Vision,Computer Vision,"Yolo v3 and Yolo v2 for Windows and Linux (neural network for object detection) Tensor Cores can be used on Linux and Windows CircleCI 0. Improvements in this repository ( improvements in this repository) 1. How to use ( how to use) 2. How to compile on Linux ( how to compile on linux) 3. How to compile on Windows ( how to compile on windows) 4. How to train (Pascal VOC Data) ( how to train pascal voc data) 5. How to train (to detect your custom objects) ( how to train to detect your custom objects) 6. When should I stop training ( when should i stop training) 7. How to calculate mAP on PascalVOC 2007 ( how to calculate map on pascalvoc 2007) 8. How to improve object detection ( how to improve object detection) 9. How to mark bounded boxes of objects and create annotation files ( how to mark bounded boxes of objects and create annotation files) 10. Using Yolo9000 ( using yolo9000) 11. How to use Yolo as DLL ( how to use yolo as dll) ! Darknet Logo ! map_fps mAP (AP50) YOLOv3 spp (is not indicated) better than YOLOv3 mAP 60.6%, FPS 20: Yolo v3 source chart for the RetinaNet on MS COCO got from Table 1 (e): Yolo v2 on Pascal VOC 2007: Yolo v2 on Pascal VOC 2012 (comp4): You Only Look Once: Unified, Real Time Object Detection (versions 2 & 3) A Yolo cross platform Windows and Linux version (for object detection). Contributtors: This repository is forked from Linux version: More details: This repository supports: both Windows and Linux both OpenCV 2.x.x and OpenCV 7.5 also create SO library on Linux and DLL library on Windows Requires: Linux GCC> 4.9 or Windows MS Visual Studio 2015 (v140) : (or offline ISO image ) CUDA 9.1 : OpenCV 3.3.0 : or OpenCV 2.4.13 : OpenCV allows to show image or video detection in the window and store result to file that specified in command line out_filename res.avi GPU with CC > 3.0 : Pre trained models for different cfg files can be downloaded from (smaller > faster & lower quality): yolov3 openimages.cfg (247 MB COCO Yolo v3 ) requires 4 GB GPU RAM: yolov3 spp.cfg (240 MB COCO Yolo v3 ) requires 4 GB GPU RAM: yolov3.cfg (236 MB COCO Yolo v3 ) requires 4 GB GPU RAM: yolov3 tiny.cfg (34 MB COCO Yolo v3 tiny ) requires 1 GB GPU RAM: yolov2.cfg (194 MB COCO Yolo v2) requires 4 GB GPU RAM: yolo voc.cfg (194 MB VOC Yolo v2) requires 4 GB GPU RAM: yolov2 tiny.cfg (43 MB COCO Yolo v2) requires 1 GB GPU RAM: yolov2 tiny voc.cfg (60 MB VOC Yolo v2) requires 1 GB GPU RAM: yolo9000.cfg (186 MB Yolo9000 model) requires 4 GB GPU RAM: Put it near compiled: darknet.exe You can get cfg files by path: darknet/cfg/ Examples of results: Everything Is AWESOME Others: Improvements in this repository added support for Windows improved binary neural network performance 2x 4x times for Detection on CPU and GPU if you trained your own weights by using this XNOR net model (bit 1 inference) : improved neural network performance 7% by fusing 2 layers into 1: Convolutional + Batch norm improved neural network performance Detection 3x times , Training 2 x times on GPU Volta (Tesla V100, Titan V, ...) using Tensor Cores if CUDNN_HALF defined in the Makefile or darknet.sln improved performance 1.2x times on FullHD, 2x times on 4K, for detection on the video (file/stream) using darknet detector demo ... improved performance 3.5 X times of data augmentation for training (using OpenCV SSE/AVX functions instead of hand written functions) removes bottleneck for training on multi GPU or GPU Volta improved performance of detection and training on Intel CPU with AVX (Yolo v3 85% , Yolo v2 10%) fixed usage of reorg layer optimized memory allocation during network resizing when random 1 optimized initialization GPU for detection we use batch 1 initially instead of re init with batch 1 added correct calculation of mAP, F1, IoU, Precision Recall using command darknet detector map ... added drawing of chart of average loss during training added calculation of anchors for training added example of Detection and Tracking objects: fixed code for use Web cam on OpenCV 3.x run time tips and warnings if you use incorrect cfg file or dataset many other fixes of code... And added manual How to train Yolo v3/v2 (to detect your custom objects) ( how to train to detect your custom objects) Also, you might be interested in using a simplified repository where is implemented INT8 quantization (+30% speedup and 1% mAP reduced): How to use: Example of usage in cmd files from build\darknet\x64\ : darknet_yolo_v3.cmd initialization with 236 MB Yolo v3 COCO model yolov3.weights & yolov3.cfg and show detection on the image: dog.jpg darknet_voc.cmd initialization with 194 MB VOC model yolo voc.weights & yolo voc.cfg and waiting for entering the name of the image file darknet_demo_voc.cmd initialization with 194 MB VOC model yolo voc.weights & yolo voc.cfg and play your video file which you must rename to: test.mp4 darknet_demo_store.cmd initialization with 194 MB VOC model yolo voc.weights & yolo voc.cfg and play your video file which you must rename to: test.mp4, and store result to: res.avi darknet_net_cam_voc.cmd initialization with 194 MB VOC model, play video from network video camera mjpeg stream (also from you phone) darknet_web_cam_voc.cmd initialization with 194 MB VOC model, play video from Web Camera number 0 darknet_coco_9000.cmd initialization with 186 MB Yolo9000 COCO model, and show detection on the image: dog.jpg darknet_coco_9000_demo.cmd initialization with 186 MB Yolo9000 COCO model, and show detection on the video (if it is present): street4k.mp4, and store result to: res.avi How to use on the command line: On Linux use ./darknet instead of darknet.exe , like this: ./darknet detector test ./cfg/coco.data ./cfg/yolov3.cfg ./yolov3.weights Yolo v3 COCO image: darknet.exe detector test data/coco.data cfg/yolov3.cfg yolov3.weights i 0 thresh 0.25 Alternative method Yolo v3 COCO image: darknet.exe detect cfg/yolov3.cfg yolov3.weights i 0 thresh 0.25 Output coordinates of objects: darknet.exe detector test data/coco.data yolov3.cfg yolov3.weights thresh 0.25 dog.jpg ext_output 194 MB VOC model image: darknet.exe detector test data/voc.data yolo voc.cfg yolo voc.weights i 0 194 MB VOC model video: darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights test.mp4 i 0 194 MB VOC model save result to the file res.avi : darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights test.mp4 i 0 out_filename res.avi Alternative method 194 MB VOC model video: darknet.exe yolo demo yolo voc.cfg yolo voc.weights test.mp4 i 0 43 MB VOC model for video: darknet.exe detector demo data/coco.data cfg/yolov2 tiny.cfg yolov2 tiny.weights test.mp4 i 0 Yolo v3 236 MB COCO for net videocam Smart WebCam: darknet.exe detector demo data/coco.data cfg/yolov3.cfg yolov3.weights i 0 194 MB VOC model for net videocam Smart WebCam: darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights i 0 194 MB VOC model WebCamera 0: darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights c 0 186 MB Yolo9000 image: darknet.exe detector test cfg/combine9k.data yolo9000.cfg yolo9000.weights Remeber to put data/9k.tree and data/coco9k.map under the same folder of your app if you use the cpp api to build an app To process a list of images data/train.txt and save results of detection to result.txt use: darknet.exe detector test data/voc.data yolo voc.cfg yolo voc.weights dont_show ext_output result.txt For using network video camera mjpeg stream with any Android smartphone: 1. Download for Android phone mjpeg stream soft: IP Webcam / Smart WebCam Smart WebCam preferably: IP Webcam: 2. Connect your Android phone to computer by WiFi (through a WiFi router) or USB 3. Start Smart WebCam on your phone 4. Replace the address below, on shown in the phone application (Smart WebCam) and launch: 194 MB COCO model: darknet.exe detector demo data/coco.data yolo.cfg yolo.weights i 0 194 MB VOC model: darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights i 0 How to compile on Linux: Just do make in the darknet directory. Before make, you can set such options in the Makefile : link GPU 1 to build with CUDA to accelerate by using GPU (CUDA should be in /usr/local/cuda ) CUDNN 1 to build with cuDNN v5 v7 to accelerate training by using GPU (cuDNN should be in /usr/local/cudnn ) CUDNN_HALF 1 to build for Tensor Cores (on Titan V / Tesla V100 / DGX 2 and later) speedup Detection 3x, Training 2x OPENCV 1 to build with OpenCV 3.x/2.4.x allows to detect on video files and video streams from network cameras or web cams DEBUG 1 to bould debug version of Yolo OPENMP 1 to build with OpenMP support to accelerate Yolo by using multi core CPU LIBSO 1 to build a library darknet.so and binary runable file uselib that uses this library. Or you can try to run so LD_LIBRARY_PATH ./:$LD_LIBRARY_PATH ./uselib test.mp4 How to use this SO library from your own code you can look at C++ example: or use in such a way: LD_LIBRARY_PATH ./:$LD_LIBRARY_PATH ./uselib data/coco.names cfg/yolov3.cfg yolov3.weights test.mp4 How to compile on Windows: 1. If you have MSVS 2015, CUDA 9.1, cuDNN 7.0 and OpenCV 3.x (with paths: C:\opencv_3.0\opencv\build\include & C:\opencv_3.0\opencv\build\x64\vc14\lib ), then start MSVS, open build\darknet\darknet.sln , set x64 and Release and do the: Build > Build darknet. NOTE: If installing OpenCV, use OpenCV 3.4.0 or earlier. This is a bug in OpenCV 3.4.1 in the C API (see 500 ). 1.1. Find files opencv_world320.dll and opencv_ffmpeg320_64.dll (or opencv_world340.dll and opencv_ffmpeg340_64.dll ) in C:\opencv_3.0\opencv\build\x64\vc14\bin and put it near with darknet.exe 1.2 Check that there are bin and include folders in the C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1 if aren't, then copy them to this folder from the path where is CUDA installed 1.3. To install CUDNN (speedup neural network), do the following: download and install cuDNN 7.0 for CUDA 9.1 : add Windows system variable cudnn with path to CUDNN: 1.4. If you want to build without CUDNN then: open \darknet.sln > (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions, and remove this: CUDNN; 2. If you have other version of CUDA (not 9.1) then open build\darknet\darknet.vcxproj by using Notepad, find 2 places with CUDA 9.1 and change it to your CUDA version, then do step 1 3. If you don't have GPU , but have MSVS 2015 and OpenCV 3.0 (with paths: C:\opencv_3.0\opencv\build\include & C:\opencv_3.0\opencv\build\x64\vc14\lib ), then start MSVS, open build\darknet\darknet_no_gpu.sln , set x64 and Release , and do the: Build > Build darknet_no_gpu 4. If you have OpenCV 2.4.13 instead of 3.0 then you should change pathes after \darknet.sln is opened 4.1 (right click on project) > properties > C/C++ > General > Additional Include Directories: C:\opencv_2.4.13\opencv\build\include 4.2 (right click on project) > properties > Linker > General > Additional Library Directories: C:\opencv_2.4.13\opencv\build\x64\vc14\lib 5. If you have GPU with Tensor Cores (nVidia Titan V / Tesla V100 / DGX 2 and later) speedup Detection 3x, Training 2x: \darknet.sln > (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions, and add here: CUDNN_HALF; Note: CUDA must be installed only after that MSVS2015 had been installed. How to compile (custom): Also, you can to create your own darknet.sln & darknet.vcxproj , this example for CUDA 9.1 and OpenCV 3.0 Then add to your created project: (right click on project) > properties > C/C++ > General > Additional Include Directories, put here: C:\opencv_3.0\opencv\build\include;..\..\3rdparty\include;%(AdditionalIncludeDirectories);$(CudaToolkitIncludeDir);$(cudnn)\include (right click on project) > Build dependecies > Build Customizations > set check on CUDA 9.1 or what version you have for example as here: add to project all .c & .cu files and file from \src (right click on project) > properties > Linker > General > Additional Library Directories, put here: C:\opencv_3.0\opencv\build\x64\vc14\lib;$(CUDA_PATH)lib\$(PlatformName);$(cudnn)\lib\x64;%(AdditionalLibraryDirectories) (right click on project) > properties > Linker > Input > Additional dependecies, put here: ..\..\3rdparty\lib\x64\pthreadVC2.lib;cublas.lib;curand.lib;cudart.lib;cudnn.lib;%(AdditionalDependencies) (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions OPENCV;_TIMESPEC_DEFINED;_CRT_SECURE_NO_WARNINGS;_CRT_RAND_S;WIN32;NDEBUG;_CONSOLE;_LIB;%(PreprocessorDefinitions) compile to .exe (X64 & Release) and put .dll s near with .exe: pthreadVC2.dll, pthreadGC2.dll from \3rdparty\dll\x64 cusolver64_91.dll, curand64_91.dll, cudart64_91.dll, cublas64_91.dll 91 for CUDA 9.1 or your version, from C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1\bin For OpenCV 3.2: opencv_world320.dll and opencv_ffmpeg320_64.dll from C:\opencv_3.0\opencv\build\x64\vc14\bin For OpenCV 2.4.13: opencv_core2413.dll , opencv_highgui2413.dll and opencv_ffmpeg2413_64.dll from C:\opencv_2.4.13\opencv\build\x64\vc14\bin How to train (Pascal VOC Data): 1. Download pre trained weights for the convolutional layers (154 MB): and put to the directory build\darknet\x64 2. Download The Pascal VOC Data and unpack it to directory build\darknet\x64\data\voc will be created dir build\darknet\x64\data\voc\VOCdevkit\ : 2.1 Download file voc_label.py to dir build\darknet\x64\data\voc : 3. Download and install Python for Windows: 4. Run command: python build\darknet\x64\data\voc\voc_label.py (to generate files: 2007_test.txt, 2007_train.txt, 2007_val.txt, 2012_train.txt, 2012_val.txt) 5. Run command: type 2007_train.txt 2007_val.txt 2012_ .txt > train.txt 6. Set batch 64 and subdivisions 8 in the file yolov3 voc.cfg : link 7. Start training by using train_voc.cmd or by using the command line: darknet.exe detector train data/voc.data cfg/yolov3 voc.cfg darknet53.conv.74 ( Note: To disable Loss Window use flag dont_show . If you are using CPU, try darknet_no_gpu.exe instead of darknet.exe .) If required change pathes in the file build\darknet\x64\data\voc.data More information about training by the link: Note: If during training you see nan values for avg (loss) field then training goes wrong, but if nan is in some other lines then training goes well. How to train with multi GPU: 1. Train it first on 1 GPU for like 1000 iterations: darknet.exe detector train data/voc.data cfg/yolov3 voc.cfg darknet53.conv.74 2. Adjust the learning rate ( cfg/yolov3 voc.cfg ) to fit the amount of GPUs. The learning rate should be equal to 0.001 , regardless of how many GPUs are used for training. So learning_rate GPUs 0.001 . For 4 GPUs adjust the value to learning_rate 0.00025 . 3. For 4xGPUs increase 4x times burn_in and max_batches in your cfg file. I.e. use burn_in 4000 instead of 1000 . 4. Then stop and by using partially trained model /backup/yolov3 voc_1000.weights run training with multigpu (up to 4 GPUs): darknet.exe detector train data/voc.data cfg/yolov3 voc.cfg /backup/yolov3 voc_1000.weights gpus 0,1,2,3 How to train (to detect your custom objects): (to train old Yolo v2 yolov2 voc.cfg , yolov2 tiny voc.cfg , yolo voc.cfg , yolo voc.2.0.cfg , ... click by the link ) Training Yolo v3: 1. Create file yolo obj.cfg with the same content as in yolov3.cfg (or copy yolov3.cfg to yolo obj.cfg) and: change line batch to batch 64 change line subdivisions to subdivisions 8 change line classes 80 to your number of objects in each of 3 yolo layers: change filters 255 to filters (classes + 5)x3 in the 3 convolutional before each yolo layer So if classes 1 then should be filters 18 . If classes 2 then write filters 21 . (Do not write in the cfg file: filters (classes + 5)x3) (Generally filters depends on the classes , coords and number of mask s, i.e. filters (classes + coords + 1) , where mask is indices of anchors. If mask is absence, then filters (classes + coords + 1) num ) So for example, for 2 objects, your file yolo obj.cfg should differ from yolov3.cfg in such lines in each of 3 yolo layers: convolutional filters 21 region classes 2 2. Create file obj.names in the directory build\darknet\x64\data\ , with objects names each in new line 3. Create file obj.data in the directory build\darknet\x64\data\ , containing (where classes number of objects ): classes 2 train data/train.txt valid data/test.txt names data/obj.names backup backup/ 4. Put image files (.jpg) of your objects in the directory build\darknet\x64\data\obj\ 5. You should label each object on images from your dataset. Use this visual GUI software for marking bounded boxes of objects and generating annotation files for Yolo v2 & v3: It will create .txt file for each .jpg image file in the same directory and with the same name, but with .txt extension, and put to file: object number and object coordinates on this image, for each object in new line: Where: integer object number from 0 to (classes 1) float values relative to width and height of image, it can be equal from (0.0 to 1.0 for example: / or / atention: are center of rectangle (are not top left corner) For example for img1.jpg you will be created img1.txt containing: 1 0.716797 0.395833 0.216406 0.147222 0 0.687109 0.379167 0.255469 0.158333 1 0.420312 0.395833 0.140625 0.166667 6. Create file train.txt in directory build\darknet\x64\data\ , with filenames of your images, each filename in new line, with path relative to darknet.exe , for example containing: data/obj/img1.jpg data/obj/img2.jpg data/obj/img3.jpg 7. Download pre trained weights for the convolutional layers (154 MB): and put to the directory build\darknet\x64 8. Start training by using the command line: darknet.exe detector train data/obj.data yolo obj.cfg darknet53.conv.74 (file yolo obj_xxx.weights will be saved to the build\darknet\x64\backup\ for each 100 iterations) (To disable Loss Window use darknet.exe detector train data/obj.data yolo obj.cfg darknet53.conv.74 dont_show , if you train on computer without monitor like a cloud Amazaon EC2) 9. After training is complete get result yolo obj_final.weights from path build\darknet\x64\backup\ After each 100 iterations you can stop and later start training from this point. For example, after 2000 iterations you can stop training, and later just copy yolo obj_2000.weights from build\darknet\x64\backup\ to build\darknet\x64\ and start training using: darknet.exe detector train data/obj.data yolo obj.cfg yolo obj_2000.weights (in the original repository the weights file is saved only once every 10 000 iterations if(iterations > 1000) ) Also you can get result earlier than all 45000 iterations. Note: If during training you see nan values for avg (loss) field then training goes wrong, but if nan is in some other lines then training goes well. Note: If you changed width or height in your cfg file, then new width and height must be divisible by 32. Note: After training use such command for detection: darknet.exe detector test data/obj.data yolo obj.cfg yolo obj_8000.weights Note: if error Out of memory occurs then in .cfg file you should increase subdivisions 16 , 32 or 64: link How to train tiny yolo (to detect your custom objects): Do all the same steps as for the full yolo model as described above. With the exception of: Download default weights file for yolov3 tiny: Get pre trained weights yolov3 tiny.conv.15 using command: darknet.exe partial cfg/yolov3 tiny.cfg yolov3 tiny.weights yolov3 tiny.conv.15 15 Make your custom model yolov3 tiny obj.cfg based on cfg/yolov3 tiny_obj.cfg instead of yolov3.cfg Start training: darknet.exe detector train data/obj.data yolov3 tiny obj.cfg yolov3 tiny.conv.15 For training Yolo based on other models ( DenseNet201 Yolo or ResNet50 Yolo ), you can download and get pre trained weights as showed in this file: If you made you custom model that isn't based on other models, then you can train it without pre trained weights, then will be used random initial weights. When should I stop training: Usually sufficient 2000 iterations for each class(object), but not less than 4000 iterations in total. But for a more precise definition when you should stop training, use the following manual: 1. During training, you will see varying indicators of error, and you should stop when no longer decreases 0.XXXXXXX avg : > Region Avg IOU: 0.798363, Class: 0.893232, Obj: 0.700808, No Obj: 0.004567, Avg Recall: 1.000000, count: 8 > Region Avg IOU: 0.800677, Class: 0.892181, Obj: 0.701590, No Obj: 0.004574, Avg Recall: 1.000000, count: 8 > > 9002 : 0.211667, 0.060730 avg , 0.001000 rate, 3.868000 seconds, 576128 images > Loaded: 0.000000 seconds 9002 iteration number (number of batch) 0.060730 avg average loss (error) the lower, the better When you see that average loss 0.xxxxxx avg no longer decreases at many iterations then you should stop training. 2. Once training is stopped, you should take some of last .weights files from darknet\build\darknet\x64\backup and choose the best of them: For example, you stopped training after 9000 iterations, but the best result can give one of previous weights (7000, 8000, 9000). It can happen due to overfitting. Overfitting is case when you can detect objects on images from training dataset, but can't detect objects on any others images. You should get weights from Early Stopping Point : ! Overfitting To get weights from Early Stopping Point: 2.1. At first, in your file obj.data you must specify the path to the validation dataset valid valid.txt (format of valid.txt as in train.txt ), and if you haven't validation images, just copy data\train.txt to data\valid.txt . 2.2 If training is stopped after 9000 iterations, to validate some of previous weights use this commands: (If you use another GitHub repository, then use darknet.exe detector recall ... instead of darknet.exe detector map ...) darknet.exe detector map data/obj.data yolo obj.cfg backup\yolo obj_7000.weights darknet.exe detector map data/obj.data yolo obj.cfg backup\yolo obj_8000.weights darknet.exe detector map data/obj.data yolo obj.cfg backup\yolo obj_9000.weights And comapre last output lines for each weights (7000, 8000, 9000): Choose weights file with the highest IoU (intersect of union) and mAP (mean average precision) For example, bigger IOU gives weights yolo obj_8000.weights then use this weights for detection . Example of custom object detection: darknet.exe detector test data/obj.data yolo obj.cfg yolo obj_8000.weights IoU (intersect of union) average instersect of union of objects and detections for a certain threshold 0.24 mAP (mean average precision) mean value of average precisions for each class, where average precision is average value of 11 points on PR curve for each possible threshold (each probability of detection) for the same class (Precision Recall in terms of PascalVOC, where Precision TP/(TP+FP) and Recall TP/(TP+FN) ), page 11: mAP is default metric of precision in the PascalVOC competition, this is the same as AP50 metric in the MS COCO competition. In terms of Wiki, indicators Precision and Recall have a slightly different meaning than in the PascalVOC competition, but IoU always has the same meaning . ! precision_recall_iou How to calculate mAP on PascalVOC 2007: 1. To calculate mAP (mean average precision) on PascalVOC 2007 test: Download PascalVOC dataset, install Python 3.x and get file 2007_test.txt as described here: Then download file to the dir build\darknet\x64\data\ then run voc_label_difficult.py to get the file difficult_2007_test.txt Remove symbol from this line to un comment it: Then there are 2 ways to get mAP: 1. Using Darknet + Python: run the file build/darknet/x64/calc_mAP_voc_py.cmd you will get mAP for yolo voc.cfg model, mAP 75.9% 2. Using this fork of Darknet: run the file build/darknet/x64/calc_mAP.cmd you will get mAP for yolo voc.cfg model, mAP 75.8% (The article specifies the value of mAP 76.8% for YOLOv2 416×416, page 4 table 3: We get values lower perhaps due to the fact that the model was trained on a slightly different source code than the code on which the detection is was done) if you want to get mAP for tiny yolo voc.cfg model, then un comment line for tiny yolo voc.cfg and comment line for yolo voc.cfg in the .cmd file if you have Python 2.x instead of Python 3.x, and if you use Darknet+Python way to get mAP, then in your cmd file use reval_voc.py and voc_eval.py instead of reval_voc_py3.py and voc_eval_py3.py from this directory: Custom object detection: Example of custom object detection: darknet.exe detector test data/obj.data yolo obj.cfg yolo obj_8000.weights ! Yolo_v2_training ! Yolo_v2_training How to improve object detection: 1. Before training: set flag random 1 in your .cfg file it will increase precision by training Yolo for different resolutions: link increase network resolution in your .cfg file ( height 608 , width 608 or any value multiple of 32) it will increase precision recalculate anchors for your dataset for width and height from cfg file: darknet.exe detector calc_anchors data/obj.data num_of_clusters 9 width 416 height 416 then set the same 9 anchors in each of 3 yolo layers in your cfg file check that each object are mandatory labeled in your dataset no one object in your data set should not be without label. In the most training issues there are wrong labels in your dataset (got labels by using some conversion script, marked with a third party tool, ...). Always check your dataset by using: desirable that your training dataset include images with objects at diffrent: scales, rotations, lightings, from different sides, on different backgrounds you should preferably have 2000 different images for each class or more, and you should train 2000 classes iterations or more desirable that your training dataset include images with non labeled objects that you do not want to detect negative samples without bounded box (empty .txt files) use as many images of negative samples as there are images with objects for training with a large number of objects in each image, add the parameter max 200 or higher value in the last yolo layer or region layer in your cfg file (the global maximum number of objects that can be detected by YoloV3 is 0,0615234375 (width height) where are width and height are parameters from net section in cfg file) for training for small objects set layers 1, 11 instead of and set stride 4 instead of If you train the model to distinguish Left and Right objects as separate classes (left/right hand, left/right turn on road signs, ...) then for disabling flip data augmentation add flip 0 here: General rule your training dataset should include such a set of relative sizes of objects that you want to detect: train_network_width train_obj_width / train_image_width detection_network_width detection_obj_width / detection_image_width train_network_height train_obj_height / train_image_height detection_network_height detection_obj_height / detection_image_height to speedup training (with decreasing detection accuracy) do Fine Tuning instead of Transfer Learning, set param stopbackward 1 here: then do this command: ./darknet partial cfg/yolov3.cfg yolov3.weights yolov3.conv.81 81 will be created file yolov3.conv.81 , then train by using weights file yolov3.conv.81 instead of darknet53.conv.74 2. After training for detection: Increase network resolution by set in your .cfg file ( height 608 and width 608 ) or ( height 832 and width 832 ) or (any value multiple of 32) this increases the precision and makes it possible to detect small objects: link it is not necessary to train the network again, just use .weights file already trained for 416x416 resolution but to get even greater accuracy you should train with higher resolution 608x608 or 832x832, note: if error Out of memory occurs then in .cfg file you should increase subdivisions 16 , 32 or 64: link How to mark bounded boxes of objects and create annotation files: Here you can find repository with GUI software for marking bounded boxes of objects and generating annotation files for Yolo v2 & v3: With example of: train.txt , obj.names , obj.data , yolo obj.cfg , air 1 6 .txt , bird 1 4 .txt for 2 classes of objects (air, bird) and train_obj.cmd with example how to train this image set with Yolo v2 & v3 Using Yolo9000 Simultaneous detection and classification of 9000 objects: yolo9000.weights (186 MB Yolo9000 Model) requires 4 GB GPU RAM: yolo9000.cfg cfg file of the Yolo9000, also there are paths to the 9k.tree and coco9k.map 9k.tree WordTree of 9418 categories , if parent_id 1 then this label hasn't parent: coco9k.map map 80 categories from MSCOCO to WordTree 9k.tree : combine9k.data data file, there are paths to: 9k.labels , 9k.names , inet9k.map , (change path to your combine9k.train.list ): 9k.labels 9418 labels of objects: 9k.names 9418 names of objects: inet9k.map map 200 categories from ImageNet to WordTree 9k.tree : How to use Yolo as DLL 1. To compile Yolo as C++ DLL file yolo_cpp_dll.dll open in MSVS2015 file build\darknet\yolo_cpp_dll.sln , set x64 and Release , and do the: Build > Build yolo_cpp_dll You should have installed CUDA 9.1 To use cuDNN do: (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions, and add at the beginning of line: CUDNN; 2. To use Yolo as DLL file in your C++ console application open in MSVS2015 file build\darknet\yolo_console_dll.sln , set x64 and Release , and do the: Build > Build yolo_console_dll you can run your console application from Windows Explorer build\darknet\x64\yolo_console_dll.exe use this command : yolo_console_dll.exe data/coco.names yolov3.cfg yolov3.weights test.mp4 or you can run from MSVS2015 (before this you should copy 2 files yolo voc.cfg and yolo voc.weights to the directory build\darknet\ ) after launching your console application and entering the image file name you will see info for each object: to use simple OpenCV GUI you should uncomment line // define OPENCV in yolo_console_dll.cpp file: link you can see source code of simple example for detection on the video file: link yolo_cpp_dll.dll API: link class Detector { public: Detector(std::string cfg_filename, std::string weight_filename, int gpu_id 0); Detector(); std::vector detect(std::string image_filename, float thresh 0.2, bool use_mean false); std::vector detect(image_t img, float thresh 0.2, bool use_mean false); static image_t load_image(std::string image_filename); static void free_image(image_t m); ifdef OPENCV std::vector detect(cv::Mat mat, float thresh 0.2, bool use_mean false); endif };",Object Detection,Object Detection 2733,Computer Vision,Computer Vision,Computer Vision,论文下载地址 实验是在cifar 100上进行,使用resnet 18在260个epoch之后测试错误率在35%左右,继续训练更多的epoch效果还能提升。 使用前修改FLAGS参数,Object Detection,Object Detection 2737,Computer Vision,Computer Vision,Computer Vision,"mini_project for CIFAR10 Using self written ResNet18 to classify CIFAR10. It includes Tensorflow and Keras versions. Before using these codes, please create a directory ./datasets . If you are using MACOSX or Linux, please use following codes to download datasets under ./datasets directory. wget tar xzvf cifar 10 binary.tar.gz rm cifar 10 binary.tar.gz If you are a Windows user, you can directly download them on the website. In Tensorflow, there are lots of tricks to be considered. Therefore the code I provide may use some codes in Keras source code especially in data augmentation part. But those parts just contain some math and are easy to understand. It is important to mention that the test accuracy in TensorFlow version is a little bit lower than Keras version, so there may be some little mistakes. I'm very glad if someone can point them out. I also provide VGG16 network as a baseline for ResNet. References You can check the Paper Deep Residual Learning for Image Recognition for more details. Some codes draw on the experience of others(like the assignment of CS231n). You can check those on",Object Detection,Object Detection 2746,Computer Vision,Computer Vision,Computer Vision,"TensorFlow implementation of RFCN Paper is available on Building The ROI pooling and the MS COCO loader needs to be compiled first. To do so, run make in the root directory of the project. You may need to edit BoxEngine/ROIPooling/Makefile if you need special linker/compiler options. NOTE: If you have multiple python versions on your system, and you want to use a different one than python , provide an environment variable called PYTHON before calling make. For example: PYTHON python3 make You may get undefined symbol problems while trying to load the .so file. This will be the case if you built your TensorFlow version yourself and the Makefile fails to auto detect your ABI version. You may encounter errors like tensorflow.python.framework.errors_impl.NotFoundError: BoxEngine/ROIPooling/roi_pooling.so: undefined symbol: \_ZN10tensorflow7strings6StrCatB5cxx11ERKNS0_8AlphaNumE in the log. In this case clean the project (make clean) and rebuild it with USE_OLD_EABI 0 flag (USE_OLD_EABI 0 make). You may want to build ROI pooling without GPU support. Use the USE_GPU 0 flag to turn off the CUDA part of the code. You may want to install python dependencies by running: pip install user r packages.txt Testing You can run trained models with test.py. Model path should be given without file extension (without .data and .index). An example: ! preview Pretrained model You can download a pretrained model from here: Extract it to your project directory. Then you can run the network with the following command: ./test.py n export/model i \ o \ NOTE: this pretrained model was not hyperparameter optimized in any way. The model can (and will) have much better performance when optimized. Try out different learning rates and classification to regression loss balances. Optimal values are highly test dependent. Training the network For training the network you will first need to download the MS COCO dataset. Download the needed files and extract them to a directory with the following structure: ├─ annotations │ ├─ instances_train2014.json │ └─ ... ├─ train2014 └─ ... Run the following command: ./main.py dataset \ name \ \ full path to the coco root directory \ path where files will be saved. This directory and its subdirectories will be automatically created. The \ will have the following structure: ├─ preview │ └─ preview.jpg preview snapshots from training process. ├─ save TensorFlow checkpoint directory │ ├─ checkpoint │ ├─ model_ . │ └─ ... └─ args.json saved command line arguments. You can always kill the training process and resume it later just by running ./main.py name \ without any other parameters. All command line parameters will be saved and reloaded automatically. License The software is under Apache 2.0 license. See for further details. Notes This code requires TensorFlow > 1.0 (last known working version is 1.4.1). Tested with python3.6, build it should work with python 2.",Object Detection,Object Detection 2750,Computer Vision,Computer Vision,Computer Vision,GHM_Loss caffe 1、梯度均衡损失函数,caffe上的实现。 改写caffe.proto文件: (1)、增加 message LayerParameter { optional GhmcLossParameter ghmc_loss_param 160;} (2)、增加 message GhmcLossParameter { optional uint32 m 1 default 30 ; optional float alpha 2 default 0.0 ; } (3)、使用: layer { name: ghmcloss type: GhmcLoss bottom: fc6 bottom: label top: ghmcloss loss_weight: 1 ghmc_loss_param { m: 30 alpha: 0.2 } } 2、参考论文: 3、pytorch参考代码:,Object Detection,Object Detection 2752,Computer Vision,Computer Vision,Computer Vision,"v2.2 v3.0 Introduction This directory contains python software and an iOS App developed by Ultralytics LLC, and is freely available for redistribution under the GPL 3.0 license . For more information please visit Description The repo contains inference and training code for YOLOv3 in PyTorch. The code works on Linux, MacOS and Windows. Training is done on the COCO dataset by default: Credit to Joseph Redmon for YOLO and to Erik Lindernoren for the PyTorch implementation this work is based on . Requirements Python 3.7 or later with the following pip3 install U r requirements.txt packages: numpy torch > 1.0.0 opencv python Tutorials Transfer Learning Train Single Image Train Single Class Train Custom Data Training Start Training: Run train.py to begin training after downloading COCO data with data/get_coco_dataset.sh . Resume Training: Run train.py resume resumes training from the latest checkpoint weights/latest.pt . Each epoch trains on 120,000 images from the train and validate COCO sets, and tests on 5000 images from the COCO validate set. Default training settings produce loss plots below, with training speed of 0.6 s/batch on a 1080 Ti (18 epochs/day) or 0.45 s/batch on a 2080 Ti. from utils import utils; utils.plot_results() ! Alt Image Augmentation datasets.py applies random OpenCV powered augmentation to the input images in accordance with the following specifications. Augmentation is applied only during training, not during inference. Bounding boxes are automatically tracked and updated with the images. 416 x 416 examples pictured below. Augmentation Description Translation +/ 10% (vertical and horizontal) Rotation +/ 5 degrees Shear +/ 2 degrees (vertical and horizontal) Scale +/ 10% Reflection 50% probability (horizontal only) H S V Saturation +/ 50% HS V Intensity +/ 50% Speed Machine type: n1 highmem 4 (4 vCPUs, 26 GB memory) CPU platform: Intel Skylake GPUs: 1 4 x NVIDIA Tesla P100 HDD: 100 GB SSD GPUs batch_size speed COCO epoch (P100) (images) (s/batch) (min/epoch) 1 16 0.54s 66min 2 32 0.99s 61min 4 64 1.61s 49min Inference Run detect.py to apply trained weights to an image, such as zidane.jpg from the data/samples folder: YOLOv3: detect.py cfg cfg/yolov3.cfg weights weights/yolov3.pt YOLOv3 tiny: detect.py cfg cfg/yolov3 tiny.cfg weights weights/yolov3 tiny.pt Webcam Run detect.py with webcam True to show a live webcam feed. Pretrained Weights Darknet .weights format: PyTorch .pt format: mAP Use test.py weights weights/yolov3.weights to test the official YOLOv3 weights. Use test.py weights weights/latest.pt to test the latest training results. Compare to official darknet results from ultralytics/yolov3 darknet YOLOv3 320 51.3 51.5 YOLOv3 416 54.9 55.3 YOLOv3 608 57.9 57.9 bash sudo rm rf yolov3 && git clone bash yolov3/data/get_coco_dataset.sh sudo rm rf cocoapi && git clone && cd cocoapi/PythonAPI && make && cd ../.. && cp r cocoapi/PythonAPI/pycocotools yolov3 cd yolov3 python3 test.py save json conf thres 0.001 img size 416 Namespace(batch_size 32, cfg 'cfg/yolov3.cfg', conf_thres 0.001, data_cfg 'cfg/coco.data', img_size 416, iou_thres 0.5, nms_thres 0.45, save_json True, weights 'weights/yolov3.weights') Average Precision (AP) @ IoU 0.50:0.95 area all maxDets 100 0.308 Average Precision (AP) @ IoU 0.50 area all maxDets 100 0.549 Average Precision (AP) @ IoU 0.75 area all maxDets 100 0.310 Average Precision (AP) @ IoU 0.50:0.95 area small maxDets 100 0.141 Average Precision (AP) @ IoU 0.50:0.95 area medium maxDets 100 0.334 Average Precision (AP) @ IoU 0.50:0.95 area large maxDets 100 0.454 Average Recall (AR) @ IoU 0.50:0.95 area all maxDets 1 0.267 Average Recall (AR) @ IoU 0.50:0.95 area all maxDets 10 0.403 Average Recall (AR) @ IoU 0.50:0.95 area all maxDets 100 0.428 Average Recall (AR) @ IoU 0.50:0.95 area small maxDets 100 0.237 Average Recall (AR) @ IoU 0.50:0.95 area medium maxDets 100 0.464 Average Recall (AR) @ IoU 0.50:0.95 area large maxDets 100 0.585 python3 test.py save json conf thres 0.001 img size 608 batch size 16 Namespace(batch_size 16, cfg 'cfg/yolov3.cfg', conf_thres 0.001, data_cfg 'cfg/coco.data', img_size 608, iou_thres 0.5, nms_thres 0.45, save_json True, weights 'weights/yolov3.weights') Average Precision (AP) @ IoU 0.50:0.95 area all maxDets 100 0.328 Average Precision (AP) @ IoU 0.50 area all maxDets 100 0.579 Average Precision (AP) @ IoU 0.75 area all maxDets 100 0.335 Average Precision (AP) @ IoU 0.50:0.95 area small maxDets 100 0.190 Average Precision (AP) @ IoU 0.50:0.95 area medium maxDets 100 0.357 Average Precision (AP) @ IoU 0.50:0.95 area large maxDets 100 0.428 Average Recall (AR) @ IoU 0.50:0.95 area all maxDets 1 0.279 Average Recall (AR) @ IoU 0.50:0.95 area all maxDets 10 0.429 Average Recall (AR) @ IoU 0.50:0.95 area all maxDets 100 0.456 Average Recall (AR) @ IoU 0.50:0.95 area small maxDets 100 0.299 Average Recall (AR) @ IoU 0.50:0.95 area medium maxDets 100 0.483 Average Recall (AR) @ IoU 0.50:0.95 area large maxDets 100 0.572 Contact For questions or comments please contact Glenn Jocher at glenn.jocher@ultralytics.com or visit us at",Object Detection,Object Detection 2753,Computer Vision,Computer Vision,Computer Vision,"SenceClassification AIChallenge Last modified by Xiaodong Wu 2017.12.04 Contact me by xiaodong dot wu dot c at gmail.com AI Challenge 2017 Scene Classificationa task testb_results_1203.json the final result we submitted. datasets finetuned models you can run the ./show_results/show_results.py to check the resutls Except the models in ./finetuned models I have also conduct the following trail: 1. focal loss See it in github: Know more about focal loss: 2. Truncated loss Self defined loss, the same idea as focal loss. Trying to decay the loss of well calssified samples, and try to let the model focus on the hard smaples. In this loss function, I just set the loss of samples whose max probability after softmax layer is bigger than 0.5 to be 0. The result is not as good as I thought. See it in github: 3. Combine the SDD detection resutls. In this trail, I concated the probability result of SSD detection and the original CNN features(features of 161). The result is not very good. During this experiment, I also write a script to load pictures. 4. Cascade prediction The idea is coming from the insight that maybe the false classified samples have an unconfident probability distribution (i.e. the output of softmax layer). I trained a model specially for all the low confident pictures(max prob<0.5). The model is the same as the main model. The loss of this model did not decay during training. So I give up this idea. (Maybe the model should have a different architecture from the main model.) 5. In the experiment of last few days before the deadline, I test the following tricks. 1) Data augmentation 2) 10 crop test 3) Model ensembling Something about the learning rate: For the finetuning process, I find that the result is very sensitive to the original learning rate. The initial learning rate can be 1e 51e 7. 1e 3 and 1e 4 did not achieve the best result in my experiment. For the learing rate decay policy, I do not recommend the 'step' policy. I think it is better to set a relative small initial value, and keep the value until the loss does not decay or the test result begin to decay. Then deacay the learning rate. Something about overfitting: Some models face problems of overfitting. The trainging loss is always decaying and the accuracy promotes first and then deacy. In my experiment, I just check the log file, and find the best model in history. And then I will decay the learning rate and continue training based on the finded the best model.",Object Detection,Object Detection 2754,Computer Vision,Computer Vision,Computer Vision,"Accurate Single Stage Detector Using Recurrent Rolling Convolution By Jimmy Ren , Xiaohao Chen, Jianbo Liu, Wenxiu Sun, Jiahao Pang, Qiong Yan, Yu Wing Tai, Li Xu. Introduction High localization accuracy is crucial in many real world applications. We propose a novel single stage end to end object detection network (RRC) to produce high accuracy detection results. You can use the code to train/evaluate a network for object detection task. For more details, please refer to our paper . method KITTI test mAP car (moderate) : : : : Mono3D 88.66% SDP+RPN 88.85% MS CNN 89.02% Sub CNN 89.04% RRC (single model) 89.85% KITTI ranking Citing RRC Please cite RRC in your publications if it helps your research: @inproceedings{Ren17CVPR, author {Jimmy Ren and Xiaohao Chen and Jianbo Liu and Wenxiu Sun and Jiahao Pang and Qiong Yan and Yu Wing Tai and Li Xu}, title {Accurate Single Stage Detector Using Recurrent Rolling Convolution}, booktitle {CVPR}, year {2017} } Contents 1. Installation ( installation) 2. Preparation ( preparation) 3. Train/Eval ( traineval) 4. Models ( models) 4. Ackonwledge ( Acknowledge) Installation 1. Get the code. We will call the directory that you cloned Caffe into $CAFFE_ROOT Shell cd rrc_detection 2. Build the code. Please follow Caffe instruction to install all necessary packages and build it. Before build it, you should install CUDA and CUDNN(v5.0). CUDA 7.5 and CUDNN v5.0 were adapted in our computer. Shell Modify Makefile.config according to your Caffe installation. cp Makefile.config.example Makefile.config make j8 Make sure to include $CAFFE_ROOT/python to your PYTHONPATH. make py make test j8 make runtest j8 Preparation 1. Download fully convolutional reduced (atrous) VGGNet . By default, we assume the model is stored in $CAFFE_ROOT/models/VGGNet/ . 2. Download the KITTI dataset . By default, we assume the data is stored in $HOME/data/KITTI/ Unzip the training images, testing images and the labels in $HOME/data/KITTI/ . 3. Create the LMDB file. For training . As only the images contain cars are adopted as training set for car detection, the labels for cars should be extracted. We have provided the list of images contain cars in $CAFFE_ROOT/data/KITTI car/ . Shell extract the labels for cars cd $CAFFE_ROOT/data/KITTI car/ ./extract_car_label.sh Before create the LMDB files. The labels should be converted to VOC type. We provide some matlab scripts. The scripts are in $CAFFE_ROOT/data/convert_labels/ . Just modify converlabels.m . Shell line 4: root_dir '/your/path/to/KITTI/'; VOC type labels will be generated in $KITTI_ROOT/training/labels_2car/xml/ . Shell cd $CAFFE_ROOT/data/KITTI car/ Create the trainval.txt, test.txt, and test_name_size.txt in data/KITTI car/ ./create_list.sh You can modify the parameters in create_data.sh if needed. It will create lmdb files for trainval and test with encoded original image: $HOME/data/KITTI/lmdb/KITTI car_training_lmdb/ $HOME/data/KITTI/lmdb/KITTI car_testing_lmdb/ and make soft links at data/KITTI car/lmdb ./data/KITTI car/create_data.sh Train/Eval 1. Train your model and evaluate the model. Shell It will create model definition files and save snapshot models in: $CAFFE_ROOT/models/VGGNet/KITTI/RRC_2560x768_kitti_car/ and job file, log file in: $CAFFE_ROOT/jobs/VGGNet/KITIIT/RRC_2560x768_kitti_car/ After 60k iterations, we can get the model as we said in the paper (mAP 89. % in KITTI). python examples/car/rrc_kitti_car.py Before run the testing script. You should modify line 10: img_dir to your path to kitti testing images . python examples/car/rrc_test.py We train our models in a computer with 4 TITAN X(Maxwell) GPU cards. By default, we assume you train the models on mechines with 4 TITAN X GPUs. If you only have one TITAN X card, you should modify the script rrc_kitti.py . Shell line 118: gpus 0,1,2,3 > gpus 0 line 123: batch_size 4 > batch_size 1 If you have two TITAN X cards, you should modify the script rrc_kitti.py as follow. Shell line 118: gpus 0,1,2,3 > gpus 0,1 line 123: batch_size 4 > batch_size 2 You can submit the result at kitti submit . If you don't have time to train your model, you can download a pre trained model from the link as follow. Google Drive Baidu Cloud Unzip the files in $caffe_root/models/VGGNet/KITTI/ , and run the testing script rrc_test.py , you will get the same result as the single model result we showed in the paper. Shell before run the script, you should modify the kitti_root at line 10. Make sure that the work directory is caffe_root cd $caffe_root python models/VGGNet/KITTI/RRC_2560x768_kitti_4r4b_max_size/rrc_test.py 2. Evaluate the most recent snapshot. For testing a model you trained, you show modify the path in rrc_test.py . Acknowledge Thanks to Wei Liu, we have benifited a lot from his previous work SSD (Single Shot Multibox Detector) and his code .",Object Detection,Object Detection 2755,Computer Vision,Computer Vision,Computer Vision,"A Fast RCNN: Hard Positive Generation via Adversary for Object Detection By Xiaolong Wang, Abhinav Shrivastava, and Abhinav Gupta Introduction This is a Caffe based version of A Fast RCNN ( arxiv_link ). Although we originally implement it on torch, this Caffe re implementation is much simpler, faster and easier to use. We release the code for training A Fast RCNN with Adversarial Spatial Dropout Network. License This code is released under the MIT License (refer to the LICENSE file for details). Citing If you find this useful in your research, please consider citing: @inproceedings{WangCVPR17afrcnn, Author {Xiaolong Wang and Abhinav Shrivastava and Abhinav Gupta}, Title {A Fast RCNN: Hard Positive Generation via Adversary for Object Detection}, Booktitle {Conference on Computer Vision and Pattern Recognition ({CVPR})}, Year {2017} } Disclaimer This implementation is built on a fork of the OHEM code ( here ), which in turn builds on the Faster R CNN Python code ( here ) and Fast R CNN ( here ). Please cite the appropriate papers depending on which part of the code and/or model you are using. Results Approach training data test data mAP Fast R CNN (FRCN) VOC 07 trainval VOC 07 test 67.6 FRCN with adversary VOC 07 trainval VOC 07 test 70.8 Note : The reported results are based on the VGG16 network. Installation Please follow the exact installation and download the VOC data as the Faster R CNN Python code ( here ). Usage To run the code, one can simply do, Shell ./train.sh It includes 3 stage of training: Shell ./experiments/scripts/fast_rcnn_std.sh GPU_ID VGG16 pascal_voc which is used for training a standard Fast RCNN for 10K iterations, you can download my model and logs for this step. Shell ./experiments/scripts/fast_rcnn_adv_pretrain.sh GPU_ID VGG16 pascal_voc which is a pre training stage for the adversarial network, you can download my model and logs for this step. Shell ./copy_model.h which is used to copy the weights of the above two models to initialize the joint model. Shell ./experiments/scripts/fast_rcnn_adv.sh GPU_ID VGG16 pascal_voc which is joint training of the detector and the adversarial network, you can download my model and logs for this step.",Object Detection,Object Detection 2758,Computer Vision,Computer Vision,Computer Vision,"Deformable Convolutional Networks The major contributors of this repository include Yuwen Xiong , Haozhi Qi , Guodong Zhang , Yi Li , Jifeng Dai , Bin Xiao , Han Hu and Yichen Wei . We released training/testing code and pre trained models of Deformable FPN, which is the foundation of our COCO detection 2017 entry. Slides at COCO 2017 workshop . A third party improvement of Deformable R FCN + Soft NMS Introduction Deformable ConvNets is initially described in an ICCV 2017 oral paper . (Slides at ICCV 2017 Oral ) R FCN is initially described in a NIPS 2016 paper . Disclaimer This is an official implementation for Deformable Convolutional Networks (Deformable ConvNets) based on MXNet. It is worth noticing that: The original implementation is based on our internal Caffe version on Windows. There are slight differences in the final accuracy and running time due to the plenty details in platform switch. The code is tested on official MXNet@(commit 62ecb60) with the extra operators for Deformable ConvNets. After MXNet@(commit ce2bca6) the offical MXNet support all operators for Deformable ConvNets. We trained our model based on the ImageNet pre trained ResNet v1 101 using a model converter . The converted model produces slightly lower accuracy (Top 1 Error on ImageNet val: 24.0% v.s. 23.6%). This repository used code from MXNet rcnn example and mx rfcn . License © Microsoft, 2017. Licensed under an Apache 2.0 license. Citing Deformable ConvNets If you find Deformable ConvNets useful in your research, please consider citing: @article{dai17dcn, Author {Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, Yichen Wei}, Title {Deformable Convolutional Networks}, Journal {arXiv preprint arXiv:1703.06211}, Year {2017} } @inproceedings{dai16rfcn, Author {Jifeng Dai, Yi Li, Kaiming He, Jian Sun}, Title {{R FCN}: Object Detection via Region based Fully Convolutional Networks}, Conference {NIPS}, Year {2016} } Main Results training data testing data mAP@0.5 mAP@0.7 time R FCN, ResNet v1 101 VOC 07+12 trainval VOC 07 test 79.6 63.1 0.16s Deformable R FCN, ResNet v1 101 VOC 07+12 trainval VOC 07 test 82.3 67.8 0.19s training data testing data mAP mAP@0.5 mAP@0.75 mAP@S mAP@M mAP@L R FCN, ResNet v1 101 coco trainval coco test dev 32.1 54.3 33.8 12.8 34.9 46.1 Deformable R FCN, ResNet v1 101 coco trainval coco test dev 35.7 56.8 38.3 15.2 38.8 51.5 Faster R CNN (2fc), ResNet v1 101 coco trainval coco test dev 30.3 52.1 31.4 9.9 32.2 47.4 Deformable Faster R CNN (2fc), ResNet v1 101 coco trainval coco test dev 35.0 55.0 38.3 14.3 37.7 52.0 training data testing data mAP mAP@0.5 mAP@0.75 mAP@S mAP@M mAP@L FPN+OHEM, ResNet v1 101 coco trainval35k coco minival 37.8 60.8 41.0 22.0 41.5 49.8 Deformable FPN + OHEM, ResNet v1 101 coco trainval35k coco minival 41.2 63.5 45.5 24.3 44.9 54.4 FPN + OHEM + Soft NMS + multi scale testing, ResNet v1 101 coco trainval35k coco minival 40.9 62.5 46.0 27.1 44.1 52.2 Deformable FPN + OHEM + Soft NMS + multi scale testing, ResNet v1 101 coco trainval35k coco minival 44.4 65.5 50.2 30.8 47.3 56.4 training data testing data mIoU time DeepLab, ResNet v1 101 Cityscapes train Cityscapes val 70.3 0.51s Deformable DeepLab, ResNet v1 101 Cityscapes train Cityscapes val 75.2 0.52s DeepLab, ResNet v1 101 VOC 12 train (augmented) VOC 12 val 70.7 0.08s Deformable DeepLab, ResNet v1 101 VOC 12 train (augmented) VOC 12 val 75.9 0.08s Running time is counted on a single Maxwell Titan X GPU (mini batch size is 1 in inference). Requirements: Software 1. MXNet from the offical repository . We tested our code on MXNet@(commit 62ecb60) . Due to the rapid development of MXNet, it is recommended to checkout this version if you encounter any issues. We may maintain this repository periodically if MXNet adds important feature in future release. 2. Python 2.7. We recommend using Anaconda2 as it already includes many common packages. We do not support Python 3 yet, if you want to use Python 3 you need to modify the code to make it work. 3. Python packages might missing: cython, opencv python > 3.2.0, easydict. If pip is set up on your system, those packages should be able to be fetched and installed by running pip install r requirements.txt 4. For Windows users, Visual Studio 2015 is needed to compile cython module. Requirements: Hardware Any NVIDIA GPUs with at least 4GB memory should be OK. Installation 1. Clone the Deformable ConvNets repository, and we'll call the directory that you cloned Deformable ConvNets as ${DCN_ROOT}. git clone 2. For Windows users, run cmd .\init.bat . For Linux user, run sh ./init.sh . The scripts will build cython module automatically and create some folders. 3. Install MXNet: Note: The MXNet's Custom Op cannot execute parallelly using multi gpus after this PR . We strongly suggest the user rollback to version MXNet@(commit 998378a) for training (following Section 3.2 3.5). Quick start 3.1 Install MXNet and all dependencies by pip install r requirements.txt If there is no other error message, MXNet should be installed successfully. Build from source (alternative way) 3.2 Clone MXNet and checkout to MXNet@(commit 998378a) by git clone recursive git checkout 998378a git submodule update if it's the first time to checkout, just use: git submodule update init recursive 3.3 Compile MXNet cd ${MXNET_ROOT} make j $(nproc) USE_OPENCV 1 USE_BLAS openblas USE_CUDA 1 USE_CUDA_PATH /usr/local/cuda USE_CUDNN 1 3.4 Install the MXNet Python binding by Note: If you will actively switch between different versions of MXNet, please follow 3.5 instead of 3.4 cd python sudo python setup.py install 3.5 For advanced users, you may put your Python packge into ./external/mxnet/$(YOUR_MXNET_PACKAGE) , and modify MXNET_VERSION in ./experiments/rfcn/cfgs/ .yaml to $(YOUR_MXNET_PACKAGE) . Thus you can switch among different versions of MXNet quickly. 4. For Deeplab, we use the argumented VOC 2012 dataset. The argumented annotations are provided by SBD dataset. For convenience, we provide the converted PNG annotations and the lists of train/val images, please download them from OneDrive . Demo & Deformable Model We provide trained deformable convnet models, including the deformable R FCN & Faster R CNN models trained on COCO trainval, and the deformable DeepLab model trained on CityScapes train. 1. To use the demo with our pre trained deformable models, please download manually from OneDrive or BaiduYun , and put it under folder model/ . Make sure it looks like this: ./model/rfcn_dcn_coco 0000.params ./model/rfcn_coco 0000.params ./model/fpn_dcn_coco 0000.params ./model/fpn_coco 0000.params ./model/rcnn_dcn_coco 0000.params ./model/rcnn_coco 0000.params ./model/deeplab_dcn_cityscapes 0000.params ./model/deeplab_cityscapes 0000.params ./model/deform_conv 0000.params ./model/deform_psroi 0000.params 2. To run the R FCN demo, run python ./rfcn/demo.py By default it will run Deformable R FCN and gives several prediction results, to run R FCN, use python ./rfcn/demo.py rfcn_only 3. To run the DeepLab demo, run python ./deeplab/demo.py By default it will run Deformable Deeplab and gives several prediction results, to run DeepLab, use python ./deeplab/demo.py deeplab_only 4. To visualize the offset of deformable convolution and deformable psroipooling, run python ./rfcn/deform_conv_demo.py python ./rfcn/deform_psroi_demo.py Preparation for Training & Testing For R FCN/Faster R CNN\: 1. Please download COCO and VOC 2007+2012 datasets, and make sure it looks like this: ./data/coco/ ./data/VOCdevkit/VOC2007/ ./data/VOCdevkit/VOC2012/ 2. Please download ImageNet pretrained ResNet v1 101 model manually from OneDrive , and put it under folder ./model . Make sure it looks like this: ./model/pretrained_model/resnet_v1_101 0000.params For DeepLab\: 1. Please download Cityscapes and VOC 2012 datasets and make sure it looks like this: ./data/cityscapes/ ./data/VOCdevkit/VOC2012/ 2. Please download argumented VOC 2012 annotations/image lists, and put the argumented annotations and the argumented train/val lists into: ./data/VOCdevkit/VOC2012/SegmentationClass/ ./data/VOCdevkit/VOC2012/ImageSets/Main/ , Respectively. 2. Please download ImageNet pretrained ResNet v1 101 model manually from OneDrive , and put it under folder ./model . Make sure it looks like this: ./model/pretrained_model/resnet_v1_101 0000.params Usage 1. All of our experiment settings (GPU , dataset, etc.) are kept in yaml config files at folder ./experiments/rfcn/cfgs , ./experiments/faster_rcnn/cfgs and ./experiments/deeplab/cfgs/ . 2. Eight config files have been provided so far, namely, R FCN for COCO/VOC, Deformable R FCN for COCO/VOC, Faster R CNN(2fc) for COCO/VOC, Deformable Faster R CNN(2fc) for COCO/VOC, Deeplab for Cityscapes/VOC and Deformable Deeplab for Cityscapes/VOC, respectively. We use 8 and 4 GPUs to train models on COCO and on VOC for R FCN, respectively. For deeplab, we use 4 GPUs for all experiments. 3. To perform experiments, run the python scripts with the corresponding config file as input. For example, to train and test deformable convnets on COCO with ResNet v1 101, use the following command python experiments\rfcn\rfcn_end2end_train_test.py cfg experiments\rfcn\cfgs\resnet_v1_101_coco_trainval_rfcn_dcn_end2end_ohem.yaml A cache folder would be created automatically to save the model and the log under output/rfcn_dcn_coco/ . 4. Please find more details in config files and in our code. Misc. Code has been tested under: Ubuntu 14.04 with a Maxwell Titan X GPU and Intel Xeon CPU E5 2620 v2 @ 2.10GHz Windows Server 2012 R2 with 8 K40 GPUs and Intel Xeon CPU E5 2650 v2 @ 2.60GHz Windows Server 2012 R2 with 4 Pascal Titan X GPUs and Intel Xeon CPU E5 2650 v4 @ 2.30GHz FAQ Q: It says AttributeError: 'module' object has no attribute 'DeformableConvolution' . A: This is because either you forget to copy the operators to your MXNet folder or you copy to the wrong path or you forget to re compile or you install the wrong MXNet Please print mxnet.__path__ to make sure you use correct MXNet Q: I encounter segment fault at the beginning. A: A compatibility issue has been identified between MXNet and opencv python 3.0+. We suggest that you always import cv2 first before import mxnet in the entry script. Q: I find the training speed becomes slower when training for a long time. A: It has been identified that MXNet on Windows has this problem. So we recommend to run this program on Linux. You could also stop it and resume the training process to regain the training speed if you encounter this problem. Q: Can you share your caffe implementation? A: Due to several reasons (code is based on a old, internal Caffe, port to public Caffe needs extra work, time limit, etc.). We do not plan to release our Caffe code. Since current MXNet convolution implementation is very similar to Caffe (almost the same), it is easy to port to Caffe by yourself, the core CUDA code could be kept unchanged. Anyone who wish to do it is welcome to make a pull request.",Object Detection,Object Detection 2763,Computer Vision,Computer Vision,Computer Vision,"Objects as Points Object detection, 3D detection, and pose estimation using center point detection: ! (readme/fig2.png) > Objects as Points , > Xingyi Zhou, Dequan Wang, Philipp Krähenbühl, > arXiv technical report ( arXiv 1904.07850 ) Contact: zhouxy@cs.utexas.edu (mailto:zhouxy@cs.utexas.edu). Any questions or discussions are welcomed! Abstract Detection identifies objects as axis aligned boxes in an image. Most successful object detectors enumerate a nearly exhaustive list of potential object locations and classify each. This is wasteful, inefficient, and requires additional post processing. In this paper, we take a different approach. We model an object as a single point the center point of its bounding box. Our detector uses keypoint estimation to find center points and regresses to all other object properties, such as size, 3D location, orientation, and even pose. Our center point based approach, CenterNet, is end to end differentiable, simpler, faster, and more accurate than corresponding bounding box based detectors. CenterNet achieves the best speed accuracy trade off on the MS COCO dataset, with 28.1% AP at 142 FPS, 37.4% AP at 52 FPS, and 45.1% AP with multi scale testing at 1.4 FPS. We use the same approach to estimate 3D bounding box in the KITTI benchmark and human pose on the COCO keypoint dataset. Our method performs competitively with sophisticated multi stage methods and runs in real time. Highlights Simple: One sentence method summary: use keypoint detection technic to detect the bounding box center point and regress to all other object properties like bounding box size, 3d information, and pose. Versatile: The same framework works for object detection, 3d bounding box estimation, and multi person pose estimation with minor modification. Fast: The whole process in a single network feedforward. No NMS post processing is needed. Our DLA 34 model runs at 52 FPS with 37.4 COCO AP. Strong : Our best single model achieves 45.1 AP on COCO test dev. Easy to use: We provide user friendly testing API and webcam demos. Main results Object Detection on COCO validation Backbone AP / FPS Flip AP / FPS Multi scale AP / FPS Hourglass 104 40.3 / 14 42.2 / 7.8 45.1 / 1.4 DLA 34 37.4 / 52 39.2 / 28 41.7 / 4 ResNet 101 34.6 / 45 36.2 / 25 39.3 / 4 ResNet 18 28.1 / 142 30.0 / 71 33.2 / 12 Keypoint detection on COCO validation Backbone AP FPS Hourglass 104 64.0 6.6 DLA 34 58.9 23 3D bounding box detection on KITTI validation Backbone FPS AP E AP M AP H AOS E AOS M AOS H BEV E BEV M BEV H DLA 34 32 96.9 87.8 79.2 93.9 84.3 75.7 34.0 30.5 26.8 All models and details are available in our Model zoo (readme/MODEL_ZOO.md). Installation Please refer to INSTALL.md (readme/INSTALL.md) for installation instructions. Use CenterNet We support demo for image/ image folder, video, and webcam. First, download the models (By default, ctdet_coco_dla_2x for detection and multi_pose_dla_3x for human pose estimation) from the Model zoo (readme/MODEL_ZOO.md) and put them in CenterNet_ROOT/models/ . For object detection on images/ video, run: python demo.py ctdet demo /path/to/image/or/folder/or/video load_model ../models/ctdet_coco_dla_2x.pth We provide example images in CenterNet_ROOT/images/ (from Detectron ). If set up correctly, the output should look like For webcam demo, run python demo.py ctdet demo webcam load_model ../models/ctdet_coco_dla_2x.pth Similarly, for human pose estimation, run: python demo.py multi_pose demo /path/to/image/or/folder/or/video/or/webcam load_model ../models/multi_pose_dla_3x.pth The result for the example images should look like: You can add debug 2 to visualize the heatmap outputs. You can add flip_test for flip test. To use this CenterNet in your own project, you can import sys CENTERNET_PATH /path/to/CenterNet/src/lib/ sys.path.insert(0, CENTERNET_PATH) from detectors.detector_factory import detector_factory from opts import opts MODEL_PATH /path/to/model TASK 'ctdet' or 'multi_pose' for human pose estimation opt opts().init('{} load_model {}'.format(TASK, MODEL_PATH).split(' ')) detector detector_factory opt.task (opt) img image/or/path/to/your/image/ ret detector.run(img) 'results' ret will be a python dict: {category_id : x1, y1, x2, y2, score , ... , } Benchmark Evaluation and Training After installation (readme/INSTALL.md), follow the instructions in DATA.md (readme/DATA.md) to setup the datasets. Then check GETTING_STARTED.md (readme/GETTING_STARTED.md) to reproduce the results in the paper. We provide scripts for all the experiments in the experiments (experiments) folder. Develop If you are interested in training CenterNet in a new dataset, use CenterNet in a new task, or use a new network architecture for CenterNet, please refer to DEVELOP.md (readme/DEVELOP.md). Also feel free to send us emails for discussions or suggestions. Third party implementation Keras: keras centernet from see . License CenterNet itself is released under the MIT License (refer to the LICENSE file for details). Portions of the code are borrowed from human pose estimation.pytorch (image transform, resnet), CornerNet (hourglassnet, loss functions), dla (DLA network), DCNv2 (deformable convolutions), tf faster rcnn (Pascal VOC evaluation) and kitti_eval (KITTI dataset evaluation). Please refer to the original License of these projects (See NOTICE (NOTICE)). Citation If you find this project useful for your research, please use the following BibTeX entry. @inproceedings{zhou2019objects, title {Objects as Points}, author {Zhou, Xingyi and Wang, Dequan and Kr{\ a}henb{\ u}hl, Philipp}, booktitle {arXiv preprint arXiv:1904.07850}, year {2019} }",Object Detection,Object Detection 2764,Computer Vision,Computer Vision,Computer Vision,"ExtremeNet: Training and Evaluation Code Code for bottom up object detection by grouping extreme and center points: ! (readme/teaser.png) > Bottom up Object Detection by Grouping Extreme and Center Points , > Xingyi Zhou, Jiacheng Zhuo, Philipp Krähenbühl, > CVPR 2019 (arXiv 1901.08043) This project is developed upon the CornerNet code and contains the code from Deep Extreme Cut(DEXTR) . Thanks to the original authors! Contact: zhouxy2017@gmail.com (mailto:zhouxy2017@gmail.com). Any questions or discussions are welcomed! Abstract With the advent of deep learning, object detection drifted from a bottom up to a top down recognition problem. State of the art algorithms enumerate a near exhaustive list of object locations and classify each into: object or not. In this paper, we show that bottom up approaches still perform competitively. We detect four extreme points (top most, left most, bottom most, right most) and one center point of objects using a standard keypoint estimation network. We group the five keypoints into a bounding box if they are geometrically aligned. Object detection is then a purely appearance based keypoint estimation problem, without region classification or implicit feature learning. The proposed method performs on par with the state of the art region based detection methods, with a bounding box AP of 43.2% on COCO test dev. In addition, our estimated extreme points directly span a coarse octagonal mask, with a COCO Mask AP of 18.9%, much better than the Mask AP of vanilla bounding boxes. Extreme point guided segmentation further improves this to 34.6% Mask AP. Installation The code was tested with Anaconda Python 3.6 and PyTorch ( ) v0.4.1. After install Anaconda: 1. Clone this repo: ExtremeNet_ROOT /path/to/clone/ExtremeNet git clone recursive $ExtremeNet_ROOT 2. Create an Anaconda environment using the provided package list from Cornernet . conda create name CornerNet file conda_packagelist.txt source activate CornerNet 3. Compiling NMS (originally from Faster R CNN and Soft NMS ). cd $ExtremeNet_ROOT/external make Demo Download our pre trained model and put it in cache/ . Optionally, if you want to test instance segmentation with Deep Extreme Cut , download their PASCAL + SBD pertained model and put it in cache/ . Run the demo python demo.py demo /path/to/image/or/folder show_mask Contents in are optional. By default, it runs the sample images provided in $ExtremeNet_ROOT/images/ (from Detectron ). We show the predicted extreme point heatmaps (combined four heatmaps and overlaid on the input image), the predicted center point heatmap, and the detection and octagon mask results. If setup correctly, the output will look like: If show_mask is turned on, it further pipelined with DEXTR for instance segmentation. The output will look like: Data preparation If you want to reproduce the results in the paper for benchmark evaluation and training, you will need to setup dataset. Installing MS COCO APIs cd $ExtremeNet_ROOT/data git clone coco cd $ExtremeNet_ROOT/data/coco/PythonAPI make python setup.py install user Downloading MS COCO Data Download the images (2017 Train, 2017 Val, 2017 Test) from coco website . Download annotation files (2017 train/val and test image info) from coco website . Place the data (or create symlinks) to make the data folder like: ${ExtremeNet_ROOT} data coco annotations instances_train2017.json instances_val2017.json image_info_test dev2017.json images train2017 val2017 test2017 Generate extreme point annotation from segmentation: cd $ExtremeNet_ROOT/tools/ python gen_coco_extreme_points.py It generates instances_extreme_train2017.json and instances_extreme_val2017.json in data/coco/annotations/ . Benchmark Evaluation After downloading our pre trained model and the dataset, Run the following command to evaluate object detection: python test.py ExtremeNet suffix multi_scale The results on COCO validation set should be 40.3 box AP without suffix multi_scale and 43.3 box AP with suffix multi_scale . After obtaining the detection results, run the following commands for instance segmentation: python eval_dextr_mask.py results/ExtremeNet/250000/validation/multi_scale/results.json The results on COCO validation set should be 34.6 mask AP (The evaluation will be slow). You can test with other hyper parameters by creating a new config file ( ExtremeNet .json ) in config/ . Training You will need 5x 12GB GPUs to reproduce our training. Our model is fine tuned on the 10 GPU pre trained CornerNet model . After downloading the CornerNet model and put it in cache/ , run python train.py ExtremeNet You can resume a half trained model by python train.py ExtremeNet iter xxxx Notes: Training takes about 10 days in our Titan V GPUs. Train with 150000 iterations (about 6 days) will be 0.5 AP lower. Training from scratch for the same iteration (250000) may result in 2 AP lower than fintuning from CornerNet, but can get higher performance (43.9AP on COCO val w/ multi scale testing) if trained for 500000 iterations Changing the focal loss implementation to this can accelerate training, but costs more GPU memory. Citation If you find this model useful for your research, please use the following BibTeX entry. @inproceedings{zhou2019bottomup, title {Bottom up Object Detection by Grouping Extreme and Center Points}, author {Zhou, Xingyi and Zhuo, Jiacheng and Kr{\ a}henb{\ u}hl, Philipp}, booktitle {CVPR}, year {2019} } Please also considering citing the CornerNet paper (where this code is heavily borrowed from) and Deep Extreme Cut paper (if you use the instance segmentation part). @inproceedings{law2018cornernet, title {CornerNet: Detecting Objects as Paired Keypoints}, author {Law, Hei and Deng, Jia}, booktitle {Proceedings of the European Conference on Computer Vision (ECCV)}, pages {734 750}, year {2018} } @Inproceedings{Man+18, Title {Deep Extreme Cut: From Extreme Points to Object Segmentation}, Author {K.K. Maninis and S. Caelles and J. Pont Tuset and L. {Van Gool}}, Booktitle {Computer Vision and Pattern Recognition (CVPR)}, Year {2018} }",Object Detection,Object Detection 2765,Computer Vision,Computer Vision,Computer Vision,"tf faster rcnn A TensorFlow implementation of Faster R CNN detection framework by Xinlei Chen (xinleic@cs.cmu.edu). This repository is based on the python Caffe implementation of Faster R CNN available here . Note : Several minor modifications are made when reimplementing the framework, which give potential improvements. For details about the modifications and ablative analysis, please refer to the technical report An Implementation of Faster RCNN with Study for Region Sampling . If you are seeking to reproduce the results in the original paper, please use the official code or maybe the semi official code . For details about the Faster R CNN architecture please refer to the paper Faster R CNN: Towards Real Time Object Detection with Region Proposal Networks . Detection Performance The current code supports VGG16 , Resnet V1 and Mobilenet V1 models. We mainly tested it on plain VGG16 and Resnet101 (thank you @philokey!) architecture. As the baseline, we report numbers using a single model on a single convolution layer, so no multi scale, no multi stage bounding box regression, no skip connection, no extra input is used. The only data augmentation technique is left right flipping during training following the original Faster R CNN. All models are released. With VGG16 ( conv5_3 ): Train on VOC 2007 trainval and test on VOC 2007 test, 70.8 . Train on VOC 2007+2012 trainval and test on VOC 2007 test ( R FCN schedule), 75.7 . Train on COCO 2014 trainval35k and test on minival ( Iterations : 900k/1190k), 30.2 . With Resnet101 (last conv4 ): Train on VOC 2007 trainval and test on VOC 2007 test, 75.7 . Train on VOC 2007+2012 trainval and test on VOC 2007 test (R FCN schedule), 79.8 . Train on COCO 2014 trainval35k and test on minival (900k/1190k), 35.4 . More Results: Train Mobilenet (1.0, 224) on COCO 2014 trainval35k and test on minival (900k/1190k), 21.8 . Train Resnet50 on COCO 2014 trainval35k and test on minival (900k/1190k), 32.4 . Train Resnet152 on COCO 2014 trainval35k and test on minival (900k/1190k), 36.1 . Approximate baseline setup from FPN (this repository does not contain training code for FPN yet): Train Resnet50 on COCO 2014 trainval35k and test on minival (900k/1190k), 34.2 . Train Resnet101 on COCO 2014 trainval35k and test on minival (900k/1190k), 37.4 . Train Resnet152 on COCO 2014 trainval35k and test on minival (900k/1190k), 38.2 . Note : Due to the randomness in GPU training with TensorFlow especially for VOC, the best numbers are reported (with 2 3 attempts) here. According to my experience, for COCO you can almost always get a very close number (within 0.2%) despite the randomness. The numbers are obtained with the default testing scheme which selects region proposals using non maximal suppression (TEST.MODE nms), the alternative testing scheme (TEST.MODE top) will likely result in slightly better performance (see report , for COCO it boosts 0.X AP). Since we keep the small proposals (\< 16 pixels width/height), our performance is especially good for small objects. We do not set a threshold (instead of 0.05) for a detection to be included in the final result, which increases recall. Weight decay is set to 1e 4. For other minor modifications, please check the report . Notable ones include using crop_and_resize , and excluding ground truth boxes in RoIs during training. For COCO, we find the performance improving with more iterations, and potentially better performance can be achieved with even more iterations. For Resnets, we fix the first block (total 4) when fine tuning the network, and only use crop_and_resize to resize the RoIs (7x7) without max pool (which I find useless especially for COCO). The final feature maps are average pooled for classification and regression. All batch normalization parameters are fixed. Learning rate for biases is not doubled. For Mobilenets, we fix the first five layers when fine tuning the network. All batch normalization parameters are fixed. Weight decay for Mobilenet layers is set to 4e 5. For approximate FPN baseline setup we simply resize the image with 800 pixels, add 32^2 anchors, and take 1000 proposals during testing. Check out here / here / here for the latest models, including longer COCO VGG16 models and Resnet ones. ! (data/imgs/gt.png) ! (data/imgs/pred.png) : : : : Displayed Ground Truth on Tensorboard Displayed Predictions on Tensorboard Additional features Additional features not mentioned in the report are added to make research life easier: Support for train and validation . During training, the validation data will also be tested from time to time to monitor the process and check potential overfitting. Ideally training and validation should be separate, where the model is loaded every time to test on validation. However I have implemented it in a joint way to save time and GPU memory. Though in the default setup the testing data is used for validation, no special attempts is made to overfit on testing set. Support for resuming training . I tried to store as much information as possible when snapshoting, with the purpose to resume training from the latest snapshot properly. The meta information includes current image index, permutation of images, and random state of numpy. However, when you resume training the random seed for TensorFlow will be reset (not sure how to save the random state of TensorFlow now), so it will result in a difference. Note that, the current implementation still cannot force the model to behave deterministically even with the random seeds set. Suggestion/solution is welcome and much appreciated. Support for visualization . The current implementation will summarize ground truth boxes, statistics of losses, activations and variables during training, and dump it to a separate folder for tensorboard visualization. The computing graph is also saved for debugging. Prerequisites A basic TensorFlow installation. The code follows r1.2 format. If you are using r1.0, please check out the r1.0 branch to fix the slim Resnet block issue. If you are using an older version (r0.1 r0.12), please check out the r0.12 branch. While it is not required, for experimenting the original RoI pooling (which requires modification of the C++ code in TensorFlow), you can check out my TensorFlow fork and look for tf.image.roi_pooling . Python packages you might not have: cython , opencv python , easydict (similar to py faster rcnn ). For easydict make sure you have the right version. I use 1.6. Docker users: Since the recent upgrade, the docker image on docker hub is no longer valid. However, you can still build your own image by using dockerfile located at docker folder (cuda 8 version, as it is required by TensorFlow r1.0.) And make sure following TensorFlow installation to install and use nvidia docker Last, after launching the container, you have to build the Cython modules within the running container. Installation 1. Clone the repository Shell git clone 2. Update your arch in setup script to match your GPU Shell cd tf faster rcnn/lib Change the GPU architecture ( arch) if necessary vim setup.py GPU model Architecture TitanX (Maxwell/Pascal) sm_52 GTX 960M sm_50 GTX 1080 (Ti) sm_61 Grid K520 (AWS g2.2xlarge) sm_30 Tesla K80 (AWS p2.xlarge) sm_37 Note : You are welcome to contribute the settings on your end if you have made the code work properly on other GPUs. Also even if you are only using CPU TensorFlow, GPU based code (for NMS) will be used by default, so please set USE_GPU_NMS False to get the correct output. 3. Build the Cython modules Shell make clean make cd .. 4. Install the Python COCO API . The code requires the API to access COCO dataset. Shell cd data git clone cd coco/PythonAPI make cd ../../.. Setup data Please follow the instructions of py faster rcnn here to setup VOC and COCO datasets (Part of COCO is done). The steps involve downloading data and optionally creating soft links in the data folder. Since Faster R CNN does not rely on pre computed proposals, it is safe to ignore the steps that setup proposals. If you find it useful, the data/cache folder created on my side is also shared here . Demo and Test with pre trained models 1. Download pre trained model Shell Resnet101 for voc pre trained on 07+12 set ./data/scripts/fetch_faster_rcnn_models.sh Note : if you cannot download the models through the link, or you want to try more models, you can check out the following solutions and optionally update the downloading script: Another server here . Google drive here . 2. Create a folder and a soft link to use the pre trained model Shell NET res101 TRAIN_IMDB voc_2007_trainval+voc_2012_trainval mkdir p output/${NET}/${TRAIN_IMDB} cd output/${NET}/${TRAIN_IMDB} ln s ../../../data/voc_2007_trainval+voc_2012_trainval ./default cd ../../.. 3. Demo for testing on custom images Shell at repository root GPU_ID 0 CUDA_VISIBLE_DEVICES ${GPU_ID} ./tools/demo.py Note : Resnet101 testing probably requires several gigabytes of memory, so if you encounter memory capacity issues, please install it with CPU support only. Refer to Issue 25 . 4. Test with pre trained Resnet101 models Shell GPU_ID 0 ./experiments/scripts/test_faster_rcnn.sh $GPU_ID pascal_voc_0712 res101 Note : If you cannot get the reported numbers (79.8 on my side), then probably the NMS function is compiled improperly, refer to Issue 5 . Train your own model 1. Download pre trained models and weights. The current code support VGG16 and Resnet V1 models. Pre trained models are provided by slim, you can get the pre trained models here and set them in the data/imagenet_weights folder. For example for VGG16 model, you can set up like: Shell mkdir p data/imagenet_weights cd data/imagenet_weights wget v tar xzvf vgg_16_2016_08_28.tar.gz mv vgg_16.ckpt vgg16.ckpt cd ../.. For Resnet101, you can set up like: Shell mkdir p data/imagenet_weights cd data/imagenet_weights wget v tar xzvf resnet_v1_101_2016_08_28.tar.gz mv resnet_v1_101.ckpt res101.ckpt cd ../.. 2. Train (and test, evaluation) Shell ./experiments/scripts/train_faster_rcnn.sh GPU_ID DATASET NET GPU_ID is the GPU you want to test on NET in {vgg16, res50, res101, res152} is the network arch to use DATASET {pascal_voc, pascal_voc_0712, coco} is defined in train_faster_rcnn.sh Examples: ./experiments/scripts/train_faster_rcnn.sh 0 pascal_voc vgg16 ./experiments/scripts/train_faster_rcnn.sh 1 coco res101 Note : Please double check you have deleted soft link to the pre trained models before training. If you find NaNs during training, please refer to Issue 86 . Also if you want to have multi gpu support, check out Issue 121 . 3. Visualization with Tensorboard Shell tensorboard logdir tensorboard/vgg16/voc_2007_trainval/ port 7001 & tensorboard logdir tensorboard/vgg16/coco_2014_train+coco_2014_valminusminival/ port 7002 & 4. Test and evaluate Shell ./experiments/scripts/test_faster_rcnn.sh GPU_ID DATASET NET GPU_ID is the GPU you want to test on NET in {vgg16, res50, res101, res152} is the network arch to use DATASET {pascal_voc, pascal_voc_0712, coco} is defined in test_faster_rcnn.sh Examples: ./experiments/scripts/test_faster_rcnn.sh 0 pascal_voc vgg16 ./experiments/scripts/test_faster_rcnn.sh 1 coco res101 5. You can use tools/reval.sh for re evaluation By default, trained networks are saved under: output/ NET / DATASET /default/ Test outputs are saved under: output/ NET / DATASET /default/ SNAPSHOT / Tensorboard information for train and validation is saved under: tensorboard/ NET / DATASET /default/ tensorboard/ NET / DATASET /default_val/ The default number of training iterations is kept the same to the original Faster R CNN for VOC 2007, however I find it is beneficial to train longer (see report for COCO), probably due to the fact that the image batch size is one. For VOC 07+12 we switch to a 80k/110k schedule following R FCN . Also note that due to the nondeterministic nature of the current implementation, the performance can vary a bit, but in general it should be within 1% of the reported numbers for VOC, and 0.2% of the reported numbers for COCO. Suggestions/Contributions are welcome. Citation If you find this implementation or the analysis conducted in our report helpful, please consider citing: @article{chen17implementation, Author {Xinlei Chen and Abhinav Gupta}, Title {An Implementation of Faster RCNN with Study for Region Sampling}, Journal {arXiv preprint arXiv:1702.02138}, Year {2017} } Or for a formal paper, Spatial Memory Network : @article{chen2017spatial, title {Spatial Memory for Context Reasoning in Object Detection}, author {Chen, Xinlei and Gupta, Abhinav}, journal {arXiv preprint arXiv:1704.04224}, year {2017} } For convenience, here is the Faster R CNN citation: @inproceedings{renNIPS15fasterrcnn, Author {Shaoqing Ren and Kaiming He and Ross Girshick and Jian Sun}, Title {Faster {R CNN}: Towards Real Time Object Detection with Region Proposal Networks}, Booktitle {Advances in Neural Information Processing Systems ({NIPS})}, Year {2015} }",Object Detection,Object Detection 2772,Computer Vision,Computer Vision,Computer Vision,"_SqueezeSeg_: Convolutional Neural Nets with Recurrent CRF for Real Time Road Object Segmentation from 3D LiDAR Point Cloud By Bichen Wu, Alvin Wan, Xiangyu Yue, Kurt Keutzer (UC Berkeley) This repository contains a tensorflow implementation of SqueezeSeg, a convolutional neural network model for LiDAR segmentation. A demonstration of SqueezeSeg can be found below: Please refer to our video for a high level introduction of this work: For more details, please refer to our paper: If you find this work useful for your research, please consider citing: @article{wu2017squeezeseg, title {Squeezeseg: Convolutional neural nets with recurrent crf for real time road object segmentation from 3d lidar point cloud}, author {Wu, Bichen and Wan, Alvin and Yue, Xiangyu and Keutzer, Kurt}, journal {arXiv preprint arXiv:1710.07368}, year {2017} } License SqueezeSeg is released under the BSD license (See LICENSE for details). The dataset used for training, evaluation, and demostration of SqueezeSeg is modified from KITTI raw dataset. For your convenience, we provide links to download the converted dataset, which is distrubited under the Creative Commons Attribution NonCommercial ShareAlike 3.0 License . Installation: The instructions are tested on Ubuntu 16.04 with python 2.7 and tensorflow 1.0 with GPU support. Clone the SqueezeSeg repository: Shell git clone We name the root directory as $SQSG_ROOT . Setup virtual environment: 1. By default we use Python2.7. Create the virtual environment Shell virtualenv env 2. Activate the virtual environment Shell source env/bin/activate Use pip to install required Python packages: Shell pip install r requirements.txt Demo: To run the demo script: Shell cd $SQSG_ROOT/ python ./src/demo.py If the installation is correct, the detector should write the detection results as well as 2D label maps to $SQSG_ROOT/data/samples_out . Here are examples of the output label map overlaped with the projected LiDAR signal. Green masks indicate clusters corresponding to cars and blue masks indicate cyclists. Training/Validation First, download training and validation data (3.9 GB) from this link . This dataset contains LiDAR point cloud projected to a 2D spherical surface. Refer to our paper for details of the data conversion procedure. This dataset is converted from KITTI raw dataset and is distrubited under the Creative Commons Attribution NonCommercial ShareAlike 3.0 License . Shell cd $SQSG_ROOT/data/ wget tar xzvf lidar_2d.tgz rm lidar_2d.tgz Now we can start training by Shell cd $SQSG_ROOT/ ./scripts/train.sh gpu 0 image_set train log_dir ./log/ Training logs and model checkpoints will be saved in the log directory. We can launch evaluation script simutaneously with training Shell cd $SQSG_ROOT/ ./scripts/eval.sh gpu 1 image_set val log_dir ./log/ We can monitor the training process using tensorboard. Shell tensorboard logdir $SQSG_ROOT/log/ Tensorboard displays information such as training loss, evaluation accuracy, visualization of detection results in the training process, which are helpful for debugging and tunning models, as shown below: ! alt text ! alt text",Object Detection,Object Detection 2774,Computer Vision,Computer Vision,Computer Vision,"py faster rcnn has been deprecated. Please see Detectron , which includes an implementation of Mask R CNN . Disclaimer The official Faster R CNN code (written in MATLAB) is available here . If your goal is to reproduce the results in our NIPS 2015 paper, please use the official code . This repository contains a Python reimplementation of the MATLAB code. This Python implementation is built on a fork of Fast R CNN . There are slight differences between the two implementations. In particular, this Python port is 10% slower at test time, because some operations execute on the CPU in Python layers (e.g., 220ms / image vs. 200ms / image for VGG16) gives similar, but not exactly the same, mAP as the MATLAB version is not compatible with models trained using the MATLAB code due to the minor implementation differences includes approximate joint training that is 1.5x faster than alternating optimization (for VGG16) see these slides for more information Faster R CNN: Towards Real Time Object Detection with Region Proposal Networks By Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun (Microsoft Research) This Python implementation contains contributions from Sean Bell (Cornell) written during an MSR internship. Please see the official README.md for more details. Faster R CNN was initially described in an arXiv tech report and was subsequently published in NIPS 2015. License Faster R CNN is released under the MIT License (refer to the LICENSE file for details). Citing Faster R CNN If you find Faster R CNN useful in your research, please consider citing: @inproceedings{renNIPS15fasterrcnn, Author {Shaoqing Ren and Kaiming He and Ross Girshick and Jian Sun}, Title {Faster {R CNN}: Towards Real Time Object Detection with Region Proposal Networks}, Booktitle {Advances in Neural Information Processing Systems ({NIPS})}, Year {2015} } Contents 1. Requirements: software ( requirements software) 2. Requirements: hardware ( requirements hardware) 3. Basic installation ( installation sufficient for the demo) 4. Demo ( demo) 5. Beyond the demo: training and testing ( beyond the demo installation for training and testing models) 6. Usage ( usage) Requirements: software NOTE If you are having issues compiling and you are using a recent version of CUDA/cuDNN, please consult this issue for a workaround 1. Requirements for Caffe and pycaffe (see: Caffe installation instructions ) Note: Caffe must be built with support for Python layers! make In your Makefile.config, make sure to have this line uncommented WITH_PYTHON_LAYER : 1 Unrelatedly, it's also recommended that you use CUDNN USE_CUDNN : 1 You can download my Makefile.config for reference. 2. Python packages you might not have: cython , python opencv , easydict 3. Optional MATLAB is required for official PASCAL VOC evaluation only. The code now includes unofficial Python evaluation code. Requirements: hardware 1. For training smaller networks (ZF, VGG_CNN_M_1024) a good GPU (e.g., Titan, K20, K40, ...) with at least 3G of memory suffices 2. For training Fast R CNN with VGG16, you'll need a K40 (11G of memory) 3. For training the end to end version of Faster R CNN with VGG16, 3G of GPU memory is sufficient (using CUDNN) Installation (sufficient for the demo) 1. Clone the Faster R CNN repository Shell Make sure to clone with recursive git clone recursive 2. We'll call the directory that you cloned Faster R CNN into FRCN_ROOT Ignore notes 1 and 2 if you followed step 1 above. Note 1: If you didn't clone Faster R CNN with the recursive flag, then you'll need to manually clone the caffe fast rcnn submodule: Shell git submodule update init recursive Note 2: The caffe fast rcnn submodule needs to be on the faster rcnn branch (or equivalent detached state). This will happen automatically if you followed step 1 instructions . 3. Build the Cython modules Shell cd $FRCN_ROOT/lib make 4. Build Caffe and pycaffe Shell cd $FRCN_ROOT/caffe fast rcnn Now follow the Caffe installation instructions here: If you're experienced with Caffe and have all of the requirements installed and your Makefile.config in place, then simply do: make j8 && make pycaffe 5. Download pre computed Faster R CNN detectors Shell cd $FRCN_ROOT ./data/scripts/fetch_faster_rcnn_models.sh This will populate the $FRCN_ROOT/data folder with faster_rcnn_models . See data/README.md for details. These models were trained on VOC 2007 trainval. Demo After successfully completing basic installation ( installation sufficient for the demo) , you'll be ready to run the demo. To run the demo Shell cd $FRCN_ROOT ./tools/demo.py The demo performs detection using a VGG16 network trained for detection on PASCAL VOC 2007. Beyond the demo: installation for training and testing models 1. Download the training, validation, test data and VOCdevkit Shell wget wget wget 2. Extract all of these tars into one directory named VOCdevkit Shell tar xvf VOCtrainval_06 Nov 2007.tar tar xvf VOCtest_06 Nov 2007.tar tar xvf VOCdevkit_08 Jun 2007.tar 3. It should have this basic structure Shell $VOCdevkit/ development kit $VOCdevkit/VOCcode/ VOC utility code $VOCdevkit/VOC2007 image sets, annotations, etc. ... and several other directories ... 4. Create symlinks for the PASCAL VOC dataset Shell cd $FRCN_ROOT/data ln s $VOCdevkit VOCdevkit2007 Using symlinks is a good idea because you will likely want to share the same PASCAL dataset installation between multiple projects. 5. Optional follow similar steps to get PASCAL VOC 2010 and 2012 6. Optional If you want to use COCO, please see some notes under data/README.md 7. Follow the next sections to download pre trained ImageNet models Download pre trained ImageNet models Pre trained ImageNet models can be downloaded for the three networks described in the paper: ZF and VGG16. Shell cd $FRCN_ROOT ./data/scripts/fetch_imagenet_models.sh VGG16 comes from the Caffe Model Zoo , but is provided here for your convenience. ZF was trained at MSRA. Usage To train and test a Faster R CNN detector using the alternating optimization algorithm from our NIPS 2015 paper, use experiments/scripts/faster_rcnn_alt_opt.sh . Output is written underneath $FRCN_ROOT/output . Shell cd $FRCN_ROOT ./experiments/scripts/faster_rcnn_alt_opt.sh GPU_ID NET set ... GPU_ID is the GPU you want to train on NET in {ZF, VGG_CNN_M_1024, VGG16} is the network arch to use set ... allows you to specify fast_rcnn.config options, e.g. set EXP_DIR seed_rng1701 RNG_SEED 1701 ( alt opt refers to the alternating optimization training algorithm described in the NIPS paper.) To train and test a Faster R CNN detector using the approximate joint training method, use experiments/scripts/faster_rcnn_end2end.sh . Output is written underneath $FRCN_ROOT/output . Shell cd $FRCN_ROOT ./experiments/scripts/faster_rcnn_end2end.sh GPU_ID NET set ... GPU_ID is the GPU you want to train on NET in {ZF, VGG_CNN_M_1024, VGG16} is the network arch to use set ... allows you to specify fast_rcnn.config options, e.g. set EXP_DIR seed_rng1701 RNG_SEED 1701 This method trains the RPN module jointly with the Fast R CNN network, rather than alternating between training the two. It results in faster ( 1.5x speedup) training times and similar detection accuracy. See these slides for more details. Artifacts generated by the scripts in tools are written in this directory. Trained Fast R CNN networks are saved under: output/ / / Test outputs are saved under: output/ / / /",Object Detection,Object Detection 2781,Computer Vision,Computer Vision,Computer Vision,"A Fast RCNN: Hard Positive Generation via Adversary for Object Detection By Xiaolong Wang, Abhinav Shrivastava, and Abhinav Gupta Introduction This is a Caffe based version of A Fast RCNN ( arxiv_link ). Although we originally implement it on torch, this Caffe re implementation is much simpler, faster and easier to use. We release the code for training A Fast RCNN with Adversarial Spatial Dropout Network. License This code is released under the MIT License (refer to the LICENSE file for details). Citing If you find this useful in your research, please consider citing: @inproceedings{WangCVPR17afrcnn, Author {Xiaolong Wang and Abhinav Shrivastava and Abhinav Gupta}, Title {A Fast RCNN: Hard Positive Generation via Adversary for Object Detection}, Booktitle {Conference on Computer Vision and Pattern Recognition ({CVPR})}, Year {2017} } Disclaimer This implementation is built on a fork of the OHEM code ( here ), which in turn builds on the Faster R CNN Python code ( here ) and Fast R CNN ( here ). Please cite the appropriate papers depending on which part of the code and/or model you are using. Results Approach training data test data mAP Fast R CNN (FRCN) VOC 07 trainval VOC 07 test 67.6 FRCN with adversary VOC 07 trainval VOC 07 test 70.8 Note : The reported results are based on the VGG16 network. Installation Please follow the exact installation and download the VOC data as the Faster R CNN Python code ( here ). Usage To run the code, one can simply do, Shell ./train.sh It includes 3 stage of training: Shell ./experiments/scripts/fast_rcnn_std.sh GPU_ID VGG16 pascal_voc which is used for training a standard Fast RCNN for 10K iterations, you can download my model and logs for this step. Shell ./experiments/scripts/fast_rcnn_adv_pretrain.sh GPU_ID VGG16 pascal_voc which is a pre training stage for the adversarial network, you can download my model and logs for this step. Shell ./copy_model.h which is used to copy the weights of the above two models to initialize the joint model. Shell ./experiments/scripts/fast_rcnn_adv.sh GPU_ID VGG16 pascal_voc which is joint training of the detector and the adversarial network, you can download my model and logs for this step.",Object Detection,Object Detection 2782,Computer Vision,Computer Vision,Computer Vision,"py faster rcnn has been deprecated. Please see Detectron , which includes an implementation of Mask R CNN . Disclaimer The official Faster R CNN code (written in MATLAB) is available here . If your goal is to reproduce the results in our NIPS 2015 paper, please use the official code . This repository contains a Python reimplementation of the MATLAB code. This Python implementation is built on a fork of Fast R CNN . There are slight differences between the two implementations. In particular, this Python port is 10% slower at test time, because some operations execute on the CPU in Python layers (e.g., 220ms / image vs. 200ms / image for VGG16) gives similar, but not exactly the same, mAP as the MATLAB version is not compatible with models trained using the MATLAB code due to the minor implementation differences includes approximate joint training that is 1.5x faster than alternating optimization (for VGG16) see these slides for more information Faster R CNN: Towards Real Time Object Detection with Region Proposal Networks By Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun (Microsoft Research) This Python implementation contains contributions from Sean Bell (Cornell) written during an MSR internship. Please see the official README.md for more details. Faster R CNN was initially described in an arXiv tech report and was subsequently published in NIPS 2015. License Faster R CNN is released under the MIT License (refer to the LICENSE file for details). Citing Faster R CNN If you find Faster R CNN useful in your research, please consider citing: @inproceedings{renNIPS15fasterrcnn, Author {Shaoqing Ren and Kaiming He and Ross Girshick and Jian Sun}, Title {Faster {R CNN}: Towards Real Time Object Detection with Region Proposal Networks}, Booktitle {Advances in Neural Information Processing Systems ({NIPS})}, Year {2015} } Contents 1. Requirements: software ( requirements software) 2. Requirements: hardware ( requirements hardware) 3. Basic installation ( installation sufficient for the demo) 4. Demo ( demo) 5. Beyond the demo: training and testing ( beyond the demo installation for training and testing models) 6. Usage ( usage) Requirements: software NOTE If you are having issues compiling and you are using a recent version of CUDA/cuDNN, please consult this issue for a workaround 1. Requirements for Caffe and pycaffe (see: Caffe installation instructions ) Note: Caffe must be built with support for Python layers! make In your Makefile.config, make sure to have this line uncommented WITH_PYTHON_LAYER : 1 Unrelatedly, it's also recommended that you use CUDNN USE_CUDNN : 1 You can download my Makefile.config for reference. 2. Python packages you might not have: cython , python opencv , easydict 3. Optional MATLAB is required for official PASCAL VOC evaluation only. The code now includes unofficial Python evaluation code. Requirements: hardware 1. For training smaller networks (ZF, VGG_CNN_M_1024) a good GPU (e.g., Titan, K20, K40, ...) with at least 3G of memory suffices 2. For training Fast R CNN with VGG16, you'll need a K40 (11G of memory) 3. For training the end to end version of Faster R CNN with VGG16, 3G of GPU memory is sufficient (using CUDNN) Installation (sufficient for the demo) 1. Clone the Faster R CNN repository Shell Make sure to clone with recursive git clone recursive 2. We'll call the directory that you cloned Faster R CNN into FRCN_ROOT Ignore notes 1 and 2 if you followed step 1 above. Note 1: If you didn't clone Faster R CNN with the recursive flag, then you'll need to manually clone the caffe fast rcnn submodule: Shell git submodule update init recursive Note 2: The caffe fast rcnn submodule needs to be on the faster rcnn branch (or equivalent detached state). This will happen automatically if you followed step 1 instructions . 3. Build the Cython modules Shell cd $FRCN_ROOT/lib make 4. Build Caffe and pycaffe Shell cd $FRCN_ROOT/caffe fast rcnn Now follow the Caffe installation instructions here: If you're experienced with Caffe and have all of the requirements installed and your Makefile.config in place, then simply do: make j8 && make pycaffe 5. Download pre computed Faster R CNN detectors Shell cd $FRCN_ROOT ./data/scripts/fetch_faster_rcnn_models.sh This will populate the $FRCN_ROOT/data folder with faster_rcnn_models . See data/README.md for details. These models were trained on VOC 2007 trainval. Demo After successfully completing basic installation ( installation sufficient for the demo) , you'll be ready to run the demo. To run the demo Shell cd $FRCN_ROOT ./tools/demo.py The demo performs detection using a VGG16 network trained for detection on PASCAL VOC 2007. Beyond the demo: installation for training and testing models 1. Download the training, validation, test data and VOCdevkit Shell wget wget wget 2. Extract all of these tars into one directory named VOCdevkit Shell tar xvf VOCtrainval_06 Nov 2007.tar tar xvf VOCtest_06 Nov 2007.tar tar xvf VOCdevkit_08 Jun 2007.tar 3. It should have this basic structure Shell $VOCdevkit/ development kit $VOCdevkit/VOCcode/ VOC utility code $VOCdevkit/VOC2007 image sets, annotations, etc. ... and several other directories ... 4. Create symlinks for the PASCAL VOC dataset Shell cd $FRCN_ROOT/data ln s $VOCdevkit VOCdevkit2007 Using symlinks is a good idea because you will likely want to share the same PASCAL dataset installation between multiple projects. 5. Optional follow similar steps to get PASCAL VOC 2010 and 2012 6. Optional If you want to use COCO, please see some notes under data/README.md 7. Follow the next sections to download pre trained ImageNet models Download pre trained ImageNet models Pre trained ImageNet models can be downloaded for the three networks described in the paper: ZF and VGG16. Shell cd $FRCN_ROOT ./data/scripts/fetch_imagenet_models.sh VGG16 comes from the Caffe Model Zoo , but is provided here for your convenience. ZF was trained at MSRA. Usage To train and test a Faster R CNN detector using the alternating optimization algorithm from our NIPS 2015 paper, use experiments/scripts/faster_rcnn_alt_opt.sh . Output is written underneath $FRCN_ROOT/output . Shell cd $FRCN_ROOT ./experiments/scripts/faster_rcnn_alt_opt.sh GPU_ID NET set ... GPU_ID is the GPU you want to train on NET in {ZF, VGG_CNN_M_1024, VGG16} is the network arch to use set ... allows you to specify fast_rcnn.config options, e.g. set EXP_DIR seed_rng1701 RNG_SEED 1701 ( alt opt refers to the alternating optimization training algorithm described in the NIPS paper.) To train and test a Faster R CNN detector using the approximate joint training method, use experiments/scripts/faster_rcnn_end2end.sh . Output is written underneath $FRCN_ROOT/output . Shell cd $FRCN_ROOT ./experiments/scripts/faster_rcnn_end2end.sh GPU_ID NET set ... GPU_ID is the GPU you want to train on NET in {ZF, VGG_CNN_M_1024, VGG16} is the network arch to use set ... allows you to specify fast_rcnn.config options, e.g. set EXP_DIR seed_rng1701 RNG_SEED 1701 This method trains the RPN module jointly with the Fast R CNN network, rather than alternating between training the two. It results in faster ( 1.5x speedup) training times and similar detection accuracy. See these slides for more details. Artifacts generated by the scripts in tools are written in this directory. Trained Fast R CNN networks are saved under: output/ / / Test outputs are saved under: output/ / / /",Object Detection,Object Detection 2783,Computer Vision,Computer Vision,Computer Vision,"Disclaimer The official Faster R CNN code (written in MATLAB) is available here . If your goal is to reproduce the results in our NIPS 2015 paper, please use the official code . This repository contains a Python reimplementation of the MATLAB code. This Python implementation is built on a fork of Fast R CNN . There are slight differences between the two implementations. In particular, this Python port is 10% slower at test time, because some operations execute on the CPU in Python layers (e.g., 220ms / image vs. 200ms / image for VGG16) gives similar, but not exactly the same, mAP as the MATLAB version is not compatible with models trained using the MATLAB code due to the minor implementation differences includes approximate joint training that is 1.5x faster than alternating optimization (for VGG16) see these slides for more information Faster R CNN: Towards Real Time Object Detection with Region Proposal Networks By Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun (Microsoft Research) This Python implementation contains contributions from Sean Bell (Cornell) written during an MSR internship. Please see the official README.md for more details. Faster R CNN was initially described in an arXiv tech report and was subsequently published in NIPS 2015. License Faster R CNN is released under the MIT License (refer to the LICENSE file for details). Citing Faster R CNN If you find Faster R CNN useful in your research, please consider citing: @inproceedings{renNIPS15fasterrcnn, Author {Shaoqing Ren and Kaiming He and Ross Girshick and Jian Sun}, Title {Faster {R CNN}: Towards Real Time Object Detection with Region Proposal Networks}, Booktitle {Advances in Neural Information Processing Systems ({NIPS})}, Year {2015} } Contents 1. Requirements: software ( requirements software) 2. Requirements: hardware ( requirements hardware) 3. Basic installation ( installation sufficient for the demo) 4. Demo ( demo) 5. Beyond the demo: training and testing ( beyond the demo installation for training and testing models) 6. Usage ( usage) Requirements: software NOTE If you are having issues compiling and you are using a recent version of CUDA/cuDNN, please consult this issue for a workaround 1. Requirements for Caffe and pycaffe (see: Caffe installation instructions ) Note: Caffe must be built with support for Python layers! make In your Makefile.config, make sure to have this line uncommented WITH_PYTHON_LAYER : 1 Unrelatedly, it's also recommended that you use CUDNN USE_CUDNN : 1 You can download my Makefile.config for reference. 2. Python packages you might not have: cython , python opencv , easydict 3. Optional MATLAB is required for official PASCAL VOC evaluation only. The code now includes unofficial Python evaluation code. Requirements: hardware 1. For training smaller networks (ZF, VGG_CNN_M_1024) a good GPU (e.g., Titan, K20, K40, ...) with at least 3G of memory suffices 2. For training Fast R CNN with VGG16, you'll need a K40 (11G of memory) 3. For training the end to end version of Faster R CNN with VGG16, 3G of GPU memory is sufficient (using CUDNN) Installation (sufficient for the demo) 1. Clone the Faster R CNN repository Shell Make sure to clone with recursive git clone recursive 2. We'll call the directory that you cloned Faster R CNN into FRCN_ROOT Ignore notes 1 and 2 if you followed step 1 above. Note 1: If you didn't clone Faster R CNN with the recursive flag, then you'll need to manually clone the caffe fast rcnn submodule: Shell git submodule update init recursive Note 2: The caffe fast rcnn submodule needs to be on the faster rcnn branch (or equivalent detached state). This will happen automatically if you followed step 1 instructions . 3. Build the Cython modules Shell cd $FRCN_ROOT/lib make 4. Build Caffe and pycaffe Shell cd $FRCN_ROOT/caffe fast rcnn Now follow the Caffe installation instructions here: If you're experienced with Caffe and have all of the requirements installed and your Makefile.config in place, then simply do: make j8 && make pycaffe 5. Download pre computed Faster R CNN detectors Shell cd $FRCN_ROOT ./data/scripts/fetch_faster_rcnn_models.sh This will populate the $FRCN_ROOT/data folder with faster_rcnn_models . See data/README.md for details. These models were trained on VOC 2007 trainval. Demo After successfully completing basic installation ( installation sufficient for the demo) , you'll be ready to run the demo. To run the demo Shell cd $FRCN_ROOT ./tools/demo.py The demo performs detection using a VGG16 network trained for detection on PASCAL VOC 2007. Beyond the demo: installation for training and testing models 1. Download the training, validation, test data and VOCdevkit Shell wget wget wget 2. Extract all of these tars into one directory named VOCdevkit Shell tar xvf VOCtrainval_06 Nov 2007.tar tar xvf VOCtest_06 Nov 2007.tar tar xvf VOCdevkit_08 Jun 2007.tar 3. It should have this basic structure Shell $VOCdevkit/ development kit $VOCdevkit/VOCcode/ VOC utility code $VOCdevkit/VOC2007 image sets, annotations, etc. ... and several other directories ... 4. Create symlinks for the PASCAL VOC dataset Shell cd $FRCN_ROOT/data ln s $VOCdevkit VOCdevkit2007 Using symlinks is a good idea because you will likely want to share the same PASCAL dataset installation between multiple projects. 5. Optional follow similar steps to get PASCAL VOC 2010 and 2012 6. Optional If you want to use COCO, please see some notes under data/README.md 7. Follow the next sections to download pre trained ImageNet models Download pre trained ImageNet models Pre trained ImageNet models can be downloaded for the three networks described in the paper: ZF and VGG16. Shell cd $FRCN_ROOT ./data/scripts/fetch_imagenet_models.sh VGG16 comes from the Caffe Model Zoo , but is provided here for your convenience. ZF was trained at MSRA. Usage To train and test a Faster R CNN detector using the alternating optimization algorithm from our NIPS 2015 paper, use experiments/scripts/faster_rcnn_alt_opt.sh . Output is written underneath $FRCN_ROOT/output . Shell cd $FRCN_ROOT ./experiments/scripts/faster_rcnn_alt_opt.sh GPU_ID NET set ... GPU_ID is the GPU you want to train on NET in {ZF, VGG_CNN_M_1024, VGG16} is the network arch to use set ... allows you to specify fast_rcnn.config options, e.g. set EXP_DIR seed_rng1701 RNG_SEED 1701 ( alt opt refers to the alternating optimization training algorithm described in the NIPS paper.) To train and test a Faster R CNN detector using the approximate joint training method, use experiments/scripts/faster_rcnn_end2end.sh . Output is written underneath $FRCN_ROOT/output . Shell cd $FRCN_ROOT ./experiments/scripts/faster_rcnn_end2end.sh GPU_ID NET set ... GPU_ID is the GPU you want to train on NET in {ZF, VGG_CNN_M_1024, VGG16} is the network arch to use set ... allows you to specify fast_rcnn.config options, e.g. set EXP_DIR seed_rng1701 RNG_SEED 1701 This method trains the RPN module jointly with the Fast R CNN network, rather than alternating between training the two. It results in faster ( 1.5x speedup) training times and similar detection accuracy. See these slides for more details. Artifacts generated by the scripts in tools are written in this directory. Trained Fast R CNN networks are saved under: output/ / / Test outputs are saved under: output/ / / /",Object Detection,Object Detection 2791,Computer Vision,Computer Vision,Computer Vision,"Thanks for Xiaoxi Wang providing contributions. Important notice: If you used the master branch before Sep. 26 2017 and its corresponding pretrained model, PLEASE PAY ATTENTION : The old master branch in now under old_master, you can still run the code and download the pretrained model, but the pretrained model for that old master is not compatible to the current master! The main differences between new and old master branch are in this two commits: 9d4c24e , c899ce7 The change is related to this issue ; master now matches all the details in tf faster rcnn so that we can now convert pretrained tf model to pytorch model. pytorch faster rcnn A pytorch implementation of faster RCNN detection framework based on Xinlei Chen's tf faster rcnn . Xinlei Chen's repository is based on the python Caffe implementation of faster RCNN available here . Note : Several minor modifications are made when reimplementing the framework, which give potential improvements. For details about the modifications and ablative analysis, please refer to the technical report An Implementation of Faster RCNN with Study for Region Sampling . If you are seeking to reproduce the results in the original paper, please use the official code or maybe the semi official code . For details about the faster RCNN architecture please refer to the paper Faster R CNN: Towards Real Time Object Detection with Region Proposal Networks . Detection Performance The current code supports VGG16 , Resnet V1 and Mobilenet V1 models. We mainly tested it on plain VGG16 and Resnet101 architecture. As the baseline, we report numbers using a single model on a single convolution layer, so no multi scale, no multi stage bounding box regression, no skip connection, no extra input is used. The only data augmentation technique is left right flipping during training following the original Faster RCNN. All models are released. With VGG16 ( conv5_3 ): Train on VOC 2007 trainval and test on VOC 2007 test, 71.22 (from scratch) 70.75 (converted) ( 70.8 for tf faster rcnn). Train on VOC 2007+2012 trainval and test on VOC 2007 test ( R FCN schedule), 75.33 (from scratch) 75.27 (converted) ( 75.7 for tf faster rcnn). Train on COCO 2014 trainval35k and test on minival (900k/1190k) 29.2 (from scratch) 30.1 (converted) ( 30.2 for tf faster rcnn). With Resnet101 (last conv4 ): Train on VOC 2007 trainval and test on VOC 2007 test, 75.29 (from scratch) 75.76 (converted) ( 75.7 for tf faster rcnn). Train on VOC 2007+2012 trainval and test on VOC 2007 test (R FCN schedule), 79.26 (from scratch) 79.78 (converted) ( 79.8 for tf faster rcnn). Train on COCO 2014 trainval35k and test on minival (800k/1190k), 35.1 (from scratch) 35.4 (converted) ( 35.4 for tf faster rcnn). More Results: Train Mobilenet (1.0, 224) on COCO 2014 trainval35k and test on minival (900k/1190k), 21.9 (converted) ( 21.8 for tf faster rcnn). Train Resnet50 on COCO 2014 trainval35k and test on minival (900k/1190k), 32.4 (converted) ( 32.4 for tf faster rcnn). Train Resnet152 on COCO 2014 trainval35k and test on minival (900k/1190k), 36.7 (converted) ( 36.1 for tf faster rcnn). Approximate baseline setup from FPN (this repository does not contain training code for FPN yet): Train Resnet50 on COCO 2014 trainval35k and test on minival (900k/1190k), 34.2 . Train Resnet101 on COCO 2014 trainval35k and test on minival (900k/1190k), 37.4 . Train Resnet152 on COCO 2014 trainval35k and test on minival (900k/1190k), 38.2 . Note : Due to the randomness in GPU training especially for VOC, the best numbers are reported (with 2 3 attempts) here. According to Xinlei's experience, for COCO you can almost always get a very close number (within 0.2%) despite the randomness. The numbers are obtained with the default testing scheme which selects region proposals using non maximal suppression (TEST.MODE nms), the alternative testing scheme (TEST.MODE top) will likely result in slightly better performance (see report , for COCO it boosts 0.X AP). Since we keep the small proposals (\ Another server here . Google drive here . (Optional) Instead of downloading my pretrained or converted model, you can also convert from tf faster rcnn model. You can download the tensorflow pretrained model from tf faster rcnn . Then run: Shell python tools/convert_from_tensorflow.py tensorflow_model resnet_model.ckpt python tools/convert_from_tensorflow_vgg.py tensorflow_model vgg_model.ckpt This script will create a .pth file with the same name in the same folder as the tensorflow model. 2. Create a folder and a soft link to use the pre trained model Shell NET res101 TRAIN_IMDB voc_2007_trainval+voc_2012_trainval mkdir p output/${NET}/${TRAIN_IMDB} cd output/${NET}/${TRAIN_IMDB} ln s ../../../data/voc_2007_trainval+voc_2012_trainval ./default cd ../../.. 3. Demo for testing on custom images Shell at repository root GPU_ID 0 CUDA_VISIBLE_DEVICES ${GPU_ID} ./tools/demo.py Note : Resnet101 testing probably requires several gigabytes of memory, so if you encounter memory capacity issues, please install it with CPU support only. Refer to Issue 25 . 4. Test with pre trained Resnet101 models Shell GPU_ID 0 ./experiments/scripts/test_faster_rcnn.sh $GPU_ID pascal_voc_0712 res101 Note : If you cannot get the reported numbers (79.8 on my side), then probably the NMS function is compiled improperly, refer to Issue 5 . Train your own model 1. Download pre trained models and weights. The current code support VGG16 and Resnet V1 models. Pre trained models are provided by pytorch vgg and pytorch resnet (the ones with caffe in the name), you can download the pre trained models and set them in the data/imagenet_weights folder. For example for VGG16 model, you can set up like: Shell mkdir p data/imagenet_weights cd data/imagenet_weights python open python in terminal and run the following Python code Python import torch from torch.utils.model_zoo import load_url from torchvision import models sd load_url( sd 'classifier.0.weight' sd 'classifier.1.weight' sd 'classifier.0.bias' sd 'classifier.1.bias' del sd 'classifier.1.weight' del sd 'classifier.1.bias' sd 'classifier.3.weight' sd 'classifier.4.weight' sd 'classifier.3.bias' sd 'classifier.4.bias' del sd 'classifier.4.weight' del sd 'classifier.4.bias' torch.save(sd, vgg16.pth ) Shell cd ../.. For Resnet101, you can set up like: Shell mkdir p data/imagenet_weights cd data/imagenet_weights download from my gdrive (link in pytorch resnet) mv resnet101 caffe.pth res101.pth cd ../.. 2. Train (and test, evaluation) Shell ./experiments/scripts/train_faster_rcnn.sh GPU_ID DATASET NET GPU_ID is the GPU you want to test on NET in {vgg16, res50, res101, res152} is the network arch to use DATASET {pascal_voc, pascal_voc_0712, coco} is defined in train_faster_rcnn.sh Examples: ./experiments/scripts/train_faster_rcnn.sh 0 pascal_voc vgg16 ./experiments/scripts/train_faster_rcnn.sh 1 coco res101 Note : Please double check you have deleted soft link to the pre trained models before training. If you find NaNs during training, please refer to Issue 86 . Also if you want to have multi gpu support, check out Issue 121 . 3. Visualization with Tensorboard Shell tensorboard logdir tensorboard/vgg16/voc_2007_trainval/ port 7001 & tensorboard logdir tensorboard/vgg16/coco_2014_train+coco_2014_valminusminival/ port 7002 & 4. Test and evaluate Shell ./experiments/scripts/test_faster_rcnn.sh GPU_ID DATASET NET GPU_ID is the GPU you want to test on NET in {vgg16, res50, res101, res152} is the network arch to use DATASET {pascal_voc, pascal_voc_0712, coco} is defined in test_faster_rcnn.sh Examples: ./experiments/scripts/test_faster_rcnn.sh 0 pascal_voc vgg16 ./experiments/scripts/test_faster_rcnn.sh 1 coco res101 5. You can use tools/reval.sh for re evaluation By default, trained networks are saved under: output/ NET / DATASET /default/ Test outputs are saved under: output/ NET / DATASET /default/ SNAPSHOT / Tensorboard information for train and validation is saved under: tensorboard/ NET / DATASET /default/ tensorboard/ NET / DATASET /default_val/ The default number of training iterations is kept the same to the original faster RCNN for VOC 2007, however Xinlei finds it is beneficial to train longer (see report for COCO), probably due to the fact that the image batch size is one. For VOC 07+12 we switch to a 80k/110k schedule following R FCN . Also note that due to the nondeterministic nature of the current implementation, the performance can vary a bit, but in general it should be within 1% of the reported numbers for VOC, and 0.2% of the reported numbers for COCO. Suggestions/Contributions are welcome. Citation If you find this implementation or the analysis conducted in our report helpful, please consider citing: @article{chen17implementation, Author {Xinlei Chen and Abhinav Gupta}, Title {An Implementation of Faster RCNN with Study for Region Sampling}, Journal {arXiv preprint arXiv:1702.02138}, Year {2017} } For convenience, here is the faster RCNN citation: @inproceedings{renNIPS15fasterrcnn, Author {Shaoqing Ren and Kaiming He and Ross Girshick and Jian Sun}, Title {Faster {R CNN}: Towards Real Time Object Detection with Region Proposal Networks}, Booktitle {Advances in Neural Information Processing Systems ({NIPS})}, Year {2015} } Detailed numbers from COCO server (not supported) All the models are trained on COCO 2014 trainval35k . VGG16 COCO 2015 test dev (900k/1190k): Average Precision (AP) @ IoU 0.50:0.95 area all maxDets 100 0.297 Average Precision (AP) @ IoU 0.50 area all maxDets 100 0.504 Average Precision (AP) @ IoU 0.75 area all maxDets 100 0.312 Average Precision (AP) @ IoU 0.50:0.95 area small maxDets 100 0.128 Average Precision (AP) @ IoU 0.50:0.95 area medium maxDets 100 0.325 Average Precision (AP) @ IoU 0.50:0.95 area large maxDets 100 0.421 Average Recall (AR) @ IoU 0.50:0.95 area all maxDets 1 0.272 Average Recall (AR) @ IoU 0.50:0.95 area all maxDets 10 0.399 Average Recall (AR) @ IoU 0.50:0.95 area all maxDets 100 0.409 Average Recall (AR) @ IoU 0.50:0.95 area small maxDets 100 0.187 Average Recall (AR) @ IoU 0.50:0.95 area medium maxDets 100 0.451 Average Recall (AR) @ IoU 0.50:0.95 area large maxDets 100 0.591 VGG16 COCO 2015 test std (900k/1190k): Average Precision (AP) @ IoU 0.50:0.95 area all maxDets 100 0.295 Average Precision (AP) @ IoU 0.50 area all maxDets 100 0.501 Average Precision (AP) @ IoU 0.75 area all maxDets 100 0.312 Average Precision (AP) @ IoU 0.50:0.95 area small maxDets 100 0.119 Average Precision (AP) @ IoU 0.50:0.95 area medium maxDets 100 0.327 Average Precision (AP) @ IoU 0.50:0.95 area large maxDets 100 0.418 Average Recall (AR) @ IoU 0.50:0.95 area all maxDets 1 0.273 Average Recall (AR) @ IoU 0.50:0.95 area all maxDets 10 0.400 Average Recall (AR) @ IoU 0.50:0.95 area all maxDets 100 0.409 Average Recall (AR) @ IoU 0.50:0.95 area small maxDets 100 0.179 Average Recall (AR) @ IoU 0.50:0.95 area medium maxDets 100 0.455 Average Recall (AR) @ IoU 0.50:0.95 area large maxDets 100 0.586",Object Detection,Object Detection 2800,Computer Vision,Computer Vision,Computer Vision,"SSD: Single Shot MultiBox Object Detector, in PyTorch Table of Contents Introduction Installation Datasets Train Test Introduction This is the SSD model based on project by Max DeGroot . I corrected some bugs in the code and successfully run the code on GPUs at Google Cloud. SSD (Single Shot MultiBox Object Detector) is able to detect objects in an image with bounding boxes. The method is faster than faster RCNN and mask RCNN and still yield a good accuracy. Installation Install PyTorch by selecting your environment on the website and running the appropriate command. Clone this repository. Note: We currently only support Python 3+. Then download the dataset by following the instructions ( datasets) below. We support Visdom for real time loss visualization during training! To use Visdom in the browser: Shell First install Python server and client pip install visdom Start the server (probably in a screen or tmux) python m visdom.server Then (during training) navigate to (see the Train section below for training details). Note: For training, we currently support VOC , and aim to add and COCO ImageNet support in the future. Datasets To make things easy, we provide bash scripts to handle the dataset downloads and setup for you. We also provide simple dataset loaders that inherit torch.utils.data.Dataset , making them fully compatible with the torchvision.datasets API . VOC Dataset PASCAL VOC: Visual Object Classes Download VOC2007 trainval & test Shell git clone navigate to the home directory of SSD model, dataset will be downloaded into data folder cd SSD_resnet_pytorch specify a directory for dataset to be downloaded into, else default is /data/ sh data/scripts/VOC2007.sh Download VOC2012 trainval Shell specify a directory for dataset to be downloaded into, else default is /data/ sh data/scripts/VOC2012.sh COCO(not fully implemented yet) Microsoft COCO: Common Objects in Context Download COCO 2014 Shell specify a directory for dataset to be downloaded into, else default is /data/ sh data/scripts/COCO2014.sh Training SSD First download the fc reduced VGG 16 PyTorch base network weights at: By default, we assume you have downloaded the file in the ssd.pytorch/weights dir: Shell cd weights wget adjust the keys in the weights file to fit for current model python3 vggweights.py cd .. To train SSD using the train script simply specify the parameters listed in train.py as a flag or manually change them. Shell use vgg python3 train.py If use resNet python3 train.py model 'resnet' basenet 'resnet50.pth' if you don't want the training to stop after you log out nohup python3 u train.py model 'resnet' basenet 'resnet50.pth' > r1.log &1 Note: For training, an NVIDIA GPU is strongly recommended for speed. It takes about two days to iterate over 120000x24 images for using Tesla K80 GPU. resNet50 takes a little bit longer than VGG16. I guess the time would be within one day, if you use Tesla P4 or P100. For instructions on Visdom usage/installation, see the Installation section. You can pick up training from a checkpoint by specifying the path as one of the training parameters (again, see train.py for options) Test Use a pre trained SSD network for detection Download a pre trained network We are trying to provide PyTorch state_dicts (dict of weight tensors) of the latest SSD model definitions trained on different datasets. Currently, we provide the following PyTorch models: SSD300 trained on VOC0712 (newest PyTorch weights) SSD300 trained on VOC0712 (original Caffe weights) Shell cd weights wget adjust the keys in the weights file to fit for current model python3 ssdweights.py Test and evaluate mean AP (average precision) To test a trained network: Shell use vgg python3 test.py If use resNet python3 test.py model 'resnet' trained_model 'weights/ssd300_resnet.pth' Currently, we got mAP 86% for VGG16 and %67 for resNet50. Display images Shell use vgg python3 demo.py The output images are shown in demo folder ! test example 1 (demo/output72.png) ! test example 2 (demo/output1229.png) References Wei Liu, et al. SSD: Single Shot MultiBox Detector. ECCV2016 ( ). SSD model in PyTorch by Max DeGroot Original Implementation (CAFFE) A huge thank you to Alex Koltun and his team at Webyclip (webyclip.com) for their help in finishing the data augmentation portion. A list of other great SSD ports that were sources of inspiration (especially the Chainer repo): Chainer , Keras , MXNet , Tensorflow",Object Detection,Object Detection 2803,Computer Vision,Computer Vision,Computer Vision,"CAFFE for YOLO Reference > You Only Look Once: Unified, Real Time Object detection > > Usage Data preparation Shell cd data/yolo ln s /your/path/to/VOCdevkit/ . python ./get_list.py change related path in script convert.sh ./convert.sh Train Shell cd examples/yolo change related path in script train.sh mkdir models nohup ./train.sh & Test Shell if everything goes well, the map of gnet_yolo_iter_32000.caffemodel may reach 56. cd examples/yolo ./test.sh model_path gpu_id The model is here (link: password: kvee)",Object Detection,Object Detection 2809,Computer Vision,Computer Vision,Computer Vision,"tf faster rcnn A Tensorflow implementation of faster RCNN detection framework by Xinlei Chen (xinleic@cs.cmu.edu). This repository is based on the python Caffe implementation of faster RCNN available here . Note : Several minor modifications are made when reimplementing the framework, which give potential improvements. For details about the modifications and ablative analysis, please refer to the technical report An Implementation of Faster RCNN with Study for Region Sampling . If you are seeking to reproduce the results in the original paper, please use the official code or maybe the semi official code . For details about the faster RCNN architecture please refer to the paper Faster R CNN: Towards Real Time Object Detection with Region Proposal Networks . Detection Performance The current code supports VGG16 , Resnet V1 and Mobilenet V1 models. We mainly tested it on plain VGG16 and Resnet101 (thank you @philokey!) architecture. As the baseline, we report numbers using a single model on a single convolution layer, so no multi scale, no multi stage bounding box regression, no skip connection, no extra input is used. The only data augmentation technique is left right flipping during training following the original Faster RCNN. All models are released. With VGG16 ( conv5_3 ): Train on VOC 2007 trainval and test on VOC 2007 test, 71.2 . Train on VOC 2007+2012 trainval and test on VOC 2007 test ( R FCN schedule), 75.3 . Train on COCO 2014 trainval35k and test on minival (900k/1190k), 29.5 . With Resnet101 (last conv4 ): Train on VOC 2007 trainval and test on VOC 2007 test, 75.2 . Train on VOC 2007+2012 trainval and test on VOC 2007 test (R FCN schedule), 79.3 . Train on COCO 2014 trainval35k and test on minival (900k/1190k), 34.1 . More Resnets: Train Resnet50 on COCO 2014 trainval35k and test on minival (900k/1190k), 31.6 . Train Resnet152 on COCO 2014 trainval35k and test on minival (900k/1190k), 35.2 . Approximate baseline setup from FPN : Train Resnet50 on COCO 2014 trainval35k and test on minival (900k/1190k), 33.4 . Train Resnet101 on COCO 2014 trainval35k and test on minival (900k/1190k), 36.3 . Train Resnet152 on COCO 2014 trainval35k and test on minival (1000k/1390k), 37.2 . Note : Due to the randomness in GPU training with Tensorflow espeicially for VOC, the best numbers are reported (with 2 3 attempts) here. According to my experience, for COCO you can almost always get a very close number (within 0.2%) despite the randomness. All the numbers are obtained with a different testing scheme without selecting region proposals using non maximal suppression (TEST.MODE top), the default and original testing scheme (TEST.MODE nms) will likely result in slightly worse performance (see report , for COCO it drops 0.X AP). Since we keep the small proposals (\< 16 pixels width/height), our performance is especially good for small objects. For other minor modifications, please check the report . Notable ones include using crop_and_resize , and excluding ground truth boxes in RoIs during training. For COCO, we find the performance improving with more iterations (VGG16 350k/490k: 26.9, 600k/790k: 28.3, 900k/1190k: 29.5), and potentially better performance can be achieved with even more iterations. For Resnets, we fix the first block (total 4) when fine tuning the network, and only use crop_and_resize to resize the RoIs (7x7) without max pool (which I find useless especially for COCO). The final feature maps are average pooled for classification and regression. All batch normalization parameters are fixed. Weight decay is set to Renset101 default 1e 4. Learning rate for biases is not doubled. For approximate FPN baseline setup we simply resize the image with 800 pixels, add 32^2 anchors, and take 1000 proposals during testing. Check out here / here / here for the latest models, including longer COCO VGG16 models and Resnet ones. Additional features Additional features not mentioned in the report are added to make research life easier: Support for train and validation . During training, the validation data will also be tested from time to time to monitor the process and check potential overfitting. Ideally training and validation should be separate, where the model is loaded everytime to test on validation. However I have implemented it in a joint way to save time and GPU memory. Though in the default setup the testing data is used for validation, no special attempts is made to overfit on testing set. Support for resuming training . I tried to store as much information as possible when snapshoting, with the purpose to resume training from the lateset snapshot properly. The meta information includes current image index, permutation of images, and random state of numpy. However, when you resume training the random seed for tensorflow will be reset (not sure how to save the random state of tensorflow now), so it will result in a difference. Note that, the current implementation still cannot force the model to behave deterministically even with the random seeds set. Suggestion/solution is welcome and much appreciated. Support for visualization . The current implementation will summarize ground truth detections, statistics of losses, activations and variables during training, and dump it to a separate folder for tensorboard visualization. The computing graph is also saved for debugging. Prerequisites A basic Tensorflow installation. The code follows r1.2 format. If you are using r1.0, please check out the r1.0 branch to fix the slim resnet block issue. If you are using an older version (r0.1 r0.12), please check out the r0.12 branch. While it is not required, for experimenting the original RoI pooling (which requires modification of the C++ code in tensorflow), you can check out my tensorflow fork and look for tf.image.roi_pooling . Python packages you might not have: cython , opencv python , easydict (similar to py faster rcnn ). For easydict make sure you have the right version. I use 1.6. Docker users: Since the recent upgrade, the docker image on docker hub is no longer valid. However, you can still build your own image by using dockerfile located at docker folder (cuda 8 version, as it is required by Tensorflow r1.0.) And make sure following Tensorflow installation to install and use nvidia docker Last, after launching the container, you have to build the Cython modules within the running container. Installation 1. Clone the repository Shell git clone 2. Update your arch in setup script to match your GPU Shell cd tf faster rcnn/lib Change the GPU architecture ( arch) if necessary vim setup.py GPU model Architecture TitanX (Maxwell/Pascal) sm_52 Grid K520 (AWS g2.2xlarge) sm_30 Tesla K80 (AWS p2.xlarge) sm_37 Note : You are welcome to contribute the settings on your end if you have made the code work properly on other GPUs. 3. Build the Cython modules Shell make clean make cd .. 4. Install the Python COCO API . The code requires the API to access COCO dataset. Shell cd data git clone cd coco/PythonAPI make cd ../../.. Setup data Please follow the instructions of py faster rcnn here to setup VOC and COCO datasets (Part of COCO is done). The steps involve downloading data and optionally creating softlinks in the data folder. Since faster RCNN does not rely on pre computed proposals, it is safe to ignore the steps that setup proposals. If you find it useful, the data/cache folder created on my side is also shared here . Demo and Test with pre trained models 1. Download pre trained model Shell Resnet101 for voc pre trained on 07+12 set ./data/scripts/fetch_faster_rcnn_models.sh Note : if you cannot download the models through the link, or you want to try more models, you can check out the following solutions and optionally update the downloading script: Another server here . Google drive here . 2. Create a folder and a softlink to use the pre trained model Shell NET res101 TRAIN_IMDB voc_2007_trainval+voc_2012_trainval mkdir p output/${NET}/${TRAIN_IMDB} cd output/${NET}/${TRAIN_IMDB} ln s ../../../data/voc_2007_trainval+voc_2012_trainval ./default cd ../../.. 3. Demo for testing on custom images Shell at reposistory root GPU_ID 0 CUDA_VISIBLE_DEVICES ${GPU_ID} ./tools/demo.py Note : Resnet101 testing probably requires several gigabytes of memory, so if you encounter memory capacity issues, please install it with CPU support only. Refer to Issue 25 . 4. Test with pre trained Resnet101 models Shell GPU_ID 0 ./experiments/scripts/test_faster_rcnn.sh $GPU_ID pascal_voc_0712 res101 Note : If you cannot get the reported numbers (78.7 on my side), then probabaly the NMS function is compiled improperly, refer to Issue 5 . Train your own model 1. Download pre trained models and weights. The current code support VGG16 and Resnet V1 models. Pre trained models are provided by slim, you can get the pre trained models here and set them in the data/imagenet_weights folder. For example for VGG16 model, you can set up like: Shell mkdir p data/imagenet_weights cd data/imagenet_weights wget v tar xzvf vgg_16_2016_08_28.tar.gz mv vgg_16.ckpt vgg16.ckpt cd ../.. For Resnet101, you can set up like: Shell mkdir p data/imagenet_weights cd data/imagenet_weights wget v tar xzvf resnet_v1_101_2016_08_28.tar.gz mv resnet_v1_101.ckpt res101.ckpt cd ../.. 2. Train (and test, evaluation) Shell ./experiments/scripts/train_faster_rcnn.sh GPU_ID DATASET NET GPU_ID is the GPU you want to test on NET in {vgg16, res50, res101, res152} is the network arch to use DATASET {pascal_voc, pascal_voc_0712, coco} is defined in train_faster_rcnn.sh Examples: ./experiments/scripts/train_faster_rcnn.sh 0 pascal_voc vgg16 ./experiments/scripts/train_faster_rcnn.sh 1 coco res101 Note : Please double check you have deleted softlink to the pre trained models before training. If you find NaNs during training, please refer to Issue 86 . 3. Visualization with Tensorboard Shell tensorboard logdir tensorboard/vgg16/voc_2007_trainval/ port 7001 & tensorboard logdir tensorboard/vgg16/coco_2014_train+coco_2014_valminusminival/ port 7002 & 4. Test and evaluate Shell ./experiments/scripts/test_faster_rcnn.sh GPU_ID DATASET NET GPU_ID is the GPU you want to test on NET in {vgg16, res50, res101, res152} is the network arch to use DATASET {pascal_voc, pascal_voc_0712, coco} is defined in test_faster_rcnn.sh Examples: ./experiments/scripts/test_faster_rcnn.sh 0 pascal_voc vgg16 ./experiments/scripts/test_faster_rcnn.sh 1 coco res101 5. You can use tools/reval.sh for re evaluation By default, trained networks are saved under: output/ NET / DATASET /default/ Test outputs are saved under: output/ NET / DATASET /default/ SNAPSHOT / Tensorboard information for train and validation is saved under: tensorboard/ NET / DATASET /default/ tensorboard/ NET / DATASET /default_val/ The default number of training iterations is kept the same to the original faster RCNN for VOC 2007, however I find it is beneficial to train longer (see report for COCO), probably due to the fact that the image batch size is one. For VOC 07+12 we switch to a 80k/110k schedule following R FCN . Also note that due to the nondeterministic nature of the current implementation, the performance can vary a bit, but in general it should be within 1% of the reported numbers for VOC, and 0.2% of the reported numbers for COCO. Suggestions/Contributions are welcome. Citation If you find this implementation or the analysis conducted in our report helpful, please consider citing: @article{chen17implementation, Author {Xinlei Chen and Abhinav Gupta}, Title {An Implementation of Faster RCNN with Study for Region Sampling}, Journal {arXiv preprint arXiv:1702.02138}, Year {2017} } For convenience, here is the faster RCNN citation: @inproceedings{renNIPS15fasterrcnn, Author {Shaoqing Ren and Kaiming He and Ross Girshick and Jian Sun}, Title {Faster {R CNN}: Towards Real Time Object Detection with Region Proposal Networks}, Booktitle {Advances in Neural Information Processing Systems ({NIPS})}, Year {2015} } Detailed numbers from COCO server All the models are trained on COCO 2014 trainval35k . VGG16 COCO 2015 test dev (900k/1190k): Average Precision (AP) @ IoU 0.50:0.95 area all maxDets 100 0.297 Average Precision (AP) @ IoU 0.50 area all maxDets 100 0.504 Average Precision (AP) @ IoU 0.75 area all maxDets 100 0.312 Average Precision (AP) @ IoU 0.50:0.95 area small maxDets 100 0.128 Average Precision (AP) @ IoU 0.50:0.95 area medium maxDets 100 0.325 Average Precision (AP) @ IoU 0.50:0.95 area large maxDets 100 0.421 Average Recall (AR) @ IoU 0.50:0.95 area all maxDets 1 0.272 Average Recall (AR) @ IoU 0.50:0.95 area all maxDets 10 0.399 Average Recall (AR) @ IoU 0.50:0.95 area all maxDets 100 0.409 Average Recall (AR) @ IoU 0.50:0.95 area small maxDets 100 0.187 Average Recall (AR) @ IoU 0.50:0.95 area medium maxDets 100 0.451 Average Recall (AR) @ IoU 0.50:0.95 area large maxDets 100 0.591 VGG16 COCO 2015 test std (900k/1190k): Average Precision (AP) @ IoU 0.50:0.95 area all maxDets 100 0.295 Average Precision (AP) @ IoU 0.50 area all maxDets 100 0.501 Average Precision (AP) @ IoU 0.75 area all maxDets 100 0.312 Average Precision (AP) @ IoU 0.50:0.95 area small maxDets 100 0.119 Average Precision (AP) @ IoU 0.50:0.95 area medium maxDets 100 0.327 Average Precision (AP) @ IoU 0.50:0.95 area large maxDets 100 0.418 Average Recall (AR) @ IoU 0.50:0.95 area all maxDets 1 0.273 Average Recall (AR) @ IoU 0.50:0.95 area all maxDets 10 0.400 Average Recall (AR) @ IoU 0.50:0.95 area all maxDets 100 0.409 Average Recall (AR) @ IoU 0.50:0.95 area small maxDets 100 0.179 Average Recall (AR) @ IoU 0.50:0.95 area medium maxDets 100 0.455 Average Recall (AR) @ IoU 0.50:0.95 area large maxDets 100 0.586",Object Detection,Object Detection 2810,Computer Vision,Computer Vision,Computer Vision,"tf faster rcnn A Tensorflow implementation of faster RCNN detection framework by Xinlei Chen (xinleic@cs.cmu.edu). This repository is based on the python Caffe implementation of faster RCNN available here . Note : Several minor modifications are made when reimplementing the framework, which give potential improvements. For details about the modifications and ablative analysis, please refer to the technical report An Implementation of Faster RCNN with Study for Region Sampling . If you are seeking to reproduce the results in the original paper, please use the official code or maybe the semi official code . For details about the faster RCNN architecture please refer to the paper Faster R CNN: Towards Real Time Object Detection with Region Proposal Networks . Detection Performance The current code supports VGG16 , Resnet V1 and Mobilenet V1 models. We mainly tested it on plain VGG16 and Resnet101 (thank you @philokey!) architecture. As the baseline, we report numbers using a single model on a single convolution layer, so no multi scale, no multi stage bounding box regression, no skip connection, no extra input is used. The only data augmentation technique is left right flipping during training following the original Faster RCNN. All models are released. With VGG16 ( conv5_3 ): Train on VOC 2007 trainval and test on VOC 2007 test, 71.2 . Train on VOC 2007+2012 trainval and test on VOC 2007 test ( R FCN schedule), 75.3 . Train on COCO 2014 trainval35k and test on minival (900k/1190k), 29.5 . With Resnet101 (last conv4 ): Train on VOC 2007 trainval and test on VOC 2007 test, 75.2 . Train on VOC 2007+2012 trainval and test on VOC 2007 test (R FCN schedule), 79.3 . Train on COCO 2014 trainval35k and test on minival (900k/1190k), 34.1 . More Resnets: Train Resnet50 on COCO 2014 trainval35k and test on minival (900k/1190k), 31.6 . Train Resnet152 on COCO 2014 trainval35k and test on minival (900k/1190k), 35.2 . Approximate baseline setup from FPN : Train Resnet50 on COCO 2014 trainval35k and test on minival (900k/1190k), 33.4 . Train Resnet101 on COCO 2014 trainval35k and test on minival (900k/1190k), 36.3 . Train Resnet152 on COCO 2014 trainval35k and test on minival (1000k/1390k), 37.2 . Note : Due to the randomness in GPU training with Tensorflow espeicially for VOC, the best numbers are reported (with 2 3 attempts) here. According to my experience, for COCO you can almost always get a very close number (within 0.2%) despite the randomness. All the numbers are obtained with a different testing scheme without selecting region proposals using non maximal suppression (TEST.MODE top), the default and original testing scheme (TEST.MODE nms) will likely result in slightly worse performance (see report , for COCO it drops 0.X AP). Since we keep the small proposals (\< 16 pixels width/height), our performance is especially good for small objects. For other minor modifications, please check the report . Notable ones include using crop_and_resize , and excluding ground truth boxes in RoIs during training. For COCO, we find the performance improving with more iterations (VGG16 350k/490k: 26.9, 600k/790k: 28.3, 900k/1190k: 29.5), and potentially better performance can be achieved with even more iterations. For Resnets, we fix the first block (total 4) when fine tuning the network, and only use crop_and_resize to resize the RoIs (7x7) without max pool (which I find useless especially for COCO). The final feature maps are average pooled for classification and regression. All batch normalization parameters are fixed. Weight decay is set to Renset101 default 1e 4. Learning rate for biases is not doubled. For approximate FPN baseline setup we simply resize the image with 800 pixels, add 32^2 anchors, and take 1000 proposals during testing. Check out here / here / here for the latest models, including longer COCO VGG16 models and Resnet ones. Additional features Additional features not mentioned in the report are added to make research life easier: Support for train and validation . During training, the validation data will also be tested from time to time to monitor the process and check potential overfitting. Ideally training and validation should be separate, where the model is loaded everytime to test on validation. However I have implemented it in a joint way to save time and GPU memory. Though in the default setup the testing data is used for validation, no special attempts is made to overfit on testing set. Support for resuming training . I tried to store as much information as possible when snapshoting, with the purpose to resume training from the lateset snapshot properly. The meta information includes current image index, permutation of images, and random state of numpy. However, when you resume training the random seed for tensorflow will be reset (not sure how to save the random state of tensorflow now), so it will result in a difference. Note that, the current implementation still cannot force the model to behave deterministically even with the random seeds set. Suggestion/solution is welcome and much appreciated. Support for visualization . The current implementation will summarize ground truth detections, statistics of losses, activations and variables during training, and dump it to a separate folder for tensorboard visualization. The computing graph is also saved for debugging. Prerequisites A basic Tensorflow installation. The code follows r1.2 format. If you are using r1.0, please check out the r1.0 branch to fix the slim resnet block issue. If you are using an older version (r0.1 r0.12), please check out the r0.12 branch. While it is not required, for experimenting the original RoI pooling (which requires modification of the C++ code in tensorflow), you can check out my tensorflow fork and look for tf.image.roi_pooling . Python packages you might not have: cython , opencv python , easydict (similar to py faster rcnn ). For easydict make sure you have the right version. I use 1.6. Docker users: Since the recent upgrade, the docker image on docker hub is no longer valid. However, you can still build your own image by using dockerfile located at docker folder (cuda 8 version, as it is required by Tensorflow r1.0.) And make sure following Tensorflow installation to install and use nvidia docker Last, after launching the container, you have to build the Cython modules within the running container. Installation 1. Clone the repository Shell git clone 2. Update your arch in setup script to match your GPU Shell cd tf faster rcnn/lib Change the GPU architecture ( arch) if necessary vim setup.py GPU model Architecture TitanX (Maxwell/Pascal) sm_52 Grid K520 (AWS g2.2xlarge) sm_30 Tesla K80 (AWS p2.xlarge) sm_37 Note : You are welcome to contribute the settings on your end if you have made the code work properly on other GPUs. 3. Build the Cython modules Shell make clean make cd .. 4. Install the Python COCO API . The code requires the API to access COCO dataset. Shell cd data git clone cd coco/PythonAPI make cd ../../.. Setup data Please follow the instructions of py faster rcnn here to setup VOC and COCO datasets (Part of COCO is done). The steps involve downloading data and optionally creating softlinks in the data folder. Since faster RCNN does not rely on pre computed proposals, it is safe to ignore the steps that setup proposals. If you find it useful, the data/cache folder created on my side is also shared here . Demo and Test with pre trained models 1. Download pre trained model Shell Resnet101 for voc pre trained on 07+12 set ./data/scripts/fetch_faster_rcnn_models.sh Note : if you cannot download the models through the link, or you want to try more models, you can check out the following solutions and optionally update the downloading script: Another server here . Google drive here . 2. Create a folder and a softlink to use the pre trained model Shell NET res101 TRAIN_IMDB voc_2007_trainval+voc_2012_trainval mkdir p output/${NET}/${TRAIN_IMDB} cd output/${NET}/${TRAIN_IMDB} ln s ../../../data/voc_2007_trainval+voc_2012_trainval ./default cd ../../.. 3. Demo for testing on custom images Shell at reposistory root GPU_ID 0 CUDA_VISIBLE_DEVICES ${GPU_ID} ./tools/demo.py Note : Resnet101 testing probably requires several gigabytes of memory, so if you encounter memory capacity issues, please install it with CPU support only. Refer to Issue 25 . 4. Test with pre trained Resnet101 models Shell GPU_ID 0 ./experiments/scripts/test_faster_rcnn.sh $GPU_ID pascal_voc_0712 res101 Note : If you cannot get the reported numbers (78.7 on my side), then probabaly the NMS function is compiled improperly, refer to Issue 5 . Train your own model 1. Download pre trained models and weights. The current code support VGG16 and Resnet V1 models. Pre trained models are provided by slim, you can get the pre trained models here and set them in the data/imagenet_weights folder. For example for VGG16 model, you can set up like: Shell mkdir p data/imagenet_weights cd data/imagenet_weights wget v tar xzvf vgg_16_2016_08_28.tar.gz mv vgg_16.ckpt vgg16.ckpt cd ../.. For Resnet101, you can set up like: Shell mkdir p data/imagenet_weights cd data/imagenet_weights wget v tar xzvf resnet_v1_101_2016_08_28.tar.gz mv resnet_v1_101.ckpt res101.ckpt cd ../.. 2. Train (and test, evaluation) Shell ./experiments/scripts/train_faster_rcnn.sh GPU_ID DATASET NET GPU_ID is the GPU you want to test on NET in {vgg16, res50, res101, res152} is the network arch to use DATASET {pascal_voc, pascal_voc_0712, coco} is defined in train_faster_rcnn.sh Examples: ./experiments/scripts/train_faster_rcnn.sh 0 pascal_voc vgg16 ./experiments/scripts/train_faster_rcnn.sh 1 coco res101 Note : Please double check you have deleted softlink to the pre trained models before training. If you find NaNs during training, please refer to Issue 86 . 3. Visualization with Tensorboard Shell tensorboard logdir tensorboard/vgg16/voc_2007_trainval/ port 7001 & tensorboard logdir tensorboard/vgg16/coco_2014_train+coco_2014_valminusminival/ port 7002 & 4. Test and evaluate Shell ./experiments/scripts/test_faster_rcnn.sh GPU_ID DATASET NET GPU_ID is the GPU you want to test on NET in {vgg16, res50, res101, res152} is the network arch to use DATASET {pascal_voc, pascal_voc_0712, coco} is defined in test_faster_rcnn.sh Examples: ./experiments/scripts/test_faster_rcnn.sh 0 pascal_voc vgg16 ./experiments/scripts/test_faster_rcnn.sh 1 coco res101 5. You can use tools/reval.sh for re evaluation By default, trained networks are saved under: output/ NET / DATASET /default/ Test outputs are saved under: output/ NET / DATASET /default/ SNAPSHOT / Tensorboard information for train and validation is saved under: tensorboard/ NET / DATASET /default/ tensorboard/ NET / DATASET /default_val/ The default number of training iterations is kept the same to the original faster RCNN for VOC 2007, however I find it is beneficial to train longer (see report for COCO), probably due to the fact that the image batch size is one. For VOC 07+12 we switch to a 80k/110k schedule following R FCN . Also note that due to the nondeterministic nature of the current implementation, the performance can vary a bit, but in general it should be within 1% of the reported numbers for VOC, and 0.2% of the reported numbers for COCO. Suggestions/Contributions are welcome. Citation If you find this implementation or the analysis conducted in our report helpful, please consider citing: @article{chen17implementation, Author {Xinlei Chen and Abhinav Gupta}, Title {An Implementation of Faster RCNN with Study for Region Sampling}, Journal {arXiv preprint arXiv:1702.02138}, Year {2017} } For convenience, here is the faster RCNN citation: @inproceedings{renNIPS15fasterrcnn, Author {Shaoqing Ren and Kaiming He and Ross Girshick and Jian Sun}, Title {Faster {R CNN}: Towards Real Time Object Detection with Region Proposal Networks}, Booktitle {Advances in Neural Information Processing Systems ({NIPS})}, Year {2015} } Detailed numbers from COCO server All the models are trained on COCO 2014 trainval35k . VGG16 COCO 2015 test dev (900k/1190k): Average Precision (AP) @ IoU 0.50:0.95 area all maxDets 100 0.297 Average Precision (AP) @ IoU 0.50 area all maxDets 100 0.504 Average Precision (AP) @ IoU 0.75 area all maxDets 100 0.312 Average Precision (AP) @ IoU 0.50:0.95 area small maxDets 100 0.128 Average Precision (AP) @ IoU 0.50:0.95 area medium maxDets 100 0.325 Average Precision (AP) @ IoU 0.50:0.95 area large maxDets 100 0.421 Average Recall (AR) @ IoU 0.50:0.95 area all maxDets 1 0.272 Average Recall (AR) @ IoU 0.50:0.95 area all maxDets 10 0.399 Average Recall (AR) @ IoU 0.50:0.95 area all maxDets 100 0.409 Average Recall (AR) @ IoU 0.50:0.95 area small maxDets 100 0.187 Average Recall (AR) @ IoU 0.50:0.95 area medium maxDets 100 0.451 Average Recall (AR) @ IoU 0.50:0.95 area large maxDets 100 0.591 VGG16 COCO 2015 test std (900k/1190k): Average Precision (AP) @ IoU 0.50:0.95 area all maxDets 100 0.295 Average Precision (AP) @ IoU 0.50 area all maxDets 100 0.501 Average Precision (AP) @ IoU 0.75 area all maxDets 100 0.312 Average Precision (AP) @ IoU 0.50:0.95 area small maxDets 100 0.119 Average Precision (AP) @ IoU 0.50:0.95 area medium maxDets 100 0.327 Average Precision (AP) @ IoU 0.50:0.95 area large maxDets 100 0.418 Average Recall (AR) @ IoU 0.50:0.95 area all maxDets 1 0.273 Average Recall (AR) @ IoU 0.50:0.95 area all maxDets 10 0.400 Average Recall (AR) @ IoU 0.50:0.95 area all maxDets 100 0.409 Average Recall (AR) @ IoU 0.50:0.95 area small maxDets 100 0.179 Average Recall (AR) @ IoU 0.50:0.95 area medium maxDets 100 0.455 Average Recall (AR) @ IoU 0.50:0.95 area large maxDets 100 0.586",Object Detection,Object Detection 2813,Computer Vision,Computer Vision,Computer Vision,"keras spp Spatial pyramid pooling layers for keras, based on . This code requires Keras version 2.0 or greater. ! spp (Image credit: Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition, K. He, X. Zhang, S. Ren, J. Sun) Three types of pooling layers are currently available: SpatialPyramidPooling: apply the pooling procedure on the entire image, given an image batch. This is especially useful if the image input can have varying dimensions, but needs to be fed to a fully connected layer. For example, this trains a network on images of both 32x32 and 64x64 size: import numpy as np from keras.models import Sequential from keras.layers import Convolution2D, Activation, MaxPooling2D, Dense from spp.SpatialPyramidPooling import SpatialPyramidPooling batch_size 64 num_channels 3 num_classes 10 model Sequential() uses theano ordering. Note that we leave the image size as None to allow multiple image sizes model.add(Convolution2D(32, 3, 3, border_mode 'same', input_shape (3, None, None))) model.add(Activation('relu')) model.add(Convolution2D(32, 3, 3)) model.add(Activation('relu')) model.add(MaxPooling2D(pool_size (2, 2))) model.add(Convolution2D(64, 3, 3, border_mode 'same')) model.add(Activation('relu')) model.add(Convolution2D(64, 3, 3)) model.add(Activation('relu')) model.add(SpatialPyramidPooling( 1, 2, 4 )) model.add(Dense(num_classes)) model.add(Activation('softmax')) model.compile(loss 'categorical_crossentropy', optimizer 'sgd') train on 64x64x3 images model.fit(np.random.rand(batch_size, num_channels, 64, 64), np.zeros((batch_size, num_classes))) train on 32x32x3 images model.fit(np.random.rand(batch_size, num_channels, 32, 32), np.zeros((batch_size, num_classes))) RoiPooling: extract multiple rois from a single image. In roi pooling, the spatial pyramid pooling is applied at the specified subregions of the image. This is useful for object detection, and is used in fast RCNN and faster RCNN. Note that the batch_size is limited to 1 currently. pooling_regions 1, 2, 4 num_rois 2 num_channels 3 if dim_ordering 'tf': in_img Input(shape (None, None, num_channels)) elif dim_ordering 'th': in_img Input(shape (num_channels, None, None)) in_roi Input(shape (num_rois, 4)) out_roi_pool RoiPooling(pooling_regions, num_rois)( in_img, in_roi ) model Model( in_img, in_roi , out_roi_pool) if dim_ordering 'th': X_img np.random.rand(1, num_channels, img_size, img_size) row_length float(X_img.shape 2 ) / i for i in pooling_regions col_length float(X_img.shape 3 ) / i for i in pooling_regions elif dim_ordering 'tf': X_img np.random.rand(1, img_size, img_size, num_channels) row_length float(X_img.shape 1 ) / i for i in pooling_regions col_length float(X_img.shape 2 ) / i for i in pooling_regions X_roi np.array( 0, 0, img_size / 1, img_size / 1 , 0, 0, img_size / 2, img_size / 2 ) X_roi np.reshape(X_roi, (1, num_rois, 4)) Y model.predict( X_img, X_roi ) RoiPoolingConv: like RoiPooling, but maintains spatial information. Thank you to @jlhbaseball15 for his contribution",Object Detection,Object Detection 2814,Computer Vision,Computer Vision,Computer Vision,"pytorch retinanet ! img3 ! img5 Pytorch implementation of RetinaNet object detection as described in Focal Loss for Dense Object Detection by Tsung Yi Lin, Priya Goyal, Ross Girshick, Kaiming He and Piotr Dollár. This implementation is primarily designed to be easy to read and simple to modify. Results Currently, this repo achieves 33.7% mAP at 600px resolution with a Resnet 50 backbone. The published result is 34.0% mAP. The difference is likely due to the use of Adam optimizer instead of SGD with weight decay. Installation 1) Clone this repo 2) Install the required packages: apt get install tk dev python tk 3) Install the python packages: pip install cffi pip install pandas pip install pycocotools pip install cython pip install opencv python pip install requests 4) Build the NMS extension. cd pytorch retinanet/lib bash build.sh cd ../ Note that you may have to edit line 14 of build.sh if you want to change which version of python you are building the extension for. Training The network can be trained using the train.py script. Currently, two dataloaders are available: COCO and CSV. For training on coco, use python train.py dataset coco coco_path ../coco depth 50 For training using a custom dataset, with annotations in CSV format (see below), use python train.py dataset csv csv_train csv_classes csv_val Note that the csv_val argument is optional, in which case no validation will be performed. Pre trained model A pre trained model is available at: (this is a pytorch state dict) (this is a pytorch model serialized via torch.save() ) The state dict model can be loaded using: retinanet model.resnet50(num_classes dataset_train.num_classes(),) retinanet.load_state_dict(torch.load(PATH_TO_WEIGHTS)) The pytorch model can be loaded directly using: retinanet torch.load(PATH_TO_MODEL) Visualization To visualize the network detection, use visualize.py : python visualize.py dataset coco coco_path ../coco model This will visualize bounding boxes on the validation set. To visualise with a CSV dataset, use: python visualize.py dataset csv csv_classes csv_val model Model The retinanet model uses a resnet backbone. You can set the depth of the resnet model using the depth argument. Depth must be one of 18, 34, 50, 101 or 152. Note that deeper models are more accurate but are slower and use more memory. CSV datasets The CSVGenerator provides an easy way to define your own datasets. It uses two CSV files: one file containing annotations and one file containing a class name to ID mapping. Annotations format The CSV file with annotations should contain one annotation per line. Images with multiple bounding boxes should use one row per bounding box. Note that indexing for pixel values starts at 0. The expected format of each line is: path/to/image.jpg,x1,y1,x2,y2,class_name Some images may not contain any labeled objects. To add these images to the dataset as negative examples, add an annotation where x1 , y1 , x2 , y2 and class_name are all empty: path/to/image.jpg,,,,, A full example: /data/imgs/img_001.jpg,837,346,981,456,cow /data/imgs/img_002.jpg,215,312,279,391,cat /data/imgs/img_002.jpg,22,5,89,84,bird /data/imgs/img_003.jpg,,,,, This defines a dataset with 3 images. img_001.jpg contains a cow. img_002.jpg contains a cat and a bird. img_003.jpg contains no interesting objects/animals. Class mapping format The class name to ID mapping file should contain one mapping per line. Each line should use the following format: class_name,id Indexing for classes starts at 0. Do not include a background class as it is implicit. For example: cow,0 cat,1 bird,2 Acknowledgements Significant amounts of code are borrowed from the keras retinanet implementation The NMS module used is from the pytorch faster rcnn implementation Examples ! img1 ! img2 ! img4 ! img6 ! img7 ! img8",Object Detection,Object Detection 2817,Computer Vision,Computer Vision,Computer Vision,"Bounding Box Regression with Uncertainty for Accurate Object Detection CVPR 2019 Yihui He , Chenchen Zhu , Jianren Wang , Marios Savvides , Xiangyu Zhang , Carnegie Mellon University & Megvii Inc. Large scale object detection datasets (e.g., MS COCO) try to define the ground truth bounding boxes as clear as possible. However, we observe that ambiguities are still introduced when labeling the bounding boxes. In this paper, we propose a novel bounding box regression loss for learning bounding box transformation and localization variance together. Our loss greatly improves the localization accuracies of various architectures with nearly no additional computation. The learned localization variance allows us to merge neighboring bounding boxes during non maximum suppression (NMS), which further improves the localization performance. On MS COCO, we boost the Average Precision (AP) of VGG 16 Faster R CNN from 23.6% to 29.1%. More importantly, for ResNet 50 FPN Mask R CNN, our method improves the AP and AP90 by 1.8% and 6.2% respectively, which significantly outperforms previous state of the art bounding box refinement methods. Citation If you find the code useful in your research, please consider citing: @inproceedings{klloss, title {Bounding Box Regression with Uncertainty for Accurate Object Detection}, author {He, Yihui and Zhu, Chenchen and Wang, Jianren and Savvides, Marios and Zhang, Xiangyu }, booktitle {2019 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, year {2019}, organization {IEEE} } Installation Please find installation instructions for Caffe2 and Detectron in INSTALL.md (INSTALL.md). When installing cocoapi, please use my fork to get AP80 and AP90 scores. Testing Inference without Var Voting (8 GPUs): python2 tools/test_net.py c configs/e2e_faster_rcnn_R 50 FPN_2x.yaml You will get: Average Precision (AP) @ IoU 0.50:0.95 area all maxDets 100 0.385 Average Precision (AP) @ IoU 0.50 area all maxDets 100 0.578 Average Precision (AP) @ IoU 0.75 area all maxDets 100 0.412 Average Precision (AP) @ IoU 0.50:0.95 area small maxDets 100 0.209 Average Precision (AP) @ IoU 0.50:0.95 area medium maxDets 100 0.412 Average Precision (AP) @ IoU 0.50:0.95 area large maxDets 100 0.515 Average Recall (AR) @ IoU 0.50:0.95 area all maxDets 1 0.323 Average Recall (AR) @ IoU 0.50:0.95 area all maxDets 10 0.499 Average Recall (AR) @ IoU 0.50:0.95 area all maxDets 100 0.522 Average Recall (AR) @ IoU 0.50:0.95 area small maxDets 100 0.321 Average Recall (AR) @ IoU 0.50:0.95 area medium maxDets 100 0.553 Average Recall (AR) @ IoU 0.50:0.95 area large maxDets 100 0.680 Average Precision (AP) @ IoU 0.60 area all maxDets 100 0.533 Average Precision (AP) @ IoU 0.70 area all maxDets 100 0.461 Average Precision (AP) @ IoU 0.80 area all maxDets 100 0.350 Average Precision (AP) @ IoU 0.85 area all maxDets 100 0.269 Average Precision (AP) @ IoU 0.90 area all maxDets 100 0.154 Average Precision (AP) @ IoU 0.95 area all maxDets 100 0.032 Inference with Var Voting: python2 tools/test_net.py c configs/e2e_faster_rcnn_R 50 FPN_2x.yaml STD_NMS True You will get: Average Precision (AP) @ IoU 0.50:0.95 area all maxDets 100 0.392 Average Precision (AP) @ IoU 0.50 area all maxDets 100 0.576 Average Precision (AP) @ IoU 0.75 area all maxDets 100 0.425 Average Precision (AP) @ IoU 0.50:0.95 area small maxDets 100 0.212 Average Precision (AP) @ IoU 0.50:0.95 area medium maxDets 100 0.417 Average Precision (AP) @ IoU 0.50:0.95 area large maxDets 100 0.526 Average Recall (AR) @ IoU 0.50:0.95 area all maxDets 1 0.324 Average Recall (AR) @ IoU 0.50:0.95 area all maxDets 10 0.528 Average Recall (AR) @ IoU 0.50:0.95 area all maxDets 100 0.564 Average Recall (AR) @ IoU 0.50:0.95 area small maxDets 100 0.346 Average Recall (AR) @ IoU 0.50:0.95 area medium maxDets 100 0.594 Average Recall (AR) @ IoU 0.50:0.95 area large maxDets 100 0.736 Average Precision (AP) @ IoU 0.60 area all maxDets 100 0.536 Average Precision (AP) @ IoU 0.70 area all maxDets 100 0.472 Average Precision (AP) @ IoU 0.80 area all maxDets 100 0.363 Average Precision (AP) @ IoU 0.85 area all maxDets 100 0.281 Average Precision (AP) @ IoU 0.90 area all maxDets 100 0.165 Average Precision (AP) @ IoU 0.95 area all maxDets 100 0.037 Training python2 tools/train_net.py c configs/e2e_faster_rcnn_R 50 FPN_2x.yaml FAQ Please create a new issue . Detectron Detectron is Facebook AI Research's software system that implements state of the art object detection algorithms, including Mask R CNN . It is written in Python and powered by the Caffe2 deep learning framework. At FAIR, Detectron has enabled numerous research projects, including: Feature Pyramid Networks for Object Detection , Mask R CNN , Detecting and Recognizing Human Object Interactions , Focal Loss for Dense Object Detection , Non local Neural Networks , Learning to Segment Every Thing , Data Distillation: Towards Omni Supervised Learning , DensePose: Dense Human Pose Estimation In The Wild , and Group Normalization . Example Mask R CNN output. Introduction The goal of Detectron is to provide a high quality, high performance codebase for object detection research . It is designed to be flexible in order to support rapid implementation and evaluation of novel research. Detectron includes implementations of the following object detection algorithms: Mask R CNN Marr Prize at ICCV 2017 RetinaNet Best Student Paper Award at ICCV 2017 Faster R CNN RPN Fast R CNN R FCN using the following backbone network architectures: ResNeXt{50,101,152} ResNet{50,101,152} Feature Pyramid Networks (with ResNet/ResNeXt) VGG16 Additional backbone architectures may be easily implemented. For more details about these models, please see References ( references) below. Update 4/2018: Support Group Normalization see GN/README.md (./projects/GN/README.md) License Detectron is released under the Apache 2.0 license . See the NOTICE file for additional details. Citing Detectron If you use Detectron in your research or wish to refer to the baseline results published in the Model Zoo (MODEL_ZOO.md), please use the following BibTeX entry. @misc{Detectron2018, author {Ross Girshick and Ilija Radosavovic and Georgia Gkioxari and Piotr Doll\'{a}r and Kaiming He}, title {Detectron}, howpublished {\url{ year {2018} } Model Zoo and Baselines We provide a large set of baseline results and trained models available for download in the Detectron Model Zoo (MODEL_ZOO.md). Installation Please find installation instructions for Caffe2 and Detectron in INSTALL.md (INSTALL.md). Quick Start: Using Detectron After installation, please see GETTING_STARTED.md (GETTING_STARTED.md) for brief tutorials covering inference and training with Detectron. Getting Help To start, please check the troubleshooting (INSTALL.md troubleshooting) section of our installation instructions as well as our FAQ (FAQ.md). If you couldn't find help there, try searching our GitHub issues. We intend the issues page to be a forum in which the community collectively troubleshoots problems. If bugs are found, we appreciate pull requests (including adding Q&A's to FAQ.md and improving our installation instructions and troubleshooting documents). Please see CONTRIBUTING.md (CONTRIBUTING.md) for more information about contributing to Detectron. References Data Distillation: Towards Omni Supervised Learning . Ilija Radosavovic, Piotr Dollár, Ross Girshick, Georgia Gkioxari, and Kaiming He. Tech report, arXiv, Dec. 2017. Learning to Segment Every Thing . Ronghang Hu, Piotr Dollár, Kaiming He, Trevor Darrell, and Ross Girshick. Tech report, arXiv, Nov. 2017. Non Local Neural Networks . Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. Tech report, arXiv, Nov. 2017. Mask R CNN . Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. IEEE International Conference on Computer Vision (ICCV), 2017. Focal Loss for Dense Object Detection . Tsung Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. IEEE International Conference on Computer Vision (ICCV), 2017. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour . Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. Tech report, arXiv, June 2017. Detecting and Recognizing Human Object Interactions . Georgia Gkioxari, Ross Girshick, Piotr Dollár, and Kaiming He. Tech report, arXiv, Apr. 2017. Feature Pyramid Networks for Object Detection . Tsung Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. Aggregated Residual Transformations for Deep Neural Networks . Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. R FCN: Object Detection via Region based Fully Convolutional Networks . Jifeng Dai, Yi Li, Kaiming He, and Jian Sun. Conference on Neural Information Processing Systems (NIPS), 2016. Deep Residual Learning for Image Recognition . Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. Faster R CNN: Towards Real Time Object Detection with Region Proposal Networks Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Conference on Neural Information Processing Systems (NIPS), 2015. Fast R CNN . Ross Girshick. IEEE International Conference on Computer Vision (ICCV), 2015.",Object Detection,Object Detection 2818,Computer Vision,Computer Vision,Computer Vision,"Deprecated. Please see our CVPR 2019 paper: Bounding Box Regression with Uncertainty for Accurate Object Detection Softer NMS: Rethinking Bounding Box Regression for Accurate Object Detection Yihui He , Xiangyu Zhang , Kris Kitani and Marios Savvides , Carnegie Mellon University We introduce a novel bounding box regression loss for learning bounding box transformation and localization variance together. The resulting localization variance is utilized in our new non maximum suppression method to improve localization accuracy for object detection. On MS COCO, we boost the AP of VGG 16 faster R CNN from 23.6% to 29.1% with a single model and nearly no additional computational overhead. More importantly, our method improves the AP of ResNet 50 FPN fast R CNN from 36.8% to 37.8% , which achieves state of the art bounding box refinement result. Citation If you find the code useful in your research, please consider citing: @article{softernms, title {Softer NMS: Rethinking Bounding Box Regression for Accurate Object Detection}, author {He, Yihui and Zhang, Xiangyu and Kitani, Kris and Savvides, Marios}, journal {arXiv preprint arXiv:1809.08545}, year {2018} } Installation Please find installation instructions for Caffe2 and Detectron in INSTALL.md (INSTALL.md). Testing Running inference using $N GPUs (e.g., N 8 ). python2 tools/test_net.py \ cfg configs/12_2017_baselines/fast_rcnn_R 50 FPN_2x_our.yaml \ multi gpu testing \ NUM_GPUS $N Training Training with 8 GPUs: python2 tools/train_net.py \ cfg configs/12_2017_baselines/fast_rcnn_R 50 FPN_2x_our.yaml \ OUTPUT_DIR /tmp/detectron output to train the initialization model: python2 tools/train_net.py \ cfg configs/12_2017_baselines/fast_rcnn_R 50 FPN_2x_init.yaml \ OUTPUT_DIR /tmp/detectron output Customization If you want to integrate softer NMS into your own code, you can find all modifications to detectron/ at lines with flags: XYXY, PRED_STD, STD_NMS . If you want to train your own model, Create two configs similar to these two: First, in _init.yaml config, you need to add: XYXY: True PRED_STD: False STD_NMS: False Other staffs remain the same. Second, in _our.yaml config, you need: XYXY: True PRED_STD: True STD_NMS: True Learning rate should be changed accordingly (see here ). TRAIN.WEIGHTS should be the path to the output of _init.yaml . FAQ Please create a new issue . Detectron Detectron is Facebook AI Research's software system that implements state of the art object detection algorithms, including Mask R CNN . It is written in Python and powered by the Caffe2 deep learning framework. At FAIR, Detectron has enabled numerous research projects, including: Feature Pyramid Networks for Object Detection , Mask R CNN , Detecting and Recognizing Human Object Interactions , Focal Loss for Dense Object Detection , Non local Neural Networks , Learning to Segment Every Thing , Data Distillation: Towards Omni Supervised Learning , DensePose: Dense Human Pose Estimation In The Wild , and Group Normalization . Example Mask R CNN output. Introduction The goal of Detectron is to provide a high quality, high performance codebase for object detection research . It is designed to be flexible in order to support rapid implementation and evaluation of novel research. Detectron includes implementations of the following object detection algorithms: Mask R CNN Marr Prize at ICCV 2017 RetinaNet Best Student Paper Award at ICCV 2017 Faster R CNN RPN Fast R CNN R FCN using the following backbone network architectures: ResNeXt{50,101,152} ResNet{50,101,152} Feature Pyramid Networks (with ResNet/ResNeXt) VGG16 Additional backbone architectures may be easily implemented. For more details about these models, please see References ( references) below. Update 4/2018: Support Group Normalization see GN/README.md (./projects/GN/README.md) License Detectron is released under the Apache 2.0 license . See the NOTICE file for additional details. Citing Detectron If you use Detectron in your research or wish to refer to the baseline results published in the Model Zoo (MODEL_ZOO.md), please use the following BibTeX entry. @misc{Detectron2018, author {Ross Girshick and Ilija Radosavovic and Georgia Gkioxari and Piotr Doll\'{a}r and Kaiming He}, title {Detectron}, howpublished {\url{ year {2018} } Model Zoo and Baselines We provide a large set of baseline results and trained models available for download in the Detectron Model Zoo (MODEL_ZOO.md). Installation Please find installation instructions for Caffe2 and Detectron in INSTALL.md (INSTALL.md). Quick Start: Using Detectron After installation, please see GETTING_STARTED.md (GETTING_STARTED.md) for brief tutorials covering inference and training with Detectron. Getting Help To start, please check the troubleshooting (INSTALL.md troubleshooting) section of our installation instructions as well as our FAQ (FAQ.md). If you couldn't find help there, try searching our GitHub issues. We intend the issues page to be a forum in which the community collectively troubleshoots problems. If bugs are found, we appreciate pull requests (including adding Q&A's to FAQ.md and improving our installation instructions and troubleshooting documents). Please see CONTRIBUTING.md (CONTRIBUTING.md) for more information about contributing to Detectron. References Data Distillation: Towards Omni Supervised Learning . Ilija Radosavovic, Piotr Dollár, Ross Girshick, Georgia Gkioxari, and Kaiming He. Tech report, arXiv, Dec. 2017. Learning to Segment Every Thing . Ronghang Hu, Piotr Dollár, Kaiming He, Trevor Darrell, and Ross Girshick. Tech report, arXiv, Nov. 2017. Non Local Neural Networks . Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. Tech report, arXiv, Nov. 2017. Mask R CNN . Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. IEEE International Conference on Computer Vision (ICCV), 2017. Focal Loss for Dense Object Detection . Tsung Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. IEEE International Conference on Computer Vision (ICCV), 2017. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour . Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. Tech report, arXiv, June 2017. Detecting and Recognizing Human Object Interactions . Georgia Gkioxari, Ross Girshick, Piotr Dollár, and Kaiming He. Tech report, arXiv, Apr. 2017. Feature Pyramid Networks for Object Detection . Tsung Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. Aggregated Residual Transformations for Deep Neural Networks . Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. R FCN: Object Detection via Region based Fully Convolutional Networks . Jifeng Dai, Yi Li, Kaiming He, and Jian Sun. Conference on Neural Information Processing Systems (NIPS), 2016. Deep Residual Learning for Image Recognition . Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. Faster R CNN: Towards Real Time Object Detection with Region Proposal Networks Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Conference on Neural Information Processing Systems (NIPS), 2015. Fast R CNN . Ross Girshick. IEEE International Conference on Computer Vision (ICCV), 2015.",Object Detection,Object Detection 2819,Computer Vision,Computer Vision,Computer Vision,"Honey Bee Images Classfication Author: Xiaochi Ge, Phoebe Wu, Ruyue Zhang, Yijia Chen The dataset is from kaggle, The BeeImage Dataset: Annotated Honey Bee Images: It contains 5,172 bee images annotated with location, date, time, subspecies, health condition, caste, and pollen. Getting Started: 1. Download the Data Please download the dataset here: Please check on the path before running the code. The dataset should be in the right path as '../input' after unzip the file. To view an image, run show_img.py 2. EDA Please run EDA.py 3. Modelling We use CNN to classify bee subspecies and hive health by 2 frameworks Keras and Pytorch. Keras: Subspecies_Keras.py (20 cnn models included with different hyperparameters) Focal loss paper HiveHealth_Keras.py (comment out line 135, 136 to run training1, at the same time, comment line 138, 139) Pytorch: Subspecies_Torch.py (you could try different channel in cnn module, and change batch size for experiment) HiveHealth_Torch.py",Object Detection,Object Detection 2827,Computer Vision,Computer Vision,Computer Vision,"deep nn examples This repository contains toy examples of shallow and deep neural networks along with convolutional and recurrent neural networks. It also contains a (not optimal) implementation of base neural networks. Have fun! :) Usage You can either run the examples from command line or from an IDE. The below examples are for command line usage. Setup First set up the environment by running the setup.py file for installation. It will download all the necessary packages to run the examples. > python setup.py install Run example > python NeuralNetworks/logisticregression_vs_shallownn.py List of examples: This is the list of finished examples.. others will follow! Neaural Networks and Regression Simple linear regression NeuralNetworks/simple_linear_regression.py Plain and simple implementation of linear regression aming to demonstrate how you can approximate data points, that are close to a linear function (in this example y 2\ x + 4). Cost Fitted vs values : : : : Document classification with word embedding NeuralNetworks/doc_classification_apple.py An example on how to learn word embeddings using a neural network. The training data contains text from both Apple Inc. and the apple fruit and the goal is to categorize new text into one of these classes. There is a lot of room for improvement, like getting more training data, filtering stop words better or restricting the vocabulary... Feel free to play around! Predicting for sample sentence about the apple fruit: The world crop of apples averages more than 60 million metric tons a year. Of the American crop, more than half is normally used as fresh fruit. Prediction: 0.8536878228187561, actual value: 1 Logistic regression vs shallow neural networks NeuralNetworks/logisticregression_vs_shallownn.py In this example, the aim is to classify a linearly NOT separable dataset. You can see, how much better you can do with a neural network with 1 hidden layer vs a simply logistic regression model. It also demonstartes the increase of accuracy, when we increase the size of the hidden layer. Original dataset Fit with logistic regression Fit with different number of layers : : : : : : Shallow neural networks vs deeper neural networks NeuralNetworks/shallownn_vs_deepnn.py Classic image binary classification problem: cat vs non cat. Two neural networks are trained for the same number of iterations, but one with 3 hidden layers and the other with only 1. You can observe, that despite, that the simpler model can reach the same train accuracy, on the test set, there is a significant difference. 2 layer model: train accuracy: 100 % test accuracy: 70 % 4 layer model: train accuracy: 100.0 % test accuracy: 80.0 % 2 layer network cost 4 layer network cost Prediction : : : : : : Hand (number) sign classification with tensorflow NeuralNetworks/tf_sign_classification.py A 1 hidden layer neural network is used to classify hand signs to numbers (0 9). It is an example on how to implement a simple model using tensorflow, instead of coding the backpropagation/optimization yourself. Cost function Prediction : : : : Convolutional Neural Networks Hand sign classification with convolutional networks ConvolutionalNeuralNetworks/cnn_sign_classification.py This demo uses convolutional (and pooling) layers to address the same problem as in the example above ( Hand (number) sign classification with tensorflow ). The main advantage of using convolutional layers on images is, that you have much less parameters as with a fully connected layer. For example: If the images are only of size 32x32x3 (32 wide, 32 high, 3 color channels), a single fully connected neuron in a first hidden layer would have 32\ 32\ 3 3072 weights, whereas a convolutional layer with one 4x4 filter has only 4\ 4\ 3 48. Architecture : : The RESNET50 ConvolutionalNeuralNetworks/resnet_mnist.py In this example, the famous mnist hand written digit dataset is used to train the ResNet50 network, which uses residual blocks. It is a bit overkill to use such a big network for this task, but the goal here is to learn a bit about deep residual networks. So what is a residual network? The main idea is, that to tweek the mathematical formula with an identity function, such as: f(x) + x f(x) + id(x) y. Identity connections enable the layers to learn incremental, or residual representations. The layers can start as the identity function and gradually transform to be more complex. This significantly helps deeper networks in the training process, since the gradient signal vanishes with increasing network depth, but the identity connections in ResNets propagate the gradient throughout the model. Accurately classified examples Misclassified examples : : : : YOLO (you only look once) ConvolutionalNeuralNetworks/yolo_car_detection.py YOLO is a new approach of object detection with a great performance on real time. A single neural network predicts bounding boxes and class probabilities directly from full images in one evaluation. The classic model can process 45 frames per second, that's why it's popularity. Architecture : : The model has 2 main steps: Detection: A single convolutional network simultaneously predicts multiple bounding boxes and class probabilities for those boxes. Filtering: Boxes with less probabilty, than the threshhold are disregarded. On the remaining boxes , a simple non max surpression function prunes away boxes that have high intersection over union (IOU) overlap with maximum probability box. References Python setup: Tensorflow: Deep learning course: Intuitive explonation of ConvNets: ConvNet CIFAR 10: Word embeddings: Deep Residual Networks: YOLO original paper:",Object Detection,Object Detection 2832,Computer Vision,Computer Vision,Computer Vision,"Yolo v3 Windows and Linux version Download Pre compiled windows libs: (Windows 10 + VS2015 + CUDA8) link: passwd: 2mfi YOLO with CMake and GPU interfaces I write this repository based on Thanks AlexeyAB CMakeLists.txt is in src/ Tutorial CircleCI 1. How to use ( how to use) 2. How to compile on Linux ( how to compile on linux) 3. How to compile on Windows ( how to compile on windows) 4. How to train (Pascal VOC Data) ( how to train pascal voc data) 5. How to train (to detect your custom objects) ( how to train to detect your custom objects) 6. When should I stop training ( when should i stop training) 7. How to improve object detection ( how to improve object detection) 8. How to mark bounded boxes of objects and create annotation files ( how to mark bounded boxes of objects and create annotation files) 9. How to use Yolo as DLL ( how to use yolo as dll) ! Darknet Logo ! map_fps ! Darknet Logo ! map_fps You Only Look Once: Unified, Real Time Object Detection (version 2) A Yolo cross platform Windows and Linux version (for object detection). Contributtors: This repository is forked from Linux version: More details: This repository supports: both Windows and Linux both OpenCV 3.x and OpenCV 2.4.13 both cuDNN 5 and cuDNN 6 CUDA > 7.5 also create SO library on Linux and DLL library on Windows Requires: Linux GCC> 4.9 or Windows MS Visual Studio 2015 (v140) : (or offline ISO image ) CUDA 8.0 : OpenCV 3.x : or OpenCV 2.4.13 : OpenCV allows to show image or video detection in the window and store result to file that specified in command line out_filename res.avi GPU with CC > 2.0 if you use CUDA, or GPU CC > 3.0 if you use cuDNN + CUDA: Pre trained models for different cfg files can be downloaded from (smaller > faster & lower quality): yolo.cfg (194 MB COCO model) require 4 GB GPU RAM: yolo voc.cfg (194 MB VOC model) require 4 GB GPU RAM: tiny yolo.cfg (60 MB COCO model) require 1 GB GPU RAM: tiny yolo voc.cfg (60 MB VOC model) require 1 GB GPU RAM: yolo9000.cfg (186 MB Yolo9000 model) require 4 GB GPU RAM: Put it near compiled: darknet.exe You can get cfg files by path: darknet/cfg/ Examples of results: Everything Is AWESOME Others: How to use: Example of usage in cmd files from build\darknet\x64\ : darknet_voc.cmd initialization with 194 MB VOC model yolo voc.weights & yolo voc.cfg and waiting for entering the name of the image file darknet_demo_voc.cmd initialization with 194 MB VOC model yolo voc.weights & yolo voc.cfg and play your video file which you must rename to: test.mp4 darknet_demo_store.cmd initialization with 194 MB VOC model yolo voc.weights & yolo voc.cfg and play your video file which you must rename to: test.mp4, and store result to: res.avi darknet_net_cam_voc.cmd initialization with 194 MB VOC model, play video from network video camera mjpeg stream (also from you phone) darknet_web_cam_voc.cmd initialization with 194 MB VOC model, play video from Web Camera number 0 darknet_coco_9000.cmd initialization with 186 MB Yolo9000 COCO model, and show detection on the image: dog.jpg darknet_coco_9000_demo.cmd initialization with 186 MB Yolo9000 COCO model, and show detection on the video (if it is present): street4k.mp4, and store result to: res.avi How to use on the command line: On Linux use ./darknet instead of darknet.exe , like this: ./darknet detector test ./cfg/coco.data ./cfg/yolo.cfg ./yolo.weights 194 MB COCO model image: darknet.exe detector test data/coco.data yolo.cfg yolo.weights i 0 thresh 0.2 Alternative method 194 MB COCO model image: darknet.exe detect yolo.cfg yolo.weights i 0 thresh 0.2 194 MB VOC model image: darknet.exe detector test data/voc.data yolo voc.cfg yolo voc.weights i 0 194 MB COCO model video: darknet.exe detector demo data/coco.data yolo.cfg yolo.weights test.mp4 i 0 194 MB VOC model video: darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights test.mp4 i 0 194 MB COCO model save result to the file res.avi : darknet.exe detector demo data/coco.data yolo.cfg yolo.weights test.mp4 i 0 out_filename res.avi 194 MB VOC model save result to the file res.avi : darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights test.mp4 i 0 out_filename res.avi Alternative method 194 MB VOC model video: darknet.exe yolo demo yolo voc.cfg yolo voc.weights test.mp4 i 0 60 MB VOC model for video: darknet.exe detector demo data/voc.data tiny yolo voc.cfg tiny yolo voc.weights test.mp4 i 0 194 MB COCO model for net videocam Smart WebCam: darknet.exe detector demo data/coco.data yolo.cfg yolo.weights i 0 194 MB VOC model for net videocam Smart WebCam: darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights i 0 194 MB VOC model WebCamera 0: darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights c 0 186 MB Yolo9000 image: darknet.exe detector test cfg/combine9k.data yolo9000.cfg yolo9000.weights 186 MB Yolo9000 video: darknet.exe detector demo cfg/combine9k.data yolo9000.cfg yolo9000.weights test.mp4 To process a list of images image_list.txt and save results of detection to result.txt use: darknet.exe detector test data/voc.data yolo voc.cfg yolo voc.weights result.txt You can comment this line so that each image does not require pressing the button ESC: For using network video camera mjpeg stream with any Android smartphone: 1. Download for Android phone mjpeg stream soft: IP Webcam / Smart WebCam Smart WebCam preferably: IP Webcam: 2. Connect your Android phone to computer by WiFi (through a WiFi router) or USB 3. Start Smart WebCam on your phone 4. Replace the address below, on shown in the phone application (Smart WebCam) and launch: 194 MB COCO model: darknet.exe detector demo data/coco.data yolo.cfg yolo.weights i 0 194 MB VOC model: darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights i 0 How to compile on Linux: Just do make in the darknet directory. Before make, you can set such options in the Makefile : link GPU 1 to build with CUDA to accelerate by using GPU (CUDA should be in /use/local/cuda ) CUDNN 1 to build with cuDNN v5/v6 to accelerate training by using GPU (cuDNN should be in /usr/local/cudnn ) OPENCV 1 to build with OpenCV 3.x/2.4.x allows to detect on video files and video streams from network cameras or web cams DEBUG 1 to bould debug version of Yolo OPENMP 1 to build with OpenMP support to accelerate Yolo by using multi core CPU LIBSO 1 to build a library darknet.so and binary runable file uselib that uses this library. Or you can try to run so LD_LIBRARY_PATH ./:$LD_LIBRARY_PATH ./uselib test.mp4 How to use this SO library from your own code you can look at C++ example: How to compile on Windows: 1. If you have MSVS 2015, CUDA 8.0 and OpenCV 3.0 (with paths: C:\opencv_3.0\opencv\build\include & C:\opencv_3.0\opencv\build\x64\vc14\lib ), then start MSVS, open build\darknet\darknet.sln , set x64 and Release , and do the: Build > Build darknet 1.1. Find files opencv_world320.dll and opencv_ffmpeg320_64.dll in C:\opencv_3.0\opencv\build\x64\vc14\bin and put it near with darknet.exe 2. If you have other version of CUDA (not 8.0) then open build\darknet\darknet.vcxproj by using Notepad, find 2 places with CUDA 8.0 and change it to your CUDA version, then do step 1 3. If you don't have GPU , but have MSVS 2015 and OpenCV 3.0 (with paths: C:\opencv_3.0\opencv\build\include & C:\opencv_3.0\opencv\build\x64\vc14\lib ), then start MSVS, open build\darknet\darknet_no_gpu.sln , set x64 and Release , and do the: Build > Build darknet 4. If you have OpenCV 2.4.13 instead of 3.0 then you should change pathes after \darknet.sln is opened 4.1 (right click on project) > properties > C/C++ > General > Additional Include Directories: C:\opencv_2.4.13\opencv\build\include 4.2 (right click on project) > properties > Linker > General > Additional Library Directories: C:\opencv_2.4.13\opencv\build\x64\vc14\lib 5. If you want to build with CUDNN to speed up then: download and install cuDNN 6.0 for CUDA 8.0 : add Windows system variable cudnn with path to CUDNN: open \darknet.sln > (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions, and add at the beginning of line: CUDNN; How to compile (custom): Also, you can to create your own darknet.sln & darknet.vcxproj , this example for CUDA 8.0 and OpenCV 3.0 Then add to your created project: (right click on project) > properties > C/C++ > General > Additional Include Directories, put here: C:\opencv_3.0\opencv\build\include;..\..\3rdparty\include;%(AdditionalIncludeDirectories);$(CudaToolkitIncludeDir);$(cudnn)\include (right click on project) > Build dependecies > Build Customizations > set check on CUDA 8.0 or what version you have for example as here: add to project all .c & .cu files from \src (right click on project) > properties > Linker > General > Additional Library Directories, put here: C:\opencv_3.0\opencv\build\x64\vc14\lib;$(CUDA_PATH)lib\$(PlatformName);$(cudnn)\lib\x64;%(AdditionalLibraryDirectories) (right click on project) > properties > Linker > Input > Additional dependecies, put here: ..\..\3rdparty\lib\x64\pthreadVC2.lib;cublas.lib;curand.lib;cudart.lib;cudnn.lib;%(AdditionalDependencies) (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions OPENCV;_TIMESPEC_DEFINED;_CRT_SECURE_NO_WARNINGS;_CRT_RAND_S;WIN32;NDEBUG;_CONSOLE;_LIB;%(PreprocessorDefinitions) open file: \src\detector.c and check lines pragma and inclue for OpenCV. compile to .exe (X64 & Release) and put .dll s near with .exe: pthreadVC2.dll, pthreadGC2.dll from \3rdparty\dll\x64 cusolver64_80.dll, curand64_80.dll, cudart64_80.dll, cublas64_80.dll 80 for CUDA 8.0 or your version, from C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\bin For OpenCV 3.0: opencv_world320.dll and opencv_ffmpeg320_64.dll from C:\opencv_3.0\opencv\build\x64\vc14\bin For OpenCV 2.4.13: opencv_core249.dll , opencv_highgui249.dll and opencv_ffmpeg249_64.dll from C:\opencv_2.4.9\opencv\build\x64\vc14\bin How to train (Pascal VOC Data): 1. Download pre trained weights for the convolutional layers (76 MB): and put to the directory build\darknet\x64 2. Download The Pascal VOC Data and unpack it to directory build\darknet\x64\data\voc will be created dir build\darknet\x64\data\voc\VOCdevkit\ : 2.1 Download file voc_label.py to dir build\darknet\x64\data\voc : 3. Download and install Python for Windows: 4. Run command: python build\darknet\x64\data\voc\voc_label.py (to generate files: 2007_test.txt, 2007_train.txt, 2007_val.txt, 2012_train.txt, 2012_val.txt) 5. Run command: type 2007_train.txt 2007_val.txt 2012_ .txt > train.txt 6. Set batch 64 and subdivisions 8 in the file yolo voc.2.0.cfg : link 7. Start training by using train_voc.cmd or by using the command line: darknet.exe detector train data/voc.data yolo voc.2.0.cfg darknet19_448.conv.23 If required change pathes in the file build\darknet\x64\data\voc.data More information about training by the link: How to train with multi GPU: 1. Train it first on 1 GPU for like 1000 iterations: darknet.exe detector train data/voc.data yolo voc.2.0.cfg darknet19_448.conv.23 2. Then stop and by using partially trained model /backup/yolo voc_1000.weights run training with multigpu (up to 4 GPUs): darknet.exe detector train data/voc.data yolo voc.2.0.cfg /backup/yolo voc_1000.weights gpus 0,1,2,3 How to train (to detect your custom objects): 1. Create file yolo obj.cfg with the same content as in yolo voc.2.0.cfg (or copy yolo voc.2.0.cfg to yolo obj.cfg) and: change line batch to batch 64 change line subdivisions to subdivisions 8 change line classes 20 to your number of objects change line 237 from filters 125 to filters (classes + 5) 5 (generally this depends on the num and coords , i.e. equal to (classes + coords + 1) num ) For example, for 2 objects, your file yolo obj.cfg should differ from yolo voc.2.0.cfg in such lines: convolutional filters 35 region classes 2 2. Create file obj.names in the directory build\darknet\x64\data\ , with objects names each in new line 3. Create file obj.data in the directory build\darknet\x64\data\ , containing (where classes number of objects ): classes 2 train data/train.txt valid data/test.txt names data/obj.names backup backup/ 4. Put image files (.jpg) of your objects in the directory build\darknet\x64\data\obj\ 5. Create .txt file for each .jpg image file in the same directory and with the same name, but with .txt extension, and put to file: object number and object coordinates on this image, for each object in new line: Where: integer number of object from 0 to (classes 1) float values relative to width and height of image, it can be equal from 0.0 to 1.0 for example: / or / atention: are center of rectangle (are not top left corner) For example for img1.jpg you should create img1.txt containing: 1 0.716797 0.395833 0.216406 0.147222 0 0.687109 0.379167 0.255469 0.158333 1 0.420312 0.395833 0.140625 0.166667 6. Create file train.txt in directory build\darknet\x64\data\ , with filenames of your images, each filename in new line, with path relative to darknet.exe , for example containing: data/obj/img1.jpg data/obj/img2.jpg data/obj/img3.jpg 7. Download pre trained weights for the convolutional layers (76 MB): and put to the directory build\darknet\x64 8. Start training by using the command line: darknet.exe detector train data/obj.data yolo obj.cfg darknet19_448.conv.23 (file yolo obj_xxx.weights will be saved to the build\darknet\x64\backup\ for each 100 iterations until 1000 iterations has been reached, and after for each 1000 iterations) 9. After training is complete get result yolo obj_final.weights from path build\darknet\x64\backup\ After each 1000 iterations you can stop and later start training from this point. For example, after 2000 iterations you can stop training, and later just copy yolo obj_2000.weights from build\darknet\x64\backup\ to build\darknet\x64\ and start training using: darknet.exe detector train data/obj.data yolo obj.cfg yolo obj_2000.weights Also you can get result earlier than all 45000 iterations. When should I stop training: Usually sufficient 2000 iterations for each class(object). But for a more precise definition when you should stop training, use the following manual: 1. During training, you will see varying indicators of error, and you should stop when no longer decreases 0.060730 avg : > Region Avg IOU: 0.798363, Class: 0.893232, Obj: 0.700808, No Obj: 0.004567, Avg Recall: 1.000000, count: 8 > Region Avg IOU: 0.800677, Class: 0.892181, Obj: 0.701590, No Obj: 0.004574, Avg Recall: 1.000000, count: 8 > > 9002 : 0.211667, 0.060730 avg , 0.001000 rate, 3.868000 seconds, 576128 images > Loaded: 0.000000 seconds 9002 iteration number (number of batch) 0.060730 avg average loss (error) the lower, the better When you see that average loss 0.xxxxxx avg no longer decreases at many iterations then you should stop training. 2. Once training is stopped, you should take some of last .weights files from darknet\build\darknet\x64\backup and choose the best of them: For example, you stopped training after 9000 iterations, but the best result can give one of previous weights (7000, 8000, 9000). It can happen due to overfitting. Overfitting is case when you can detect objects on images from training dataset, but can't detect ojbects on any others images. You should get weights from Early Stopping Point : ! Overfitting To get weights from Early Stopping Point: 2.1. At first, in your file obj.data you must specify the path to the validation dataset valid valid.txt (format of valid.txt as in train.txt ), and if you haven't validation images, just copy data\train.txt to data\valid.txt . 2.2 If training is stopped after 9000 iterations, to validate some of previous weights use this commands: darknet.exe detector recall data/obj.data yolo obj.cfg backup\yolo obj_7000.weights darknet.exe detector recall data/obj.data yolo obj.cfg backup\yolo obj_8000.weights darknet.exe detector recall data/obj.data yolo obj.cfg backup\yolo obj_9000.weights And comapre last output lines for each weights (7000, 8000, 9000): > 7586 7612 7689 RPs/Img: 68.23 IOU: 77.86% Recall:99.00% IOU the bigger, the better (says about accuracy) better to use Recall the bigger, the better (says about accuracy) actually Yolo calculates true positives, so it shouldn't be used For example, bigger IOU gives weights yolo obj_8000.weights then use this weights for detection . ! precision_recall_iou Custom object detection: Example of custom object detection: darknet.exe detector test data/obj.data yolo obj.cfg yolo obj_8000.weights ! Yolo_v2_training ! Yolo_v2_training How to improve object detection: 1. Before training: set flag random 1 in your .cfg file it will increase precision by training Yolo for different resolutions: link desirable that your training dataset include images with objects at diffrent: scales, rotations, lightings, from different sides 2. After training for detection: Increase network resolution by set in your .cfg file ( height 608 and width 608 ) or ( height 832 and width 832 ) or (any value multiple of 32) this increases the precision and makes it possible to detect small objects: link you do not need to train the network again, just use .weights file already trained for 416x416 resolution if error Out of memory occurs then in .cfg file you should increase subdivisions 16 , 32 or 64: link How to mark bounded boxes of objects and create annotation files: Here you can find repository with GUI software for marking bounded boxes of objects and generating annotation files for Yolo v2: With example of: train.txt , obj.names , obj.data , yolo obj.cfg , air 1 6 .txt , bird 1 4 .txt for 2 classes of objects (air, bird) and train_obj.cmd with example how to train this image set with Yolo v2 How to use Yolo as DLL 1. To compile Yolo as C++ DLL file yolo_cpp_dll.dll open in MSVS2015 file build\darknet\yolo_cpp_dll.sln , set x64 and Release , and do the: Build > Build yolo_cpp_dll You should have installed CUDA 8.0 To use cuDNN do: (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions, and add at the beginning of line: CUDNN; 2. To use Yolo as DLL file in your C++ console application open in MSVS2015 file build\darknet\yolo_console_dll.sln , set x64 and Release , and do the: Build > Build yolo_console_dll you can run your console application from Windows Explorer build\darknet\x64\yolo_console_dll.exe or you can run from MSVS2015 (before this you should copy 2 files yolo voc.cfg and yolo voc.weights to the directory build\darknet\ ) after launching your console application and entering the image file name you will see info for each object: to use simple OpenCV GUI you should uncomment line // define OPENCV in yolo_console_dll.cpp file: link you can see source code of simple example for detection on the video file: link yolo_cpp_dll.dll API: link class Detector { public: Detector(std::string cfg_filename, std::string weight_filename, int gpu_id 0); Detector(); std::vector detect(std::string image_filename, float thresh 0.2, bool use_mean false); std::vector detect(image_t img, float thresh 0.2, bool use_mean false); static image_t load_image(std::string image_filename); static void free_image(image_t m); ifdef OPENCV std::vector detect(cv::Mat mat, float thresh 0.2, bool use_mean false); endif };",Object Detection,Object Detection 2834,Computer Vision,Computer Vision,Computer Vision,"Yolo v3 and Yolo v2 for Windows and Linux (neural network for object detection) Tensor Cores can be used on Linux and Windows CircleCI 0. Improvements in this repository ( improvements in this repository) 1. How to use ( how to use) 2. How to compile on Linux ( how to compile on linux) 3. How to compile on Windows ( how to compile on windows) 4. How to train (Pascal VOC Data) ( how to train pascal voc data) 5. How to train (to detect your custom objects) ( how to train to detect your custom objects) 6. When should I stop training ( when should i stop training) 7. How to calculate mAP on PascalVOC 2007 ( how to calculate map on pascalvoc 2007) 8. How to improve object detection ( how to improve object detection) 9. How to mark bounded boxes of objects and create annotation files ( how to mark bounded boxes of objects and create annotation files) 10. Using Yolo9000 ( using yolo9000) 11. How to use Yolo as DLL ( how to use yolo as dll) ! Darknet Logo ! map_fps mAP (AP50) YOLOv3 spp (is not indicated) better than YOLOv3 mAP 60.6%, FPS 20: Yolo v3 source chart for the RetinaNet on MS COCO got from Table 1 (e): Yolo v2 on Pascal VOC 2007: Yolo v2 on Pascal VOC 2012 (comp4): You Only Look Once: Unified, Real Time Object Detection (versions 2 & 3) A Yolo cross platform Windows and Linux version (for object detection). Contributtors: This repository is forked from Linux version: More details: This repository supports: both Windows and Linux both OpenCV 2.x.x and OpenCV 7.5 also create SO library on Linux and DLL library on Windows Requires: Linux GCC> 4.9 or Windows MS Visual Studio 2015 (v140) : (or offline ISO image ) CUDA 9.1 : OpenCV 3.3.0 : or OpenCV 2.4.13 : OpenCV allows to show image or video detection in the window and store result to file that specified in command line out_filename res.avi GPU with CC > 3.0 : Pre trained models for different cfg files can be downloaded from (smaller > faster & lower quality): yolov3 openimages.cfg (247 MB COCO Yolo v3 ) requires 4 GB GPU RAM: yolov3 spp.cfg (240 MB COCO Yolo v3 ) requires 4 GB GPU RAM: yolov3.cfg (236 MB COCO Yolo v3 ) requires 4 GB GPU RAM: yolov3 tiny.cfg (34 MB COCO Yolo v3 tiny ) requires 1 GB GPU RAM: yolov2.cfg (194 MB COCO Yolo v2) requires 4 GB GPU RAM: yolo voc.cfg (194 MB VOC Yolo v2) requires 4 GB GPU RAM: yolov2 tiny.cfg (43 MB COCO Yolo v2) requires 1 GB GPU RAM: yolov2 tiny voc.cfg (60 MB VOC Yolo v2) requires 1 GB GPU RAM: yolo9000.cfg (186 MB Yolo9000 model) requires 4 GB GPU RAM: Put it near compiled: darknet.exe You can get cfg files by path: darknet/cfg/ Examples of results: Everything Is AWESOME Others: Improvements in this repository added support for Windows improved binary neural network performance 2x 4x times for Detection on CPU and GPU if you trained your own weights by using this XNOR net model (bit 1 inference) : improved neural network performance 7% by fusing 2 layers into 1: Convolutional + Batch norm improved neural network performance Detection 3x times , Training 2 x times on GPU Volta (Tesla V100, Titan V, ...) using Tensor Cores if CUDNN_HALF defined in the Makefile or darknet.sln improved performance 1.2x times on FullHD, 2x times on 4K, for detection on the video (file/stream) using darknet detector demo ... improved performance 3.5 X times of data augmentation for training (using OpenCV SSE/AVX functions instead of hand written functions) removes bottleneck for training on multi GPU or GPU Volta improved performance of detection and training on Intel CPU with AVX (Yolo v3 85% , Yolo v2 10%) fixed usage of reorg layer optimized memory allocation during network resizing when random 1 optimized initialization GPU for detection we use batch 1 initially instead of re init with batch 1 added correct calculation of mAP, F1, IoU, Precision Recall using command darknet detector map ... added drawing of chart of average loss during training added calculation of anchors for training added example of Detection and Tracking objects: fixed code for use Web cam on OpenCV 3.x run time tips and warnings if you use incorrect cfg file or dataset many other fixes of code... And added manual How to train Yolo v3/v2 (to detect your custom objects) ( how to train to detect your custom objects) Also, you might be interested in using a simplified repository where is implemented INT8 quantization (+30% speedup and 1% mAP reduced): How to use: Example of usage in cmd files from build\darknet\x64\ : darknet_yolo_v3.cmd initialization with 236 MB Yolo v3 COCO model yolov3.weights & yolov3.cfg and show detection on the image: dog.jpg darknet_voc.cmd initialization with 194 MB VOC model yolo voc.weights & yolo voc.cfg and waiting for entering the name of the image file darknet_demo_voc.cmd initialization with 194 MB VOC model yolo voc.weights & yolo voc.cfg and play your video file which you must rename to: test.mp4 darknet_demo_store.cmd initialization with 194 MB VOC model yolo voc.weights & yolo voc.cfg and play your video file which you must rename to: test.mp4, and store result to: res.avi darknet_net_cam_voc.cmd initialization with 194 MB VOC model, play video from network video camera mjpeg stream (also from you phone) darknet_web_cam_voc.cmd initialization with 194 MB VOC model, play video from Web Camera number 0 darknet_coco_9000.cmd initialization with 186 MB Yolo9000 COCO model, and show detection on the image: dog.jpg darknet_coco_9000_demo.cmd initialization with 186 MB Yolo9000 COCO model, and show detection on the video (if it is present): street4k.mp4, and store result to: res.avi How to use on the command line: On Linux use ./darknet instead of darknet.exe , like this: ./darknet detector test ./cfg/coco.data ./cfg/yolov3.cfg ./yolov3.weights Yolo v3 COCO image: darknet.exe detector test data/coco.data cfg/yolov3.cfg yolov3.weights i 0 thresh 0.25 Alternative method Yolo v3 COCO image: darknet.exe detect cfg/yolov3.cfg yolov3.weights i 0 thresh 0.25 Output coordinates of objects: darknet.exe detector test data/coco.data yolov3.cfg yolov3.weights thresh 0.25 dog.jpg ext_output 194 MB VOC model image: darknet.exe detector test data/voc.data yolo voc.cfg yolo voc.weights i 0 194 MB VOC model video: darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights test.mp4 i 0 194 MB VOC model save result to the file res.avi : darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights test.mp4 i 0 out_filename res.avi Alternative method 194 MB VOC model video: darknet.exe yolo demo yolo voc.cfg yolo voc.weights test.mp4 i 0 43 MB VOC model for video: darknet.exe detector demo data/coco.data cfg/yolov2 tiny.cfg yolov2 tiny.weights test.mp4 i 0 Yolo v3 236 MB COCO for net videocam Smart WebCam: darknet.exe detector demo data/coco.data cfg/yolov3.cfg yolov3.weights i 0 194 MB VOC model for net videocam Smart WebCam: darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights i 0 194 MB VOC model WebCamera 0: darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights c 0 186 MB Yolo9000 image: darknet.exe detector test cfg/combine9k.data yolo9000.cfg yolo9000.weights Remeber to put data/9k.tree and data/coco9k.map under the same folder of your app if you use the cpp api to build an app To process a list of images data/train.txt and save results of detection to result.txt use: darknet.exe detector test data/voc.data yolo voc.cfg yolo voc.weights dont_show ext_output result.txt For using network video camera mjpeg stream with any Android smartphone: 1. Download for Android phone mjpeg stream soft: IP Webcam / Smart WebCam Smart WebCam preferably: IP Webcam: 2. Connect your Android phone to computer by WiFi (through a WiFi router) or USB 3. Start Smart WebCam on your phone 4. Replace the address below, on shown in the phone application (Smart WebCam) and launch: 194 MB COCO model: darknet.exe detector demo data/coco.data yolo.cfg yolo.weights i 0 194 MB VOC model: darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights i 0 How to compile on Linux: Just do make in the darknet directory. Before make, you can set such options in the Makefile : link GPU 1 to build with CUDA to accelerate by using GPU (CUDA should be in /usr/local/cuda ) CUDNN 1 to build with cuDNN v5 v7 to accelerate training by using GPU (cuDNN should be in /usr/local/cudnn ) CUDNN_HALF 1 to build for Tensor Cores (on Titan V / Tesla V100 / DGX 2 and later) speedup Detection 3x, Training 2x OPENCV 1 to build with OpenCV 3.x/2.4.x allows to detect on video files and video streams from network cameras or web cams DEBUG 1 to bould debug version of Yolo OPENMP 1 to build with OpenMP support to accelerate Yolo by using multi core CPU LIBSO 1 to build a library darknet.so and binary runable file uselib that uses this library. Or you can try to run so LD_LIBRARY_PATH ./:$LD_LIBRARY_PATH ./uselib test.mp4 How to use this SO library from your own code you can look at C++ example: or use in such a way: LD_LIBRARY_PATH ./:$LD_LIBRARY_PATH ./uselib data/coco.names cfg/yolov3.cfg yolov3.weights test.mp4 How to compile on Windows: 1. If you have MSVS 2015, CUDA 9.1, cuDNN 7.0 and OpenCV 3.x (with paths: C:\opencv_3.0\opencv\build\include & C:\opencv_3.0\opencv\build\x64\vc14\lib ), then start MSVS, open build\darknet\darknet.sln , set x64 and Release and do the: Build > Build darknet. NOTE: If installing OpenCV, use OpenCV 3.4.0 or earlier. This is a bug in OpenCV 3.4.1 in the C API (see 500 ). 1.1. Find files opencv_world320.dll and opencv_ffmpeg320_64.dll (or opencv_world340.dll and opencv_ffmpeg340_64.dll ) in C:\opencv_3.0\opencv\build\x64\vc14\bin and put it near with darknet.exe 1.2 Check that there are bin and include folders in the C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1 if aren't, then copy them to this folder from the path where is CUDA installed 1.3. To install CUDNN (speedup neural network), do the following: download and install cuDNN 7.0 for CUDA 9.1 : add Windows system variable cudnn with path to CUDNN: 1.4. If you want to build without CUDNN then: open \darknet.sln > (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions, and remove this: CUDNN; 2. If you have other version of CUDA (not 9.1) then open build\darknet\darknet.vcxproj by using Notepad, find 2 places with CUDA 9.1 and change it to your CUDA version, then do step 1 3. If you don't have GPU , but have MSVS 2015 and OpenCV 3.0 (with paths: C:\opencv_3.0\opencv\build\include & C:\opencv_3.0\opencv\build\x64\vc14\lib ), then start MSVS, open build\darknet\darknet_no_gpu.sln , set x64 and Release , and do the: Build > Build darknet_no_gpu 4. If you have OpenCV 2.4.13 instead of 3.0 then you should change pathes after \darknet.sln is opened 4.1 (right click on project) > properties > C/C++ > General > Additional Include Directories: C:\opencv_2.4.13\opencv\build\include 4.2 (right click on project) > properties > Linker > General > Additional Library Directories: C:\opencv_2.4.13\opencv\build\x64\vc14\lib 5. If you have GPU with Tensor Cores (nVidia Titan V / Tesla V100 / DGX 2 and later) speedup Detection 3x, Training 2x: \darknet.sln > (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions, and add here: CUDNN_HALF; Note: CUDA must be installed only after that MSVS2015 had been installed. How to compile (custom): Also, you can to create your own darknet.sln & darknet.vcxproj , this example for CUDA 9.1 and OpenCV 3.0 Then add to your created project: (right click on project) > properties > C/C++ > General > Additional Include Directories, put here: C:\opencv_3.0\opencv\build\include;..\..\3rdparty\include;%(AdditionalIncludeDirectories);$(CudaToolkitIncludeDir);$(cudnn)\include (right click on project) > Build dependecies > Build Customizations > set check on CUDA 9.1 or what version you have for example as here: add to project all .c & .cu files and file from \src (right click on project) > properties > Linker > General > Additional Library Directories, put here: C:\opencv_3.0\opencv\build\x64\vc14\lib;$(CUDA_PATH)lib\$(PlatformName);$(cudnn)\lib\x64;%(AdditionalLibraryDirectories) (right click on project) > properties > Linker > Input > Additional dependecies, put here: ..\..\3rdparty\lib\x64\pthreadVC2.lib;cublas.lib;curand.lib;cudart.lib;cudnn.lib;%(AdditionalDependencies) (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions OPENCV;_TIMESPEC_DEFINED;_CRT_SECURE_NO_WARNINGS;_CRT_RAND_S;WIN32;NDEBUG;_CONSOLE;_LIB;%(PreprocessorDefinitions) compile to .exe (X64 & Release) and put .dll s near with .exe: pthreadVC2.dll, pthreadGC2.dll from \3rdparty\dll\x64 cusolver64_91.dll, curand64_91.dll, cudart64_91.dll, cublas64_91.dll 91 for CUDA 9.1 or your version, from C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1\bin For OpenCV 3.2: opencv_world320.dll and opencv_ffmpeg320_64.dll from C:\opencv_3.0\opencv\build\x64\vc14\bin For OpenCV 2.4.13: opencv_core2413.dll , opencv_highgui2413.dll and opencv_ffmpeg2413_64.dll from C:\opencv_2.4.13\opencv\build\x64\vc14\bin How to train (Pascal VOC Data): 1. Download pre trained weights for the convolutional layers (154 MB): and put to the directory build\darknet\x64 2. Download The Pascal VOC Data and unpack it to directory build\darknet\x64\data\voc will be created dir build\darknet\x64\data\voc\VOCdevkit\ : 2.1 Download file voc_label.py to dir build\darknet\x64\data\voc : 3. Download and install Python for Windows: 4. Run command: python build\darknet\x64\data\voc\voc_label.py (to generate files: 2007_test.txt, 2007_train.txt, 2007_val.txt, 2012_train.txt, 2012_val.txt) 5. Run command: type 2007_train.txt 2007_val.txt 2012_ .txt > train.txt 6. Set batch 64 and subdivisions 8 in the file yolov3 voc.cfg : link 7. Start training by using train_voc.cmd or by using the command line: darknet.exe detector train data/voc.data cfg/yolov3 voc.cfg darknet53.conv.74 ( Note: To disable Loss Window use flag dont_show . If you are using CPU, try darknet_no_gpu.exe instead of darknet.exe .) If required change pathes in the file build\darknet\x64\data\voc.data More information about training by the link: Note: If during training you see nan values for avg (loss) field then training goes wrong, but if nan is in some other lines then training goes well. How to train with multi GPU: 1. Train it first on 1 GPU for like 1000 iterations: darknet.exe detector train data/voc.data cfg/yolov3 voc.cfg darknet53.conv.74 2. Adjust the learning rate ( cfg/yolov3 voc.cfg ) to fit the amount of GPUs. The learning rate should be equal to 0.001 , regardless of how many GPUs are used for training. So learning_rate GPUs 0.001 . For 4 GPUs adjust the value to learning_rate 0.00025 . 3. For 4xGPUs increase 4x times burn_in and max_batches in your cfg file. I.e. use burn_in 4000 instead of 1000 . 4. Then stop and by using partially trained model /backup/yolov3 voc_1000.weights run training with multigpu (up to 4 GPUs): darknet.exe detector train data/voc.data cfg/yolov3 voc.cfg /backup/yolov3 voc_1000.weights gpus 0,1,2,3 How to train (to detect your custom objects): (to train old Yolo v2 yolov2 voc.cfg , yolov2 tiny voc.cfg , yolo voc.cfg , yolo voc.2.0.cfg , ... click by the link ) Training Yolo v3: 1. Create file yolo obj.cfg with the same content as in yolov3.cfg (or copy yolov3.cfg to yolo obj.cfg) and: change line batch to batch 64 change line subdivisions to subdivisions 8 change line classes 80 to your number of objects in each of 3 yolo layers: change filters 255 to filters (classes + 5)x3 in the 3 convolutional before each yolo layer So if classes 1 then should be filters 18 . If classes 2 then write filters 21 . (Do not write in the cfg file: filters (classes + 5)x3) (Generally filters depends on the classes , coords and number of mask s, i.e. filters (classes + coords + 1) , where mask is indices of anchors. If mask is absence, then filters (classes + coords + 1) num ) So for example, for 2 objects, your file yolo obj.cfg should differ from yolov3.cfg in such lines in each of 3 yolo layers: convolutional filters 21 region classes 2 2. Create file obj.names in the directory build\darknet\x64\data\ , with objects names each in new line 3. Create file obj.data in the directory build\darknet\x64\data\ , containing (where classes number of objects ): classes 2 train data/train.txt valid data/test.txt names data/obj.names backup backup/ 4. Put image files (.jpg) of your objects in the directory build\darknet\x64\data\obj\ 5. You should label each object on images from your dataset. Use this visual GUI software for marking bounded boxes of objects and generating annotation files for Yolo v2 & v3: It will create .txt file for each .jpg image file in the same directory and with the same name, but with .txt extension, and put to file: object number and object coordinates on this image, for each object in new line: Where: integer object number from 0 to (classes 1) float values relative to width and height of image, it can be equal from (0.0 to 1.0 for example: / or / atention: are center of rectangle (are not top left corner) For example for img1.jpg you will be created img1.txt containing: 1 0.716797 0.395833 0.216406 0.147222 0 0.687109 0.379167 0.255469 0.158333 1 0.420312 0.395833 0.140625 0.166667 6. Create file train.txt in directory build\darknet\x64\data\ , with filenames of your images, each filename in new line, with path relative to darknet.exe , for example containing: data/obj/img1.jpg data/obj/img2.jpg data/obj/img3.jpg 7. Download pre trained weights for the convolutional layers (154 MB): and put to the directory build\darknet\x64 8. Start training by using the command line: darknet.exe detector train data/obj.data yolo obj.cfg darknet53.conv.74 (file yolo obj_xxx.weights will be saved to the build\darknet\x64\backup\ for each 100 iterations) (To disable Loss Window use darknet.exe detector train data/obj.data yolo obj.cfg darknet53.conv.74 dont_show , if you train on computer without monitor like a cloud Amazaon EC2) 9. After training is complete get result yolo obj_final.weights from path build\darknet\x64\backup\ After each 100 iterations you can stop and later start training from this point. For example, after 2000 iterations you can stop training, and later just copy yolo obj_2000.weights from build\darknet\x64\backup\ to build\darknet\x64\ and start training using: darknet.exe detector train data/obj.data yolo obj.cfg yolo obj_2000.weights (in the original repository the weights file is saved only once every 10 000 iterations if(iterations > 1000) ) Also you can get result earlier than all 45000 iterations. Note: If during training you see nan values for avg (loss) field then training goes wrong, but if nan is in some other lines then training goes well. Note: If you changed width or height in your cfg file, then new width and height must be divisible by 32. Note: After training use such command for detection: darknet.exe detector test data/obj.data yolo obj.cfg yolo obj_8000.weights Note: if error Out of memory occurs then in .cfg file you should increase subdivisions 16 , 32 or 64: link How to train tiny yolo (to detect your custom objects): Do all the same steps as for the full yolo model as described above. With the exception of: Download default weights file for yolov3 tiny: Get pre trained weights yolov3 tiny.conv.15 using command: darknet.exe partial cfg/yolov3 tiny.cfg yolov3 tiny.weights yolov3 tiny.conv.15 15 Make your custom model yolov3 tiny obj.cfg based on cfg/yolov3 tiny_obj.cfg instead of yolov3.cfg Start training: darknet.exe detector train data/obj.data yolov3 tiny obj.cfg yolov3 tiny.conv.15 For training Yolo based on other models ( DenseNet201 Yolo or ResNet50 Yolo ), you can download and get pre trained weights as showed in this file: If you made you custom model that isn't based on other models, then you can train it without pre trained weights, then will be used random initial weights. When should I stop training: Usually sufficient 2000 iterations for each class(object), but not less than 4000 iterations in total. But for a more precise definition when you should stop training, use the following manual: 1. During training, you will see varying indicators of error, and you should stop when no longer decreases 0.XXXXXXX avg : > Region Avg IOU: 0.798363, Class: 0.893232, Obj: 0.700808, No Obj: 0.004567, Avg Recall: 1.000000, count: 8 > Region Avg IOU: 0.800677, Class: 0.892181, Obj: 0.701590, No Obj: 0.004574, Avg Recall: 1.000000, count: 8 > > 9002 : 0.211667, 0.060730 avg , 0.001000 rate, 3.868000 seconds, 576128 images > Loaded: 0.000000 seconds 9002 iteration number (number of batch) 0.060730 avg average loss (error) the lower, the better When you see that average loss 0.xxxxxx avg no longer decreases at many iterations then you should stop training. 2. Once training is stopped, you should take some of last .weights files from darknet\build\darknet\x64\backup and choose the best of them: For example, you stopped training after 9000 iterations, but the best result can give one of previous weights (7000, 8000, 9000). It can happen due to overfitting. Overfitting is case when you can detect objects on images from training dataset, but can't detect objects on any others images. You should get weights from Early Stopping Point : ! Overfitting To get weights from Early Stopping Point: 2.1. At first, in your file obj.data you must specify the path to the validation dataset valid valid.txt (format of valid.txt as in train.txt ), and if you haven't validation images, just copy data\train.txt to data\valid.txt . 2.2 If training is stopped after 9000 iterations, to validate some of previous weights use this commands: (If you use another GitHub repository, then use darknet.exe detector recall ... instead of darknet.exe detector map ...) darknet.exe detector map data/obj.data yolo obj.cfg backup\yolo obj_7000.weights darknet.exe detector map data/obj.data yolo obj.cfg backup\yolo obj_8000.weights darknet.exe detector map data/obj.data yolo obj.cfg backup\yolo obj_9000.weights And comapre last output lines for each weights (7000, 8000, 9000): Choose weights file with the highest IoU (intersect of union) and mAP (mean average precision) For example, bigger IOU gives weights yolo obj_8000.weights then use this weights for detection . Example of custom object detection: darknet.exe detector test data/obj.data yolo obj.cfg yolo obj_8000.weights IoU (intersect of union) average instersect of union of objects and detections for a certain threshold 0.24 mAP (mean average precision) mean value of average precisions for each class, where average precision is average value of 11 points on PR curve for each possible threshold (each probability of detection) for the same class (Precision Recall in terms of PascalVOC, where Precision TP/(TP+FP) and Recall TP/(TP+FN) ), page 11: mAP is default metric of precision in the PascalVOC competition, this is the same as AP50 metric in the MS COCO competition. In terms of Wiki, indicators Precision and Recall have a slightly different meaning than in the PascalVOC competition, but IoU always has the same meaning . ! precision_recall_iou How to calculate mAP on PascalVOC 2007: 1. To calculate mAP (mean average precision) on PascalVOC 2007 test: Download PascalVOC dataset, install Python 3.x and get file 2007_test.txt as described here: Then download file to the dir build\darknet\x64\data\ then run voc_label_difficult.py to get the file difficult_2007_test.txt Remove symbol from this line to un comment it: Then there are 2 ways to get mAP: 1. Using Darknet + Python: run the file build/darknet/x64/calc_mAP_voc_py.cmd you will get mAP for yolo voc.cfg model, mAP 75.9% 2. Using this fork of Darknet: run the file build/darknet/x64/calc_mAP.cmd you will get mAP for yolo voc.cfg model, mAP 75.8% (The article specifies the value of mAP 76.8% for YOLOv2 416×416, page 4 table 3: We get values lower perhaps due to the fact that the model was trained on a slightly different source code than the code on which the detection is was done) if you want to get mAP for tiny yolo voc.cfg model, then un comment line for tiny yolo voc.cfg and comment line for yolo voc.cfg in the .cmd file if you have Python 2.x instead of Python 3.x, and if you use Darknet+Python way to get mAP, then in your cmd file use reval_voc.py and voc_eval.py instead of reval_voc_py3.py and voc_eval_py3.py from this directory: Custom object detection: Example of custom object detection: darknet.exe detector test data/obj.data yolo obj.cfg yolo obj_8000.weights ! Yolo_v2_training ! Yolo_v2_training How to improve object detection: 1. Before training: set flag random 1 in your .cfg file it will increase precision by training Yolo for different resolutions: link increase network resolution in your .cfg file ( height 608 , width 608 or any value multiple of 32) it will increase precision recalculate anchors for your dataset for width and height from cfg file: darknet.exe detector calc_anchors data/obj.data num_of_clusters 9 width 416 height 416 then set the same 9 anchors in each of 3 yolo layers in your cfg file check that each object are mandatory labeled in your dataset no one object in your data set should not be without label. In the most training issues there are wrong labels in your dataset (got labels by using some conversion script, marked with a third party tool, ...). Always check your dataset by using: desirable that your training dataset include images with objects at diffrent: scales, rotations, lightings, from different sides, on different backgrounds you should preferably have 2000 different images for each class or more, and you should train 2000 classes iterations or more desirable that your training dataset include images with non labeled objects that you do not want to detect negative samples without bounded box (empty .txt files) use as many images of negative samples as there are images with objects for training with a large number of objects in each image, add the parameter max 200 or higher value in the last yolo layer or region layer in your cfg file (the global maximum number of objects that can be detected by YoloV3 is 0,0615234375 (width height) where are width and height are parameters from net section in cfg file) for training for small objects set layers 1, 11 instead of and set stride 4 instead of If you train the model to distinguish Left and Right objects as separate classes (left/right hand, left/right turn on road signs, ...) then for disabling flip data augmentation add flip 0 here: General rule your training dataset should include such a set of relative sizes of objects that you want to detect: train_network_width train_obj_width / train_image_width detection_network_width detection_obj_width / detection_image_width train_network_height train_obj_height / train_image_height detection_network_height detection_obj_height / detection_image_height to speedup training (with decreasing detection accuracy) do Fine Tuning instead of Transfer Learning, set param stopbackward 1 here: then do this command: ./darknet partial cfg/yolov3.cfg yolov3.weights yolov3.conv.81 81 will be created file yolov3.conv.81 , then train by using weights file yolov3.conv.81 instead of darknet53.conv.74 2. After training for detection: Increase network resolution by set in your .cfg file ( height 608 and width 608 ) or ( height 832 and width 832 ) or (any value multiple of 32) this increases the precision and makes it possible to detect small objects: link it is not necessary to train the network again, just use .weights file already trained for 416x416 resolution but to get even greater accuracy you should train with higher resolution 608x608 or 832x832, note: if error Out of memory occurs then in .cfg file you should increase subdivisions 16 , 32 or 64: link How to mark bounded boxes of objects and create annotation files: Here you can find repository with GUI software for marking bounded boxes of objects and generating annotation files for Yolo v2 & v3: With example of: train.txt , obj.names , obj.data , yolo obj.cfg , air 1 6 .txt , bird 1 4 .txt for 2 classes of objects (air, bird) and train_obj.cmd with example how to train this image set with Yolo v2 & v3 Using Yolo9000 Simultaneous detection and classification of 9000 objects: yolo9000.weights (186 MB Yolo9000 Model) requires 4 GB GPU RAM: yolo9000.cfg cfg file of the Yolo9000, also there are paths to the 9k.tree and coco9k.map 9k.tree WordTree of 9418 categories , if parent_id 1 then this label hasn't parent: coco9k.map map 80 categories from MSCOCO to WordTree 9k.tree : combine9k.data data file, there are paths to: 9k.labels , 9k.names , inet9k.map , (change path to your combine9k.train.list ): 9k.labels 9418 labels of objects: 9k.names 9418 names of objects: inet9k.map map 200 categories from ImageNet to WordTree 9k.tree : How to use Yolo as DLL 1. To compile Yolo as C++ DLL file yolo_cpp_dll.dll open in MSVS2015 file build\darknet\yolo_cpp_dll.sln , set x64 and Release , and do the: Build > Build yolo_cpp_dll You should have installed CUDA 9.1 To use cuDNN do: (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions, and add at the beginning of line: CUDNN; 2. To use Yolo as DLL file in your C++ console application open in MSVS2015 file build\darknet\yolo_console_dll.sln , set x64 and Release , and do the: Build > Build yolo_console_dll you can run your console application from Windows Explorer build\darknet\x64\yolo_console_dll.exe use this command : yolo_console_dll.exe data/coco.names yolov3.cfg yolov3.weights test.mp4 or you can run from MSVS2015 (before this you should copy 2 files yolo voc.cfg and yolo voc.weights to the directory build\darknet\ ) after launching your console application and entering the image file name you will see info for each object: to use simple OpenCV GUI you should uncomment line // define OPENCV in yolo_console_dll.cpp file: link you can see source code of simple example for detection on the video file: link yolo_cpp_dll.dll API: link class Detector { public: Detector(std::string cfg_filename, std::string weight_filename, int gpu_id 0); Detector(); std::vector detect(std::string image_filename, float thresh 0.2, bool use_mean false); std::vector detect(image_t img, float thresh 0.2, bool use_mean false); static image_t load_image(std::string image_filename); static void free_image(image_t m); ifdef OPENCV std::vector detect(cv::Mat mat, float thresh 0.2, bool use_mean false); endif };",Object Detection,Object Detection 2850,Computer Vision,Computer Vision,Computer Vision,"Yolo v3 and Yolo v2 for Windows and Linux (neural network for object detection) Tensor Cores can be used on Linux and Windows CircleCI Requirements ( requirements) Pre trained models ( pre trained models) Explanations in issues 0. Improvements in this repository ( improvements in this repository) 1. How to use ( how to use) 2. How to compile on Linux ( how to compile on linux) 3. How to compile on Windows Using vcpkg ( how to compile on windows using vcpkg) Legacy way ( how to compile on windows legacy way) 4. How to train (Pascal VOC Data) ( how to train pascal voc data) How to train with multi GPU: ( how to train with multi gpu) 5. How to train (to detect your custom objects) ( how to train to detect your custom objects) How to train tiny yolo (to detect your custom objects) ( how to train tiny yolo to detect your custom objects) 6. When should I stop training ( when should i stop training) 7. How to calculate mAP on PascalVOC 2007 ( how to calculate map on pascalvoc 2007) 8. How to improve object detection ( how to improve object detection) 9. How to mark bounded boxes of objects and create annotation files ( how to mark bounded boxes of objects and create annotation files) 10. How to use Yolo as DLL and SO libraries ( how to use yolo as dll and so libraries) ! Darknet Logo ! map_time mAP@0.5 (AP50) YOLOv3 spp better than YOLOv3 mAP 60.6%, FPS 20: Yolo v3 source chart for the RetinaNet on MS COCO got from Table 1 (e): Yolo v2 on Pascal VOC 2007: Yolo v2 on Pascal VOC 2012 (comp4): You Only Look Once: Unified, Real Time Object Detection (versions 2 & 3) A Yolo cross platform Windows and Linux version (for object detection). Contributors: This repository is forked from Linux version: More details: This repository supports: both Windows and Linux both OpenCV 2.x.x and OpenCV v7 CUDA > 7.5 also create SO library on Linux and DLL library on Windows Requirements CMake > 3.8 for modern CUDA support: CUDA 10.0 : (on Linux do Post installation Actions ) OpenCV 7.0 for CUDA 10.0 (set system variable CUDNN C:\cudnn where did you unpack cuDNN. On Linux in .bashrc file, on Windows see the image ) GPU with CC > 3.0 : on Linux GCC or Clang , on Windows MSVS 2017 (v15) Pre trained models There are weights file for different cfg files (smaller size > faster speed & lower accuracy: yolov3 openimages.cfg (247 MB COCO Yolo v3 ) requires 4 GB GPU RAM: yolov3 spp.cfg (240 MB COCO Yolo v3 ) requires 4 GB GPU RAM: yolov3.cfg (236 MB COCO Yolo v3 ) requires 4 GB GPU RAM: yolov3 tiny.cfg (34 MB COCO Yolo v3 tiny ) requires 1 GB GPU RAM: yolov2.cfg (194 MB COCO Yolo v2) requires 4 GB GPU RAM: yolo voc.cfg (194 MB VOC Yolo v2) requires 4 GB GPU RAM: yolov2 tiny.cfg (43 MB COCO Yolo v2) requires 1 GB GPU RAM: yolov2 tiny voc.cfg (60 MB VOC Yolo v2) requires 1 GB GPU RAM: yolo9000.cfg (186 MB Yolo9000 model) requires 4 GB GPU RAM: Put it near compiled: darknet.exe You can get cfg files by path: darknet/cfg/ Examples of results Everything Is AWESOME Others: Improvements in this repository added support for Windows improved binary neural network performance 2x 4x times for Detection on CPU and GPU if you trained your own weights by using this XNOR net model (bit 1 inference) : improved neural network performance 7% by fusing 2 layers into 1: Convolutional + Batch norm improved neural network performance Detection 3x times , Training 2 x times on GPU Volta (Tesla V100, Titan V, ...) using Tensor Cores if CUDNN_HALF defined in the Makefile or darknet.sln improved performance 1.2x times on FullHD, 2x times on 4K, for detection on the video (file/stream) using darknet detector demo ... improved performance 3.5 X times of data augmentation for training (using OpenCV SSE/AVX functions instead of hand written functions) removes bottleneck for training on multi GPU or GPU Volta improved performance of detection and training on Intel CPU with AVX (Yolo v3 85% , Yolo v2 10%) fixed usage of reorg layer optimized memory allocation during network resizing when random 1 optimized initialization GPU for detection we use batch 1 initially instead of re init with batch 1 added correct calculation of mAP, F1, IoU, Precision Recall using command darknet detector map ... added drawing of chart of average Loss and accuracy mAP ( map flag) during training run ./darknet detector demo ... json_port 8070 mjpeg_port 8090 as JSON and MJPEG server to get results online over the network by using your soft or Web browser added calculation of anchors for training added example of Detection and Tracking objects: fixed code for use Web cam on OpenCV 3.x run time tips and warnings if you use incorrect cfg file or dataset many other fixes of code... And added manual How to train Yolo v3/v2 (to detect your custom objects) ( how to train to detect your custom objects) Also, you might be interested in using a simplified repository where is implemented INT8 quantization (+30% speedup and 1% mAP reduced): How to use on the command line On Linux use ./darknet instead of darknet.exe , like this: ./darknet detector test ./cfg/coco.data ./cfg/yolov3.cfg ./yolov3.weights On Linux find executable file ./darknet in the root directory, while on Windows find it in the directory \build\darknet\x64 Yolo v3 COCO image : darknet.exe detector test cfg/coco.data cfg/yolov3.cfg yolov3.weights i 0 thresh 0.25 Output coordinates of objects: darknet.exe detector test cfg/coco.data yolov3.cfg yolov3.weights ext_output dog.jpg Yolo v3 COCO video : darknet.exe detector demo cfg/coco.data cfg/yolov3.cfg yolov3.weights ext_output test.mp4 Yolo v3 COCO WebCam 0 : darknet.exe detector demo cfg/coco.data cfg/yolov3.cfg yolov3.weights c 0 Yolo v3 COCO for net videocam Smart WebCam: darknet.exe detector demo cfg/coco.data cfg/yolov3.cfg yolov3.weights Yolo v3 save result videofile res.avi : darknet.exe detector demo cfg/coco.data cfg/yolov3.cfg yolov3.weights thresh 0.25 test.mp4 out_filename res.avi Yolo v3 Tiny COCO video: darknet.exe detector demo cfg/coco.data cfg/yolov3 tiny.cfg yolov3 tiny.weights test.mp4 JSON and MJPEG server that allows multiple connections from your soft or Web browser ip address:8070 and 8090: ./darknet detector demo ./cfg/coco.data ./cfg/yolov3.cfg ./yolov3.weights test50.mp4 json_port 8070 mjpeg_port 8090 ext_output Yolo v3 Tiny on GPU 0 : darknet.exe detector demo cfg/coco.data cfg/yolov3 tiny.cfg yolov3 tiny.weights i 0 test.mp4 Alternative method Yolo v3 COCO image: darknet.exe detect cfg/yolov3.cfg yolov3.weights i 0 thresh 0.25 Train on Amazon EC2 , to see mAP & Loss chart using URL like: in the Chrome/Firefox: ./darknet detector train cfg/coco.data yolov3.cfg darknet53.conv.74 dont_show mjpeg_port 8090 map 186 MB Yolo9000 image: darknet.exe detector test cfg/combine9k.data yolo9000.cfg yolo9000.weights Remeber to put data/9k.tree and data/coco9k.map under the same folder of your app if you use the cpp api to build an app To process a list of images data/train.txt and save results of detection to result.txt use: darknet.exe detector test cfg/coco.data yolov3.cfg yolov3.weights dont_show ext_output result.txt Pseudo lableing to process a list of images data/new_train.txt and save results of detection in Yolo training format for each image as label .txt (in this way you can increase the amount of training data) use: darknet.exe detector test cfg/coco.data yolov3.cfg yolov3.weights thresh 0.25 dont_show save_labels cd $env:VCPKG_ROOT PS Code\vcpkg> .\vcpkg install pthreads opencv replace with opencv cuda in case you want to use cuda accelerated openCV 8. necessary only with CUDA Customize the CMakeLists.txt with the preferred compute capability 9. Build with the Powershell script build.ps1 or use the Open Folder functionality of Visual Studio 2017. In the first option, if you want to use Visual Studio, you will find a custom solution created for you by CMake after the build containing all the appropriate config flags for your system. How to compile on Windows (legacy way) 1. If you have MSVS 2015, CUDA 10.0, cuDNN 7.4 and OpenCV 3.x (with paths: C:\opencv_3.0\opencv\build\include & C:\opencv_3.0\opencv\build\x64\vc14\lib ), then start MSVS, open build\darknet\darknet.sln , set x64 and Release and do the: Build > Build darknet. Also add Windows system variable CUDNN with path to CUDNN: NOTE: If installing OpenCV, use OpenCV 3.4.0 or earlier. This is a bug in OpenCV 3.4.1 in the C API (see 500 ). 1.1. Find files opencv_world320.dll and opencv_ffmpeg320_64.dll (or opencv_world340.dll and opencv_ffmpeg340_64.dll ) in C:\opencv_3.0\opencv\build\x64\vc14\bin and put it near with darknet.exe 1.2 Check that there are bin and include folders in the C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0 if aren't, then copy them to this folder from the path where is CUDA installed 1.3. To install CUDNN (speedup neural network), do the following: download and install cuDNN v7.4.1 for CUDA 10.0 : add Windows system variable CUDNN with path to CUDNN: copy file cudnn64_7.dll to the folder \build\darknet\x64 near with darknet.exe 1.4. If you want to build without CUDNN then: open \darknet.sln > (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions, and remove this: CUDNN; 2. If you have other version of CUDA (not 10.0) then open build\darknet\darknet.vcxproj by using Notepad, find 2 places with CUDA 10.0 and change it to your CUDA version. Then open \darknet.sln > (right click on project) > properties > CUDA C/C++ > Device and remove there ;compute_75,sm_75 . Then do step 1 3. If you don't have GPU , but have MSVS 2015 and OpenCV 3.0 (with paths: C:\opencv_3.0\opencv\build\include & C:\opencv_3.0\opencv\build\x64\vc14\lib ), then start MSVS, open build\darknet\darknet_no_gpu.sln , set x64 and Release , and do the: Build > Build darknet_no_gpu 4. If you have OpenCV 2.4.13 instead of 3.0 then you should change pathes after \darknet.sln is opened 4.1 (right click on project) > properties > C/C++ > General > Additional Include Directories: C:\opencv_2.4.13\opencv\build\include 4.2 (right click on project) > properties > Linker > General > Additional Library Directories: C:\opencv_2.4.13\opencv\build\x64\vc14\lib 5. If you have GPU with Tensor Cores (nVidia Titan V / Tesla V100 / DGX 2 and later) speedup Detection 3x, Training 2x: \darknet.sln > (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions, and add here: CUDNN_HALF; Note: CUDA must be installed only after that MSVS2015 had been installed. How to compile (custom): Also, you can to create your own darknet.sln & darknet.vcxproj , this example for CUDA 9.1 and OpenCV 3.0 Then add to your created project: (right click on project) > properties > C/C++ > General > Additional Include Directories, put here: C:\opencv_3.0\opencv\build\include;..\..\3rdparty\include;%(AdditionalIncludeDirectories);$(CudaToolkitIncludeDir);$(CUDNN)\include (right click on project) > Build dependecies > Build Customizations > set check on CUDA 9.1 or what version you have for example as here: add to project: all .c files all .cu files file from \src directory file darknet.h from \include directory (right click on project) > properties > Linker > General > Additional Library Directories, put here: C:\opencv_3.0\opencv\build\x64\vc14\lib;$(CUDA_PATH)lib\$(PlatformName);$(CUDNN)\lib\x64;%(AdditionalLibraryDirectories) (right click on project) > properties > Linker > Input > Additional dependecies, put here: ..\..\3rdparty\lib\x64\pthreadVC2.lib;cublas.lib;curand.lib;cudart.lib;cudnn.lib;%(AdditionalDependencies) (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions OPENCV;_TIMESPEC_DEFINED;_CRT_SECURE_NO_WARNINGS;_CRT_RAND_S;WIN32;NDEBUG;_CONSOLE;_LIB;%(PreprocessorDefinitions) compile to .exe (X64 & Release) and put .dll s near with .exe: pthreadVC2.dll, pthreadGC2.dll from \3rdparty\dll\x64 cusolver64_91.dll, curand64_91.dll, cudart64_91.dll, cublas64_91.dll 91 for CUDA 9.1 or your version, from C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1\bin For OpenCV 3.2: opencv_world320.dll and opencv_ffmpeg320_64.dll from C:\opencv_3.0\opencv\build\x64\vc14\bin For OpenCV 2.4.13: opencv_core2413.dll , opencv_highgui2413.dll and opencv_ffmpeg2413_64.dll from C:\opencv_2.4.13\opencv\build\x64\vc14\bin How to train (Pascal VOC Data): 1. Download pre trained weights for the convolutional layers (154 MB): and put to the directory build\darknet\x64 2. Download The Pascal VOC Data and unpack it to directory build\darknet\x64\data\voc will be created dir build\darknet\x64\data\voc\VOCdevkit\ : 2.1 Download file voc_label.py to dir build\darknet\x64\data\voc : 3. Download and install Python for Windows: 4. Run command: python build\darknet\x64\data\voc\voc_label.py (to generate files: 2007_test.txt, 2007_train.txt, 2007_val.txt, 2012_train.txt, 2012_val.txt) 5. Run command: type 2007_train.txt 2007_val.txt 2012_ .txt > train.txt 6. Set batch 64 and subdivisions 8 in the file yolov3 voc.cfg : link 7. Start training by using train_voc.cmd or by using the command line: darknet.exe detector train cfg/voc.data cfg/yolov3 voc.cfg darknet53.conv.74 ( Note: To disable Loss Window use flag dont_show . If you are using CPU, try darknet_no_gpu.exe instead of darknet.exe .) If required change pathes in the file build\darknet\x64\data\voc.data More information about training by the link: Note: If during training you see nan values for avg (loss) field then training goes wrong, but if nan is in some other lines then training goes well. How to train with multi GPU: 1. Train it first on 1 GPU for like 1000 iterations: darknet.exe detector train cfg/voc.data cfg/yolov3 voc.cfg darknet53.conv.74 2. Then stop and by using partially trained model /backup/yolov3 voc_1000.weights run training with multigpu (up to 4 GPUs): darknet.exe detector train cfg/voc.data cfg/yolov3 voc.cfg /backup/yolov3 voc_1000.weights gpus 0,1,2,3 Only for small datasets sometimes better to decrease learning rate, for 4 GPUs set learning_rate 0.00025 (i.e. learning_rate 0.001 / GPUs). In this case also increase 4x times burn_in and max_batches in your cfg file. I.e. use burn_in 4000 instead of 1000 . How to train (to detect your custom objects): (to train old Yolo v2 yolov2 voc.cfg , yolov2 tiny voc.cfg , yolo voc.cfg , yolo voc.2.0.cfg , ... click by the link ) Training Yolo v3: 1. Create file yolo obj.cfg with the same content as in yolov3.cfg (or copy yolov3.cfg to yolo obj.cfg) and: change line batch to batch 64 change line subdivisions to subdivisions 8 change line classes 80 to your number of objects in each of 3 yolo layers: change filters 255 to filters (classes + 5)x3 in the 3 convolutional before each yolo layer So if classes 1 then should be filters 18 . If classes 2 then write filters 21 . (Do not write in the cfg file: filters (classes + 5)x3) (Generally filters depends on the classes , coords and number of mask s, i.e. filters (classes + coords + 1) , where mask is indices of anchors. If mask is absence, then filters (classes + coords + 1) num ) So for example, for 2 objects, your file yolo obj.cfg should differ from yolov3.cfg in such lines in each of 3 yolo layers: convolutional filters 21 region classes 2 2. Create file obj.names in the directory build\darknet\x64\data\ , with objects names each in new line 3. Create file obj.data in the directory build\darknet\x64\data\ , containing (where classes number of objects ): classes 2 train data/train.txt valid data/test.txt names data/obj.names backup backup/ 4. Put image files (.jpg) of your objects in the directory build\darknet\x64\data\obj\ 5. You should label each object on images from your dataset. Use this visual GUI software for marking bounded boxes of objects and generating annotation files for Yolo v2 & v3: It will create .txt file for each .jpg image file in the same directory and with the same name, but with .txt extension, and put to file: object number and object coordinates on this image, for each object in new line: Where: integer object number from 0 to (classes 1) float values relative to width and height of image, it can be equal from (0.0 to 1.0 for example: / or / atention: are center of rectangle (are not top left corner) For example for img1.jpg you will be created img1.txt containing: 1 0.716797 0.395833 0.216406 0.147222 0 0.687109 0.379167 0.255469 0.158333 1 0.420312 0.395833 0.140625 0.166667 6. Create file train.txt in directory build\darknet\x64\data\ , with filenames of your images, each filename in new line, with path relative to darknet.exe , for example containing: data/obj/img1.jpg data/obj/img2.jpg data/obj/img3.jpg 7. Download pre trained weights for the convolutional layers (154 MB): and put to the directory build\darknet\x64 8. Start training by using the command line: darknet.exe detector train data/obj.data yolo obj.cfg darknet53.conv.74 To train on Linux use command: ./darknet detector train data/obj.data yolo obj.cfg darknet53.conv.74 (just use ./darknet instead of darknet.exe ) (file yolo obj_last.weights will be saved to the build\darknet\x64\backup\ for each 100 iterations) (file yolo obj_xxxx.weights will be saved to the build\darknet\x64\backup\ for each 1000 iterations) (to disable Loss Window use darknet.exe detector train data/obj.data yolo obj.cfg darknet53.conv.74 dont_show , if you train on computer without monitor like a cloud Amazon EC2) (to see the mAP & Loss chart during training on remote server without GUI, use command darknet.exe detector train data/obj.data yolo obj.cfg darknet53.conv.74 dont_show mjpeg_port 8090 map then open URL in Chrome/Firefox browser) 8.1. For training with mAP (mean average precisions) calculation for each 4 Epochs (set valid valid.txt or train.txt in obj.data file) and run: darknet.exe detector train data/obj.data yolo obj.cfg darknet53.conv.74 map 9. After training is complete get result yolo obj_final.weights from path build\darknet\x64\backup\ After each 100 iterations you can stop and later start training from this point. For example, after 2000 iterations you can stop training, and later just copy yolo obj_2000.weights from build\darknet\x64\backup\ to build\darknet\x64\ and start training using: darknet.exe detector train data/obj.data yolo obj.cfg yolo obj_2000.weights (in the original repository the weights file is saved only once every 10 000 iterations if(iterations > 1000) ) Also you can get result earlier than all 45000 iterations. Note: If during training you see nan values for avg (loss) field then training goes wrong, but if nan is in some other lines then training goes well. Note: If you changed width or height in your cfg file, then new width and height must be divisible by 32. Note: After training use such command for detection: darknet.exe detector test data/obj.data yolo obj.cfg yolo obj_8000.weights Note: if error Out of memory occurs then in .cfg file you should increase subdivisions 16 , 32 or 64: link How to train tiny yolo (to detect your custom objects): Do all the same steps as for the full yolo model as described above. With the exception of: Download default weights file for yolov3 tiny: Get pre trained weights yolov3 tiny.conv.15 using command: darknet.exe partial cfg/yolov3 tiny.cfg yolov3 tiny.weights yolov3 tiny.conv.15 15 Make your custom model yolov3 tiny obj.cfg based on cfg/yolov3 tiny_obj.cfg instead of yolov3.cfg Start training: darknet.exe detector train data/obj.data yolov3 tiny obj.cfg yolov3 tiny.conv.15 For training Yolo based on other models ( DenseNet201 Yolo or ResNet50 Yolo ), you can download and get pre trained weights as showed in this file: If you made you custom model that isn't based on other models, then you can train it without pre trained weights, then will be used random initial weights. When should I stop training: Usually sufficient 2000 iterations for each class(object), but not less than 4000 iterations in total. But for a more precise definition when you should stop training, use the following manual: 1. During training, you will see varying indicators of error, and you should stop when no longer decreases 0.XXXXXXX avg : > Region Avg IOU: 0.798363, Class: 0.893232, Obj: 0.700808, No Obj: 0.004567, Avg Recall: 1.000000, count: 8 > Region Avg IOU: 0.800677, Class: 0.892181, Obj: 0.701590, No Obj: 0.004574, Avg Recall: 1.000000, count: 8 > > 9002 : 0.211667, 0.060730 avg , 0.001000 rate, 3.868000 seconds, 576128 images > Loaded: 0.000000 seconds 9002 iteration number (number of batch) 0.060730 avg average loss (error) the lower, the better When you see that average loss 0.xxxxxx avg no longer decreases at many iterations then you should stop training. 2. Once training is stopped, you should take some of last .weights files from darknet\build\darknet\x64\backup and choose the best of them: For example, you stopped training after 9000 iterations, but the best result can give one of previous weights (7000, 8000, 9000). It can happen due to overfitting. Overfitting is case when you can detect objects on images from training dataset, but can't detect objects on any others images. You should get weights from Early Stopping Point : ! Overfitting To get weights from Early Stopping Point: 2.1. At first, in your file obj.data you must specify the path to the validation dataset valid valid.txt (format of valid.txt as in train.txt ), and if you haven't validation images, just copy data\train.txt to data\valid.txt . 2.2 If training is stopped after 9000 iterations, to validate some of previous weights use this commands: (If you use another GitHub repository, then use darknet.exe detector recall ... instead of darknet.exe detector map ...) darknet.exe detector map data/obj.data yolo obj.cfg backup\yolo obj_7000.weights darknet.exe detector map data/obj.data yolo obj.cfg backup\yolo obj_8000.weights darknet.exe detector map data/obj.data yolo obj.cfg backup\yolo obj_9000.weights And comapre last output lines for each weights (7000, 8000, 9000): Choose weights file with the highest mAP (mean average precision) or IoU (intersect over union) For example, bigger mAP gives weights yolo obj_8000.weights then use this weights for detection . Or just train with map flag: darknet.exe detector train data/obj.data yolo obj.cfg darknet53.conv.74 map So you will see mAP chart (red line) in the Loss chart Window. mAP will be calculated for each 4 Epochs using valid valid.txt file that is specified in obj.data file ( 1 Epoch images_in_train_txt / batch iterations) ! loss_chart_map_chart Example of custom object detection: darknet.exe detector test data/obj.data yolo obj.cfg yolo obj_8000.weights IoU (intersect over union) average instersect over union of objects and detections for a certain threshold 0.24 mAP (mean average precision) mean value of average precisions for each class, where average precision is average value of 11 points on PR curve for each possible threshold (each probability of detection) for the same class (Precision Recall in terms of PascalVOC, where Precision TP/(TP+FP) and Recall TP/(TP+FN) ), page 11: mAP is default metric of precision in the PascalVOC competition, this is the same as AP50 metric in the MS COCO competition. In terms of Wiki, indicators Precision and Recall have a slightly different meaning than in the PascalVOC competition, but IoU always has the same meaning . ! precision_recall_iou How to calculate mAP on PascalVOC 2007: 1. To calculate mAP (mean average precision) on PascalVOC 2007 test: Download PascalVOC dataset, install Python 3.x and get file 2007_test.txt as described here: Then download file to the dir build\darknet\x64\data\ then run voc_label_difficult.py to get the file difficult_2007_test.txt Remove symbol from this line to un comment it: Then there are 2 ways to get mAP: 1. Using Darknet + Python: run the file build/darknet/x64/calc_mAP_voc_py.cmd you will get mAP for yolo voc.cfg model, mAP 75.9% 2. Using this fork of Darknet: run the file build/darknet/x64/calc_mAP.cmd you will get mAP for yolo voc.cfg model, mAP 75.8% (The article specifies the value of mAP 76.8% for YOLOv2 416×416, page 4 table 3: We get values lower perhaps due to the fact that the model was trained on a slightly different source code than the code on which the detection is was done) if you want to get mAP for tiny yolo voc.cfg model, then un comment line for tiny yolo voc.cfg and comment line for yolo voc.cfg in the .cmd file if you have Python 2.x instead of Python 3.x, and if you use Darknet+Python way to get mAP, then in your cmd file use reval_voc.py and voc_eval.py instead of reval_voc_py3.py and voc_eval_py3.py from this directory: Custom object detection: Example of custom object detection: darknet.exe detector test data/obj.data yolo obj.cfg yolo obj_8000.weights ! Yolo_v2_training ! Yolo_v2_training How to improve object detection: 1. Before training: set flag random 1 in your .cfg file it will increase precision by training Yolo for different resolutions: link increase network resolution in your .cfg file ( height 608 , width 608 or any value multiple of 32) it will increase precision check that each object are mandatory labeled in your dataset no one object in your data set should not be without label. In the most training issues there are wrong labels in your dataset (got labels by using some conversion script, marked with a third party tool, ...). Always check your dataset by using: for each object which you want to detect there must be at least 1 similar object in the Training dataset with about the same: shape, side of object, relative size, angle of rotation, tilt, illumination. So desirable that your training dataset include images with objects at diffrent: scales, rotations, lightings, from different sides, on different backgrounds you should preferably have 2000 different images for each class or more, and you should train 2000 classes iterations or more desirable that your training dataset include images with non labeled objects that you do not want to detect negative samples without bounded box (empty .txt files) use as many images of negative samples as there are images with objects for training with a large number of objects in each image, add the parameter max 200 or higher value in the last yolo layer or region layer in your cfg file (the global maximum number of objects that can be detected by YoloV3 is 0,0615234375 (width height) where are width and height are parameters from net section in cfg file) for training for small objects (smaller than 16x16 after the image is resized to 416x416) set layers 1, 11 instead of and set stride 4 instead of for training for both small and large objects use modified models: Full model: 5 yolo layers: Tiny model: 3 yolo layers: Spatial full model: 3 yolo layers: If you train the model to distinguish Left and Right objects as separate classes (left/right hand, left/right turn on road signs, ...) then for disabling flip data augmentation add flip 0 here: General rule your training dataset should include such a set of relative sizes of objects that you want to detect: train_network_width train_obj_width / train_image_width detection_network_width detection_obj_width / detection_image_width train_network_height train_obj_height / train_image_height detection_network_height detection_obj_height / detection_image_height I.e. for each object from Test dataset there must be at least 1 object in the Training dataset with the same class_id and about the same relative size: object width in percent from Training dataset object width in percent from Test dataset That is, if only objects that occupied 80 90% of the image were present in the training set, then the trained network will not be able to detect objects that occupy 1 10% of the image. to speedup training (with decreasing detection accuracy) do Fine Tuning instead of Transfer Learning, set param stopbackward 1 here: then do this command: ./darknet partial cfg/yolov3.cfg yolov3.weights yolov3.conv.81 81 will be created file yolov3.conv.81 , then train by using weights file yolov3.conv.81 instead of darknet53.conv.74 The more different objects you want to detect, the more complex network model should be used. But each: model of object, side, illimination, scale, each 30 grad of the turn and inclination angles these are different objects from a neural network perspective. recalculate anchors for your dataset for width and height from cfg file: darknet.exe detector calc_anchors data/obj.data num_of_clusters 9 width 416 height 416 then set the same 9 anchors in each of 3 yolo layers in your cfg file. But you should change indexes of anchors masks for each yolo layer, so that 1st yolo layer has anchors larger than 60x60, 2nd larger than 30x30, 3rd remaining. If many of the calculated anchors do not fit under the appropriate layers then just try using all the default anchors. 2. After training for detection: Increase network resolution by set in your .cfg file ( height 608 and width 608 ) or ( height 832 and width 832 ) or (any value multiple of 32) this increases the precision and makes it possible to detect small objects: link it is not necessary to train the network again, just use .weights file already trained for 416x416 resolution but to get even greater accuracy you should train with higher resolution 608x608 or 832x832, note: if error Out of memory occurs then in .cfg file you should increase subdivisions 16 , 32 or 64: link How to mark bounded boxes of objects and create annotation files: Here you can find repository with GUI software for marking bounded boxes of objects and generating annotation files for Yolo v2 & v3: With example of: train.txt , obj.names , obj.data , yolo obj.cfg , air 1 6 .txt , bird 1 4 .txt for 2 classes of objects (air, bird) and train_obj.cmd with example how to train this image set with Yolo v2 & v3 Using Yolo9000 Simultaneous detection and classification of 9000 objects: darknet.exe detector test cfg/combine9k.data cfg/yolo9000.cfg yolo9000.weights data/dog.jpg yolo9000.weights (186 MB Yolo9000 Model) requires 4 GB GPU RAM: yolo9000.cfg cfg file of the Yolo9000, also there are paths to the 9k.tree and coco9k.map 9k.tree WordTree of 9418 categories , if parent_id 1 then this label hasn't parent: coco9k.map map 80 categories from MSCOCO to WordTree 9k.tree : combine9k.data data file, there are paths to: 9k.labels , 9k.names , inet9k.map , (change path to your combine9k.train.list ): 9k.labels 9418 labels of objects: 9k.names 9418 names of objects: inet9k.map map 200 categories from ImageNet to WordTree 9k.tree : How to use Yolo as DLL and SO libraries on Linux set LIBSO 1 in the Makefile and do make on Windows compile build\darknet\yolo_cpp_dll.sln or build\darknet\yolo_cpp_dll_no_gpu.sln solution There are 2 APIs: C API: Python examples using the C API:: C++ API: C++ example that uses C++ API: 1. To compile Yolo as C++ DLL file yolo_cpp_dll.dll open in MSVS2015 file build\darknet\yolo_cpp_dll.sln , set x64 and Release , and do the: Build > Build yolo_cpp_dll You should have installed CUDA 10.0 To use cuDNN do: (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions, and add at the beginning of line: CUDNN; 2. To use Yolo as DLL file in your C++ console application open in MSVS2015 file build\darknet\yolo_console_dll.sln , set x64 and Release , and do the: Build > Build yolo_console_dll you can run your console application from Windows Explorer build\darknet\x64\yolo_console_dll.exe use this command : yolo_console_dll.exe data/coco.names yolov3.cfg yolov3.weights test.mp4 after launching your console application and entering the image file name you will see info for each object: to use simple OpenCV GUI you should uncomment line // define OPENCV in yolo_console_dll.cpp file: link you can see source code of simple example for detection on the video file: link yolo_cpp_dll.dll API: link struct bbox_t { unsigned int x, y, w, h; // (x,y) top left corner, (w, h) width & height of bounded box float prob; // confidence probability that the object was found correctly unsigned int obj_id; // class of object from range 0, classes 1 unsigned int track_id; // tracking id for video (0 untracked, 1 inf tracked object) unsigned int frames_counter;// counter of frames on which the object was detected }; class Detector { public: Detector(std::string cfg_filename, std::string weight_filename, int gpu_id 0); Detector(); std::vector detect(std::string image_filename, float thresh 0.2, bool use_mean false); std::vector detect(image_t img, float thresh 0.2, bool use_mean false); static image_t load_image(std::string image_filename); static void free_image(image_t m); ifdef OPENCV std::vector detect(cv::Mat mat, float thresh 0.2, bool use_mean false); std::shared_ptr mat_to_image_resize(cv::Mat mat) const; endif };",Object Detection,Object Detection 2851,Computer Vision,Computer Vision,Computer Vision,"tf faster rcnn A Tensorflow implementation of faster RCNN detection framework by Xinlei Chen (xinleic@cs.cmu.edu). This repository is based on the python Caffe implementation of faster RCNN available here . Note : Several minor modifications are made when reimplementing the framework, which give potential improvements. For details about the modifications and ablative analysis, please refer to the technical report An Implementation of Faster RCNN with Study for Region Sampling . If you are seeking to reproduce the results in the original paper, please use the official code or maybe the semi official code . For details about the faster RCNN architecture please refer to the paper Faster R CNN: Towards Real Time Object Detection with Region Proposal Networks . Detection Performance The current code supports VGG16 , Resnet V1 and Mobilenet V1 models. We mainly tested it on plain VGG16 and Resnet101 (thank you @philokey!) architecture. As the baseline, we report numbers using a single model on a single convolution layer, so no multi scale, no multi stage bounding box regression, no skip connection, no extra input is used. The only data augmentation technique is left right flipping during training following the original Faster RCNN. All models are released. With VGG16 ( conv5_3 ): Train on VOC 2007 trainval and test on VOC 2007 test, 70.8 . Train on VOC 2007+2012 trainval and test on VOC 2007 test ( R FCN schedule), 75.7 . Train on COCO 2014 trainval35k and test on minival ( Iterations : 900k/1190k), 30.2 . With Resnet101 (last conv4 ): Train on VOC 2007 trainval and test on VOC 2007 test, 75.7 . Train on VOC 2007+2012 trainval and test on VOC 2007 test (R FCN schedule), 79.8 . Train on COCO 2014 trainval35k and test on minival (900k/1190k), 35.4 . More Results: Train Mobilenet (1.0, 224) on COCO 2014 trainval35k and test on minival (900k/1190k), 21.8 . Train Resnet50 on COCO 2014 trainval35k and test on minival (900k/1190k), 32.4 . Train Resnet152 on COCO 2014 trainval35k and test on minival (900k/1190k), 36.1 . Approximate baseline setup from FPN (this repository does not contain training code for FPN yet): Train Resnet50 on COCO 2014 trainval35k and test on minival (900k/1190k), 34.2 . Train Resnet101 on COCO 2014 trainval35k and test on minival (900k/1190k), 37.4 . Train Resnet152 on COCO 2014 trainval35k and test on minival (900k/1190k), 38.2 . Note : Due to the randomness in GPU training with Tensorflow especially for VOC, the best numbers are reported (with 2 3 attempts) here. According to my experience, for COCO you can almost always get a very close number (within 0.2%) despite the randomness. The numbers are obtained with the default testing scheme which selects region proposals using non maximal suppression (TEST.MODE nms), the alternative testing scheme (TEST.MODE top) will likely result in slightly better performance (see report , for COCO it boosts 0.X AP). Since we keep the small proposals (\< 16 pixels width/height), our performance is especially good for small objects. We do not set a threshold (instead of 0.05) for a detection to be included in the final result, which increases recall. Weight decay is set to 1e 4. For other minor modifications, please check the report . Notable ones include using crop_and_resize , and excluding ground truth boxes in RoIs during training. For COCO, we find the performance improving with more iterations, and potentially better performance can be achieved with even more iterations. For Resnets, we fix the first block (total 4) when fine tuning the network, and only use crop_and_resize to resize the RoIs (7x7) without max pool (which I find useless especially for COCO). The final feature maps are average pooled for classification and regression. All batch normalization parameters are fixed. Learning rate for biases is not doubled. For Mobilenets, we fix the first five layers when fine tuning the network. All batch normalization parameters are fixed. Weight decay for Mobilenet layers is set to 4e 5. For approximate FPN baseline setup we simply resize the image with 800 pixels, add 32^2 anchors, and take 1000 proposals during testing. Check out here / here / here for the latest models, including longer COCO VGG16 models and Resnet ones. ! (data/imgs/gt.png) ! (data/imgs/pred.png) : : : : Displayed Ground Truth on Tensorboard Displayed Predictions on Tensorboard Additional features Additional features not mentioned in the report are added to make research life easier: Support for train and validation . During training, the validation data will also be tested from time to time to monitor the process and check potential overfitting. Ideally training and validation should be separate, where the model is loaded every time to test on validation. However I have implemented it in a joint way to save time and GPU memory. Though in the default setup the testing data is used for validation, no special attempts is made to overfit on testing set. Support for resuming training . I tried to store as much information as possible when snapshoting, with the purpose to resume training from the latest snapshot properly. The meta information includes current image index, permutation of images, and random state of numpy. However, when you resume training the random seed for tensorflow will be reset (not sure how to save the random state of tensorflow now), so it will result in a difference. Note that, the current implementation still cannot force the model to behave deterministically even with the random seeds set. Suggestion/solution is welcome and much appreciated. Support for visualization . The current implementation will summarize ground truth boxes, statistics of losses, activations and variables during training, and dump it to a separate folder for tensorboard visualization. The computing graph is also saved for debugging. Prerequisites A basic Tensorflow installation. The code follows r1.2 format. If you are using r1.0, please check out the r1.0 branch to fix the slim Resnet block issue. If you are using an older version (r0.1 r0.12), please check out the r0.12 branch. While it is not required, for experimenting the original RoI pooling (which requires modification of the C++ code in tensorflow), you can check out my tensorflow fork and look for tf.image.roi_pooling . Python packages you might not have: cython , opencv python , easydict (similar to py faster rcnn ). For easydict make sure you have the right version. I use 1.6. Docker users: Since the recent upgrade, the docker image on docker hub is no longer valid. However, you can still build your own image by using dockerfile located at docker folder (cuda 8 version, as it is required by Tensorflow r1.0.) And make sure following Tensorflow installation to install and use nvidia docker Last, after launching the container, you have to build the Cython modules within the running container. Installation 1. Clone the repository Shell git clone 2. Update your arch in setup script to match your GPU Shell cd tf faster rcnn/lib Change the GPU architecture ( arch) if necessary vim setup.py GPU model Architecture TitanX (Maxwell/Pascal) sm_52 GTX 960M sm_50 GTX 1080 (Ti) sm_61 Grid K520 (AWS g2.2xlarge) sm_30 Tesla K80 (AWS p2.xlarge) sm_37 Note : You are welcome to contribute the settings on your end if you have made the code work properly on other GPUs. Also even if you are only using CPU tensorflow, GPU based code (for NMS) will be used by default, so please set USE_GPU_NMS False to get the correct output. 3. Build the Cython modules Shell make clean make cd .. 4. Install the Python COCO API . The code requires the API to access COCO dataset. Shell cd data git clone cd coco/PythonAPI make cd ../../.. Setup data Please follow the instructions of py faster rcnn here to setup VOC and COCO datasets (Part of COCO is done). The steps involve downloading data and optionally creating soft links in the data folder. Since faster RCNN does not rely on pre computed proposals, it is safe to ignore the steps that setup proposals. If you find it useful, the data/cache folder created on my side is also shared here . Demo and Test with pre trained models 1. Download pre trained model Shell Resnet101 for voc pre trained on 07+12 set ./data/scripts/fetch_faster_rcnn_models.sh Note : if you cannot download the models through the link, or you want to try more models, you can check out the following solutions and optionally update the downloading script: Another server here . Google drive here . 2. Create a folder and a soft link to use the pre trained model Shell NET res101 TRAIN_IMDB voc_2007_trainval+voc_2012_trainval mkdir p output/${NET}/${TRAIN_IMDB} cd output/${NET}/${TRAIN_IMDB} ln s ../../../data/voc_2007_trainval+voc_2012_trainval ./default cd ../../.. 3. Demo for testing on custom images Shell at repository root GPU_ID 0 CUDA_VISIBLE_DEVICES ${GPU_ID} ./tools/demo.py Note : Resnet101 testing probably requires several gigabytes of memory, so if you encounter memory capacity issues, please install it with CPU support only. Refer to Issue 25 . 4. Test with pre trained Resnet101 models Shell GPU_ID 0 ./experiments/scripts/test_faster_rcnn.sh $GPU_ID pascal_voc_0712 res101 Note : If you cannot get the reported numbers (79.8 on my side), then probably the NMS function is compiled improperly, refer to Issue 5 . Train your own model 1. Download pre trained models and weights. The current code support VGG16 and Resnet V1 models. Pre trained models are provided by slim, you can get the pre trained models here and set them in the data/imagenet_weights folder. For example for VGG16 model, you can set up like: Shell mkdir p data/imagenet_weights cd data/imagenet_weights wget v tar xzvf vgg_16_2016_08_28.tar.gz mv vgg_16.ckpt vgg16.ckpt cd ../.. For Resnet101, you can set up like: Shell mkdir p data/imagenet_weights cd data/imagenet_weights wget v tar xzvf resnet_v1_101_2016_08_28.tar.gz mv resnet_v1_101.ckpt res101.ckpt cd ../.. 2. Train (and test, evaluation) Shell ./experiments/scripts/train_faster_rcnn.sh GPU_ID DATASET NET GPU_ID is the GPU you want to test on NET in {vgg16, res50, res101, res152} is the network arch to use DATASET {pascal_voc, pascal_voc_0712, coco} is defined in train_faster_rcnn.sh Examples: ./experiments/scripts/train_faster_rcnn.sh 0 pascal_voc vgg16 ./experiments/scripts/train_faster_rcnn.sh 1 coco res101 Note : Please double check you have deleted soft link to the pre trained models before training. If you find NaNs during training, please refer to Issue 86 . Also if you want to have multi gpu support, check out Issue 121 . 3. Visualization with Tensorboard Shell tensorboard logdir tensorboard/vgg16/voc_2007_trainval/ port 7001 & tensorboard logdir tensorboard/vgg16/coco_2014_train+coco_2014_valminusminival/ port 7002 & 4. Test and evaluate Shell ./experiments/scripts/test_faster_rcnn.sh GPU_ID DATASET NET GPU_ID is the GPU you want to test on NET in {vgg16, res50, res101, res152} is the network arch to use DATASET {pascal_voc, pascal_voc_0712, coco} is defined in test_faster_rcnn.sh Examples: ./experiments/scripts/test_faster_rcnn.sh 0 pascal_voc vgg16 ./experiments/scripts/test_faster_rcnn.sh 1 coco res101 5. You can use tools/reval.sh for re evaluation By default, trained networks are saved under: output/ NET / DATASET /default/ Test outputs are saved under: output/ NET / DATASET /default/ SNAPSHOT / Tensorboard information for train and validation is saved under: tensorboard/ NET / DATASET /default/ tensorboard/ NET / DATASET /default_val/ The default number of training iterations is kept the same to the original faster RCNN for VOC 2007, however I find it is beneficial to train longer (see report for COCO), probably due to the fact that the image batch size is one. For VOC 07+12 we switch to a 80k/110k schedule following R FCN . Also note that due to the nondeterministic nature of the current implementation, the performance can vary a bit, but in general it should be within 1% of the reported numbers for VOC, and 0.2% of the reported numbers for COCO. Suggestions/Contributions are welcome. Citation If you find this implementation or the analysis conducted in our report helpful, please consider citing: @article{chen17implementation, Author {Xinlei Chen and Abhinav Gupta}, Title {An Implementation of Faster RCNN with Study for Region Sampling}, Journal {arXiv preprint arXiv:1702.02138}, Year {2017} } Or for a formal paper, Spatial Memory Network : @article{chen2017spatial, title {Spatial Memory for Context Reasoning in Object Detection}, author {Chen, Xinlei and Gupta, Abhinav}, journal {arXiv preprint arXiv:1704.04224}, year {2017} } For convenience, here is the faster RCNN citation: @inproceedings{renNIPS15fasterrcnn, Author {Shaoqing Ren and Kaiming He and Ross Girshick and Jian Sun}, Title {Faster {R CNN}: Towards Real Time Object Detection with Region Proposal Networks}, Booktitle {Advances in Neural Information Processing Systems ({NIPS})}, Year {2015} }",Object Detection,Object Detection 2853,Computer Vision,Computer Vision,Computer Vision,"faster rcnn论文连接: 按照论文思路并参考了其他版本的源码,这版代码是我加过注释的tf版本,在我自己的电脑上能跑通。 本机实验环境:py3.6, CUDA9.0, tf1.7 另外还有faster rcnn总结网络整体流程",Object Detection,Object Detection 2859,Computer Vision,Computer Vision,Computer Vision,"Deformable Convolutional Networks The major contributors of this repository include Yuwen Xiong , Haozhi Qi , Guodong Zhang , Yi Li , Jifeng Dai , Bin Xiao , Han Hu and Yichen Wei . We released training/testing code and pre trained models of Deformable FPN, which is the foundation of our COCO detection 2017 entry. Slides at COCO 2017 workshop . A third party improvement of Deformable R FCN + Soft NMS Introduction Deformable ConvNets is initially described in an ICCV 2017 oral paper . (Slides at ICCV 2017 Oral ) R FCN is initially described in a NIPS 2016 paper . Disclaimer This is an official implementation for Deformable Convolutional Networks (Deformable ConvNets) based on MXNet. It is worth noticing that: The original implementation is based on our internal Caffe version on Windows. There are slight differences in the final accuracy and running time due to the plenty details in platform switch. The code is tested on official MXNet@(commit 62ecb60) with the extra operators for Deformable ConvNets. After MXNet@(commit ce2bca6) the offical MXNet support all operators for Deformable ConvNets. We trained our model based on the ImageNet pre trained ResNet v1 101 using a model converter . The converted model produces slightly lower accuracy (Top 1 Error on ImageNet val: 24.0% v.s. 23.6%). This repository used code from MXNet rcnn example and mx rfcn . License © Microsoft, 2017. Licensed under an MIT license. Citing Deformable ConvNets If you find Deformable ConvNets useful in your research, please consider citing: @article{dai17dcn, Author {Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, Yichen Wei}, Title {Deformable Convolutional Networks}, Journal {arXiv preprint arXiv:1703.06211}, Year {2017} } @inproceedings{dai16rfcn, Author {Jifeng Dai, Yi Li, Kaiming He, Jian Sun}, Title {{R FCN}: Object Detection via Region based Fully Convolutional Networks}, Conference {NIPS}, Year {2016} } Main Results training data testing data mAP@0.5 mAP@0.7 time R FCN, ResNet v1 101 VOC 07+12 trainval VOC 07 test 79.6 63.1 0.16s Deformable R FCN, ResNet v1 101 VOC 07+12 trainval VOC 07 test 82.3 67.8 0.19s training data testing data mAP mAP@0.5 mAP@0.75 mAP@S mAP@M mAP@L R FCN, ResNet v1 101 coco trainval coco test dev 32.1 54.3 33.8 12.8 34.9 46.1 Deformable R FCN, ResNet v1 101 coco trainval coco test dev 35.7 56.8 38.3 15.2 38.8 51.5 Faster R CNN (2fc), ResNet v1 101 coco trainval coco test dev 30.3 52.1 31.4 9.9 32.2 47.4 Deformable Faster R CNN (2fc), ResNet v1 101 coco trainval coco test dev 35.0 55.0 38.3 14.3 37.7 52.0 training data testing data mAP mAP@0.5 mAP@0.75 mAP@S mAP@M mAP@L FPN+OHEM, ResNet v1 101 coco trainval35k coco minival 37.8 60.8 41.0 22.0 41.5 49.8 Deformable FPN + OHEM, ResNet v1 101 coco trainval35k coco minival 41.2 63.5 45.5 24.3 44.9 54.4 FPN + OHEM + Soft NMS + multi scale testing, ResNet v1 101 coco trainval35k coco minival 40.9 62.5 46.0 27.1 44.1 52.2 Deformable FPN + OHEM + Soft NMS + multi scale testing, ResNet v1 101 coco trainval35k coco minival 44.4 65.5 50.2 30.8 47.3 56.4 training data testing data mIoU time DeepLab, ResNet v1 101 Cityscapes train Cityscapes val 70.3 0.51s Deformable DeepLab, ResNet v1 101 Cityscapes train Cityscapes val 75.2 0.52s DeepLab, ResNet v1 101 VOC 12 train (augmented) VOC 12 val 70.7 0.08s Deformable DeepLab, ResNet v1 101 VOC 12 train (augmented) VOC 12 val 75.9 0.08s Running time is counted on a single Maxwell Titan X GPU (mini batch size is 1 in inference). Requirements: Software 1. MXNet from the offical repository . We tested our code on MXNet@(commit 62ecb60) . Due to the rapid development of MXNet, it is recommended to checkout this version if you encounter any issues. We may maintain this repository periodically if MXNet adds important feature in future release. 2. Python 2.7. We recommend using Anaconda2 as it already includes many common packages. We do not support Python 3 yet, if you want to use Python 3 you need to modify the code to make it work. 3. Python packages might missing: cython, opencv python > 3.2.0, easydict. If pip is set up on your system, those packages should be able to be fetched and installed by running pip install r requirements.txt 4. For Windows users, Visual Studio 2015 is needed to compile cython module. Requirements: Hardware Any NVIDIA GPUs with at least 4GB memory should be OK. Installation 1. Clone the Deformable ConvNets repository, and we'll call the directory that you cloned Deformable ConvNets as ${DCN_ROOT}. git clone 2. For Windows users, run cmd .\init.bat . For Linux user, run sh ./init.sh . The scripts will build cython module automatically and create some folders. 3. Install MXNet: Note: The MXNet's Custom Op cannot execute parallelly using multi gpus after this PR . We strongly suggest the user rollback to version MXNet@(commit 998378a) for training (following Section 3.2 3.5). Quick start 3.1 Install MXNet and all dependencies by pip install r requirements.txt If there is no other error message, MXNet should be installed successfully. Build from source (alternative way) 3.2 Clone MXNet and checkout to MXNet@(commit 998378a) by git clone recursive git checkout 998378a git submodule update if it's the first time to checkout, just use: git submodule update init recursive 3.3 Compile MXNet cd ${MXNET_ROOT} make j $(nproc) USE_OPENCV 1 USE_BLAS openblas USE_CUDA 1 USE_CUDA_PATH /usr/local/cuda USE_CUDNN 1 3.4 Install the MXNet Python binding by Note: If you will actively switch between different versions of MXNet, please follow 3.5 instead of 3.4 cd python sudo python setup.py install 3.5 For advanced users, you may put your Python packge into ./external/mxnet/$(YOUR_MXNET_PACKAGE) , and modify MXNET_VERSION in ./experiments/rfcn/cfgs/ .yaml to $(YOUR_MXNET_PACKAGE) . Thus you can switch among different versions of MXNet quickly. 4. For Deeplab, we use the argumented VOC 2012 dataset. The argumented annotations are provided by SBD dataset. For convenience, we provide the converted PNG annotations and the lists of train/val images, please download them from OneDrive . Demo & Deformable Model We provide trained deformable convnet models, including the deformable R FCN & Faster R CNN models trained on COCO trainval, and the deformable DeepLab model trained on CityScapes train. 1. To use the demo with our pre trained deformable models, please download manually from OneDrive or BaiduYun , and put it under folder model/ . Make sure it looks like this: ./model/rfcn_dcn_coco 0000.params ./model/rfcn_coco 0000.params ./model/fpn_dcn_coco 0000.params ./model/fpn_coco 0000.params ./model/rcnn_dcn_coco 0000.params ./model/rcnn_coco 0000.params ./model/deeplab_dcn_cityscapes 0000.params ./model/deeplab_cityscapes 0000.params ./model/deform_conv 0000.params ./model/deform_psroi 0000.params 2. To run the R FCN demo, run python ./rfcn/demo.py By default it will run Deformable R FCN and gives several prediction results, to run R FCN, use python ./rfcn/demo.py rfcn_only 3. To run the DeepLab demo, run python ./deeplab/demo.py By default it will run Deformable Deeplab and gives several prediction results, to run DeepLab, use python ./deeplab/demo.py deeplab_only 4. To visualize the offset of deformable convolution and deformable psroipooling, run python ./rfcn/deform_conv_demo.py python ./rfcn/deform_psroi_demo.py Preparation for Training & Testing For R FCN/Faster R CNN\: 1. Please download COCO and VOC 2007+2012 datasets, and make sure it looks like this: ./data/coco/ ./data/VOCdevkit/VOC2007/ ./data/VOCdevkit/VOC2012/ 2. Please download ImageNet pretrained ResNet v1 101 model manually from OneDrive , and put it under folder ./model . Make sure it looks like this: ./model/pretrained_model/resnet_v1_101 0000.params For DeepLab\: 1. Please download Cityscapes and VOC 2012 datasets and make sure it looks like this: ./data/cityscapes/ ./data/VOCdevkit/VOC2012/ 2. Please download argumented VOC 2012 annotations/image lists, and put the argumented annotations and the argumented train/val lists into: ./data/VOCdevkit/VOC2012/SegmentationClass/ ./data/VOCdevkit/VOC2012/ImageSets/Main/ , Respectively. 2. Please download ImageNet pretrained ResNet v1 101 model manually from OneDrive , and put it under folder ./model . Make sure it looks like this: ./model/pretrained_model/resnet_v1_101 0000.params Usage 1. All of our experiment settings (GPU , dataset, etc.) are kept in yaml config files at folder ./experiments/rfcn/cfgs , ./experiments/faster_rcnn/cfgs and ./experiments/deeplab/cfgs/ . 2. Eight config files have been provided so far, namely, R FCN for COCO/VOC, Deformable R FCN for COCO/VOC, Faster R CNN(2fc) for COCO/VOC, Deformable Faster R CNN(2fc) for COCO/VOC, Deeplab for Cityscapes/VOC and Deformable Deeplab for Cityscapes/VOC, respectively. We use 8 and 4 GPUs to train models on COCO and on VOC for R FCN, respectively. For deeplab, we use 4 GPUs for all experiments. 3. To perform experiments, run the python scripts with the corresponding config file as input. For example, to train and test deformable convnets on COCO with ResNet v1 101, use the following command python experiments\rfcn\rfcn_end2end_train_test.py cfg experiments\rfcn\cfgs\resnet_v1_101_coco_trainval_rfcn_dcn_end2end_ohem.yaml A cache folder would be created automatically to save the model and the log under output/rfcn_dcn_coco/ . 4. Please find more details in config files and in our code. Misc. Code has been tested under: Ubuntu 14.04 with a Maxwell Titan X GPU and Intel Xeon CPU E5 2620 v2 @ 2.10GHz Windows Server 2012 R2 with 8 K40 GPUs and Intel Xeon CPU E5 2650 v2 @ 2.60GHz Windows Server 2012 R2 with 4 Pascal Titan X GPUs and Intel Xeon CPU E5 2650 v4 @ 2.30GHz FAQ Q: It says AttributeError: 'module' object has no attribute 'DeformableConvolution' . A: This is because either you forget to copy the operators to your MXNet folder or you copy to the wrong path or you forget to re compile or you install the wrong MXNet Please print mxnet.__path__ to make sure you use correct MXNet Q: I encounter segment fault at the beginning. A: A compatibility issue has been identified between MXNet and opencv python 3.0+. We suggest that you always import cv2 first before import mxnet in the entry script. Q: I find the training speed becomes slower when training for a long time. A: It has been identified that MXNet on Windows has this problem. So we recommend to run this program on Linux. You could also stop it and resume the training process to regain the training speed if you encounter this problem. Q: Can you share your caffe implementation? A: Due to several reasons (code is based on a old, internal Caffe, port to public Caffe needs extra work, time limit, etc.). We do not plan to release our Caffe code. Since current MXNet convolution implementation is very similar to Caffe (almost the same), it is easy to port to Caffe by yourself, the core CUDA code could be kept unchanged. Anyone who wish to do it is welcome to make a pull request.",Object Detection,Object Detection 2862,Computer Vision,Computer Vision,Computer Vision,"Deformable Convolutional Networks The major contributors of this repository include Yuwen Xiong , Haozhi Qi , Guodong Zhang , Yi Li , Jifeng Dai , Bin Xiao , Han Hu and Yichen Wei . We released training/testing code and pre trained models of Deformable FPN, which is the foundation of our COCO detection 2017 entry. Slides at COCO 2017 workshop . A third party improvement of Deformable R FCN + Soft NMS Introduction Deformable ConvNets is initially described in an ICCV 2017 oral paper . (Slides at ICCV 2017 Oral ) R FCN is initially described in a NIPS 2016 paper . Disclaimer This is an official implementation for Deformable Convolutional Networks (Deformable ConvNets) based on MXNet. It is worth noticing that: The original implementation is based on our internal Caffe version on Windows. There are slight differences in the final accuracy and running time due to the plenty details in platform switch. The code is tested on official MXNet@(commit 62ecb60) with the extra operators for Deformable ConvNets. After MXNet@(commit ce2bca6) the offical MXNet support all operators for Deformable ConvNets. We trained our model based on the ImageNet pre trained ResNet v1 101 using a model converter . The converted model produces slightly lower accuracy (Top 1 Error on ImageNet val: 24.0% v.s. 23.6%). This repository used code from MXNet rcnn example and mx rfcn . License © Microsoft, 2017. Licensed under an MIT license. Citing Deformable ConvNets If you find Deformable ConvNets useful in your research, please consider citing: @article{dai17dcn, Author {Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, Yichen Wei}, Title {Deformable Convolutional Networks}, Journal {arXiv preprint arXiv:1703.06211}, Year {2017} } @inproceedings{dai16rfcn, Author {Jifeng Dai, Yi Li, Kaiming He, Jian Sun}, Title {{R FCN}: Object Detection via Region based Fully Convolutional Networks}, Conference {NIPS}, Year {2016} } Main Results training data testing data mAP@0.5 mAP@0.7 time R FCN, ResNet v1 101 VOC 07+12 trainval VOC 07 test 79.6 63.1 0.16s Deformable R FCN, ResNet v1 101 VOC 07+12 trainval VOC 07 test 82.3 67.8 0.19s training data testing data mAP mAP@0.5 mAP@0.75 mAP@S mAP@M mAP@L R FCN, ResNet v1 101 coco trainval coco test dev 32.1 54.3 33.8 12.8 34.9 46.1 Deformable R FCN, ResNet v1 101 coco trainval coco test dev 35.7 56.8 38.3 15.2 38.8 51.5 Faster R CNN (2fc), ResNet v1 101 coco trainval coco test dev 30.3 52.1 31.4 9.9 32.2 47.4 Deformable Faster R CNN (2fc), ResNet v1 101 coco trainval coco test dev 35.0 55.0 38.3 14.3 37.7 52.0 training data testing data mAP mAP@0.5 mAP@0.75 mAP@S mAP@M mAP@L FPN+OHEM, ResNet v1 101 coco trainval35k coco minival 37.8 60.8 41.0 22.0 41.5 49.8 Deformable FPN + OHEM, ResNet v1 101 coco trainval35k coco minival 41.2 63.5 45.5 24.3 44.9 54.4 FPN + OHEM + Soft NMS + multi scale testing, ResNet v1 101 coco trainval35k coco minival 40.9 62.5 46.0 27.1 44.1 52.2 Deformable FPN + OHEM + Soft NMS + multi scale testing, ResNet v1 101 coco trainval35k coco minival 44.4 65.5 50.2 30.8 47.3 56.4 training data testing data mIoU time DeepLab, ResNet v1 101 Cityscapes train Cityscapes val 70.3 0.51s Deformable DeepLab, ResNet v1 101 Cityscapes train Cityscapes val 75.2 0.52s DeepLab, ResNet v1 101 VOC 12 train (augmented) VOC 12 val 70.7 0.08s Deformable DeepLab, ResNet v1 101 VOC 12 train (augmented) VOC 12 val 75.9 0.08s Running time is counted on a single Maxwell Titan X GPU (mini batch size is 1 in inference). Requirements: Software 1. MXNet from the offical repository . We tested our code on MXNet@(commit 62ecb60) . Due to the rapid development of MXNet, it is recommended to checkout this version if you encounter any issues. We may maintain this repository periodically if MXNet adds important feature in future release. 2. Python 2.7. We recommend using Anaconda2 as it already includes many common packages. We do not support Python 3 yet, if you want to use Python 3 you need to modify the code to make it work. 3. Python packages might missing: cython, opencv python > 3.2.0, easydict. If pip is set up on your system, those packages should be able to be fetched and installed by running pip install r requirements.txt 4. For Windows users, Visual Studio 2015 is needed to compile cython module. Requirements: Hardware Any NVIDIA GPUs with at least 4GB memory should be OK. Installation 1. Clone the Deformable ConvNets repository, and we'll call the directory that you cloned Deformable ConvNets as ${DCN_ROOT}. git clone 2. For Windows users, run cmd .\init.bat . For Linux user, run sh ./init.sh . The scripts will build cython module automatically and create some folders. 3. Install MXNet: Note: The MXNet's Custom Op cannot execute parallelly using multi gpus after this PR . We strongly suggest the user rollback to version MXNet@(commit 998378a) for training (following Section 3.2 3.5). Quick start 3.1 Install MXNet and all dependencies by pip install r requirements.txt If there is no other error message, MXNet should be installed successfully. Build from source (alternative way) 3.2 Clone MXNet and checkout to MXNet@(commit 998378a) by git clone recursive git checkout 998378a git submodule update if it's the first time to checkout, just use: git submodule update init recursive 3.3 Compile MXNet cd ${MXNET_ROOT} make j $(nproc) USE_OPENCV 1 USE_BLAS openblas USE_CUDA 1 USE_CUDA_PATH /usr/local/cuda USE_CUDNN 1 3.4 Install the MXNet Python binding by Note: If you will actively switch between different versions of MXNet, please follow 3.5 instead of 3.4 cd python sudo python setup.py install 3.5 For advanced users, you may put your Python packge into ./external/mxnet/$(YOUR_MXNET_PACKAGE) , and modify MXNET_VERSION in ./experiments/rfcn/cfgs/ .yaml to $(YOUR_MXNET_PACKAGE) . Thus you can switch among different versions of MXNet quickly. 4. For Deeplab, we use the argumented VOC 2012 dataset. The argumented annotations are provided by SBD dataset. For convenience, we provide the converted PNG annotations and the lists of train/val images, please download them from OneDrive . Demo & Deformable Model We provide trained deformable convnet models, including the deformable R FCN & Faster R CNN models trained on COCO trainval, and the deformable DeepLab model trained on CityScapes train. 1. To use the demo with our pre trained deformable models, please download manually from OneDrive or BaiduYun , and put it under folder model/ . Make sure it looks like this: ./model/rfcn_dcn_coco 0000.params ./model/rfcn_coco 0000.params ./model/fpn_dcn_coco 0000.params ./model/fpn_coco 0000.params ./model/rcnn_dcn_coco 0000.params ./model/rcnn_coco 0000.params ./model/deeplab_dcn_cityscapes 0000.params ./model/deeplab_cityscapes 0000.params ./model/deform_conv 0000.params ./model/deform_psroi 0000.params 2. To run the R FCN demo, run python ./rfcn/demo.py By default it will run Deformable R FCN and gives several prediction results, to run R FCN, use python ./rfcn/demo.py rfcn_only 3. To run the DeepLab demo, run python ./deeplab/demo.py By default it will run Deformable Deeplab and gives several prediction results, to run DeepLab, use python ./deeplab/demo.py deeplab_only 4. To visualize the offset of deformable convolution and deformable psroipooling, run python ./rfcn/deform_conv_demo.py python ./rfcn/deform_psroi_demo.py Preparation for Training & Testing For R FCN/Faster R CNN\: 1. Please download COCO and VOC 2007+2012 datasets, and make sure it looks like this: ./data/coco/ ./data/VOCdevkit/VOC2007/ ./data/VOCdevkit/VOC2012/ 2. Please download ImageNet pretrained ResNet v1 101 model manually from OneDrive , and put it under folder ./model . Make sure it looks like this: ./model/pretrained_model/resnet_v1_101 0000.params For DeepLab\: 1. Please download Cityscapes and VOC 2012 datasets and make sure it looks like this: ./data/cityscapes/ ./data/VOCdevkit/VOC2012/ 2. Please download argumented VOC 2012 annotations/image lists, and put the argumented annotations and the argumented train/val lists into: ./data/VOCdevkit/VOC2012/SegmentationClass/ ./data/VOCdevkit/VOC2012/ImageSets/Main/ , Respectively. 2. Please download ImageNet pretrained ResNet v1 101 model manually from OneDrive , and put it under folder ./model . Make sure it looks like this: ./model/pretrained_model/resnet_v1_101 0000.params Usage 1. All of our experiment settings (GPU , dataset, etc.) are kept in yaml config files at folder ./experiments/rfcn/cfgs , ./experiments/faster_rcnn/cfgs and ./experiments/deeplab/cfgs/ . 2. Eight config files have been provided so far, namely, R FCN for COCO/VOC, Deformable R FCN for COCO/VOC, Faster R CNN(2fc) for COCO/VOC, Deformable Faster R CNN(2fc) for COCO/VOC, Deeplab for Cityscapes/VOC and Deformable Deeplab for Cityscapes/VOC, respectively. We use 8 and 4 GPUs to train models on COCO and on VOC for R FCN, respectively. For deeplab, we use 4 GPUs for all experiments. 3. To perform experiments, run the python scripts with the corresponding config file as input. For example, to train and test deformable convnets on COCO with ResNet v1 101, use the following command python experiments\rfcn\rfcn_end2end_train_test.py cfg experiments\rfcn\cfgs\resnet_v1_101_coco_trainval_rfcn_dcn_end2end_ohem.yaml A cache folder would be created automatically to save the model and the log under output/rfcn_dcn_coco/ . 4. Please find more details in config files and in our code. Misc. Code has been tested under: Ubuntu 14.04 with a Maxwell Titan X GPU and Intel Xeon CPU E5 2620 v2 @ 2.10GHz Windows Server 2012 R2 with 8 K40 GPUs and Intel Xeon CPU E5 2650 v2 @ 2.60GHz Windows Server 2012 R2 with 4 Pascal Titan X GPUs and Intel Xeon CPU E5 2650 v4 @ 2.30GHz FAQ Q: It says AttributeError: 'module' object has no attribute 'DeformableConvolution' . A: This is because either you forget to copy the operators to your MXNet folder or you copy to the wrong path or you forget to re compile or you install the wrong MXNet Please print mxnet.__path__ to make sure you use correct MXNet Q: I encounter segment fault at the beginning. A: A compatibility issue has been identified between MXNet and opencv python 3.0+. We suggest that you always import cv2 first before import mxnet in the entry script. Q: I find the training speed becomes slower when training for a long time. A: It has been identified that MXNet on Windows has this problem. So we recommend to run this program on Linux. You could also stop it and resume the training process to regain the training speed if you encounter this problem. Q: Can you share your caffe implementation? A: Due to several reasons (code is based on a old, internal Caffe, port to public Caffe needs extra work, time limit, etc.). We do not plan to release our Caffe code. Since current MXNet convolution implementation is very similar to Caffe (almost the same), it is easy to port to Caffe by yourself, the core CUDA code could be kept unchanged. Anyone who wish to do it is welcome to make a pull request.",Object Detection,Object Detection 2863,Computer Vision,Computer Vision,Computer Vision,"Deformable Convolutional Networks Update 12/01/2018 We updated the deformable convolution operator to be the same as those utilized in the Deformale ConvNets v2 paper. A possible issue when the sampling location is outside of image boundary is solved. The issue may cause deteriated performance on ImageNet classification. Note that the current deformable conv layers in both the official MXNet and the PyTorch codebase still have the issue. So if you want to reproduce the results in Deformable ConvNets v2, please utilize the updated layer provided here. The efficiency at large image batch size is also improved. See more details in DCNv2_op/README.md . The full codebase of Deformable ConvNets v2 would be available later. But it should be easy to reproduce the results with the updated operator. 10/2017 We released the training/testing code and pre trained models of Deformable FPN, which is the foundation of our COCO detection 2017 entry. Slides at COCO 2017 workshop . A third party improvement of Deformable R FCN + Soft NMS Introduction Deformable ConvNets is initially described in an ICCV 2017 oral paper . (Slides at ICCV 2017 Oral ) R FCN is initially described in a NIPS 2016 paper . Disclaimer This is an official implementation for Deformable Convolutional Networks (Deformable ConvNets) based on MXNet. It is worth noticing that: The original implementation is based on our internal Caffe version on Windows. There are slight differences in the final accuracy and running time due to the plenty details in platform switch. The code is tested on official MXNet@(commit 62ecb60) with the extra operators for Deformable ConvNets. After MXNet@(commit ce2bca6) the offical MXNet support all operators for Deformable ConvNets. We trained our model based on the ImageNet pre trained ResNet v1 101 using a model converter . The converted model produces slightly lower accuracy (Top 1 Error on ImageNet val: 24.0% v.s. 23.6%). This repository used code from MXNet rcnn example and mx rfcn . License © Microsoft, 2017. Licensed under an MIT license. Citing Deformable ConvNets If you find Deformable ConvNets useful in your research, please consider citing: @article{dai17dcn, Author {Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, Yichen Wei}, Title {Deformable Convolutional Networks}, Journal {arXiv preprint arXiv:1703.06211}, Year {2017} } @inproceedings{dai16rfcn, Author {Jifeng Dai, Yi Li, Kaiming He, Jian Sun}, Title {{R FCN}: Object Detection via Region based Fully Convolutional Networks}, Conference {NIPS}, Year {2016} } Main Results training data testing data mAP@0.5 mAP@0.7 time R FCN, ResNet v1 101 VOC 07+12 trainval VOC 07 test 79.6 63.1 0.16s Deformable R FCN, ResNet v1 101 VOC 07+12 trainval VOC 07 test 82.3 67.8 0.19s training data testing data mAP mAP@0.5 mAP@0.75 mAP@S mAP@M mAP@L R FCN, ResNet v1 101 coco trainval coco test dev 32.1 54.3 33.8 12.8 34.9 46.1 Deformable R FCN, ResNet v1 101 coco trainval coco test dev 35.7 56.8 38.3 15.2 38.8 51.5 Faster R CNN (2fc), ResNet v1 101 coco trainval coco test dev 30.3 52.1 31.4 9.9 32.2 47.4 Deformable Faster R CNN (2fc), ResNet v1 101 coco trainval coco test dev 35.0 55.0 38.3 14.3 37.7 52.0 training data testing data mAP mAP@0.5 mAP@0.75 mAP@S mAP@M mAP@L FPN+OHEM, ResNet v1 101 coco trainval35k coco minival 37.8 60.8 41.0 22.0 41.5 49.8 Deformable FPN + OHEM, ResNet v1 101 coco trainval35k coco minival 41.2 63.5 45.5 24.3 44.9 54.4 FPN + OHEM + Soft NMS + multi scale testing, ResNet v1 101 coco trainval35k coco minival 40.9 62.5 46.0 27.1 44.1 52.2 Deformable FPN + OHEM + Soft NMS + multi scale testing, ResNet v1 101 coco trainval35k coco minival 44.4 65.5 50.2 30.8 47.3 56.4 training data testing data mIoU time DeepLab, ResNet v1 101 Cityscapes train Cityscapes val 70.3 0.51s Deformable DeepLab, ResNet v1 101 Cityscapes train Cityscapes val 75.2 0.52s DeepLab, ResNet v1 101 VOC 12 train (augmented) VOC 12 val 70.7 0.08s Deformable DeepLab, ResNet v1 101 VOC 12 train (augmented) VOC 12 val 75.9 0.08s Running time is counted on a single Maxwell Titan X GPU (mini batch size is 1 in inference). Requirements: Software 1. MXNet from the offical repository . We tested our code on MXNet@(commit 62ecb60) . Due to the rapid development of MXNet, it is recommended to checkout this version if you encounter any issues. We may maintain this repository periodically if MXNet adds important feature in future release. 2. Python 2.7. We recommend using Anaconda2 as it already includes many common packages. We do not support Python 3 yet, if you want to use Python 3 you need to modify the code to make it work. 3. Python packages might missing: cython, opencv python > 3.2.0, easydict. If pip is set up on your system, those packages should be able to be fetched and installed by running pip install r requirements.txt 4. For Windows users, Visual Studio 2015 is needed to compile cython module. Requirements: Hardware Any NVIDIA GPUs with at least 4GB memory should be OK. Installation 1. Clone the Deformable ConvNets repository, and we'll call the directory that you cloned Deformable ConvNets as ${DCN_ROOT}. git clone 2. For Windows users, run cmd .\init.bat . For Linux user, run sh ./init.sh . The scripts will build cython module automatically and create some folders. 3. Install MXNet: Note: The MXNet's Custom Op cannot execute parallelly using multi gpus after this PR . We strongly suggest the user rollback to version MXNet@(commit 998378a) for training (following Section 3.2 3.5). Quick start 3.1 Install MXNet and all dependencies by pip install r requirements.txt If there is no other error message, MXNet should be installed successfully. Build from source (alternative way) 3.2 Clone MXNet and checkout to MXNet@(commit 998378a) by git clone recursive git checkout 998378a git submodule update if it's the first time to checkout, just use: git submodule update init recursive 3.3 Compile MXNet cd ${MXNET_ROOT} make j $(nproc) USE_OPENCV 1 USE_BLAS openblas USE_CUDA 1 USE_CUDA_PATH /usr/local/cuda USE_CUDNN 1 3.4 Install the MXNet Python binding by Note: If you will actively switch between different versions of MXNet, please follow 3.5 instead of 3.4 cd python sudo python setup.py install 3.5 For advanced users, you may put your Python packge into ./external/mxnet/$(YOUR_MXNET_PACKAGE) , and modify MXNET_VERSION in ./experiments/rfcn/cfgs/ .yaml to $(YOUR_MXNET_PACKAGE) . Thus you can switch among different versions of MXNet quickly. 4. For Deeplab, we use the argumented VOC 2012 dataset. The argumented annotations are provided by SBD dataset. For convenience, we provide the converted PNG annotations and the lists of train/val images, please download them from OneDrive . Demo & Deformable Model We provide trained deformable convnet models, including the deformable R FCN & Faster R CNN models trained on COCO trainval, and the deformable DeepLab model trained on CityScapes train. 1. To use the demo with our pre trained deformable models, please download manually from OneDrive or BaiduYun , and put it under folder model/ . Make sure it looks like this: ./model/rfcn_dcn_coco 0000.params ./model/rfcn_coco 0000.params ./model/fpn_dcn_coco 0000.params ./model/fpn_coco 0000.params ./model/rcnn_dcn_coco 0000.params ./model/rcnn_coco 0000.params ./model/deeplab_dcn_cityscapes 0000.params ./model/deeplab_cityscapes 0000.params ./model/deform_conv 0000.params ./model/deform_psroi 0000.params 2. To run the R FCN demo, run python ./rfcn/demo.py By default it will run Deformable R FCN and gives several prediction results, to run R FCN, use python ./rfcn/demo.py rfcn_only 3. To run the DeepLab demo, run python ./deeplab/demo.py By default it will run Deformable Deeplab and gives several prediction results, to run DeepLab, use python ./deeplab/demo.py deeplab_only 4. To visualize the offset of deformable convolution and deformable psroipooling, run python ./rfcn/deform_conv_demo.py python ./rfcn/deform_psroi_demo.py Preparation for Training & Testing For R FCN/Faster R CNN\: 1. Please download COCO and VOC 2007+2012 datasets, and make sure it looks like this: ./data/coco/ ./data/VOCdevkit/VOC2007/ ./data/VOCdevkit/VOC2012/ 2. Please download ImageNet pretrained ResNet v1 101 model manually from OneDrive , and put it under folder ./model . Make sure it looks like this: ./model/pretrained_model/resnet_v1_101 0000.params For DeepLab\: 1. Please download Cityscapes and VOC 2012 datasets and make sure it looks like this: ./data/cityscapes/ ./data/VOCdevkit/VOC2012/ 2. Please download argumented VOC 2012 annotations/image lists, and put the argumented annotations and the argumented train/val lists into: ./data/VOCdevkit/VOC2012/SegmentationClass/ ./data/VOCdevkit/VOC2012/ImageSets/Main/ , Respectively. 2. Please download ImageNet pretrained ResNet v1 101 model manually from OneDrive , and put it under folder ./model . Make sure it looks like this: ./model/pretrained_model/resnet_v1_101 0000.params Usage 1. All of our experiment settings (GPU , dataset, etc.) are kept in yaml config files at folder ./experiments/rfcn/cfgs , ./experiments/faster_rcnn/cfgs and ./experiments/deeplab/cfgs/ . 2. Eight config files have been provided so far, namely, R FCN for COCO/VOC, Deformable R FCN for COCO/VOC, Faster R CNN(2fc) for COCO/VOC, Deformable Faster R CNN(2fc) for COCO/VOC, Deeplab for Cityscapes/VOC and Deformable Deeplab for Cityscapes/VOC, respectively. We use 8 and 4 GPUs to train models on COCO and on VOC for R FCN, respectively. For deeplab, we use 4 GPUs for all experiments. 3. To perform experiments, run the python scripts with the corresponding config file as input. For example, to train and test deformable convnets on COCO with ResNet v1 101, use the following command python experiments\rfcn\rfcn_end2end_train_test.py cfg experiments\rfcn\cfgs\resnet_v1_101_coco_trainval_rfcn_dcn_end2end_ohem.yaml A cache folder would be created automatically to save the model and the log under output/rfcn_dcn_coco/ . 4. Please find more details in config files and in our code. Misc. Code has been tested under: Ubuntu 14.04 with a Maxwell Titan X GPU and Intel Xeon CPU E5 2620 v2 @ 2.10GHz Windows Server 2012 R2 with 8 K40 GPUs and Intel Xeon CPU E5 2650 v2 @ 2.60GHz Windows Server 2012 R2 with 4 Pascal Titan X GPUs and Intel Xeon CPU E5 2650 v4 @ 2.30GHz FAQ Q: It says AttributeError: 'module' object has no attribute 'DeformableConvolution' . A: This is because either you forget to copy the operators to your MXNet folder or you copy to the wrong path or you forget to re compile or you install the wrong MXNet Please print mxnet.__path__ to make sure you use correct MXNet Q: I encounter segment fault at the beginning. A: A compatibility issue has been identified between MXNet and opencv python 3.0+. We suggest that you always import cv2 first before import mxnet in the entry script. Q: I find the training speed becomes slower when training for a long time. A: It has been identified that MXNet on Windows has this problem. So we recommend to run this program on Linux. You could also stop it and resume the training process to regain the training speed if you encounter this problem. Q: Can you share your caffe implementation? A: Due to several reasons (code is based on a old, internal Caffe, port to public Caffe needs extra work, time limit, etc.). We do not plan to release our Caffe code. Since current MXNet convolution implementation is very similar to Caffe (almost the same), it is easy to port to Caffe by yourself, the core CUDA code could be kept unchanged. Anyone who wish to do it is welcome to make a pull request.",Object Detection,Object Detection 2865,Computer Vision,Computer Vision,Computer Vision,"Airbus Ships detection problem NPM Version npm image npm url Build Status travis image travis url Downloads Stats npm downloads npm url This a computer vision object detection and segmentation problem on kaggle . In this problem, I build a model that detects all ships in satellite images and generate a mask for each ship. There several deep learning models that works with image detection such as YOLO, R CNN, Fast R CNN, Faster R CNN. For objection segmentation, Unet is a great tools. Recently there is a nice paper on object instance segmentation called Mask R CNN. In this problem, most image (80%) contains no ships. So my strategy is the following: 1. I build a classifier to detect if a image has any ships. 2. Feed the image that contains image detected by the classifier to Mask R CNN. Results Acknowledgement This code is implemented on maskrcnn frameworks . Thanks for their great work! Prerequisites Python 3.6 Jupyter Notebook Meta Chi Zhang – @LinkedIn – c.zhang@neu.edu Distributed under the MIT license. See LICENSE for more information. mit url : mit image : npm image : npm url : npm downloads : travis image : travis url : wiki :",Object Detection,Object Detection 2867,Computer Vision,Computer Vision,Computer Vision,"Introduction Fork from This directory contains python software and an iOS App developed by Ultralytics LLC, and is freely available for redistribution under the GPL 3.0 license . For more information please visit Description The repo contains inference and training code for YOLOv3 in PyTorch. The code works on Linux, MacOS and Windows. Training is done on the COCO dataset by default: Credit to Joseph Redmon for YOLO: Requirements Python 3.6 or later with the following pip3 install U r requirements.txt packages: numpy torch > 1.0.0 opencv python tqdm Tutorials GCP Quickstart Transfer Learning Train Single Image Train Single Class Train Custom Data Training VOC dataset download pretrained model from baiducloud password: 98k3 or googlecloud mkdir Train put .xml and .jpg dataset in Train cd myscripts vim class_path.py and modify classes , ie: classes 3 , 4 , 5 , 6 , 7 , 8 sh process_data.sh cd .. sh train.sh Training coco dataset Start Training: Run train.py to begin training after downloading COCO data with data/get_coco_dataset.sh . Resume Training: Run train.py resume resumes training from the latest checkpoint weights/latest.pt . Each epoch trains on 120,000 images from the train and validate COCO sets, and tests on 5000 images from the COCO validate set. Default training settings produce loss plots below, with training speed of 0.6 s/batch on a 1080 Ti (18 epochs/day) or 0.45 s/batch on a 2080 Ti. from utils import utils; utils.plot_results() ! Alt Image Augmentation datasets.py applies random OpenCV powered augmentation to the input images in accordance with the following specifications. Augmentation is applied only during training, not during inference. Bounding boxes are automatically tracked and updated with the images. 416 x 416 examples pictured below. Augmentation Description Translation +/ 10% (vertical and horizontal) Rotation +/ 5 degrees Shear +/ 2 degrees (vertical and horizontal) Scale +/ 10% Reflection 50% probability (horizontal only) H S V Saturation +/ 50% HS V Intensity +/ 50% Speed Machine type: n1 standard 8 (8 vCPUs, 30 GB memory) CPU platform: Intel Skylake GPUs: K80 ($0.198/hr), P4 ($0.279/hr), T4 ($0.353/hr), P100 ($0.493/hr), V100 ($0.803/hr) HDD: 100 GB SSD Dataset: COCO train 2014 GPUs batch_size batch time epoch time epoch cost (images) (s/batch) 1 K80 16 1.43s 175min $0.58 1 P4 8 0.51s 125min $0.58 1 T4 16 0.78s 94min $0.55 1 P100 16 0.39s 48min $0.39 2 P100 32 0.48s 29min $0.47 4 P100 64 0.65s 20min $0.65 1 V100 16 0.25s 31min $0.41 2 V100 32 0.29s 18min $0.48 4 V100 64 0.41s 13min $0.70 8 V100 128 0.49s 7min $0.80 Inference Run detect.py to apply trained weights to an image, such as zidane.jpg from the data/samples folder: YOLOv3: python3 detect.py cfg cfg/yolov3.cfg weights weights/yolov3.weights YOLOv3 tiny: python3 detect.py cfg cfg/yolov3 tiny.cfg weights weights/yolov3 tiny.weights YOLOv3 SPP: python3 detect.py cfg cfg/yolov3 spp.cfg weights weights/yolov3 spp.weights Webcam Run detect.py with webcam True to show a live webcam feed. Pretrained Weights Darknet .weights format: PyTorch .pt format: mAP Use test.py weights weights/yolov3.weights to test the official YOLOv3 weights. Use test.py weights weights/latest.pt to test the latest training results. Compare to darknet published results ultralytics/yolov3 OR NMS 5:52@416 ( pycocotools ) darknet YOLOv3 320 51.9 (51.4) 51.5 YOLOv3 416 55.0 (54.9) 55.3 YOLOv3 608 57.5 (57.8) 57.9 ultralytics/yolov3 MERGE NMS 7:15@416 ( pycocotools ) darknet YOLOv3 320 52.3 (51.7) 51.5 YOLOv3 416 55.4 (55.3) 55.3 YOLOv3 608 57.9 (58.1) 57.9 ultralytics/yolov3 MERGE+earlier_pred4 8:34@416 ( pycocotools ) darknet YOLOv3 320 52.3 (51.8) 51.5 YOLOv3 416 55.5 (55.4) 55.3 YOLOv3 608 57.9 (58.2) 57.9 > ultralytics/yolov3 darknet YOLOv3 320 51.8 51.5 YOLOv3 416 55.4 55.3 YOLOv3 608 58.2 57.9 YOLOv3 spp 320 52.4 YOLOv3 spp 416 56.5 YOLOv3 spp 608 60.7 60.6 bash sudo rm rf yolov3 && git clone bash yolov3/data/get_coco_dataset.sh sudo rm rf cocoapi && git clone && cd cocoapi/PythonAPI && make && cd ../.. && cp r cocoapi/PythonAPI/pycocotools yolov3 cd yolov3 python3 test.py save json conf thres 0.001 img size 608 batch size 16 Namespace(batch_size 16, cfg 'cfg/yolov3.cfg', conf_thres 0.001, data_cfg 'cfg/coco.data', img_size 608, iou_thres 0.5, nms_thres 0.5, save_json True, weights 'weights/yolov3.weights') Using cuda _CudaDeviceProperties(name 'Tesla V100 SXM2 16GB', major 7, minor 0, total_memory 16130MB, multi_processor_count 80) Image Total P R mAP Calculating mAP: 100% █████████████████████████████████ 313/313 08:54<00:00, 1.55s/it 5000 5000 0.0966 0.786 0.579 Average Precision (AP) @ IoU 0.50:0.95 area all maxDets 100 0.331 Average Precision (AP) @ IoU 0.50 area all maxDets 100 0.582 Average Precision (AP) @ IoU 0.75 area all maxDets 100 0.344 Average Precision (AP) @ IoU 0.50:0.95 area small maxDets 100 0.198 Average Precision (AP) @ IoU 0.50:0.95 area medium maxDets 100 0.362 Average Precision (AP) @ IoU 0.50:0.95 area large maxDets 100 0.427 Average Recall (AR) @ IoU 0.50:0.95 area all maxDets 1 0.281 Average Recall (AR) @ IoU 0.50:0.95 area all maxDets 10 0.437 Average Recall (AR) @ IoU 0.50:0.95 area all maxDets 100 0.463 Average Recall (AR) @ IoU 0.50:0.95 area small maxDets 100 0.309 Average Recall (AR) @ IoU 0.50:0.95 area medium maxDets 100 0.494 Average Recall (AR) @ IoU 0.50:0.95 area large maxDets 100 0.577 python3 test.py weights weights/yolov3 spp.weights cfg cfg/yolov3 spp.cfg save json img size 608 batch size 8 Namespace(batch_size 8, cfg 'cfg/yolov3 spp.cfg', conf_thres 0.001, data_cfg 'data/coco.data', img_size 608, iou_thres 0.5, nms_thres 0.5, save_json True, weights 'weights/yolov3 spp.weights') Using cuda _CudaDeviceProperties(name 'Tesla V100 SXM2 16GB', major 7, minor 0, total_memory 16130MB, multi_processor_count 80) Image Total P R mAP Calculating mAP: 100% █████████████████████████████████ 625/625 07:01<00:00, 1.56it/s 5000 5000 0.12 0.81 0.611 Average Precision (AP) @ IoU 0.50:0.95 area all maxDets 100 0.366 Average Precision (AP) @ IoU 0.50 area all maxDets 100 0.607 Average Precision (AP) @ IoU 0.75 area all maxDets 100 0.386 Average Precision (AP) @ IoU 0.50:0.95 area small maxDets 100 0.207 Average Precision (AP) @ IoU 0.50:0.95 area medium maxDets 100 0.391 Average Precision (AP) @ IoU 0.50:0.95 area large maxDets 100 0.485 Average Recall (AR) @ IoU 0.50:0.95 area all maxDets 1 0.296 Average Recall (AR) @ IoU 0.50:0.95 area all maxDets 10 0.464 Average Recall (AR) @ IoU 0.50:0.95 area all maxDets 100 0.494 Average Recall (AR) @ IoU 0.50:0.95 area small maxDets 100 0.331 Average Recall (AR) @ IoU 0.50:0.95 area medium maxDets 100 0.517 Average Recall (AR) @ IoU 0.50:0.95 area large maxDets 100 0.618 Citation DOI Contact Issues should be raised directly in the repository. For additional questions or comments please email Glenn Jocher at glenn.jocher@ultralytics.com or visit us at",Object Detection,Object Detection 2870,Computer Vision,Computer Vision,Computer Vision,"Yolo v2 Windows and Linux version CircleCI 1. How to use ( how to use) 2. How to compile on Linux ( how to compile on linux) 3. How to compile on Windows ( how to compile on windows) 4. How to train (Pascal VOC Data) ( how to train pascal voc data) 5. How to train (to detect your custom objects) ( how to train to detect your custom objects) 6. When should I stop training ( when should i stop training) 7. How to improve object detection ( how to improve object detection) 8. How to mark bounded boxes of objects and create annotation files ( how to mark bounded boxes of objects and create annotation files) 9. Using Yolo9000 ( using yolo9000) 10. How to use Yolo as DLL ( how to use yolo as dll) ! Darknet Logo ! map_fps ! Darknet Logo ! map_fps You Only Look Once: Unified, Real Time Object Detection (version 2) A Yolo cross platform Windows and Linux version (for object detection). Contributtors: This repository is forked from Linux version: More details: This repository supports: both Windows and Linux both OpenCV 3.x and OpenCV 2.4.13 both cuDNN 5 and cuDNN 6 CUDA > 7.5 also create SO library on Linux and DLL library on Windows Requires: Linux GCC> 4.9 or Windows MS Visual Studio 2015 (v140) : (or offline ISO image ) CUDA 8.0 : OpenCV 3.x : or OpenCV 2.4.13 : OpenCV allows to show image or video detection in the window and store result to file that specified in command line out_filename res.avi GPU with CC > 2.0 if you use CUDA, or GPU CC > 3.0 if you use cuDNN + CUDA: Pre trained models for different cfg files can be downloaded from (smaller > faster & lower quality): yolo.cfg (194 MB COCO model) require 4 GB GPU RAM: yolo voc.cfg (194 MB VOC model) require 4 GB GPU RAM: tiny yolo.cfg (60 MB COCO model) require 1 GB GPU RAM: tiny yolo voc.cfg (60 MB VOC model) require 1 GB GPU RAM: yolo9000.cfg (186 MB Yolo9000 model) require 4 GB GPU RAM: Put it near compiled: darknet.exe You can get cfg files by path: darknet/cfg/ Examples of results: Everything Is AWESOME Others: How to use: Example of usage in cmd files from build\darknet\x64\ : darknet_voc.cmd initialization with 194 MB VOC model yolo voc.weights & yolo voc.cfg and waiting for entering the name of the image file darknet_demo_voc.cmd initialization with 194 MB VOC model yolo voc.weights & yolo voc.cfg and play your video file which you must rename to: test.mp4 darknet_demo_store.cmd initialization with 194 MB VOC model yolo voc.weights & yolo voc.cfg and play your video file which you must rename to: test.mp4, and store result to: res.avi darknet_net_cam_voc.cmd initialization with 194 MB VOC model, play video from network video camera mjpeg stream (also from you phone) darknet_web_cam_voc.cmd initialization with 194 MB VOC model, play video from Web Camera number 0 darknet_coco_9000.cmd initialization with 186 MB Yolo9000 COCO model, and show detection on the image: dog.jpg darknet_coco_9000_demo.cmd initialization with 186 MB Yolo9000 COCO model, and show detection on the video (if it is present): street4k.mp4, and store result to: res.avi How to use on the command line: On Linux use ./darknet instead of darknet.exe , like this: ./darknet detector test ./cfg/coco.data ./cfg/yolo.cfg ./yolo.weights 194 MB COCO model image: darknet.exe detector test data/coco.data yolo.cfg yolo.weights i 0 thresh 0.2 Alternative method 194 MB COCO model image: darknet.exe detect yolo.cfg yolo.weights i 0 thresh 0.2 194 MB VOC model image: darknet.exe detector test data/voc.data yolo voc.cfg yolo voc.weights i 0 194 MB COCO model video: darknet.exe detector demo data/coco.data yolo.cfg yolo.weights test.mp4 i 0 194 MB VOC model video: darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights test.mp4 i 0 194 MB COCO model save result to the file res.avi : darknet.exe detector demo data/coco.data yolo.cfg yolo.weights test.mp4 i 0 out_filename res.avi 194 MB VOC model save result to the file res.avi : darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights test.mp4 i 0 out_filename res.avi Alternative method 194 MB VOC model video: darknet.exe yolo demo yolo voc.cfg yolo voc.weights test.mp4 i 0 60 MB VOC model for video: darknet.exe detector demo data/voc.data tiny yolo voc.cfg tiny yolo voc.weights test.mp4 i 0 194 MB COCO model for net videocam Smart WebCam: darknet.exe detector demo data/coco.data yolo.cfg yolo.weights i 0 194 MB VOC model for net videocam Smart WebCam: darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights i 0 194 MB VOC model WebCamera 0: darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights c 0 186 MB Yolo9000 image: darknet.exe detector test cfg/combine9k.data yolo9000.cfg yolo9000.weights 186 MB Yolo9000 video: darknet.exe detector demo cfg/combine9k.data yolo9000.cfg yolo9000.weights test.mp4 To process a list of images image_list.txt and save results of detection to result.txt use: darknet.exe detector test data/voc.data yolo voc.cfg yolo voc.weights result.txt You can comment this line so that each image does not require pressing the button ESC: For using network video camera mjpeg stream with any Android smartphone: 1. Download for Android phone mjpeg stream soft: IP Webcam / Smart WebCam Smart WebCam preferably: IP Webcam: 2. Connect your Android phone to computer by WiFi (through a WiFi router) or USB 3. Start Smart WebCam on your phone 4. Replace the address below, on shown in the phone application (Smart WebCam) and launch: 194 MB COCO model: darknet.exe detector demo data/coco.data yolo.cfg yolo.weights i 0 194 MB VOC model: darknet.exe detector demo data/voc.data yolo voc.cfg yolo voc.weights i 0 How to compile on Linux: Just do make in the darknet directory. Before make, you can set such options in the Makefile : link GPU 1 to build with CUDA to accelerate by using GPU (CUDA should be in /usr/local/cuda ) CUDNN 1 to build with cuDNN v5/v6 to accelerate training by using GPU (cuDNN should be in /usr/local/cudnn ) OPENCV 1 to build with OpenCV 3.x/2.4.x allows to detect on video files and video streams from network cameras or web cams DEBUG 1 to bould debug version of Yolo OPENMP 1 to build with OpenMP support to accelerate Yolo by using multi core CPU LIBSO 1 to build a library darknet.so and binary runable file uselib that uses this library. Or you can try to run so LD_LIBRARY_PATH ./:$LD_LIBRARY_PATH ./uselib test.mp4 How to use this SO library from your own code you can look at C++ example: How to compile on Windows: 1. If you have MSVS 2015, CUDA 8.0 and OpenCV 3.0 (with paths: C:\opencv_3.0\opencv\build\include & C:\opencv_3.0\opencv\build\x64\vc14\lib ), then start MSVS, open build\darknet\darknet.sln , set x64 and Release , and do the: Build > Build darknet 1.1. Find files opencv_world320.dll and opencv_ffmpeg320_64.dll in C:\opencv_3.0\opencv\build\x64\vc14\bin and put it near with darknet.exe 2. If you have other version of CUDA (not 8.0) then open build\darknet\darknet.vcxproj by using Notepad, find 2 places with CUDA 8.0 and change it to your CUDA version, then do step 1 3. If you don't have GPU , but have MSVS 2015 and OpenCV 3.0 (with paths: C:\opencv_3.0\opencv\build\include & C:\opencv_3.0\opencv\build\x64\vc14\lib ), then start MSVS, open build\darknet\darknet_no_gpu.sln , set x64 and Release , and do the: Build > Build darknet 4. If you have OpenCV 2.4.13 instead of 3.0 then you should change pathes after \darknet.sln is opened 4.1 (right click on project) > properties > C/C++ > General > Additional Include Directories: C:\opencv_2.4.13\opencv\build\include 4.2 (right click on project) > properties > Linker > General > Additional Library Directories: C:\opencv_2.4.13\opencv\build\x64\vc14\lib 5. If you want to build with CUDNN to speed up then: download and install cuDNN 6.0 for CUDA 8.0 : add Windows system variable cudnn with path to CUDNN: open \darknet.sln > (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions, and add at the beginning of line: CUDNN; How to compile (custom): Also, you can to create your own darknet.sln & darknet.vcxproj , this example for CUDA 8.0 and OpenCV 3.0 Then add to your created project: (right click on project) > properties > C/C++ > General > Additional Include Directories, put here: C:\opencv_3.0\opencv\build\include;..\..\3rdparty\include;%(AdditionalIncludeDirectories);$(CudaToolkitIncludeDir);$(cudnn)\include (right click on project) > Build dependecies > Build Customizations > set check on CUDA 8.0 or what version you have for example as here: add to project all .c & .cu files from \src (right click on project) > properties > Linker > General > Additional Library Directories, put here: C:\opencv_3.0\opencv\build\x64\vc14\lib;$(CUDA_PATH)lib\$(PlatformName);$(cudnn)\lib\x64;%(AdditionalLibraryDirectories) (right click on project) > properties > Linker > Input > Additional dependecies, put here: ..\..\3rdparty\lib\x64\pthreadVC2.lib;cublas.lib;curand.lib;cudart.lib;cudnn.lib;%(AdditionalDependencies) (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions OPENCV;_TIMESPEC_DEFINED;_CRT_SECURE_NO_WARNINGS;_CRT_RAND_S;WIN32;NDEBUG;_CONSOLE;_LIB;%(PreprocessorDefinitions) compile to .exe (X64 & Release) and put .dll s near with .exe: pthreadVC2.dll, pthreadGC2.dll from \3rdparty\dll\x64 cusolver64_80.dll, curand64_80.dll, cudart64_80.dll, cublas64_80.dll 80 for CUDA 8.0 or your version, from C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\bin For OpenCV 3.X: opencv_world320.dll and opencv_ffmpeg320_64.dll from C:\opencv_3.0\opencv\build\x64\vc14\bin For OpenCV 2.4.13: opencv_core2413.dll , opencv_highgui2413.dll and opencv_ffmpeg2413_64.dll from C:\opencv_2.4.13\opencv\build\x64\vc14\bin How to train (Pascal VOC Data): 1. Download pre trained weights for the convolutional layers (76 MB): and put to the directory build\darknet\x64 2. Download The Pascal VOC Data and unpack it to directory build\darknet\x64\data\voc will be created dir build\darknet\x64\data\voc\VOCdevkit\ : 2.1 Download file voc_label.py to dir build\darknet\x64\data\voc : 3. Download and install Python for Windows: 4. Run command: python build\darknet\x64\data\voc\voc_label.py (to generate files: 2007_test.txt, 2007_train.txt, 2007_val.txt, 2012_train.txt, 2012_val.txt) 5. Run command: type 2007_train.txt 2007_val.txt 2012_ .txt > train.txt 6. Set batch 64 and subdivisions 8 in the file yolo voc.2.0.cfg : link 7. Start training by using train_voc.cmd or by using the command line: darknet.exe detector train data/voc.data yolo voc.2.0.cfg darknet19_448.conv.23 If required change pathes in the file build\darknet\x64\data\voc.data More information about training by the link: How to train with multi GPU: 1. Train it first on 1 GPU for like 1000 iterations: darknet.exe detector train data/voc.data yolo voc.2.0.cfg darknet19_448.conv.23 2. Then stop and by using partially trained model /backup/yolo voc_1000.weights run training with multigpu (up to 4 GPUs): darknet.exe detector train data/voc.data yolo voc.2.0.cfg /backup/yolo voc_1000.weights gpus 0,1,2,3 How to train (to detect your custom objects): 1. Create file yolo obj.cfg with the same content as in yolo voc.2.0.cfg (or copy yolo voc.2.0.cfg to yolo obj.cfg) and: change line batch to batch 64 change line subdivisions to subdivisions 8 change line classes 20 to your number of objects change line 237 from filters 125 to: filters (classes + 5)x5, so if classes 2 then should be filters 35 . Or if you use classes 1 then write filters 30 , do not write in the cfg file: filters (classes + 5)x5 . (Generally filters depends on the classes , num and coords , i.e. equal to (classes + coords + 1) num , where num is number of anchors) So for example, for 2 objects, your file yolo obj.cfg should differ from yolo voc.2.0.cfg in such lines: convolutional filters 35 region classes 2 2. Create file obj.names in the directory build\darknet\x64\data\ , with objects names each in new line 3. Create file obj.data in the directory build\darknet\x64\data\ , containing (where classes number of objects ): classes 2 train data/train.txt valid data/test.txt names data/obj.names backup backup/ 4. Put image files (.jpg) of your objects in the directory build\darknet\x64\data\obj\ 5. Create .txt file for each .jpg image file in the same directory and with the same name, but with .txt extension, and put to file: object number and object coordinates on this image, for each object in new line: Where: integer number of object from 0 to (classes 1) float values relative to width and height of image, it can be equal from 0.0 to 1.0 for example: / or / atention: are center of rectangle (are not top left corner) For example for img1.jpg you should create img1.txt containing: 1 0.716797 0.395833 0.216406 0.147222 0 0.687109 0.379167 0.255469 0.158333 1 0.420312 0.395833 0.140625 0.166667 6. Create file train.txt in directory build\darknet\x64\data\ , with filenames of your images, each filename in new line, with path relative to darknet.exe , for example containing: data/obj/img1.jpg data/obj/img2.jpg data/obj/img3.jpg 7. Download pre trained weights for the convolutional layers (76 MB): and put to the directory build\darknet\x64 8. Start training by using the command line: darknet.exe detector train data/obj.data yolo obj.cfg darknet19_448.conv.23 (file yolo obj_xxx.weights will be saved to the build\darknet\x64\backup\ for each 100 iterations until 1000 iterations has been reached, and after for each 1000 iterations) 9. After training is complete get result yolo obj_final.weights from path build\darknet\x64\backup\ After each 1000 iterations you can stop and later start training from this point. For example, after 2000 iterations you can stop training, and later just copy yolo obj_2000.weights from build\darknet\x64\backup\ to build\darknet\x64\ and start training using: darknet.exe detector train data/obj.data yolo obj.cfg yolo obj_2000.weights Also you can get result earlier than all 45000 iterations. When should I stop training: Usually sufficient 2000 iterations for each class(object). But for a more precise definition when you should stop training, use the following manual: 1. During training, you will see varying indicators of error, and you should stop when no longer decreases 0.060730 avg : > Region Avg IOU: 0.798363, Class: 0.893232, Obj: 0.700808, No Obj: 0.004567, Avg Recall: 1.000000, count: 8 > Region Avg IOU: 0.800677, Class: 0.892181, Obj: 0.701590, No Obj: 0.004574, Avg Recall: 1.000000, count: 8 > > 9002 : 0.211667, 0.060730 avg , 0.001000 rate, 3.868000 seconds, 576128 images > Loaded: 0.000000 seconds 9002 iteration number (number of batch) 0.060730 avg average loss (error) the lower, the better When you see that average loss 0.xxxxxx avg no longer decreases at many iterations then you should stop training. 2. Once training is stopped, you should take some of last .weights files from darknet\build\darknet\x64\backup and choose the best of them: For example, you stopped training after 9000 iterations, but the best result can give one of previous weights (7000, 8000, 9000). It can happen due to overfitting. Overfitting is case when you can detect objects on images from training dataset, but can't detect ojbects on any others images. You should get weights from Early Stopping Point : ! Overfitting To get weights from Early Stopping Point: 2.1. At first, in your file obj.data you must specify the path to the validation dataset valid valid.txt (format of valid.txt as in train.txt ), and if you haven't validation images, just copy data\train.txt to data\valid.txt . 2.2 If training is stopped after 9000 iterations, to validate some of previous weights use this commands: darknet.exe detector recall data/obj.data yolo obj.cfg backup\yolo obj_7000.weights darknet.exe detector recall data/obj.data yolo obj.cfg backup\yolo obj_8000.weights darknet.exe detector recall data/obj.data yolo obj.cfg backup\yolo obj_9000.weights And comapre last output lines for each weights (7000, 8000, 9000): > 7586 7612 7689 RPs/Img: 68.23 IOU: 77.86% Recall:99.00% IOU the bigger, the better (says about accuracy) better to use Recall the bigger, the better (says about accuracy) actually Yolo calculates true positives, so it shouldn't be used For example, bigger IOU gives weights yolo obj_8000.weights then use this weights for detection . ! precision_recall_iou How to calculate mAP voc_eval.py or datascience.stackexchange link Custom object detection: Example of custom object detection: darknet.exe detector test data/obj.data yolo obj.cfg yolo obj_8000.weights ! Yolo_v2_training ! Yolo_v2_training How to improve object detection: 1. Before training: set flag random 1 in your .cfg file it will increase precision by training Yolo for different resolutions: link desirable that your training dataset include images with objects at diffrent: scales, rotations, lightings, from different sides 2. After training for detection: Increase network resolution by set in your .cfg file ( height 608 and width 608 ) or ( height 832 and width 832 ) or (any value multiple of 32) this increases the precision and makes it possible to detect small objects: link you do not need to train the network again, just use .weights file already trained for 416x416 resolution if error Out of memory occurs then in .cfg file you should increase subdivisions 16 , 32 or 64: link How to mark bounded boxes of objects and create annotation files: Here you can find repository with GUI software for marking bounded boxes of objects and generating annotation files for Yolo v2: With example of: train.txt , obj.names , obj.data , yolo obj.cfg , air 1 6 .txt , bird 1 4 .txt for 2 classes of objects (air, bird) and train_obj.cmd with example how to train this image set with Yolo v2 Using Yolo9000 Simultaneous detection and classification of 9000 objects: yolo9000.weights (186 MB Yolo9000 Model) requires 4 GB GPU RAM: yolo9000.cfg cfg file of the Yolo9000, also there are paths to the 9k.tree and coco9k.map 9k.tree WordTree of 9418 categories , if parent_id 1 then this label hasn't parent: coco9k.map map 80 categories from MSCOCO to WordTree 9k.tree : combine9k.data data file, there are paths to: 9k.labels , 9k.names , inet9k.map , (change path to your combine9k.train.list ): 9k.labels 9418 labels of objects: 9k.names 9418 names of objects: inet9k.map map 200 categories from ImageNet to WordTree 9k.tree : How to use Yolo as DLL 1. To compile Yolo as C++ DLL file yolo_cpp_dll.dll open in MSVS2015 file build\darknet\yolo_cpp_dll.sln , set x64 and Release , and do the: Build > Build yolo_cpp_dll You should have installed CUDA 8.0 To use cuDNN do: (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions, and add at the beginning of line: CUDNN; 2. To use Yolo as DLL file in your C++ console application open in MSVS2015 file build\darknet\yolo_console_dll.sln , set x64 and Release , and do the: Build > Build yolo_console_dll you can run your console application from Windows Explorer build\darknet\x64\yolo_console_dll.exe or you can run from MSVS2015 (before this you should copy 2 files yolo voc.cfg and yolo voc.weights to the directory build\darknet\ ) after launching your console application and entering the image file name you will see info for each object: to use simple OpenCV GUI you should uncomment line // define OPENCV in yolo_console_dll.cpp file: link you can see source code of simple example for detection on the video file: link yolo_cpp_dll.dll API: link class Detector { public: Detector(std::string cfg_filename, std::string weight_filename, int gpu_id 0); Detector(); std::vector detect(std::string image_filename, float thresh 0.2, bool use_mean false); std::vector detect(image_t img, float thresh 0.2, bool use_mean false); static image_t load_image(std::string image_filename); static void free_image(image_t m); ifdef OPENCV std::vector detect(cv::Mat mat, float thresh 0.2, bool use_mean false); endif };",Object Detection,Object Detection 2872,Computer Vision,Computer Vision,Computer Vision,"Cascade R CNN: Delving into High Quality Object Detection by Zhaowei Cai and Nuno Vasconcelos This repository is written by Zhaowei Cai at UC San Diego. Introduction This repository implements mulitple popular object detection algorithms, including Faster R CNN, R FCN, FPN, and our recently proposed Cascade R CNN, on the MS COCO and PASCAL VOC datasets. Multiple choices are available for backbone network, including AlexNet, VGG Net and ResNet. It is written in C++ and powered by Caffe deep learning toolbox. Cascade R CNN is a multi stage extension of the popular two stage R CNN object detection framework. The goal is to obtain high quality object detection, which can effectively reject close false positives. It consists of a sequence of detectors trained end to end with increasing IoU thresholds, to be sequentially more selective against close false positives. The output of a previous stage detector is forwarded to a later stage detector, and the detection results will be improved stage by stage. This idea can be applied to any detector based on the two stage R CNN framework, including Faster R CNN, R FCN, FPN, Mask R CNN, etc, and reliable gains are available independently of baseline strength. A vanilla Cascade R CNN on FPN detector of ResNet 101 backbone network, without any training or inference bells and whistles, achieved state of the art results on the challenging MS COCO dataset. Update The re implementation of Cascade R CNN in Detectron has been released. See Detectron Cascade RCNN . Very consistent improvements are available for all tested models, independent of baseline strength. It is also recommended to use the third party implementation, mmdetection based on PyTorch and tensorpack based on TensorFlow. Citation If you use our code/model/data, please cite our paper: @inproceedings{cai18cascadercnn, author {Zhaowei Cai and Nuno Vasconcelos}, Title {Cascade R CNN: Delving into High Quality Object Detection}, booktitle {CVPR}, Year {2018} } Benchmarking We benchmark mulitple detector models on the MS COCO and PASCAL VOC datasets in the below tables. 1. MS COCO (Train/Test: train2017/val2017, shorter size: 800 for FPN and 600 for the others) model GPUs bs lr iter train time test time AP AP50 AP75 VGG RPN baseline 2 4 3e 3 100k 12.5 hr 0.075s 23.6 43.9 23.0 VGG RPN Cascade 2 4 3e 3 100k 15.5 hr 0.115s 27.0 44.2 27.7 Res50 RFCN baseline 4 1 3e 3 280k 19 hr 0.07s 27.0 48.7 26.9 Res50 RFCN Cascade 4 1 3e 3 280k 22.5 hr 0.075s 31.1 49.8 32.8 Res101 RFCN baseline 4 1 3e 3 280k 29 hr 0.075s 30.3 52.2 30.8 Res101 RFCN Cascade 4 1 3e 3 280k 30.5 hr 0.085s 33.3 52.0 35.2 Res50 FPN baseline 8 1 5e 3 280k 32 hr 0.095s 36.5 58.6 39.2 Res50 FPN Cascade 8 1 5e 3 280k 36 hr 0.115s 40.3 59.4 43.7 Res101 FPN baseline 8 1 5e 3 280k 37 hr 0.115s 38.5 60.6 41.7 Res101 FPN Cascade 8 1 5e 3 280k 46 hr 0.14s 42.7 61.6 46.6 2. PASCAL VOC 2007 (Train/Test: 2007+2012trainval/2007test, shorter size: 600) model GPUs bs lr iter train time AP AP50 AP75 Alex RPN baseline 2 4 1e 3 45k 2.5 hr 29.4 63.2 23.7 Alex RPN Cascade 2 4 1e 3 45k 3 hr 38.9 66.5 40.5 VGG RPN baseline 2 4 1e 3 45k 6 hr 42.9 76.4 44.1 VGG RPN Cascade 2 4 1e 3 45k 7.5 hr 51.2 79.1 56.3 Res50 RFCN baseline 2 2 2e 3 90k 8 hr 44.8 77.5 46.8 Res50 RFCN Cascade 2 2 2e 3 90k 9 hr 51.8 78.5 57.1 Res101 RFCN baseline 2 2 2e 3 90k 10.5 hr 49.4 79.8 53.2 Res101 RFCN Cascade 2 2 2e 3 90k 12 hr 54.2 79.6 59.2 NOTE . In the above tables, all models have been run at least two times with close results. The training is relatively stable. RPN means Faster R CNN. The annotations of PASCAL VOC are transformed to COCO format, and COCO API was used for evaluation. The results are different from the official VOC evaluation. If you want to compare the VOC results in publication, please use the official VOC code for evaluation. Requirements 1. NVIDIA GPU and cuDNN are required to have fast speeds. For now, CUDA 8.0 with cuDNN 6.0.20 has been tested. The other versions should be working. 2. Caffe MATLAB wrapper is required to run the detection/evaluation demo. Installation 1. Clone the Cascade RCNN repository, and we'll call the directory that you cloned Cascade RCNN into CASCADE_ROOT Shell git clone 2. Build Cascade RCNN Shell cd $CASCADE_ROOT/ Follow the Caffe installation instructions here: If you're experienced with Caffe and have all of the requirements installed and your Makefile.config in place, then simply do: make all j 16 If you want to run Cascade RCNN detection/evaluation demo, build MATLAB wrapper as well make matcaffe Datasets If you already have a COCO/VOC copy but not as organized as below, you can simply create Symlinks to have the same directory structure. MS COCO In all MS COCO experiments, we use train2017 for training, and val2017 (a.k.a. minival ) for validation. Follow MS COCO website to download images/annotations, and set up the COCO API. Assumed that your local COCO dataset copy is at /your/path/to/coco , make sure it has the following directory structure: coco _ images _ train2017 _ .jpg _ ... _ .jpg _ val2017 _ ... _ annotations _ instances_train2017.json _ instances_val2017.json _ ... _ MatlabAPI PASCAL VOC In all PASCAL VOC experiments, we use VOC2007+VOC2012 trainval for training, and VOC2007 test for validation. Follow PASCAL VOC website to download images/annotations, and set up the VOCdevkit. Assumed that your local VOCdevkit copy is at /your/path/to/VOCdevkit , make sure it has the following directory structure: VOCdevkit _ VOC2007 _ JPEGImages _ .jpg _ ... _ .jpg _ Annotations _ .xml _ ... _ .xml _ ... _ VOC2012 _ JPEGImages _ .jpg _ ... _ .jpg _ Annotations _ .xml _ ... _ .xml _ ... _ VOCcode Training Cascade RCNN 1. Get the training data Shell cd $CASCADE_ROOT/data/ sh get_coco_data.sh This will download the window files required for the experiments. You can also use the provided MATLAB scripts coco_window_file.m under $CASCADE_ROOT/data/coco/ to generate your own window files. 2. Download the pretrained models on ImageNet. For AlexNet and VGG Net, the FC layers are pruned and 2048 units per FC layer are remained. In addition, the two FC layers are copied three times for Cascade R CNN training. For ResNet, the BatchNorm layers are merged into Scale layers and frozen during training as common practice. Shell cd $CASCADE_ROOT/models/ sh fetch_vggnet.sh 3. Multiple shell scripts are provided to train Cascade RCNN on different baseline detectors as described in our paper. Under each model folder, you need to change the root_folder of the data layer in train.prototxt and test.prototxt to your COCO path. After that, you can start to train your own Cascade RCNN models. Take vgg 12s 600 rpn cascade for example. Shell cd $CASCADE_ROOT/examples/coco/vgg 12s 600 rpn cascade/ sh train_detection.sh Log file will be generated along the training procedure. The total training time depends on the complexity of models and datasets. If you want to quickly check if the training works well, try the light AlexNet model on VOC dataset. NOTE . Occasionally, the training of the Res101 FPN Cascade will be out of memory. Just resume the training from the latest solverstate. Pretrained Models We only provide the Res50 FPN baseline, Res50 FPN Cascade and Res101 FPN Cascade models for COCO dataset, and Res101 RFCN Cascade for VOC dataset. Download pre trained models Shell cd $CASCADE_ROOT/examples/coco/ sh fetch_cascadercnn_models.sh The pretrained models produce exactly the same results as described in our paper. Testing/Evaluation Demo Once the models pretrained or trained by yourself are available, you can use the MATLAB script run_cascadercnn_coco.m to obtain the detection and evaluation results. Set the right dataset path and choose the model of your interest to test in the demo script. The default setting is for the pretrained model. The final detection results will be saved under $CASCADE_ROOT/examples/coco/detections/ and the evaluation results will be saved under the model folder. You also can run the shell script test_coco_detection.sh under each model folder for evalution, but it is not identical to the official evaluation. For publication, use the MATLAB script. Disclaimer 1. When we were re implementing the FPN framework and roi_align layer, we only referred to their published papers. Thus, our implementation details could be different from the official Detectron . If you encounter any issue when using our code or model, please let me know.",Object Detection,Object Detection 2879,Computer Vision,Computer Vision,Computer Vision,"py faster rcnn has been deprecated. Please see Detectron , which includes an implementation of Mask R CNN . Disclaimer The official Faster R CNN code (written in MATLAB) is available here . If your goal is to reproduce the results in our NIPS 2015 paper, please use the official code . This repository contains a Python reimplementation of the MATLAB code. This Python implementation is built on a fork of Fast R CNN . There are slight differences between the two implementations. In particular, this Python port is 10% slower at test time, because some operations execute on the CPU in Python layers (e.g., 220ms / image vs. 200ms / image for VGG16) gives similar, but not exactly the same, mAP as the MATLAB version is not compatible with models trained using the MATLAB code due to the minor implementation differences includes approximate joint training that is 1.5x faster than alternating optimization (for VGG16) see these slides for more information Faster R CNN: Towards Real Time Object Detection with Region Proposal Networks By Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun (Microsoft Research) This Python implementation contains contributions from Sean Bell (Cornell) written during an MSR internship. Please see the official README.md for more details. Faster R CNN was initially described in an arXiv tech report and was subsequently published in NIPS 2015. License Faster R CNN is released under the MIT License (refer to the LICENSE file for details). Citing Faster R CNN If you find Faster R CNN useful in your research, please consider citing: @inproceedings{renNIPS15fasterrcnn, Author {Shaoqing Ren and Kaiming He and Ross Girshick and Jian Sun}, Title {Faster {R CNN}: Towards Real Time Object Detection with Region Proposal Networks}, Booktitle {Advances in Neural Information Processing Systems ({NIPS})}, Year {2015} } Contents 1. Requirements: software ( requirements software) 2. Requirements: hardware ( requirements hardware) 3. Basic installation ( installation sufficient for the demo) 4. Demo ( demo) 5. Beyond the demo: training and testing ( beyond the demo installation for training and testing models) 6. Usage ( usage) Requirements: software NOTE If you are having issues compiling and you are using a recent version of CUDA/cuDNN, please consult this issue for a workaround 1. Requirements for Caffe and pycaffe (see: Caffe installation instructions ) Note: Caffe must be built with support for Python layers! make In your Makefile.config, make sure to have this line uncommented WITH_PYTHON_LAYER : 1 Unrelatedly, it's also recommended that you use CUDNN USE_CUDNN : 1 You can download my Makefile.config for reference. 2. Python packages you might not have: cython , python opencv , easydict 3. Optional MATLAB is required for official PASCAL VOC evaluation only. The code now includes unofficial Python evaluation code. Requirements: hardware 1. For training smaller networks (ZF, VGG_CNN_M_1024) a good GPU (e.g., Titan, K20, K40, ...) with at least 3G of memory suffices 2. For training Fast R CNN with VGG16, you'll need a K40 (11G of memory) 3. For training the end to end version of Faster R CNN with VGG16, 3G of GPU memory is sufficient (using CUDNN) Installation (sufficient for the demo) 1. Clone the Faster R CNN repository Shell Make sure to clone with recursive git clone recursive 2. We'll call the directory that you cloned Faster R CNN into FRCN_ROOT Ignore notes 1 and 2 if you followed step 1 above. Note 1: If you didn't clone Faster R CNN with the recursive flag, then you'll need to manually clone the caffe fast rcnn submodule: Shell git submodule update init recursive Note 2: The caffe fast rcnn submodule needs to be on the faster rcnn branch (or equivalent detached state). This will happen automatically if you followed step 1 instructions . 3. Build the Cython modules Shell cd $FRCN_ROOT/lib make 4. Build Caffe and pycaffe Shell cd $FRCN_ROOT/caffe fast rcnn Now follow the Caffe installation instructions here: If you're experienced with Caffe and have all of the requirements installed and your Makefile.config in place, then simply do: make j8 && make pycaffe 5. Download pre computed Faster R CNN detectors Shell cd $FRCN_ROOT ./data/scripts/fetch_faster_rcnn_models.sh This will populate the $FRCN_ROOT/data folder with faster_rcnn_models . See data/README.md for details. These models were trained on VOC 2007 trainval. Demo After successfully completing basic installation ( installation sufficient for the demo) , you'll be ready to run the demo. To run the demo Shell cd $FRCN_ROOT ./tools/demo.py The demo performs detection using a VGG16 network trained for detection on PASCAL VOC 2007. Beyond the demo: installation for training and testing models 1. Download the training, validation, test data and VOCdevkit Shell wget wget wget 2. Extract all of these tars into one directory named VOCdevkit Shell tar xvf VOCtrainval_06 Nov 2007.tar tar xvf VOCtest_06 Nov 2007.tar tar xvf VOCdevkit_08 Jun 2007.tar 3. It should have this basic structure Shell $VOCdevkit/ development kit $VOCdevkit/VOCcode/ VOC utility code $VOCdevkit/VOC2007 image sets, annotations, etc. ... and several other directories ... 4. Create symlinks for the PASCAL VOC dataset Shell cd $FRCN_ROOT/data ln s $VOCdevkit VOCdevkit2007 Using symlinks is a good idea because you will likely want to share the same PASCAL dataset installation between multiple projects. 5. Optional follow similar steps to get PASCAL VOC 2010 and 2012 6. Optional If you want to use COCO, please see some notes under data/README.md 7. Follow the next sections to download pre trained ImageNet models Download pre trained ImageNet models Pre trained ImageNet models can be downloaded for the three networks described in the paper: ZF and VGG16. Shell cd $FRCN_ROOT ./data/scripts/fetch_imagenet_models.sh VGG16 comes from the Caffe Model Zoo , but is provided here for your convenience. ZF was trained at MSRA. Usage To train and test a Faster R CNN detector using the alternating optimization algorithm from our NIPS 2015 paper, use experiments/scripts/faster_rcnn_alt_opt.sh . Output is written underneath $FRCN_ROOT/output . Shell cd $FRCN_ROOT ./experiments/scripts/faster_rcnn_alt_opt.sh GPU_ID NET set ... GPU_ID is the GPU you want to train on NET in {ZF, VGG_CNN_M_1024, VGG16} is the network arch to use set ... allows you to specify fast_rcnn.config options, e.g. set EXP_DIR seed_rng1701 RNG_SEED 1701 ( alt opt refers to the alternating optimization training algorithm described in the NIPS paper.) To train and test a Faster R CNN detector using the approximate joint training method, use experiments/scripts/faster_rcnn_end2end.sh . Output is written underneath $FRCN_ROOT/output . Shell cd $FRCN_ROOT ./experiments/scripts/faster_rcnn_end2end.sh GPU_ID NET set ... GPU_ID is the GPU you want to train on NET in {ZF, VGG_CNN_M_1024, VGG16} is the network arch to use set ... allows you to specify fast_rcnn.config options, e.g. set EXP_DIR seed_rng1701 RNG_SEED 1701 This method trains the RPN module jointly with the Fast R CNN network, rather than alternating between training the two. It results in faster ( 1.5x speedup) training times and similar detection accuracy. See these slides for more details. Artifacts generated by the scripts in tools are written in this directory. Trained Fast R CNN networks are saved under: output/ / / Test outputs are saved under: output/ / / /",Object Detection,Object Detection 2884,Computer Vision,Computer Vision,Computer Vision,"SimpleDet A Simple and Versatile Framework for Object Detection and Instance Recognition Major Features ! (./doc/image/diagram.png) FP16 training for memory saving and up to 2.5X acceleration Highly scalable distributed training available out of box Full coverage of state of the art models including FasterRCNN, MaskRCNN, CascadeRCNN, RetinaNet and TridentNet Extensive feature set including large batch BN , deformable convolution, soft NMS, multi scale train/test Modular design for coding free exploration of new experiment settings Setup Install SimpleDet contains a lot of C++ operators not in MXNet offical repo, so one has to build MXNet from scratch. Please refer to INSTALL.md (./doc/INSTALL.md) more details Preparing Data SimpleDet requires groundtruth annotation organized as following format { gt_class : (nBox, ), gt_bbox : (nBox, 4), flipped : bool, h : int, w : int, image_url : str, im_id : int, this fields are generated on the fly during test rec_id : int, resize_h : int, resize_w : int, ... }, ... Especially, for experimenting on coco datatet, one can organize coco data in data/ coco/ annotations/ instances_train2014.json instances_valminusminival2014.json instances_minival2014.json image_info_test dev2017.json images/ train2014 val2014 test2017 and run the helper script to generate roidb bash python3 utils/generate_roidb.py dataset coco dataset split train2014 python3 utils/generate_roidb.py dataset coco dataset split valminusminival2014 python3 utils/generate_roidb.py dataset coco dataset split minival2014 python3 utils/generate_roidb.py dataset coco dataset split test dev2017 Deploy dependency and compile extension 1. setup mxnext, a wrapper of mxnet symbolic API bash pip3 install 'git+ 2. run make in simpledet directory to install cython extensions Quick Start bash train python3 detection_train.py config config/detection_config.py test python3 detection_test.py config config/detection_config.py Project Design Model Zoo Please refer to MODEL_ZOO.md (./MODEL_ZOO.md) for available models Code Structure detection_train.py detection_test.py config/ detection_config.py core/ detection_input.py detection_metric.py detection_module.py models/ FPN/ tridentnet/ maskrcnn/ cascade_rcnn/ retinanet/ mxnext/ symbol/ builder.py Config Everything is configurable from the config file, all the changes should be out of source . Experiments One experiment is a directory in experiments folder with the same name as the config file. > E.g. r50_fixbn_1x.py is the name of a config file config/ r50_fixbn_1x.py experiments/ r50_fixbn_1x/ checkpoint.params log.txt coco_minival2014_result.json Models The models directory contains SOTA models implemented in SimpletDet. How is Faster RCNN built Simpledet supports many popular detection methods and here we take Faster RCNN as a typical example to show how a detector is built. Preprocessing . The preprocessing methods of the detector is implemented through DetectionAugmentation . Image/bbox related preprocessing, such as Norm2DImage and Resize2DImageBbox . Anchor generator AnchorTarget2D , which generates anchors and corresponding anchor targets for training RPN. Network Structure . The training and testing symbols of Faster RCNN detector is defined in FasterRcnn . The key components are listed as follow: Backbone . Backbone provides interfaces to build backbone networks, e.g. ResNet and ResNext. Neck . Neck provides interfaces to build complementary feature extraction layers for backbone networks, e.g. FPNConvTopDown builds Top down pathway for Feature Pyramid Network . RPN head . RpnHead aims to build classification and regression layers to generate proposal outputs for RPN. Meanwhile, it also provides interplace to generate sampled proposals for the subsequent R CNN. Roi Extractor . RoiExtractor extracts features for each roi (proposal) based on the R CNN features generated by Backbone and Neck . Bounding Box Head . BboxHead builds the R CNN layers for proposal refinement. How to build a custom detector The flexibility of simpledet framework makes it easy to build different detectors. We take TridentNet as an example to demonstrate how to build a custom detector simply based on the Faster RCNN framework. Preprocessing . The additional processing methods could be provided accordingly by inheriting from DetectionAugmentation . In TridentNet, a new TridentAnchorTarget2D is implemented to generate anchors for multiple branches and filter anchors for scale aware training scheme. Network Structure . The new network structure could be constructed easily for a custom detector by modifying some required components as needed and For TridentNet, we build trident blocks in the Backbone according to the descriptions in the paper. We also provide a TridentRpnHead to generate filtered proposals in RPN to implement the scale aware scheme. Other components are shared the same with original Faster RCNN. Distributed Training Please refer to DISTRIBUTED.md (./doc/DISTRIBUTED.md) Contributors Yuntao Chen, Chenxia Han, Yanghao Li, Zehao Huang, Yi Jiang, Naiyan Wang License and Citation This project is release under the Apache 2.0 license for non commercial usage. For commercial usage, please contact us for another license. If you find our project helpful, please consider cite our tech report. @article{chen2019simpledet, title {SimpleDet: A Simple and Versatile Distributed Framework for Object Detection and Instance Recognition}, author {Chen, Yuntao and and Han, Chenxia and Li, Yanghao and Huang, Zehao and Jiang, Yi and Wang, Naiyan and Zhang, Zhaoxiang}, journal {arXiv preprint arXiv:1903.05831}, year {2019} }",Object Detection,Object Detection 2885,Computer Vision,Computer Vision,Computer Vision,"YOLO v2: Real Time Object Detection Still under development. 71 mAP(darknet) and 74mAP(resnet50) on VOC2007 achieved so far. This is a pre released version. What's new This repo is now deprecated, I am migrating to the latest Gluon CV which is more user friendly and has a lot more algorithms in development. Pretrained YOLOv3 models which achiveve 81%+ mAP on VOC and near 37% mAP on COCO: Model Zoo . Object Detection model tutorials . This repo will not receive active development, however, you can continue use it with the mxnet 1.1.0(probably 1.2.0). Disclaimer This is a re implementation of original yolo v2 which is based on darknet . The arXiv paper is available here . Demo ! demo1 Getting started Build from source, this is required because this example is not merged, some custom operators are not presented in official MXNet. Instructions Install required packages: cv2 , matplotlib Try the demo Download the pretrained model (darknet as backbone), or this model (resnet50 as backbone) and extract to model/ directory. Run cd /path/to/mxnet yolo python demo.py cpu available options python demo.py h Train the model Grab a pretrained model, e.g. darknet19 (optional) Grab a pretrained resnet50 model, resnet 50 0000.params , resnet 50 symbol.json , this will produce slightly better mAP than darknet in my experiments. Download PASCAL VOC dataset. cd /path/to/where_you_store_datasets/ wget wget wget Extract the data. tar xvf VOCtrainval_11 May 2012.tar tar xvf VOCtrainval_06 Nov 2007.tar tar xvf VOCtest_06 Nov 2007.tar ln s /path/to/VOCdevkit /path/to/mxnet yolo/data/VOCdevkit Create packed binary file for faster training cd /path/to/mxnet ssd bash tools/prepare_pascal.sh or if you are using windows python tools/prepare_dataset.py dataset pascal year 2007,2012 set trainval target ./data/train.lst python tools/prepare_dataset.py dataset pascal year 2007 set test target ./data/val.lst shuffle False Start training python train.py gpus 0,1,2,3 epoch 0 choose different networks, such as resnet50_yolo python train.py gpus 0,1,2,3 network resnet50_yolo data shape 416 pretrained model/resnet 50 epoch 0 see advanced arguments for training python train.py h",Object Detection,Object Detection 2888,Computer Vision,Computer Vision,Computer Vision,"Yolo v3 and Yolo v2 for Windows and Linux (neural network for object detection) Tensor Cores can be used on Linux and Windows CircleCI Requirements ( requirements) Pre trained models ( pre trained models) Explanations in issues 0. Improvements in this repository ( improvements in this repository) 1. How to use ( how to use on the command line) 2. How to compile on Linux ( how to compile on linux) 3. How to compile on Windows Using vcpkg ( how to compile on windows using vcpkg) Legacy way ( how to compile on windows legacy way) 4. How to train (Pascal VOC Data) ( how to train pascal voc data) 5. How to train with multi GPU: ( how to train with multi gpu) 6. How to train (to detect your custom objects) ( how to train to detect your custom objects) 7. How to train tiny yolo (to detect your custom objects) ( how to train tiny yolo to detect your custom objects) 8. When should I stop training ( when should i stop training) 9. How to calculate mAP on PascalVOC 2007 ( how to calculate map on pascalvoc 2007) 10. How to improve object detection ( how to improve object detection) 11. How to mark bounded boxes of objects and create annotation files ( how to mark bounded boxes of objects and create annotation files) 12. How to use Yolo as DLL and SO libraries ( how to use yolo as dll and so libraries) ! Darknet Logo ! map_time mAP@0.5 (AP50) YOLOv3 spp better than YOLOv3 mAP 60.6%, FPS 20: Yolo v3 source chart for the RetinaNet on MS COCO got from Table 1 (e): Yolo v2 on Pascal VOC 2007: Yolo v2 on Pascal VOC 2012 (comp4): You Only Look Once: Unified, Real Time Object Detection (versions 2 & 3) A Yolo cross platform Windows and Linux version (for object detection). Contributors: This repository is forked from Linux version: More details: This repository supports: both Windows and Linux both OpenCV 2.x.x and OpenCV v7 CUDA > 7.5 also create SO library on Linux and DLL library on Windows Requirements CMake > 3.8 for modern CUDA support: CUDA 10.0 : (on Linux do Post installation Actions ) OpenCV 7.0 for CUDA 10.0 (set system variable CUDNN C:\cudnn where did you unpack cuDNN. On Linux in .bashrc file, on Windows see the image ) GPU with CC > 3.0 : on Linux GCC or Clang , on Windows MSVS 2017 (v15) Pre trained models There are weights file for different cfg files (smaller size > faster speed & lower accuracy: yolov3 openimages.cfg (247 MB COCO Yolo v3 ) requires 4 GB GPU RAM: yolov3 spp.cfg (240 MB COCO Yolo v3 ) requires 4 GB GPU RAM: yolov3.cfg (236 MB COCO Yolo v3 ) requires 4 GB GPU RAM: yolov3 tiny.cfg (34 MB COCO Yolo v3 tiny ) requires 1 GB GPU RAM: yolov2.cfg (194 MB COCO Yolo v2) requires 4 GB GPU RAM: yolo voc.cfg (194 MB VOC Yolo v2) requires 4 GB GPU RAM: yolov2 tiny.cfg (43 MB COCO Yolo v2) requires 1 GB GPU RAM: yolov2 tiny voc.cfg (60 MB VOC Yolo v2) requires 1 GB GPU RAM: yolo9000.cfg (186 MB Yolo9000 model) requires 4 GB GPU RAM: Put it near compiled: darknet.exe You can get cfg files by path: darknet/cfg/ Examples of results Everything Is AWESOME Others: Improvements in this repository added support for Windows improved binary neural network performance 2x 4x times for Detection on CPU and GPU if you trained your own weights by using this XNOR net model (bit 1 inference) : improved neural network performance 7% by fusing 2 layers into 1: Convolutional + Batch norm improved neural network performance Detection 3x times , Training 2 x times on GPU Volta (Tesla V100, Titan V, ...) using Tensor Cores if CUDNN_HALF defined in the Makefile or darknet.sln improved performance 1.2x times on FullHD, 2x times on 4K, for detection on the video (file/stream) using darknet detector demo ... improved performance 3.5 X times of data augmentation for training (using OpenCV SSE/AVX functions instead of hand written functions) removes bottleneck for training on multi GPU or GPU Volta improved performance of detection and training on Intel CPU with AVX (Yolo v3 85% , Yolo v2 10%) fixed usage of reorg layer optimized memory allocation during network resizing when random 1 optimized initialization GPU for detection we use batch 1 initially instead of re init with batch 1 added correct calculation of mAP, F1, IoU, Precision Recall using command darknet detector map ... added drawing of chart of average Loss and accuracy mAP ( map flag) during training run ./darknet detector demo ... json_port 8070 mjpeg_port 8090 as JSON and MJPEG server to get results online over the network by using your soft or Web browser added calculation of anchors for training added example of Detection and Tracking objects: fixed code for use Web cam on OpenCV 3.x run time tips and warnings if you use incorrect cfg file or dataset many other fixes of code... And added manual How to train Yolo v3/v2 (to detect your custom objects) ( how to train to detect your custom objects) Also, you might be interested in using a simplified repository where is implemented INT8 quantization (+30% speedup and 1% mAP reduced): How to use on the command line On Linux use ./darknet instead of darknet.exe , like this: ./darknet detector test ./cfg/coco.data ./cfg/yolov3.cfg ./yolov3.weights On Linux find executable file ./darknet in the root directory, while on Windows find it in the directory \build\darknet\x64 Yolo v3 COCO image : darknet.exe detector test cfg/coco.data cfg/yolov3.cfg yolov3.weights i 0 thresh 0.25 Output coordinates of objects: darknet.exe detector test cfg/coco.data yolov3.cfg yolov3.weights ext_output dog.jpg Yolo v3 COCO video : darknet.exe detector demo cfg/coco.data cfg/yolov3.cfg yolov3.weights ext_output test.mp4 Yolo v3 COCO WebCam 0 : darknet.exe detector demo cfg/coco.data cfg/yolov3.cfg yolov3.weights c 0 Yolo v3 COCO for net videocam Smart WebCam: darknet.exe detector demo cfg/coco.data cfg/yolov3.cfg yolov3.weights Yolo v3 save result videofile res.avi : darknet.exe detector demo cfg/coco.data cfg/yolov3.cfg yolov3.weights thresh 0.25 test.mp4 out_filename res.avi Yolo v3 Tiny COCO video: darknet.exe detector demo cfg/coco.data cfg/yolov3 tiny.cfg yolov3 tiny.weights test.mp4 JSON and MJPEG server that allows multiple connections from your soft or Web browser ip address:8070 and 8090: ./darknet detector demo ./cfg/coco.data ./cfg/yolov3.cfg ./yolov3.weights test50.mp4 json_port 8070 mjpeg_port 8090 ext_output Yolo v3 Tiny on GPU 0 : darknet.exe detector demo cfg/coco.data cfg/yolov3 tiny.cfg yolov3 tiny.weights i 0 test.mp4 Alternative method Yolo v3 COCO image: darknet.exe detect cfg/yolov3.cfg yolov3.weights i 0 thresh 0.25 Train on Amazon EC2 , to see mAP & Loss chart using URL like: in the Chrome/Firefox: ./darknet detector train cfg/coco.data yolov3.cfg darknet53.conv.74 dont_show mjpeg_port 8090 map 186 MB Yolo9000 image: darknet.exe detector test cfg/combine9k.data yolo9000.cfg yolo9000.weights Remeber to put data/9k.tree and data/coco9k.map under the same folder of your app if you use the cpp api to build an app To process a list of images data/train.txt and save results of detection to result.txt use: darknet.exe detector test cfg/coco.data yolov3.cfg yolov3.weights dont_show ext_output result.txt Pseudo lableing to process a list of images data/new_train.txt and save results of detection in Yolo training format for each image as label .txt (in this way you can increase the amount of training data) use: darknet.exe detector test cfg/coco.data yolov3.cfg yolov3.weights thresh 0.25 dont_show save_labels cd $env:VCPKG_ROOT PS Code\vcpkg> .\vcpkg install pthreads opencv replace with opencv cuda in case you want to use cuda accelerated openCV 8. necessary only with CUDA Customize the CMakeLists.txt with the preferred compute capability 9. Build with the Powershell script build.ps1 or use the Open Folder functionality of Visual Studio 2017. In the first option, if you want to use Visual Studio, you will find a custom solution created for you by CMake after the build containing all the appropriate config flags for your system. How to compile on Windows (legacy way) 1. If you have MSVS 2015, CUDA 10.0, cuDNN 7.4 and OpenCV 3.x (with paths: C:\opencv_3.0\opencv\build\include & C:\opencv_3.0\opencv\build\x64\vc14\lib ), then start MSVS, open build\darknet\darknet.sln , set x64 and Release and do the: Build > Build darknet. Also add Windows system variable CUDNN with path to CUDNN: NOTE: If installing OpenCV, use OpenCV 3.4.0 or earlier. This is a bug in OpenCV 3.4.1 in the C API (see 500 ). 1.1. Find files opencv_world320.dll and opencv_ffmpeg320_64.dll (or opencv_world340.dll and opencv_ffmpeg340_64.dll ) in C:\opencv_3.0\opencv\build\x64\vc14\bin and put it near with darknet.exe 1.2 Check that there are bin and include folders in the C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0 if aren't, then copy them to this folder from the path where is CUDA installed 1.3. To install CUDNN (speedup neural network), do the following: download and install cuDNN v7.4.1 for CUDA 10.0 : add Windows system variable CUDNN with path to CUDNN: copy file cudnn64_7.dll to the folder \build\darknet\x64 near with darknet.exe 1.4. If you want to build without CUDNN then: open \darknet.sln > (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions, and remove this: CUDNN; 2. If you have other version of CUDA (not 10.0) then open build\darknet\darknet.vcxproj by using Notepad, find 2 places with CUDA 10.0 and change it to your CUDA version. Then open \darknet.sln > (right click on project) > properties > CUDA C/C++ > Device and remove there ;compute_75,sm_75 . Then do step 1 3. If you don't have GPU , but have MSVS 2015 and OpenCV 3.0 (with paths: C:\opencv_3.0\opencv\build\include & C:\opencv_3.0\opencv\build\x64\vc14\lib ), then start MSVS, open build\darknet\darknet_no_gpu.sln , set x64 and Release , and do the: Build > Build darknet_no_gpu 4. If you have OpenCV 2.4.13 instead of 3.0 then you should change pathes after \darknet.sln is opened 4.1 (right click on project) > properties > C/C++ > General > Additional Include Directories: C:\opencv_2.4.13\opencv\build\include 4.2 (right click on project) > properties > Linker > General > Additional Library Directories: C:\opencv_2.4.13\opencv\build\x64\vc14\lib 5. If you have GPU with Tensor Cores (nVidia Titan V / Tesla V100 / DGX 2 and later) speedup Detection 3x, Training 2x: \darknet.sln > (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions, and add here: CUDNN_HALF; Note: CUDA must be installed only after that MSVS2015 had been installed. How to compile (custom): Also, you can to create your own darknet.sln & darknet.vcxproj , this example for CUDA 9.1 and OpenCV 3.0 Then add to your created project: (right click on project) > properties > C/C++ > General > Additional Include Directories, put here: C:\opencv_3.0\opencv\build\include;..\..\3rdparty\include;%(AdditionalIncludeDirectories);$(CudaToolkitIncludeDir);$(CUDNN)\include (right click on project) > Build dependecies > Build Customizations > set check on CUDA 9.1 or what version you have for example as here: add to project: all .c files all .cu files file from \src directory file darknet.h from \include directory (right click on project) > properties > Linker > General > Additional Library Directories, put here: C:\opencv_3.0\opencv\build\x64\vc14\lib;$(CUDA_PATH)lib\$(PlatformName);$(CUDNN)\lib\x64;%(AdditionalLibraryDirectories) (right click on project) > properties > Linker > Input > Additional dependecies, put here: ..\..\3rdparty\lib\x64\pthreadVC2.lib;cublas.lib;curand.lib;cudart.lib;cudnn.lib;%(AdditionalDependencies) (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions OPENCV;_TIMESPEC_DEFINED;_CRT_SECURE_NO_WARNINGS;_CRT_RAND_S;WIN32;NDEBUG;_CONSOLE;_LIB;%(PreprocessorDefinitions) compile to .exe (X64 & Release) and put .dll s near with .exe: pthreadVC2.dll, pthreadGC2.dll from \3rdparty\dll\x64 cusolver64_91.dll, curand64_91.dll, cudart64_91.dll, cublas64_91.dll 91 for CUDA 9.1 or your version, from C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1\bin For OpenCV 3.2: opencv_world320.dll and opencv_ffmpeg320_64.dll from C:\opencv_3.0\opencv\build\x64\vc14\bin For OpenCV 2.4.13: opencv_core2413.dll , opencv_highgui2413.dll and opencv_ffmpeg2413_64.dll from C:\opencv_2.4.13\opencv\build\x64\vc14\bin How to train (Pascal VOC Data): 1. Download pre trained weights for the convolutional layers (154 MB): and put to the directory build\darknet\x64 2. Download The Pascal VOC Data and unpack it to directory build\darknet\x64\data\voc will be created dir build\darknet\x64\data\voc\VOCdevkit\ : 2.1 Download file voc_label.py to dir build\darknet\x64\data\voc : 3. Download and install Python for Windows: 4. Run command: python build\darknet\x64\data\voc\voc_label.py (to generate files: 2007_test.txt, 2007_train.txt, 2007_val.txt, 2012_train.txt, 2012_val.txt) 5. Run command: type 2007_train.txt 2007_val.txt 2012_ .txt > train.txt 6. Set batch 64 and subdivisions 8 in the file yolov3 voc.cfg : link 7. Start training by using train_voc.cmd or by using the command line: darknet.exe detector train cfg/voc.data cfg/yolov3 voc.cfg darknet53.conv.74 ( Note: To disable Loss Window use flag dont_show . If you are using CPU, try darknet_no_gpu.exe instead of darknet.exe .) If required change pathes in the file build\darknet\x64\data\voc.data More information about training by the link: Note: If during training you see nan values for avg (loss) field then training goes wrong, but if nan is in some other lines then training goes well. How to train with multi GPU: 1. Train it first on 1 GPU for like 1000 iterations: darknet.exe detector train cfg/voc.data cfg/yolov3 voc.cfg darknet53.conv.74 2. Then stop and by using partially trained model /backup/yolov3 voc_1000.weights run training with multigpu (up to 4 GPUs): darknet.exe detector train cfg/voc.data cfg/yolov3 voc.cfg /backup/yolov3 voc_1000.weights gpus 0,1,2,3 Only for small datasets sometimes better to decrease learning rate, for 4 GPUs set learning_rate 0.00025 (i.e. learning_rate 0.001 / GPUs). In this case also increase 4x times burn_in and max_batches in your cfg file. I.e. use burn_in 4000 instead of 1000 . How to train (to detect your custom objects): (to train old Yolo v2 yolov2 voc.cfg , yolov2 tiny voc.cfg , yolo voc.cfg , yolo voc.2.0.cfg , ... click by the link ) Training Yolo v3: 1. Create file yolo obj.cfg with the same content as in yolov3.cfg (or copy yolov3.cfg to yolo obj.cfg) and: change line batch to batch 64 change line subdivisions to subdivisions 8 change line classes 80 to your number of objects in each of 3 yolo layers: change filters 255 to filters (classes + 5)x3 in the 3 convolutional before each yolo layer So if classes 1 then should be filters 18 . If classes 2 then write filters 21 . (Do not write in the cfg file: filters (classes + 5)x3) (Generally filters depends on the classes , coords and number of mask s, i.e. filters (classes + coords + 1) , where mask is indices of anchors. If mask is absence, then filters (classes + coords + 1) num ) So for example, for 2 objects, your file yolo obj.cfg should differ from yolov3.cfg in such lines in each of 3 yolo layers: convolutional filters 21 region classes 2 2. Create file obj.names in the directory build\darknet\x64\data\ , with objects names each in new line 3. Create file obj.data in the directory build\darknet\x64\data\ , containing (where classes number of objects ): classes 2 train data/train.txt valid data/test.txt names data/obj.names backup backup/ 4. Put image files (.jpg) of your objects in the directory build\darknet\x64\data\obj\ 5. You should label each object on images from your dataset. Use this visual GUI software for marking bounded boxes of objects and generating annotation files for Yolo v2 & v3: It will create .txt file for each .jpg image file in the same directory and with the same name, but with .txt extension, and put to file: object number and object coordinates on this image, for each object in new line: Where: integer object number from 0 to (classes 1) float values relative to width and height of image, it can be equal from (0.0 to 1.0 for example: / or / atention: are center of rectangle (are not top left corner) For example for img1.jpg you will be created img1.txt containing: 1 0.716797 0.395833 0.216406 0.147222 0 0.687109 0.379167 0.255469 0.158333 1 0.420312 0.395833 0.140625 0.166667 6. Create file train.txt in directory build\darknet\x64\data\ , with filenames of your images, each filename in new line, with path relative to darknet.exe , for example containing: data/obj/img1.jpg data/obj/img2.jpg data/obj/img3.jpg 7. Download pre trained weights for the convolutional layers (154 MB): and put to the directory build\darknet\x64 8. Start training by using the command line: darknet.exe detector train data/obj.data yolo obj.cfg darknet53.conv.74 To train on Linux use command: ./darknet detector train data/obj.data yolo obj.cfg darknet53.conv.74 (just use ./darknet instead of darknet.exe ) (file yolo obj_last.weights will be saved to the build\darknet\x64\backup\ for each 100 iterations) (file yolo obj_xxxx.weights will be saved to the build\darknet\x64\backup\ for each 1000 iterations) (to disable Loss Window use darknet.exe detector train data/obj.data yolo obj.cfg darknet53.conv.74 dont_show , if you train on computer without monitor like a cloud Amazon EC2) (to see the mAP & Loss chart during training on remote server without GUI, use command darknet.exe detector train data/obj.data yolo obj.cfg darknet53.conv.74 dont_show mjpeg_port 8090 map then open URL in Chrome/Firefox browser) 8.1. For training with mAP (mean average precisions) calculation for each 4 Epochs (set valid valid.txt or train.txt in obj.data file) and run: darknet.exe detector train data/obj.data yolo obj.cfg darknet53.conv.74 map 9. After training is complete get result yolo obj_final.weights from path build\darknet\x64\backup\ After each 100 iterations you can stop and later start training from this point. For example, after 2000 iterations you can stop training, and later just copy yolo obj_2000.weights from build\darknet\x64\backup\ to build\darknet\x64\ and start training using: darknet.exe detector train data/obj.data yolo obj.cfg yolo obj_2000.weights (in the original repository the weights file is saved only once every 10 000 iterations if(iterations > 1000) ) Also you can get result earlier than all 45000 iterations. Note: If during training you see nan values for avg (loss) field then training goes wrong, but if nan is in some other lines then training goes well. Note: If you changed width or height in your cfg file, then new width and height must be divisible by 32. Note: After training use such command for detection: darknet.exe detector test data/obj.data yolo obj.cfg yolo obj_8000.weights Note: if error Out of memory occurs then in .cfg file you should increase subdivisions 16 , 32 or 64: link How to train tiny yolo (to detect your custom objects): Do all the same steps as for the full yolo model as described above. With the exception of: Download default weights file for yolov3 tiny: Get pre trained weights yolov3 tiny.conv.15 using command: darknet.exe partial cfg/yolov3 tiny.cfg yolov3 tiny.weights yolov3 tiny.conv.15 15 Make your custom model yolov3 tiny obj.cfg based on cfg/yolov3 tiny_obj.cfg instead of yolov3.cfg Start training: darknet.exe detector train data/obj.data yolov3 tiny obj.cfg yolov3 tiny.conv.15 For training Yolo based on other models ( DenseNet201 Yolo or ResNet50 Yolo ), you can download and get pre trained weights as showed in this file: If you made you custom model that isn't based on other models, then you can train it without pre trained weights, then will be used random initial weights. When should I stop training: Usually sufficient 2000 iterations for each class(object), but not less than 4000 iterations in total. But for a more precise definition when you should stop training, use the following manual: 1. During training, you will see varying indicators of error, and you should stop when no longer decreases 0.XXXXXXX avg : > Region Avg IOU: 0.798363, Class: 0.893232, Obj: 0.700808, No Obj: 0.004567, Avg Recall: 1.000000, count: 8 > Region Avg IOU: 0.800677, Class: 0.892181, Obj: 0.701590, No Obj: 0.004574, Avg Recall: 1.000000, count: 8 > > 9002 : 0.211667, 0.060730 avg , 0.001000 rate, 3.868000 seconds, 576128 images > Loaded: 0.000000 seconds 9002 iteration number (number of batch) 0.060730 avg average loss (error) the lower, the better When you see that average loss 0.xxxxxx avg no longer decreases at many iterations then you should stop training. The final avgerage loss can be from 0.05 (for a small model and easy dataset) to 3.0 (for a big model and a difficult dataset). 2. Once training is stopped, you should take some of last .weights files from darknet\build\darknet\x64\backup and choose the best of them: For example, you stopped training after 9000 iterations, but the best result can give one of previous weights (7000, 8000, 9000). It can happen due to overfitting. Overfitting is case when you can detect objects on images from training dataset, but can't detect objects on any others images. You should get weights from Early Stopping Point : ! Overfitting To get weights from Early Stopping Point: 2.1. At first, in your file obj.data you must specify the path to the validation dataset valid valid.txt (format of valid.txt as in train.txt ), and if you haven't validation images, just copy data\train.txt to data\valid.txt . 2.2 If training is stopped after 9000 iterations, to validate some of previous weights use this commands: (If you use another GitHub repository, then use darknet.exe detector recall ... instead of darknet.exe detector map ...) darknet.exe detector map data/obj.data yolo obj.cfg backup\yolo obj_7000.weights darknet.exe detector map data/obj.data yolo obj.cfg backup\yolo obj_8000.weights darknet.exe detector map data/obj.data yolo obj.cfg backup\yolo obj_9000.weights And comapre last output lines for each weights (7000, 8000, 9000): Choose weights file with the highest mAP (mean average precision) or IoU (intersect over union) For example, bigger mAP gives weights yolo obj_8000.weights then use this weights for detection . Or just train with map flag: darknet.exe detector train data/obj.data yolo obj.cfg darknet53.conv.74 map So you will see mAP chart (red line) in the Loss chart Window. mAP will be calculated for each 4 Epochs using valid valid.txt file that is specified in obj.data file ( 1 Epoch images_in_train_txt / batch iterations) ! loss_chart_map_chart Example of custom object detection: darknet.exe detector test data/obj.data yolo obj.cfg yolo obj_8000.weights IoU (intersect over union) average instersect over union of objects and detections for a certain threshold 0.24 mAP (mean average precision) mean value of average precisions for each class, where average precision is average value of 11 points on PR curve for each possible threshold (each probability of detection) for the same class (Precision Recall in terms of PascalVOC, where Precision TP/(TP+FP) and Recall TP/(TP+FN) ), page 11: mAP is default metric of precision in the PascalVOC competition, this is the same as AP50 metric in the MS COCO competition. In terms of Wiki, indicators Precision and Recall have a slightly different meaning than in the PascalVOC competition, but IoU always has the same meaning . ! precision_recall_iou How to calculate mAP on PascalVOC 2007: 1. To calculate mAP (mean average precision) on PascalVOC 2007 test: Download PascalVOC dataset, install Python 3.x and get file 2007_test.txt as described here: Then download file to the dir build\darknet\x64\data\ then run voc_label_difficult.py to get the file difficult_2007_test.txt Remove symbol from this line to un comment it: Then there are 2 ways to get mAP: 1. Using Darknet + Python: run the file build/darknet/x64/calc_mAP_voc_py.cmd you will get mAP for yolo voc.cfg model, mAP 75.9% 2. Using this fork of Darknet: run the file build/darknet/x64/calc_mAP.cmd you will get mAP for yolo voc.cfg model, mAP 75.8% (The article specifies the value of mAP 76.8% for YOLOv2 416×416, page 4 table 3: We get values lower perhaps due to the fact that the model was trained on a slightly different source code than the code on which the detection is was done) if you want to get mAP for tiny yolo voc.cfg model, then un comment line for tiny yolo voc.cfg and comment line for yolo voc.cfg in the .cmd file if you have Python 2.x instead of Python 3.x, and if you use Darknet+Python way to get mAP, then in your cmd file use reval_voc.py and voc_eval.py instead of reval_voc_py3.py and voc_eval_py3.py from this directory: Custom object detection: Example of custom object detection: darknet.exe detector test data/obj.data yolo obj.cfg yolo obj_8000.weights ! Yolo_v2_training ! Yolo_v2_training How to improve object detection: 1. Before training: set flag random 1 in your .cfg file it will increase precision by training Yolo for different resolutions: link increase network resolution in your .cfg file ( height 608 , width 608 or any value multiple of 32) it will increase precision check that each object are mandatory labeled in your dataset no one object in your data set should not be without label. In the most training issues there are wrong labels in your dataset (got labels by using some conversion script, marked with a third party tool, ...). Always check your dataset by using: for each object which you want to detect there must be at least 1 similar object in the Training dataset with about the same: shape, side of object, relative size, angle of rotation, tilt, illumination. So desirable that your training dataset include images with objects at diffrent: scales, rotations, lightings, from different sides, on different backgrounds you should preferably have 2000 different images for each class or more, and you should train 2000 classes iterations or more desirable that your training dataset include images with non labeled objects that you do not want to detect negative samples without bounded box (empty .txt files) use as many images of negative samples as there are images with objects for training with a large number of objects in each image, add the parameter max 200 or higher value in the last yolo layer or region layer in your cfg file (the global maximum number of objects that can be detected by YoloV3 is 0,0615234375 (width height) where are width and height are parameters from net section in cfg file) for training for small objects (smaller than 16x16 after the image is resized to 416x416) set layers 1, 11 instead of and set stride 4 instead of for training for both small and large objects use modified models: Full model: 5 yolo layers: Tiny model: 3 yolo layers: Spatial full model: 3 yolo layers: If you train the model to distinguish Left and Right objects as separate classes (left/right hand, left/right turn on road signs, ...) then for disabling flip data augmentation add flip 0 here: General rule your training dataset should include such a set of relative sizes of objects that you want to detect: train_network_width train_obj_width / train_image_width detection_network_width detection_obj_width / detection_image_width train_network_height train_obj_height / train_image_height detection_network_height detection_obj_height / detection_image_height I.e. for each object from Test dataset there must be at least 1 object in the Training dataset with the same class_id and about the same relative size: object width in percent from Training dataset object width in percent from Test dataset That is, if only objects that occupied 80 90% of the image were present in the training set, then the trained network will not be able to detect objects that occupy 1 10% of the image. to speedup training (with decreasing detection accuracy) do Fine Tuning instead of Transfer Learning, set param stopbackward 1 here: then do this command: ./darknet partial cfg/yolov3.cfg yolov3.weights yolov3.conv.81 81 will be created file yolov3.conv.81 , then train by using weights file yolov3.conv.81 instead of darknet53.conv.74 each: model of object, side, illimination, scale, each 30 grad of the turn and inclination angles these are different objects from an internal perspective of the neural network. So the more different objects you want to detect, the more complex network model should be used. recalculate anchors for your dataset for width and height from cfg file: darknet.exe detector calc_anchors data/obj.data num_of_clusters 9 width 416 height 416 then set the same 9 anchors in each of 3 yolo layers in your cfg file. But you should change indexes of anchors masks for each yolo layer, so that 1st yolo layer has anchors larger than 60x60, 2nd larger than 30x30, 3rd remaining. Also you should change the filters (classes + 5) before each yolo layer. If many of the calculated anchors do not fit under the appropriate layers then just try using all the default anchors. 2. After training for detection: Increase network resolution by set in your .cfg file ( height 608 and width 608 ) or ( height 832 and width 832 ) or (any value multiple of 32) this increases the precision and makes it possible to detect small objects: link it is not necessary to train the network again, just use .weights file already trained for 416x416 resolution but to get even greater accuracy you should train with higher resolution 608x608 or 832x832, note: if error Out of memory occurs then in .cfg file you should increase subdivisions 16 , 32 or 64: link How to mark bounded boxes of objects and create annotation files: Here you can find repository with GUI software for marking bounded boxes of objects and generating annotation files for Yolo v2 & v3: With example of: train.txt , obj.names , obj.data , yolo obj.cfg , air 1 6 .txt , bird 1 4 .txt for 2 classes of objects (air, bird) and train_obj.cmd with example how to train this image set with Yolo v2 & v3 Using Yolo9000 Simultaneous detection and classification of 9000 objects: darknet.exe detector test cfg/combine9k.data cfg/yolo9000.cfg yolo9000.weights data/dog.jpg yolo9000.weights (186 MB Yolo9000 Model) requires 4 GB GPU RAM: yolo9000.cfg cfg file of the Yolo9000, also there are paths to the 9k.tree and coco9k.map 9k.tree WordTree of 9418 categories , if parent_id 1 then this label hasn't parent: coco9k.map map 80 categories from MSCOCO to WordTree 9k.tree : combine9k.data data file, there are paths to: 9k.labels , 9k.names , inet9k.map , (change path to your combine9k.train.list ): 9k.labels 9418 labels of objects: 9k.names 9418 names of objects: inet9k.map map 200 categories from ImageNet to WordTree 9k.tree : How to use Yolo as DLL and SO libraries on Linux set LIBSO 1 in the Makefile and do make on Windows compile build\darknet\yolo_cpp_dll.sln or build\darknet\yolo_cpp_dll_no_gpu.sln solution There are 2 APIs: C API: Python examples using the C API:: C++ API: C++ example that uses C++ API: 1. To compile Yolo as C++ DLL file yolo_cpp_dll.dll open in MSVS2015 file build\darknet\yolo_cpp_dll.sln , set x64 and Release , and do the: Build > Build yolo_cpp_dll You should have installed CUDA 10.0 To use cuDNN do: (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions, and add at the beginning of line: CUDNN; 2. To use Yolo as DLL file in your C++ console application open in MSVS2015 file build\darknet\yolo_console_dll.sln , set x64 and Release , and do the: Build > Build yolo_console_dll you can run your console application from Windows Explorer build\darknet\x64\yolo_console_dll.exe use this command : yolo_console_dll.exe data/coco.names yolov3.cfg yolov3.weights test.mp4 after launching your console application and entering the image file name you will see info for each object: to use simple OpenCV GUI you should uncomment line // define OPENCV in yolo_console_dll.cpp file: link you can see source code of simple example for detection on the video file: link yolo_cpp_dll.dll API: link struct bbox_t { unsigned int x, y, w, h; // (x,y) top left corner, (w, h) width & height of bounded box float prob; // confidence probability that the object was found correctly unsigned int obj_id; // class of object from range 0, classes 1 unsigned int track_id; // tracking id for video (0 untracked, 1 inf tracked object) unsigned int frames_counter;// counter of frames on which the object was detected }; class Detector { public: Detector(std::string cfg_filename, std::string weight_filename, int gpu_id 0); Detector(); std::vector detect(std::string image_filename, float thresh 0.2, bool use_mean false); std::vector detect(image_t img, float thresh 0.2, bool use_mean false); static image_t load_image(std::string image_filename); static void free_image(image_t m); ifdef OPENCV std::vector detect(cv::Mat mat, float thresh 0.2, bool use_mean false); std::shared_ptr mat_to_image_resize(cv::Mat mat) const; endif };",Object Detection,Object Detection 2893,Computer Vision,Computer Vision,Computer Vision,"Yolo v3 and Yolo v2 for Windows and Linux (neural network for object detection) Tensor Cores can be used on Linux and Windows CircleCI 0. Improvements in this repository ( improvements in this repository) 1. How to use ( how to use) 2. How to compile on Linux ( how to compile on linux) 3. How to compile on Windows ( how to compile on windows) 4. How to train (Pascal VOC Data) ( how to train pascal voc data) 5. How to train (to detect your custom objects) ( how to train to detect your custom objects) 6. When should I stop training ( when should i stop training) 7. How to calculate mAP on PascalVOC 2007 ( how to calculate map on pascalvoc 2007) 8. How to improve object detection ( how to improve object detection) 9. How to mark bounded boxes of objects and create annotation files ( how to mark bounded boxes of objects and create annotation files) 10. Using Yolo9000 ( using yolo9000) 11. How to use Yolo as DLL ( how to use yolo as dll) ! Darknet Logo ! map_fps mAP (AP50) YOLOv3 spp (is not indicated) better than YOLOv3 mAP 60.6%, FPS 20: Yolo v3 source chart for the RetinaNet on MS COCO got from Table 1 (e): Yolo v2 on Pascal VOC 2007: Yolo v2 on Pascal VOC 2012 (comp4): You Only Look Once: Unified, Real Time Object Detection (versions 2 & 3) A Yolo cross platform Windows and Linux version (for object detection). Contributtors: This repository is forked from Linux version: More details: This repository supports: both Windows and Linux both OpenCV 2.x.x and OpenCV 7.5 also create SO library on Linux and DLL library on Windows Requires: Linux GCC> 4.9 or Windows MS Visual Studio 2015 (v140) : (or offline ISO image ) CUDA 10.0 : (on Linux do Post installation Actions ) OpenCV 3.3.0 : or OpenCV 2.4.13 : OpenCV allows to show image or video detection in the window and store result to file that specified in command line out_filename res.avi GPU with CC > 3.0 : Pre trained models for different cfg files can be downloaded from (smaller > faster & lower quality): yolov3 openimages.cfg (247 MB COCO Yolo v3 ) requires 4 GB GPU RAM: yolov3 spp.cfg (240 MB COCO Yolo v3 ) requires 4 GB GPU RAM: yolov3.cfg (236 MB COCO Yolo v3 ) requires 4 GB GPU RAM: yolov3 tiny.cfg (34 MB COCO Yolo v3 tiny ) requires 1 GB GPU RAM: yolov2.cfg (194 MB COCO Yolo v2) requires 4 GB GPU RAM: yolo voc.cfg (194 MB VOC Yolo v2) requires 4 GB GPU RAM: yolov2 tiny.cfg (43 MB COCO Yolo v2) requires 1 GB GPU RAM: yolov2 tiny voc.cfg (60 MB VOC Yolo v2) requires 1 GB GPU RAM: yolo9000.cfg (186 MB Yolo9000 model) requires 4 GB GPU RAM: Put it near compiled: darknet.exe You can get cfg files by path: darknet/cfg/ Examples of results: Everything Is AWESOME Others: Improvements in this repository added support for Windows improved binary neural network performance 2x 4x times for Detection on CPU and GPU if you trained your own weights by using this XNOR net model (bit 1 inference) : improved neural network performance 7% by fusing 2 layers into 1: Convolutional + Batch norm improved neural network performance Detection 3x times , Training 2 x times on GPU Volta (Tesla V100, Titan V, ...) using Tensor Cores if CUDNN_HALF defined in the Makefile or darknet.sln improved performance 1.2x times on FullHD, 2x times on 4K, for detection on the video (file/stream) using darknet detector demo ... improved performance 3.5 X times of data augmentation for training (using OpenCV SSE/AVX functions instead of hand written functions) removes bottleneck for training on multi GPU or GPU Volta improved performance of detection and training on Intel CPU with AVX (Yolo v3 85% , Yolo v2 10%) fixed usage of reorg layer optimized memory allocation during network resizing when random 1 optimized initialization GPU for detection we use batch 1 initially instead of re init with batch 1 added correct calculation of mAP, F1, IoU, Precision Recall using command darknet detector map ... added drawing of chart of average loss during training added calculation of anchors for training added example of Detection and Tracking objects: fixed code for use Web cam on OpenCV 3.x run time tips and warnings if you use incorrect cfg file or dataset many other fixes of code... And added manual How to train Yolo v3/v2 (to detect your custom objects) ( how to train to detect your custom objects) Also, you might be interested in using a simplified repository where is implemented INT8 quantization (+30% speedup and 1% mAP reduced): How to use: Example of usage in cmd files from build\darknet\x64\ : darknet_yolo_v3.cmd initialization with 236 MB Yolo v3 COCO model yolov3.weights & yolov3.cfg and show detection on the image: dog.jpg darknet_voc.cmd initialization with 194 MB VOC model yolo voc.weights & yolo voc.cfg and waiting for entering the name of the image file darknet_demo_voc.cmd initialization with 194 MB VOC model yolo voc.weights & yolo voc.cfg and play your video file which you must rename to: test.mp4 darknet_demo_store.cmd initialization with 194 MB VOC model yolo voc.weights & yolo voc.cfg and play your video file which you must rename to: test.mp4, and store result to: res.avi darknet_net_cam_voc.cmd initialization with 194 MB VOC model, play video from network video camera mjpeg stream (also from you phone) darknet_web_cam_voc.cmd initialization with 194 MB VOC model, play video from Web Camera number 0 darknet_coco_9000.cmd initialization with 186 MB Yolo9000 COCO model, and show detection on the image: dog.jpg darknet_coco_9000_demo.cmd initialization with 186 MB Yolo9000 COCO model, and show detection on the video (if it is present): street4k.mp4, and store result to: res.avi How to use on the command line: On Linux use ./darknet instead of darknet.exe , like this: ./darknet detector test ./cfg/coco.data ./cfg/yolov3.cfg ./yolov3.weights Yolo v3 COCO image: darknet.exe detector test data/coco.data cfg/yolov3.cfg yolov3.weights i 0 thresh 0.25 Output coordinates of objects: darknet.exe detector test data/coco.data yolov3.cfg yolov3.weights ext_output dog.jpg Yolo v3 COCO video: darknet.exe detector demo data/coco.data cfg/yolov3.cfg yolov3.weights ext_output test.mp4 Yolo v3 COCO WebCam 0: darknet.exe detector demo data/coco.data cfg/yolov3.cfg yolov3.weights c 0 Yolo v3 COCO for net videocam Smart WebCam: darknet.exe detector demo data/coco.data cfg/yolov3.cfg yolov3.weights Yolo v3 save result to the file res.avi : darknet.exe detector demo data/coco.data cfg/yolov3.cfg yolov3.weights thresh 0.25 test.mp4 out_filename res.avi Yolo v3 Tiny COCO video: darknet.exe detector demo data/coco.data cfg/yolov3 tiny.cfg yolov3 tiny.weights test.mp4 Yolo v3 Tiny on GPU 0: darknet.exe detector demo data/coco.data cfg/yolov3 tiny.cfg yolov3 tiny.weights i 0 test.mp4 Alternative method Yolo v3 COCO image: darknet.exe detect cfg/yolov3.cfg yolov3.weights i 0 thresh 0.25 186 MB Yolo9000 image: darknet.exe detector test cfg/combine9k.data yolo9000.cfg yolo9000.weights Remeber to put data/9k.tree and data/coco9k.map under the same folder of your app if you use the cpp api to build an app To process a list of images data/train.txt and save results of detection to result.txt use: darknet.exe detector test cfg/coco.data yolov3.cfg yolov3.weights dont_show ext_output result.txt For using network video camera mjpeg stream with any Android smartphone: 1. Download for Android phone mjpeg stream soft: IP Webcam / Smart WebCam Smart WebCam preferably: IP Webcam: 2. Connect your Android phone to computer by WiFi (through a WiFi router) or USB 3. Start Smart WebCam on your phone 4. Replace the address below, on shown in the phone application (Smart WebCam) and launch: Yolo v3 COCO model: darknet.exe detector demo data/coco.data yolov3.cfg yolov3.weights i 0 How to compile on Linux: Just do make in the darknet directory. Before make, you can set such options in the Makefile : link GPU 1 to build with CUDA to accelerate by using GPU (CUDA should be in /usr/local/cuda ) CUDNN 1 to build with cuDNN v5 v7 to accelerate training by using GPU (cuDNN should be in /usr/local/cudnn ) CUDNN_HALF 1 to build for Tensor Cores (on Titan V / Tesla V100 / DGX 2 and later) speedup Detection 3x, Training 2x OPENCV 1 to build with OpenCV 3.x/2.4.x allows to detect on video files and video streams from network cameras or web cams DEBUG 1 to bould debug version of Yolo OPENMP 1 to build with OpenMP support to accelerate Yolo by using multi core CPU LIBSO 1 to build a library darknet.so and binary runable file uselib that uses this library. Or you can try to run so LD_LIBRARY_PATH ./:$LD_LIBRARY_PATH ./uselib test.mp4 How to use this SO library from your own code you can look at C++ example: or use in such a way: LD_LIBRARY_PATH ./:$LD_LIBRARY_PATH ./uselib data/coco.names cfg/yolov3.cfg yolov3.weights test.mp4 To run Darknet on Linux use examples from this article, just use ./darknet instead of darknet.exe , i.e. use this command: ./darknet detector test ./cfg/coco.data ./cfg/yolov3.cfg ./yolov3.weights How to compile on Windows: 1. If you have MSVS 2015, CUDA 10.0, cuDNN 7.4 and OpenCV 3.x (with paths: C:\opencv_3.0\opencv\build\include & C:\opencv_3.0\opencv\build\x64\vc14\lib ), then start MSVS, open build\darknet\darknet.sln , set x64 and Release and do the: Build > Build darknet. Also add Windows system variable cudnn with path to CUDNN: NOTE: If installing OpenCV, use OpenCV 3.4.0 or earlier. This is a bug in OpenCV 3.4.1 in the C API (see 500 ). 1.1. Find files opencv_world320.dll and opencv_ffmpeg320_64.dll (or opencv_world340.dll and opencv_ffmpeg340_64.dll ) in C:\opencv_3.0\opencv\build\x64\vc14\bin and put it near with darknet.exe 1.2 Check that there are bin and include folders in the C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1 if aren't, then copy them to this folder from the path where is CUDA installed 1.3. To install CUDNN (speedup neural network), do the following: download and install cuDNN v7.4.1 for CUDA 10.0 : add Windows system variable cudnn with path to CUDNN: 1.4. If you want to build without CUDNN then: open \darknet.sln > (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions, and remove this: CUDNN; 2. If you have other version of CUDA (not 10.0) then open build\darknet\darknet.vcxproj by using Notepad, find 2 places with CUDA 10.0 and change it to your CUDA version, then do step 1 3. If you don't have GPU , but have MSVS 2015 and OpenCV 3.0 (with paths: C:\opencv_3.0\opencv\build\include & C:\opencv_3.0\opencv\build\x64\vc14\lib ), then start MSVS, open build\darknet\darknet_no_gpu.sln , set x64 and Release , and do the: Build > Build darknet_no_gpu 4. If you have OpenCV 2.4.13 instead of 3.0 then you should change pathes after \darknet.sln is opened 4.1 (right click on project) > properties > C/C++ > General > Additional Include Directories: C:\opencv_2.4.13\opencv\build\include 4.2 (right click on project) > properties > Linker > General > Additional Library Directories: C:\opencv_2.4.13\opencv\build\x64\vc14\lib 5. If you have GPU with Tensor Cores (nVidia Titan V / Tesla V100 / DGX 2 and later) speedup Detection 3x, Training 2x: \darknet.sln > (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions, and add here: CUDNN_HALF; Note: CUDA must be installed only after that MSVS2015 had been installed. How to compile (custom): Also, you can to create your own darknet.sln & darknet.vcxproj , this example for CUDA 9.1 and OpenCV 3.0 Then add to your created project: (right click on project) > properties > C/C++ > General > Additional Include Directories, put here: C:\opencv_3.0\opencv\build\include;..\..\3rdparty\include;%(AdditionalIncludeDirectories);$(CudaToolkitIncludeDir);$(cudnn)\include (right click on project) > Build dependecies > Build Customizations > set check on CUDA 9.1 or what version you have for example as here: add to project all .c & .cu files and file from \src (right click on project) > properties > Linker > General > Additional Library Directories, put here: C:\opencv_3.0\opencv\build\x64\vc14\lib;$(CUDA_PATH)lib\$(PlatformName);$(cudnn)\lib\x64;%(AdditionalLibraryDirectories) (right click on project) > properties > Linker > Input > Additional dependecies, put here: ..\..\3rdparty\lib\x64\pthreadVC2.lib;cublas.lib;curand.lib;cudart.lib;cudnn.lib;%(AdditionalDependencies) (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions OPENCV;_TIMESPEC_DEFINED;_CRT_SECURE_NO_WARNINGS;_CRT_RAND_S;WIN32;NDEBUG;_CONSOLE;_LIB;%(PreprocessorDefinitions) compile to .exe (X64 & Release) and put .dll s near with .exe: pthreadVC2.dll, pthreadGC2.dll from \3rdparty\dll\x64 cusolver64_91.dll, curand64_91.dll, cudart64_91.dll, cublas64_91.dll 91 for CUDA 9.1 or your version, from C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1\bin For OpenCV 3.2: opencv_world320.dll and opencv_ffmpeg320_64.dll from C:\opencv_3.0\opencv\build\x64\vc14\bin For OpenCV 2.4.13: opencv_core2413.dll , opencv_highgui2413.dll and opencv_ffmpeg2413_64.dll from C:\opencv_2.4.13\opencv\build\x64\vc14\bin How to train (Pascal VOC Data): 1. Download pre trained weights for the convolutional layers (154 MB): and put to the directory build\darknet\x64 2. Download The Pascal VOC Data and unpack it to directory build\darknet\x64\data\voc will be created dir build\darknet\x64\data\voc\VOCdevkit\ : 2.1 Download file voc_label.py to dir build\darknet\x64\data\voc : 3. Download and install Python for Windows: 4. Run command: python build\darknet\x64\data\voc\voc_label.py (to generate files: 2007_test.txt, 2007_train.txt, 2007_val.txt, 2012_train.txt, 2012_val.txt) 5. Run command: type 2007_train.txt 2007_val.txt 2012_ .txt > train.txt 6. Set batch 64 and subdivisions 8 in the file yolov3 voc.cfg : link 7. Start training by using train_voc.cmd or by using the command line: darknet.exe detector train data/voc.data cfg/yolov3 voc.cfg darknet53.conv.74 ( Note: To disable Loss Window use flag dont_show . If you are using CPU, try darknet_no_gpu.exe instead of darknet.exe .) If required change pathes in the file build\darknet\x64\data\voc.data More information about training by the link: Note: If during training you see nan values for avg (loss) field then training goes wrong, but if nan is in some other lines then training goes well. How to train with multi GPU: 1. Train it first on 1 GPU for like 1000 iterations: darknet.exe detector train data/voc.data cfg/yolov3 voc.cfg darknet53.conv.74 2. Then stop and by using partially trained model /backup/yolov3 voc_1000.weights run training with multigpu (up to 4 GPUs): darknet.exe detector train data/voc.data cfg/yolov3 voc.cfg /backup/yolov3 voc_1000.weights gpus 0,1,2,3 Only for small datasets sometimes better to decrease learning rate, for 4 GPUs set learning_rate 0.00025 (i.e. learning_rate 0.001 / GPUs). In this case also increase 4x times burn_in and max_batches in your cfg file. I.e. use burn_in 4000 instead of 1000 . How to train (to detect your custom objects): (to train old Yolo v2 yolov2 voc.cfg , yolov2 tiny voc.cfg , yolo voc.cfg , yolo voc.2.0.cfg , ... click by the link ) Training Yolo v3: 1. Create file yolo obj.cfg with the same content as in yolov3.cfg (or copy yolov3.cfg to yolo obj.cfg) and: change line batch to batch 64 change line subdivisions to subdivisions 8 change line classes 80 to your number of objects in each of 3 yolo layers: change filters 255 to filters (classes + 5)x3 in the 3 convolutional before each yolo layer So if classes 1 then should be filters 18 . If classes 2 then write filters 21 . (Do not write in the cfg file: filters (classes + 5)x3) (Generally filters depends on the classes , coords and number of mask s, i.e. filters (classes + coords + 1) , where mask is indices of anchors. If mask is absence, then filters (classes + coords + 1) num ) So for example, for 2 objects, your file yolo obj.cfg should differ from yolov3.cfg in such lines in each of 3 yolo layers: convolutional filters 21 region classes 2 2. Create file obj.names in the directory build\darknet\x64\data\ , with objects names each in new line 3. Create file obj.data in the directory build\darknet\x64\data\ , containing (where classes number of objects ): classes 2 train data/train.txt valid data/test.txt names data/obj.names backup backup/ 4. Put image files (.jpg) of your objects in the directory build\darknet\x64\data\obj\ 5. You should label each object on images from your dataset. Use this visual GUI software for marking bounded boxes of objects and generating annotation files for Yolo v2 & v3: It will create .txt file for each .jpg image file in the same directory and with the same name, but with .txt extension, and put to file: object number and object coordinates on this image, for each object in new line: Where: integer object number from 0 to (classes 1) float values relative to width and height of image, it can be equal from (0.0 to 1.0 for example: / or / atention: are center of rectangle (are not top left corner) For example for img1.jpg you will be created img1.txt containing: 1 0.716797 0.395833 0.216406 0.147222 0 0.687109 0.379167 0.255469 0.158333 1 0.420312 0.395833 0.140625 0.166667 6. Create file train.txt in directory build\darknet\x64\data\ , with filenames of your images, each filename in new line, with path relative to darknet.exe , for example containing: data/obj/img1.jpg data/obj/img2.jpg data/obj/img3.jpg 7. Download pre trained weights for the convolutional layers (154 MB): and put to the directory build\darknet\x64 8. Start training by using the command line: darknet.exe detector train data/obj.data yolo obj.cfg darknet53.conv.74 To train on Linux use command: ./darknet detector train data/obj.data yolo obj.cfg darknet53.conv.74 (just use ./darknet instead of darknet.exe ) (file yolo obj_last.weights will be saved to the build\darknet\x64\backup\ for each 100 iterations) (file yolo obj_xxxx.weights will be saved to the build\darknet\x64\backup\ for each 1000 iterations) (To disable Loss Window use darknet.exe detector train data/obj.data yolo obj.cfg darknet53.conv.74 dont_show , if you train on computer without monitor like a cloud Amazaon EC2) 8.1. For training with mAP (mean average precisions) calculation for each 4 Epochs (set valid valid.txt or train.txt in obj.data file) and run: darknet.exe detector train data/obj.data yolo obj.cfg darknet53.conv.74 map 9. After training is complete get result yolo obj_final.weights from path build\darknet\x64\backup\ After each 100 iterations you can stop and later start training from this point. For example, after 2000 iterations you can stop training, and later just copy yolo obj_2000.weights from build\darknet\x64\backup\ to build\darknet\x64\ and start training using: darknet.exe detector train data/obj.data yolo obj.cfg yolo obj_2000.weights (in the original repository the weights file is saved only once every 10 000 iterations if(iterations > 1000) ) Also you can get result earlier than all 45000 iterations. Note: If during training you see nan values for avg (loss) field then training goes wrong, but if nan is in some other lines then training goes well. Note: If you changed width or height in your cfg file, then new width and height must be divisible by 32. Note: After training use such command for detection: darknet.exe detector test data/obj.data yolo obj.cfg yolo obj_8000.weights Note: if error Out of memory occurs then in .cfg file you should increase subdivisions 16 , 32 or 64: link How to train tiny yolo (to detect your custom objects): Do all the same steps as for the full yolo model as described above. With the exception of: Download default weights file for yolov3 tiny: Get pre trained weights yolov3 tiny.conv.15 using command: darknet.exe partial cfg/yolov3 tiny.cfg yolov3 tiny.weights yolov3 tiny.conv.15 15 Make your custom model yolov3 tiny obj.cfg based on cfg/yolov3 tiny_obj.cfg instead of yolov3.cfg Start training: darknet.exe detector train data/obj.data yolov3 tiny obj.cfg yolov3 tiny.conv.15 For training Yolo based on other models ( DenseNet201 Yolo or ResNet50 Yolo ), you can download and get pre trained weights as showed in this file: If you made you custom model that isn't based on other models, then you can train it without pre trained weights, then will be used random initial weights. When should I stop training: Usually sufficient 2000 iterations for each class(object), but not less than 4000 iterations in total. But for a more precise definition when you should stop training, use the following manual: 1. During training, you will see varying indicators of error, and you should stop when no longer decreases 0.XXXXXXX avg : > Region Avg IOU: 0.798363, Class: 0.893232, Obj: 0.700808, No Obj: 0.004567, Avg Recall: 1.000000, count: 8 > Region Avg IOU: 0.800677, Class: 0.892181, Obj: 0.701590, No Obj: 0.004574, Avg Recall: 1.000000, count: 8 > > 9002 : 0.211667, 0.060730 avg , 0.001000 rate, 3.868000 seconds, 576128 images > Loaded: 0.000000 seconds 9002 iteration number (number of batch) 0.060730 avg average loss (error) the lower, the better When you see that average loss 0.xxxxxx avg no longer decreases at many iterations then you should stop training. 2. Once training is stopped, you should take some of last .weights files from darknet\build\darknet\x64\backup and choose the best of them: For example, you stopped training after 9000 iterations, but the best result can give one of previous weights (7000, 8000, 9000). It can happen due to overfitting. Overfitting is case when you can detect objects on images from training dataset, but can't detect objects on any others images. You should get weights from Early Stopping Point : ! Overfitting To get weights from Early Stopping Point: 2.1. At first, in your file obj.data you must specify the path to the validation dataset valid valid.txt (format of valid.txt as in train.txt ), and if you haven't validation images, just copy data\train.txt to data\valid.txt . 2.2 If training is stopped after 9000 iterations, to validate some of previous weights use this commands: (If you use another GitHub repository, then use darknet.exe detector recall ... instead of darknet.exe detector map ...) darknet.exe detector map data/obj.data yolo obj.cfg backup\yolo obj_7000.weights darknet.exe detector map data/obj.data yolo obj.cfg backup\yolo obj_8000.weights darknet.exe detector map data/obj.data yolo obj.cfg backup\yolo obj_9000.weights And comapre last output lines for each weights (7000, 8000, 9000): Choose weights file with the highest mAP (mean average precision) or IoU (intersect over union) For example, bigger mAP gives weights yolo obj_8000.weights then use this weights for detection . Or just train with map flag: darknet.exe detector train data/obj.data yolo obj.cfg darknet53.conv.74 map So you will see mAP chart (red line) in the Loss chart Window. mAP will be calculated for each 4 Epochs using valid valid.txt file that is specified in obj.data file ( 1 Epoch images_in_train_txt / batch iterations) ! loss_chart_map_chart Example of custom object detection: darknet.exe detector test data/obj.data yolo obj.cfg yolo obj_8000.weights IoU (intersect over union) average instersect over union of objects and detections for a certain threshold 0.24 mAP (mean average precision) mean value of average precisions for each class, where average precision is average value of 11 points on PR curve for each possible threshold (each probability of detection) for the same class (Precision Recall in terms of PascalVOC, where Precision TP/(TP+FP) and Recall TP/(TP+FN) ), page 11: mAP is default metric of precision in the PascalVOC competition, this is the same as AP50 metric in the MS COCO competition. In terms of Wiki, indicators Precision and Recall have a slightly different meaning than in the PascalVOC competition, but IoU always has the same meaning . ! precision_recall_iou How to calculate mAP on PascalVOC 2007: 1. To calculate mAP (mean average precision) on PascalVOC 2007 test: Download PascalVOC dataset, install Python 3.x and get file 2007_test.txt as described here: Then download file to the dir build\darknet\x64\data\ then run voc_label_difficult.py to get the file difficult_2007_test.txt Remove symbol from this line to un comment it: Then there are 2 ways to get mAP: 1. Using Darknet + Python: run the file build/darknet/x64/calc_mAP_voc_py.cmd you will get mAP for yolo voc.cfg model, mAP 75.9% 2. Using this fork of Darknet: run the file build/darknet/x64/calc_mAP.cmd you will get mAP for yolo voc.cfg model, mAP 75.8% (The article specifies the value of mAP 76.8% for YOLOv2 416×416, page 4 table 3: We get values lower perhaps due to the fact that the model was trained on a slightly different source code than the code on which the detection is was done) if you want to get mAP for tiny yolo voc.cfg model, then un comment line for tiny yolo voc.cfg and comment line for yolo voc.cfg in the .cmd file if you have Python 2.x instead of Python 3.x, and if you use Darknet+Python way to get mAP, then in your cmd file use reval_voc.py and voc_eval.py instead of reval_voc_py3.py and voc_eval_py3.py from this directory: Custom object detection: Example of custom object detection: darknet.exe detector test data/obj.data yolo obj.cfg yolo obj_8000.weights ! Yolo_v2_training ! Yolo_v2_training How to improve object detection: 1. Before training: set flag random 1 in your .cfg file it will increase precision by training Yolo for different resolutions: link increase network resolution in your .cfg file ( height 608 , width 608 or any value multiple of 32) it will increase precision recalculate anchors for your dataset for width and height from cfg file: darknet.exe detector calc_anchors data/obj.data num_of_clusters 9 width 416 height 416 then set the same 9 anchors in each of 3 yolo layers in your cfg file check that each object are mandatory labeled in your dataset no one object in your data set should not be without label. In the most training issues there are wrong labels in your dataset (got labels by using some conversion script, marked with a third party tool, ...). Always check your dataset by using: desirable that your training dataset include images with objects at diffrent: scales, rotations, lightings, from different sides, on different backgrounds you should preferably have 2000 different images for each class or more, and you should train 2000 classes iterations or more desirable that your training dataset include images with non labeled objects that you do not want to detect negative samples without bounded box (empty .txt files) use as many images of negative samples as there are images with objects for training with a large number of objects in each image, add the parameter max 200 or higher value in the last yolo layer or region layer in your cfg file (the global maximum number of objects that can be detected by YoloV3 is 0,0615234375 (width height) where are width and height are parameters from net section in cfg file) for training for small objects set layers 1, 11 instead of and set stride 4 instead of If you train the model to distinguish Left and Right objects as separate classes (left/right hand, left/right turn on road signs, ...) then for disabling flip data augmentation add flip 0 here: General rule your training dataset should include such a set of relative sizes of objects that you want to detect: train_network_width train_obj_width / train_image_width detection_network_width detection_obj_width / detection_image_width train_network_height train_obj_height / train_image_height detection_network_height detection_obj_height / detection_image_height I.e. for each object from Test dataset there must be at least 1 object in the Training dataset with about the same relative size: object width in percent from Training dataset object width in percent from Test dataset That is, if only objects that occupied 80 90% of the image were present in the training set, then the trained network will not be able to detect objects that occupy 1 10% of the image. to speedup training (with decreasing detection accuracy) do Fine Tuning instead of Transfer Learning, set param stopbackward 1 here: then do this command: ./darknet partial cfg/yolov3.cfg yolov3.weights yolov3.conv.81 81 will be created file yolov3.conv.81 , then train by using weights file yolov3.conv.81 instead of darknet53.conv.74 2. After training for detection: Increase network resolution by set in your .cfg file ( height 608 and width 608 ) or ( height 832 and width 832 ) or (any value multiple of 32) this increases the precision and makes it possible to detect small objects: link it is not necessary to train the network again, just use .weights file already trained for 416x416 resolution but to get even greater accuracy you should train with higher resolution 608x608 or 832x832, note: if error Out of memory occurs then in .cfg file you should increase subdivisions 16 , 32 or 64: link How to mark bounded boxes of objects and create annotation files: Here you can find repository with GUI software for marking bounded boxes of objects and generating annotation files for Yolo v2 & v3: With example of: train.txt , obj.names , obj.data , yolo obj.cfg , air 1 6 .txt , bird 1 4 .txt for 2 classes of objects (air, bird) and train_obj.cmd with example how to train this image set with Yolo v2 & v3 Using Yolo9000 Simultaneous detection and classification of 9000 objects: darknet.exe detector test cfg/combine9k.data cfg/yolo9000.cfg yolo9000.weights data/dog.jpg yolo9000.weights (186 MB Yolo9000 Model) requires 4 GB GPU RAM: yolo9000.cfg cfg file of the Yolo9000, also there are paths to the 9k.tree and coco9k.map 9k.tree WordTree of 9418 categories , if parent_id 1 then this label hasn't parent: coco9k.map map 80 categories from MSCOCO to WordTree 9k.tree : combine9k.data data file, there are paths to: 9k.labels , 9k.names , inet9k.map , (change path to your combine9k.train.list ): 9k.labels 9418 labels of objects: 9k.names 9418 names of objects: inet9k.map map 200 categories from ImageNet to WordTree 9k.tree : How to use Yolo as DLL 1. To compile Yolo as C++ DLL file yolo_cpp_dll.dll open in MSVS2015 file build\darknet\yolo_cpp_dll.sln , set x64 and Release , and do the: Build > Build yolo_cpp_dll You should have installed CUDA 9.1 To use cuDNN do: (right click on project) > properties > C/C++ > Preprocessor > Preprocessor Definitions, and add at the beginning of line: CUDNN; 2. To use Yolo as DLL file in your C++ console application open in MSVS2015 file build\darknet\yolo_console_dll.sln , set x64 and Release , and do the: Build > Build yolo_console_dll you can run your console application from Windows Explorer build\darknet\x64\yolo_console_dll.exe use this command : yolo_console_dll.exe data/coco.names yolov3.cfg yolov3.weights test.mp4 or you can run from MSVS2015 (before this you should copy 2 files yolo voc.cfg and yolo voc.weights to the directory build\darknet\ ) after launching your console application and entering the image file name you will see info for each object: to use simple OpenCV GUI you should uncomment line // define OPENCV in yolo_console_dll.cpp file: link you can see source code of simple example for detection on the video file: link yolo_cpp_dll.dll API: link class Detector { public: Detector(std::string cfg_filename, std::string weight_filename, int gpu_id 0); Detector(); std::vector detect(std::string image_filename, float thresh 0.2, bool use_mean false); std::vector detect(image_t img, float thresh 0.2, bool use_mean false); static image_t load_image(std::string image_filename); static void free_image(image_t m); ifdef OPENCV std::vector detect(cv::Mat mat, float thresh 0.2, bool use_mean false); endif };",Object Detection,Object Detection 2897,Computer Vision,Computer Vision,Computer Vision,"py faster rcnn has been deprecated. Please see Detectron , which includes an implementation of Mask R CNN . Disclaimer The official Faster R CNN code (written in MATLAB) is available here . If your goal is to reproduce the results in our NIPS 2015 paper, please use the official code . This repository contains a Python reimplementation of the MATLAB code. This Python implementation is built on a fork of Fast R CNN . There are slight differences between the two implementations. In particular, this Python port is 10% slower at test time, because some operations execute on the CPU in Python layers (e.g., 220ms / image vs. 200ms / image for VGG16) gives similar, but not exactly the same, mAP as the MATLAB version is not compatible with models trained using the MATLAB code due to the minor implementation differences includes approximate joint training that is 1.5x faster than alternating optimization (for VGG16) see these slides for more information Faster R CNN: Towards Real Time Object Detection with Region Proposal Networks By Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun (Microsoft Research) This Python implementation contains contributions from Sean Bell (Cornell) written during an MSR internship. Please see the official README.md for more details. Faster R CNN was initially described in an arXiv tech report and was subsequently published in NIPS 2015. License Faster R CNN is released under the MIT License (refer to the LICENSE file for details). Citing Faster R CNN If you find Faster R CNN useful in your research, please consider citing: @inproceedings{renNIPS15fasterrcnn, Author {Shaoqing Ren and Kaiming He and Ross Girshick and Jian Sun}, Title {Faster {R CNN}: Towards Real Time Object Detection with Region Proposal Networks}, Booktitle {Advances in Neural Information Processing Systems ({NIPS})}, Year {2015} } Contents 1. Requirements: software ( requirements software) 2. Requirements: hardware ( requirements hardware) 3. Basic installation ( installation sufficient for the demo) 4. Demo ( demo) 5. Beyond the demo: training and testing ( beyond the demo installation for training and testing models) 6. Usage ( usage) Requirements: software NOTE If you are having issues compiling and you are using a recent version of CUDA/cuDNN, please consult this issue for a workaround 1. Requirements for Caffe and pycaffe (see: Caffe installation instructions ) Note: Caffe must be built with support for Python layers! make In your Makefile.config, make sure to have this line uncommented WITH_PYTHON_LAYER : 1 Unrelatedly, it's also recommended that you use CUDNN USE_CUDNN : 1 You can download my Makefile.config for reference. 2. Python packages you might not have: cython , python opencv , easydict 3. Optional MATLAB is required for official PASCAL VOC evaluation only. The code now includes unofficial Python evaluation code. Requirements: hardware 1. For training smaller networks (ZF, VGG_CNN_M_1024) a good GPU (e.g., Titan, K20, K40, ...) with at least 3G of memory suffices 2. For training Fast R CNN with VGG16, you'll need a K40 (11G of memory) 3. For training the end to end version of Faster R CNN with VGG16, 3G of GPU memory is sufficient (using CUDNN) Installation (sufficient for the demo) 1. Clone the Faster R CNN repository Shell Make sure to clone with recursive git clone recursive 2. We'll call the directory that you cloned Faster R CNN into FRCN_ROOT Ignore notes 1 and 2 if you followed step 1 above. Note 1: If you didn't clone Faster R CNN with the recursive flag, then you'll need to manually clone the caffe fast rcnn submodule: Shell git submodule update init recursive Note 2: The caffe fast rcnn submodule needs to be on the faster rcnn branch (or equivalent detached state). This will happen automatically if you followed step 1 instructions . 3. Build the Cython modules Shell cd $FRCN_ROOT/lib make 4. Build Caffe and pycaffe Shell cd $FRCN_ROOT/caffe fast rcnn Now follow the Caffe installation instructions here: If you're experienced with Caffe and have all of the requirements installed and your Makefile.config in place, then simply do: make j8 && make pycaffe 5. Download pre computed Faster R CNN detectors Shell cd $FRCN_ROOT ./data/scripts/fetch_faster_rcnn_models.sh This will populate the $FRCN_ROOT/data folder with faster_rcnn_models . See data/README.md for details. These models were trained on VOC 2007 trainval. Demo After successfully completing basic installation ( installation sufficient for the demo) , you'll be ready to run the demo. To run the demo Shell cd $FRCN_ROOT ./tools/demo.py The demo performs detection using a VGG16 network trained for detection on PASCAL VOC 2007. Beyond the demo: installation for training and testing models 1. Download the training, validation, test data and VOCdevkit Shell wget wget wget 2. Extract all of these tars into one directory named VOCdevkit Shell tar xvf VOCtrainval_06 Nov 2007.tar tar xvf VOCtest_06 Nov 2007.tar tar xvf VOCdevkit_08 Jun 2007.tar 3. It should have this basic structure Shell $VOCdevkit/ development kit $VOCdevkit/VOCcode/ VOC utility code $VOCdevkit/VOC2007 image sets, annotations, etc. ... and several other directories ... 4. Create symlinks for the PASCAL VOC dataset Shell cd $FRCN_ROOT/data ln s $VOCdevkit VOCdevkit2007 Using symlinks is a good idea because you will likely want to share the same PASCAL dataset installation between multiple projects. 5. Optional follow similar steps to get PASCAL VOC 2010 and 2012 6. Optional If you want to use COCO, please see some notes under data/README.md 7. Follow the next sections to download pre trained ImageNet models Download pre trained ImageNet models Pre trained ImageNet models can be downloaded for the three networks described in the paper: ZF and VGG16. Shell cd $FRCN_ROOT ./data/scripts/fetch_imagenet_models.sh VGG16 comes from the Caffe Model Zoo , but is provided here for your convenience. ZF was trained at MSRA. Usage To train and test a Faster R CNN detector using the alternating optimization algorithm from our NIPS 2015 paper, use experiments/scripts/faster_rcnn_alt_opt.sh . Output is written underneath $FRCN_ROOT/output . Shell cd $FRCN_ROOT ./experiments/scripts/faster_rcnn_alt_opt.sh GPU_ID NET set ... GPU_ID is the GPU you want to train on NET in {ZF, VGG_CNN_M_1024, VGG16} is the network arch to use set ... allows you to specify fast_rcnn.config options, e.g. set EXP_DIR seed_rng1701 RNG_SEED 1701 ( alt opt refers to the alternating optimization training algorithm described in the NIPS paper.) To train and test a Faster R CNN detector using the approximate joint training method, use experiments/scripts/faster_rcnn_end2end.sh . Output is written underneath $FRCN_ROOT/output . Shell cd $FRCN_ROOT ./experiments/scripts/faster_rcnn_end2end.sh GPU_ID NET set ... GPU_ID is the GPU you want to train on NET in {ZF, VGG_CNN_M_1024, VGG16} is the network arch to use set ... allows you to specify fast_rcnn.config options, e.g. set EXP_DIR seed_rng1701 RNG_SEED 1701 This method trains the RPN module jointly with the Fast R CNN network, rather than alternating between training the two. It results in faster ( 1.5x speedup) training times and similar detection accuracy. See these slides for more details. Artifacts generated by the scripts in tools are written in this directory. Trained Fast R CNN networks are saved under: output/ / / Test outputs are saved under: output/ / / /",Object Detection,Object Detection 2905,Computer Vision,Computer Vision,Computer Vision,"Faster R CNN and Mask R CNN in PyTorch 1.0 This project aims at providing the necessary building blocks for easily creating detection and segmentation models using PyTorch 1.0. ! alt text (demo/demo_e2e_mask_rcnn_X_101_32x8d_FPN_1x.png from Highlights PyTorch 1.0: RPN, Faster R CNN and Mask R CNN implementations that matches or exceeds Detectron accuracies Very fast : up to 2x faster than Detectron and 30% faster than mmdetection during training. See MODEL_ZOO.md (MODEL_ZOO.md) for more details. Memory efficient: uses roughly 500MB less GPU memory than mmdetection during training Multi GPU training and inference Batched inference: can perform inference using multiple images per batch per GPU CPU support for inference: runs on CPU in inference time. See our webcam demo (demo) for an example Provides pre trained models for almost all reference Mask R CNN and Faster R CNN configurations with 1x schedule. Webcam and Jupyter notebook demo We provide a simple webcam demo that illustrates how you can use maskrcnn_benchmark for inference: bash cd demo by default, it runs on the GPU for best results, use min image size 800 python webcam.py min image size 800 can also run it on the CPU python webcam.py min image size 300 MODEL.DEVICE cpu or change the model that you want to use python webcam.py config file ../configs/caffe2/e2e_mask_rcnn_R_101_FPN_1x_caffe2.yaml min image size 300 MODEL.DEVICE cpu in order to see the probability heatmaps, pass show mask heatmaps python webcam.py min image size 300 show mask heatmaps MODEL.DEVICE cpu for the keypoint demo python webcam.py config file ../configs/caffe2/e2e_keypoint_rcnn_R_50_FPN_1x_caffe2.yaml min image size 300 MODEL.DEVICE cpu A notebook with the demo can be found in demo/Mask_R CNN_demo.ipynb (demo/Mask_R CNN_demo.ipynb). Installation Check INSTALL.md (INSTALL.md) for installation instructions. Model Zoo and Baselines Pre trained models, baselines and comparison with Detectron and mmdetection can be found in MODEL_ZOO.md (MODEL_ZOO.md) Inference in a few lines We provide a helper class to simplify writing inference pipelines using pre trained models. Here is how we would do it. Run this from the demo folder: python from maskrcnn_benchmark.config import cfg from predictor import COCODemo config_file ../configs/caffe2/e2e_mask_rcnn_R_50_FPN_1x_caffe2.yaml update the config options with the config file cfg.merge_from_file(config_file) manual override some options cfg.merge_from_list( MODEL.DEVICE , cpu ) coco_demo COCODemo( cfg, min_image_size 800, confidence_threshold 0.7, ) load image and then run prediction image ... predictions coco_demo.run_on_opencv_image(image) Perform training on COCO dataset For the following examples to work, you need to first install maskrcnn_benchmark . You will also need to download the COCO dataset. We recommend to symlink the path to the coco dataset to datasets/ as follows We use minival and valminusminival sets from Detectron bash symlink the coco dataset cd /github/maskrcnn benchmark mkdir p datasets/coco ln s /path_to_coco_dataset/annotations datasets/coco/annotations ln s /path_to_coco_dataset/train2014 datasets/coco/train2014 ln s /path_to_coco_dataset/test2014 datasets/coco/test2014 ln s /path_to_coco_dataset/val2014 datasets/coco/val2014 or use COCO 2017 version ln s /path_to_coco_dataset/annotations datasets/coco/annotations ln s /path_to_coco_dataset/train2017 datasets/coco/train2017 ln s /path_to_coco_dataset/test2017 datasets/coco/test2017 ln s /path_to_coco_dataset/val2017 datasets/coco/val2017 for pascal voc dataset: ln s /path_to_VOCdevkit_dir datasets/voc P.S. COCO_2017_train COCO_2014_train + valminusminival , COCO_2017_val minival You can also configure your own paths to the datasets. For that, all you need to do is to modify maskrcnn_benchmark/config/paths_catalog.py to point to the location where your dataset is stored. You can also create a new paths_catalog.py file which implements the same two classes, and pass it as a config argument PATHS_CATALOG during training. Single GPU training Most of the configuration files that we provide assume that we are running on 8 GPUs. In order to be able to run it on fewer GPUs, there are a few possibilities: 1. Run the following without modifications bash python /path_to_maskrcnn_benchmark/tools/train_net.py config file /path/to/config/file.yaml This should work out of the box and is very similar to what we should do for multi GPU training. But the drawback is that it will use much more GPU memory. The reason is that we set in the configuration files a global batch size that is divided over the number of GPUs. So if we only have a single GPU, this means that the batch size for that GPU will be 8x larger, which might lead to out of memory errors. If you have a lot of memory available, this is the easiest solution. 2. Modify the cfg parameters If you experience out of memory errors, you can reduce the global batch size. But this means that you'll also need to change the learning rate, the number of iterations and the learning rate schedule. Here is an example for Mask R CNN R 50 FPN with the 1x schedule: bash python tools/train_net.py config file configs/e2e_mask_rcnn_R_50_FPN_1x.yaml SOLVER.IMS_PER_BATCH 2 SOLVER.BASE_LR 0.0025 SOLVER.MAX_ITER 720000 SOLVER.STEPS (480000, 640000) TEST.IMS_PER_BATCH 1 This follows the scheduling rules from Detectron. Note that we have multiplied the number of iterations by 8x (as well as the learning rate schedules), and we have divided the learning rate by 8x. We also changed the batch size during testing, but that is generally not necessary because testing requires much less memory than training. Multi GPU training We use internally torch.distributed.launch in order to launch multi gpu training. This utility function from PyTorch spawns as many Python processes as the number of GPUs we want to use, and each Python process will only use a single GPU. bash export NGPUS 8 python m torch.distributed.launch nproc_per_node $NGPUS /path_to_maskrcnn_benchmark/tools/train_net.py config file path/to/config/file.yaml Abstractions For more information on some of the main abstractions in our implementation, see ABSTRACTIONS.md (ABSTRACTIONS.md). Adding your own dataset This implementation adds support for COCO style datasets. But adding support for training on a new dataset can be done as follows: python from maskrcnn_benchmark.structures.bounding_box import BoxList class MyDataset(object): def __init__(self, ...): as you would do normally def __getitem__(self, idx): load the image as a PIL Image image ... load the bounding boxes as a list of list of boxes in this case, for illustrative purposes, we use x1, y1, x2, y2 order. boxes 0, 0, 10, 10 , 10, 20, 50, 50 and labels labels torch.tensor( 10, 20 ) create a BoxList from the boxes boxlist BoxList(boxes, image.size, mode xyxy ) add the labels to the boxlist boxlist.add_field( labels , labels) if self.transforms: image, boxlist self.transforms(image, boxlist) return the image, the boxlist and the idx in your dataset return image, boxlist, idx def get_img_info(self, idx): get img_height and img_width. This is used if we want to split the batches according to the aspect ratio of the image, as it can be more efficient than loading the image from disk return { height : img_height, width : img_width} That's it. You can also add extra fields to the boxlist, such as segmentation masks (using structures.segmentation_mask.SegmentationMask ), or even your own instance type. For a full example of how the COCODataset is implemented, check maskrcnn_benchmark/data/datasets/coco.py (maskrcnn_benchmark/data/datasets/coco.py). Once you have created your dataset, it needs to be added in a couple of places: maskrcnn_benchmark/data/datasets/__init__.py (maskrcnn_benchmark/data/datasets/__init__.py): add it to __all__ maskrcnn_benchmark/config/paths_catalog.py (maskrcnn_benchmark/config/paths_catalog.py): DatasetCatalog.DATASETS and corresponding if clause in DatasetCatalog.get() Testing While the aforementioned example should work for training, we leverage the cocoApi for computing the accuracies during testing. Thus, test datasets should currently follow the cocoApi for now. To enable your dataset for testing, add a corresponding if statement in maskrcnn_benchmark/data/datasets/evaluation/__init__.py (maskrcnn_benchmark/data/datasets/evaluation/__init__.py): python if isinstance(dataset, datasets.MyDataset): return coco_evaluation( args) Finetuning from Detectron weights on custom datasets Create a script tools/trim_detectron_model.py like here . You can decide which keys to be removed and which keys to be kept by modifying the script. Then you can simply point the converted model path in the config file by changing MODEL.WEIGHT . For further information, please refer to 15 . Troubleshooting If you have issues running or compiling this code, we have compiled a list of common issues in TROUBLESHOOTING.md (TROUBLESHOOTING.md). If your issue is not present there, please feel free to open a new issue. Citations Please consider citing this project in your publications if it helps your research. The following is a BibTeX reference. The BibTeX entry requires the url LaTeX package. @misc{massa2018mrcnn, author {Massa, Francisco and Girshick, Ross}, title {{maskrcnn benchmark: Fast, modular reference implementation of Instance Segmentation and Object Detection algorithms in PyTorch}}, year {2018}, howpublished {\url{ note {Accessed: Insert date here } } Projects using maskrcnn benchmark RetinaMask: Learning to predict masks improves state of the art single shot detection for free . Cheng Yang Fu, Mykhailo Shvets, and Alexander C. Berg. Tech report, arXiv,1901.03353. License maskrcnn benchmark is released under the MIT license. See LICENSE (LICENSE) for additional details.",Object Detection,Object Detection 2909,Computer Vision,Computer Vision,Computer Vision,"faster rcnn tf This repository had been edited by Xueqian Zhang, an edition to transformat video into picture. You can use this project to train infrared image set or other customized sets. Note : the image set provided by this project can be downloaded from here . This repository is based on the project of Xinlei Chen (xinleic@cs.cmu.edu), tf faster rcnn A Tensorflow implementation of faster RCNN detection framework by Xinlei Chen (xinleic@cs.cmu.edu). This repository is based on the python Caffe implementation of faster RCNN available here . Note : Several minor modifications are made when reimplementing the framework, which give potential improvements. For details about the modifications and ablative analysis, please refer to the technical report An Implementation of Faster RCNN with Study for Region Sampling . If you are seeking to reproduce the results in the original paper, please use the official code or maybe the semi official code . For details about the faster RCNN architecture please refer to the paper Faster R CNN: Towards Real Time Object Detection with Region Proposal Networks . Detection Performance The current code supports VGG16 , Resnet V1 and Mobilenet V1 models. We mainly tested it on plain VGG16 and Resnet101 (thank you @philokey!) architecture. As the baseline, we report numbers using a single model on a single convolution layer, so no multi scale, no multi stage bounding box regression, no skip connection, no extra input is used. The only data augmentation technique is left right flipping during training following the original Faster RCNN. All models are released. With VGG16 ( conv5_3 ): Train on VOC 2007 trainval and test on VOC 2007 test, 70.8 . Train on VOC 2007+2012 trainval and test on VOC 2007 test ( R FCN schedule), 75.7 . Train on COCO 2014 trainval35k and test on minival ( Iterations : 900k/1190k), 30.2 . With Resnet101 (last conv4 ): Train on VOC 2007 trainval and test on VOC 2007 test, 75.7 . Train on VOC 2007+2012 trainval and test on VOC 2007 test (R FCN schedule), 79.8 . Train on COCO 2014 trainval35k and test on minival (900k/1190k), 35.4 . More Results: Train Mobilenet (1.0, 224) on COCO 2014 trainval35k and test on minival (900k/1190k), 21.8 . Train Resnet50 on COCO 2014 trainval35k and test on minival (900k/1190k), 32.4 . Train Resnet152 on COCO 2014 trainval35k and test on minival (900k/1190k), 36.1 . Approximate baseline setup from FPN (this repository does not contain training code for FPN yet): Train Resnet50 on COCO 2014 trainval35k and test on minival (900k/1190k), 34.2 . Train Resnet101 on COCO 2014 trainval35k and test on minival (900k/1190k), 37.4 . Train Resnet152 on COCO 2014 trainval35k and test on minival (900k/1190k), 38.2 . Note : Due to the randomness in GPU training with Tensorflow especially for VOC, the best numbers are reported (with 2 3 attempts) here. According to my experience, for COCO you can almost always get a very close number (within 0.2%) despite the randomness. The numbers are obtained with the default testing scheme which selects region proposals using non maximal suppression (TEST.MODE nms), the alternative testing scheme (TEST.MODE top) will likely result in slightly better performance (see report , for COCO it boosts 0.X AP). Since we keep the small proposals (\< 16 pixels width/height), our performance is especially good for small objects. We do not set a threshold (instead of 0.05) for a detection to be included in the final result, which increases recall. Weight decay is set to 1e 4. For other minor modifications, please check the report . Notable ones include using crop_and_resize , and excluding ground truth boxes in RoIs during training. For COCO, we find the performance improving with more iterations, and potentially better performance can be achieved with even more iterations. For Resnets, we fix the first block (total 4) when fine tuning the network, and only use crop_and_resize to resize the RoIs (7x7) without max pool (which I find useless especially for COCO). The final feature maps are average pooled for classification and regression. All batch normalization parameters are fixed. Learning rate for biases is not doubled. For Mobilenets, we fix the first five layers when fine tuning the network. All batch normalization parameters are fixed. Weight decay for Mobilenet layers is set to 4e 5. For approximate FPN baseline setup we simply resize the image with 800 pixels, add 32^2 anchors, and take 1000 proposals during testing. Check out here / here / here for the latest models, including longer COCO VGG16 models and Resnet ones. ! (data/imgs/gt.png) ! (data/imgs/pred.png) : : : : Displayed Ground Truth on Tensorboard Displayed Predictions on Tensorboard Additional features Additional features not mentioned in the report are added to make research life easier: Support for train and validation . During training, the validation data will also be tested from time to time to monitor the process and check potential overfitting. Ideally training and validation should be separate, where the model is loaded every time to test on validation. However I have implemented it in a joint way to save time and GPU memory. Though in the default setup the testing data is used for validation, no special attempts is made to overfit on testing set. Support for resuming training . I tried to store as much information as possible when snapshoting, with the purpose to resume training from the latest snapshot properly. The meta information includes current image index, permutation of images, and random state of numpy. However, when you resume training the random seed for tensorflow will be reset (not sure how to save the random state of tensorflow now), so it will result in a difference. Note that, the current implementation still cannot force the model to behave deterministically even with the random seeds set. Suggestion/solution is welcome and much appreciated. Support for visualization . The current implementation will summarize ground truth boxes, statistics of losses, activations and variables during training, and dump it to a separate folder for tensorboard visualization. The computing graph is also saved for debugging. Prerequisites A basic Tensorflow installation. The code follows r1.2 format. If you are using r1.0, please check out the r1.0 branch to fix the slim Resnet block issue. If you are using an older version (r0.1 r0.12), please check out the r0.12 branch. While it is not required, for experimenting the original RoI pooling (which requires modification of the C++ code in tensorflow), you can check out my tensorflow fork and look for tf.image.roi_pooling . Python packages you might not have: cython , opencv python , easydict (similar to py faster rcnn ). For easydict make sure you have the right version. I use 1.6. Docker users: Since the recent upgrade, the docker image on docker hub is no longer valid. However, you can still build your own image by using dockerfile located at docker folder (cuda 8 version, as it is required by Tensorflow r1.0.) And make sure following Tensorflow installation to install and use nvidia docker Last, after launching the container, you have to build the Cython modules within the running container. Installation 1. Clone the repository Shell git clone 2. Update your arch in setup script to match your GPU Shell cd tf faster rcnn/lib Change the GPU architecture ( arch) if necessary vim setup.py GPU model Architecture TitanX (Maxwell/Pascal) sm_52 GTX 960M sm_50 GTX 1080 (Ti) sm_61 Grid K520 (AWS g2.2xlarge) sm_30 Tesla K80 (AWS p2.xlarge) sm_37 Note : You are welcome to contribute the settings on your end if you have made the code work properly on other GPUs. Also even if you are only using CPU tensorflow, GPU based code (for NMS) will be used by default, so please set USE_GPU_NMS False to get the correct output. 3. Build the Cython modules Shell make clean make cd .. 4. Install the Python COCO API . The code requires the API to access COCO dataset. Shell cd data git clone cd coco/PythonAPI make cd ../../.. Setup data Please follow the instructions of py faster rcnn here to setup VOC and COCO datasets (Part of COCO is done). The steps involve downloading data and optionally creating soft links in the data folder. Since faster RCNN does not rely on pre computed proposals, it is safe to ignore the steps that setup proposals. If you find it useful, the data/cache folder created on my side is also shared here . Demo and Test with pre trained models 1. Download pre trained model Shell Resnet101 for voc pre trained on 07+12 set ./data/scripts/fetch_faster_rcnn_models.sh Note : if you cannot download the models through the link, or you want to try more models, you can check out the following solutions and optionally update the downloading script: Another server here . Google drive here . 2. Create a folder and a soft link to use the pre trained model Shell NET res101 TRAIN_IMDB voc_2007_trainval+voc_2012_trainval mkdir p output/${NET}/${TRAIN_IMDB} cd output/${NET}/${TRAIN_IMDB} ln s ../../../data/voc_2007_trainval+voc_2012_trainval ./default cd ../../.. 3. Demo for testing on custom images Shell at repository root GPU_ID 0 CUDA_VISIBLE_DEVICES ${GPU_ID} ./tools/demo.py Note : Resnet101 testing probably requires several gigabytes of memory, so if you encounter memory capacity issues, please install it with CPU support only. Refer to Issue 25 . 4. Test with pre trained Resnet101 models Shell GPU_ID 0 ./experiments/scripts/test_faster_rcnn.sh $GPU_ID pascal_voc_0712 res101 Note : If you cannot get the reported numbers (79.8 on my side), then probably the NMS function is compiled improperly, refer to Issue 5 . Train your own model 1. Download pre trained models and weights. The current code support VGG16 and Resnet V1 models. Pre trained models are provided by slim, you can get the pre trained models here and set them in the data/imagenet_weights folder. For example for VGG16 model, you can set up like: Shell mkdir p data/imagenet_weights cd data/imagenet_weights wget v tar xzvf vgg_16_2016_08_28.tar.gz mv vgg_16.ckpt vgg16.ckpt cd ../.. For Resnet101, you can set up like: Shell mkdir p data/imagenet_weights cd data/imagenet_weights wget v tar xzvf resnet_v1_101_2016_08_28.tar.gz mv resnet_v1_101.ckpt res101.ckpt cd ../.. 2. Train (and test, evaluation) Shell ./experiments/scripts/train_faster_rcnn.sh GPU_ID DATASET NET GPU_ID is the GPU you want to test on NET in {vgg16, res50, res101, res152} is the network arch to use DATASET {pascal_voc, pascal_voc_0712, coco} is defined in train_faster_rcnn.sh Examples: ./experiments/scripts/train_faster_rcnn.sh 0 pascal_voc vgg16 ./experiments/scripts/train_faster_rcnn.sh 1 coco res101 Note : Please double check you have deleted soft link to the pre trained models before training. If you find NaNs during training, please refer to Issue 86 . Also if you want to have multi gpu support, check out Issue 121 . 3. Visualization with Tensorboard Shell tensorboard logdir tensorboard/vgg16/voc_2007_trainval/ port 7001 & tensorboard logdir tensorboard/vgg16/coco_2014_train+coco_2014_valminusminival/ port 7002 & 4. Test and evaluate Shell ./experiments/scripts/test_faster_rcnn.sh GPU_ID DATASET NET GPU_ID is the GPU you want to test on NET in {vgg16, res50, res101, res152} is the network arch to use DATASET {pascal_voc, pascal_voc_0712, coco} is defined in test_faster_rcnn.sh Examples: ./experiments/scripts/test_faster_rcnn.sh 0 pascal_voc vgg16 ./experiments/scripts/test_faster_rcnn.sh 1 coco res101 5. You can use tools/reval.sh for re evaluation By default, trained networks are saved under: output/ NET / DATASET /default/ Test outputs are saved under: output/ NET / DATASET /default/ SNAPSHOT / Tensorboard information for train and validation is saved under: tensorboard/ NET / DATASET /default/ tensorboard/ NET / DATASET /default_val/ The default number of training iterations is kept the same to the original faster RCNN for VOC 2007, however I find it is beneficial to train longer (see report for COCO), probably due to the fact that the image batch size is one. For VOC 07+12 we switch to a 80k/110k schedule following R FCN . Also note that due to the nondeterministic nature of the current implementation, the performance can vary a bit, but in general it should be within 1% of the reported numbers for VOC, and 0.2% of the reported numbers for COCO. Suggestions/Contributions are welcome. Citation If you find this implementation or the analysis conducted in our report helpful, please consider citing: @article{chen17implementation, Author {Xinlei Chen and Abhinav Gupta}, Title {An Implementation of Faster RCNN with Study for Region Sampling}, Journal {arXiv preprint arXiv:1702.02138}, Year {2017} } Or for a formal paper, Spatial Memory Network : @article{chen2017spatial, title {Spatial Memory for Context Reasoning in Object Detection}, author {Chen, Xinlei and Gupta, Abhinav}, journal {arXiv preprint arXiv:1704.04224}, year {2017} } For convenience, here is the faster RCNN citation: @inproceedings{renNIPS15fasterrcnn, Author {Shaoqing Ren and Kaiming He and Ross Girshick and Jian Sun}, Title {Faster {R CNN}: Towards Real Time Object Detection with Region Proposal Networks}, Booktitle {Advances in Neural Information Processing Systems ({NIPS})}, Year {2015} }",Object Detection,Object Detection 2914,Computer Vision,Computer Vision,Computer Vision,"Deformable Convolutional Networks Update 12/01/2018 We updated the deformable convolution operator to be the same as those utilized in the Deformale ConvNets v2 paper. A possible issue when the sampling location is outside of image boundary is solved. The issue may cause deteriated performance on ImageNet classification. Note that the current deformable conv layers in both the official MXNet and the PyTorch codebase still have the issue. So if you want to reproduce the results in Deformable ConvNets v2, please utilize the updated layer provided here. The efficiency at large image batch size is also improved. See more details in DCNv2_op/README.md . The full codebase of Deformable ConvNets v2 would be available later. But it should be easy to reproduce the results with the updated operator. 10/2017 We released the training/testing code and pre trained models of Deformable FPN, which is the foundation of our COCO detection 2017 entry. Slides at COCO 2017 workshop . A third party improvement of Deformable R FCN + Soft NMS Introduction Deformable ConvNets is initially described in an ICCV 2017 oral paper . (Slides at ICCV 2017 Oral ) R FCN is initially described in a NIPS 2016 paper . Disclaimer This is an official implementation for Deformable Convolutional Networks (Deformable ConvNets) based on MXNet. It is worth noticing that: The original implementation is based on our internal Caffe version on Windows. There are slight differences in the final accuracy and running time due to the plenty details in platform switch. The code is tested on official MXNet@(commit 62ecb60) with the extra operators for Deformable ConvNets. After MXNet@(commit ce2bca6) the offical MXNet support all operators for Deformable ConvNets. We trained our model based on the ImageNet pre trained ResNet v1 101 using a model converter . The converted model produces slightly lower accuracy (Top 1 Error on ImageNet val: 24.0% v.s. 23.6%). This repository used code from MXNet rcnn example and mx rfcn . License © Microsoft, 2017. Licensed under an MIT license. Citing Deformable ConvNets If you find Deformable ConvNets useful in your research, please consider citing: @article{dai17dcn, Author {Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, Yichen Wei}, Title {Deformable Convolutional Networks}, Journal {arXiv preprint arXiv:1703.06211}, Year {2017} } @inproceedings{dai16rfcn, Author {Jifeng Dai, Yi Li, Kaiming He, Jian Sun}, Title {{R FCN}: Object Detection via Region based Fully Convolutional Networks}, Conference {NIPS}, Year {2016} } Main Results training data testing data mAP@0.5 mAP@0.7 time R FCN, ResNet v1 101 VOC 07+12 trainval VOC 07 test 79.6 63.1 0.16s Deformable R FCN, ResNet v1 101 VOC 07+12 trainval VOC 07 test 82.3 67.8 0.19s training data testing data mAP mAP@0.5 mAP@0.75 mAP@S mAP@M mAP@L R FCN, ResNet v1 101 coco trainval coco test dev 32.1 54.3 33.8 12.8 34.9 46.1 Deformable R FCN, ResNet v1 101 coco trainval coco test dev 35.7 56.8 38.3 15.2 38.8 51.5 Faster R CNN (2fc), ResNet v1 101 coco trainval coco test dev 30.3 52.1 31.4 9.9 32.2 47.4 Deformable Faster R CNN (2fc), ResNet v1 101 coco trainval coco test dev 35.0 55.0 38.3 14.3 37.7 52.0 training data testing data mAP mAP@0.5 mAP@0.75 mAP@S mAP@M mAP@L FPN+OHEM, ResNet v1 101 coco trainval35k coco minival 37.8 60.8 41.0 22.0 41.5 49.8 Deformable FPN + OHEM, ResNet v1 101 coco trainval35k coco minival 41.2 63.5 45.5 24.3 44.9 54.4 FPN + OHEM + Soft NMS + multi scale testing, ResNet v1 101 coco trainval35k coco minival 40.9 62.5 46.0 27.1 44.1 52.2 Deformable FPN + OHEM + Soft NMS + multi scale testing, ResNet v1 101 coco trainval35k coco minival 44.4 65.5 50.2 30.8 47.3 56.4 training data testing data mIoU time DeepLab, ResNet v1 101 Cityscapes train Cityscapes val 70.3 0.51s Deformable DeepLab, ResNet v1 101 Cityscapes train Cityscapes val 75.2 0.52s DeepLab, ResNet v1 101 VOC 12 train (augmented) VOC 12 val 70.7 0.08s Deformable DeepLab, ResNet v1 101 VOC 12 train (augmented) VOC 12 val 75.9 0.08s Running time is counted on a single Maxwell Titan X GPU (mini batch size is 1 in inference). Requirements: Software 1. MXNet from the offical repository . We tested our code on MXNet@(commit 62ecb60) . Due to the rapid development of MXNet, it is recommended to checkout this version if you encounter any issues. We may maintain this repository periodically if MXNet adds important feature in future release. 2. Python 2.7. We recommend using Anaconda2 as it already includes many common packages. We do not support Python 3 yet, if you want to use Python 3 you need to modify the code to make it work. 3. Python packages might missing: cython, opencv python > 3.2.0, easydict. If pip is set up on your system, those packages should be able to be fetched and installed by running pip install r requirements.txt 4. For Windows users, Visual Studio 2015 is needed to compile cython module. Requirements: Hardware Any NVIDIA GPUs with at least 4GB memory should be OK. Installation 1. Clone the Deformable ConvNets repository, and we'll call the directory that you cloned Deformable ConvNets as ${DCN_ROOT}. git clone 2. For Windows users, run cmd .\init.bat . For Linux user, run sh ./init.sh . The scripts will build cython module automatically and create some folders. 3. Install MXNet: Note: The MXNet's Custom Op cannot execute parallelly using multi gpus after this PR . We strongly suggest the user rollback to version MXNet@(commit 998378a) for training (following Section 3.2 3.5). Quick start 3.1 Install MXNet and all dependencies by pip install r requirements.txt If there is no other error message, MXNet should be installed successfully. Build from source (alternative way) 3.2 Clone MXNet and checkout to MXNet@(commit 998378a) by git clone recursive git checkout 998378a git submodule update if it's the first time to checkout, just use: git submodule update init recursive 3.3 Compile MXNet cd ${MXNET_ROOT} make j $(nproc) USE_OPENCV 1 USE_BLAS openblas USE_CUDA 1 USE_CUDA_PATH /usr/local/cuda USE_CUDNN 1 3.4 Install the MXNet Python binding by Note: If you will actively switch between different versions of MXNet, please follow 3.5 instead of 3.4 cd python sudo python setup.py install 3.5 For advanced users, you may put your Python packge into ./external/mxnet/$(YOUR_MXNET_PACKAGE) , and modify MXNET_VERSION in ./experiments/rfcn/cfgs/ .yaml to $(YOUR_MXNET_PACKAGE) . Thus you can switch among different versions of MXNet quickly. 4. For Deeplab, we use the argumented VOC 2012 dataset. The argumented annotations are provided by SBD dataset. For convenience, we provide the converted PNG annotations and the lists of train/val images, please download them from OneDrive . Demo & Deformable Model We provide trained deformable convnet models, including the deformable R FCN & Faster R CNN models trained on COCO trainval, and the deformable DeepLab model trained on CityScapes train. 1. To use the demo with our pre trained deformable models, please download manually from OneDrive or BaiduYun , and put it under folder model/ . Make sure it looks like this: ./model/rfcn_dcn_coco 0000.params ./model/rfcn_coco 0000.params ./model/fpn_dcn_coco 0000.params ./model/fpn_coco 0000.params ./model/rcnn_dcn_coco 0000.params ./model/rcnn_coco 0000.params ./model/deeplab_dcn_cityscapes 0000.params ./model/deeplab_cityscapes 0000.params ./model/deform_conv 0000.params ./model/deform_psroi 0000.params 2. To run the R FCN demo, run python ./rfcn/demo.py By default it will run Deformable R FCN and gives several prediction results, to run R FCN, use python ./rfcn/demo.py rfcn_only 3. To run the DeepLab demo, run python ./deeplab/demo.py By default it will run Deformable Deeplab and gives several prediction results, to run DeepLab, use python ./deeplab/demo.py deeplab_only 4. To visualize the offset of deformable convolution and deformable psroipooling, run python ./rfcn/deform_conv_demo.py python ./rfcn/deform_psroi_demo.py Preparation for Training & Testing For R FCN/Faster R CNN\: 1. Please download COCO and VOC 2007+2012 datasets, and make sure it looks like this: ./data/coco/ ./data/VOCdevkit/VOC2007/ ./data/VOCdevkit/VOC2012/ 2. Please download ImageNet pretrained ResNet v1 101 model manually from OneDrive , and put it under folder ./model . Make sure it looks like this: ./model/pretrained_model/resnet_v1_101 0000.params For DeepLab\: 1. Please download Cityscapes and VOC 2012 datasets and make sure it looks like this: ./data/cityscapes/ ./data/VOCdevkit/VOC2012/ 2. Please download argumented VOC 2012 annotations/image lists, and put the argumented annotations and the argumented train/val lists into: ./data/VOCdevkit/VOC2012/SegmentationClass/ ./data/VOCdevkit/VOC2012/ImageSets/Main/ , Respectively. 2. Please download ImageNet pretrained ResNet v1 101 model manually from OneDrive , and put it under folder ./model . Make sure it looks like this: ./model/pretrained_model/resnet_v1_101 0000.params Usage 1. All of our experiment settings (GPU , dataset, etc.) are kept in yaml config files at folder ./experiments/rfcn/cfgs , ./experiments/faster_rcnn/cfgs and ./experiments/deeplab/cfgs/ . 2. Eight config files have been provided so far, namely, R FCN for COCO/VOC, Deformable R FCN for COCO/VOC, Faster R CNN(2fc) for COCO/VOC, Deformable Faster R CNN(2fc) for COCO/VOC, Deeplab for Cityscapes/VOC and Deformable Deeplab for Cityscapes/VOC, respectively. We use 8 and 4 GPUs to train models on COCO and on VOC for R FCN, respectively. For deeplab, we use 4 GPUs for all experiments. 3. To perform experiments, run the python scripts with the corresponding config file as input. For example, to train and test deformable convnets on COCO with ResNet v1 101, use the following command python experiments\rfcn\rfcn_end2end_train_test.py cfg experiments\rfcn\cfgs\resnet_v1_101_coco_trainval_rfcn_dcn_end2end_ohem.yaml A cache folder would be created automatically to save the model and the log under output/rfcn_dcn_coco/ . 4. Please find more details in config files and in our code. Misc. Code has been tested under: Ubuntu 14.04 with a Maxwell Titan X GPU and Intel Xeon CPU E5 2620 v2 @ 2.10GHz Windows Server 2012 R2 with 8 K40 GPUs and Intel Xeon CPU E5 2650 v2 @ 2.60GHz Windows Server 2012 R2 with 4 Pascal Titan X GPUs and Intel Xeon CPU E5 2650 v4 @ 2.30GHz FAQ Q: It says AttributeError: 'module' object has no attribute 'DeformableConvolution' . A: This is because either you forget to copy the operators to your MXNet folder or you copy to the wrong path or you forget to re compile or you install the wrong MXNet Please print mxnet.__path__ to make sure you use correct MXNet Q: I encounter segment fault at the beginning. A: A compatibility issue has been identified between MXNet and opencv python 3.0+. We suggest that you always import cv2 first before import mxnet in the entry script. Q: I find the training speed becomes slower when training for a long time. A: It has been identified that MXNet on Windows has this problem. So we recommend to run this program on Linux. You could also stop it and resume the training process to regain the training speed if you encounter this problem. Q: Can you share your caffe implementation? A: Due to several reasons (code is based on a old, internal Caffe, port to public Caffe needs extra work, time limit, etc.). We do not plan to release our Caffe code. Since current MXNet convolution implementation is very similar to Caffe (almost the same), it is easy to port to Caffe by yourself, the core CUDA code could be kept unchanged. Anyone who wish to do it is welcome to make a pull request.",Object Detection,Object Detection 1929,Computer Vision,Computer Vision,Computer Vision,"densenet This repository hosts the contributor source files for the densenet model. ModelHub integrates these files into an engine and controlled runtime environment. A unified API allows for out of the box reproducible implementations of published models. For more information, please visit www.modelhub.ai or contact us info@modelhub.ai (mailto:info@modelhub.ai). meta id c0048222 b29f 4719 be6b cac790251a19 application_area ImageNet task Classification task_extended ImageNet classification data_type Image/Photo data_source publication title Densely Connected Convolutional Networks source arXiv url year 2016 authors Gao Huang, Zhuang Liu, Laurens van der Maaten, Kilian Q. Weinberger abstract Recent work has shown that convolutional networks can be substantially deeper, more accurate, and efficient to train if they contain shorter connections between layers close to the input and those close to the output. In this paper, we embrace this observation and introduce the Dense Convolutional Network (DenseNet), which connects each layer to every other layer in a feed forward fashion. Whereas traditional convolutional networks with L layers have L connections one between each layer and its subsequent layer our network has L(L+1)/2 direct connections. For each layer, the feature maps of all preceding layers are used as inputs, and its own feature maps are used as inputs into all subsequent layers. DenseNets have several compelling advantages: they alleviate the vanishing gradient problem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters. We evaluate our proposed architecture on four highly competitive object recognition benchmark tasks (CIFAR 10, CIFAR 100, SVHN, and ImageNet). DenseNets obtain significant improvements over the state of the art on most of them, whilst requiring less computation to achieve high performance. Code and pre trained models are available at this URL. google_scholar bibtex @article{DBLP:journals/corr/HuangLW16a, author {Gao Huang and Zhuang Liu and Kilian Q. Weinberger}, title {Densely Connected Convolutional Networks}, journal {CoRR}, volume {abs/1608.06993}, year {2016}, url { archivePrefix {arXiv}, eprint {1608.06993}, timestamp {Mon, 10 Sep 2018 15:49:32 +0200}, biburl { bibsource {dblp computer science bibliography, model description DenseNet increases the depth of convolutional networks by simplifying the connectivity pattern between layers. It exploits the full potential of the network through feature reuse. provenance architecture Convolutional Neural Network (CNN) learning_type Supervised learning format .h5 I/O model I/O can be viewed here (contrib_src/model/config.json) license model license can be viewed here (contrib_src/license/model) run To run this model and view others in the collection, view the instructions on ModelHub . contribute To contribute models, visit the ModelHub docs .",Image Classification,Image Classification 1931,Computer Vision,Computer Vision,Computer Vision,"inception v3 This repository hosts the contributor source files for the inception v3 model. ModelHub integrates these files into an engine and controlled runtime environment. A unified API allows for out of the box reproducible implementations of published models. For more information, please visit www.modelhub.ai or contact us info@modelhub.ai (mailto:info@modelhub.ai). meta id 001bb1c9 bbaf 48ca bf4a 505faca870dd application_area ImageNet task Classification task_extended ImageNet classification data_type Image/Photo data_source publication title Rethinking the Inception Architecture for Computer Vision source Arxiv url year 2015 authors Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, Zbigniew Wojna abstract Convolutional networks are at the core of most state of the art computer vision solutions for a wide variety of tasks. Since 2014 very deep convolutional networks started to become mainstream, yielding substantial gains in various benchmarks. Although increased model size and computational cost tend to translate to immediate quality gains for most tasks (as long as enough labeled data is provided for training), computational efficiency and low parameter count are still enabling factors for various use cases such as mobile vision and big data scenarios. Here we explore ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization. We benchmark our methods on the ILSVRC 2012 classification challenge validation set demonstrate substantial gains over the state of the art: 21.2% top 1 and 5.6% top 5 error for single frame evaluation using a network with a computational cost of 5 billion multiply adds per inference and with using less than 25 million parameters. With an ensemble of 4 models and multi crop evaluation, we report 3.5% top 5 error on the validation set (3.6% error on the test set) and 17.3% top 1 error on the validation set. google_scholar bibtex @article{DBLP:journals/corr/SzegedyVISW15, author {Christian Szegedy and Vincent Vanhoucke and Sergey Ioffe and Jonathon Shlens and Zbigniew Wojna}, title {Rethinking the Inception Architecture for Computer Vision}, journal {CoRR}, volume {abs/1512.00567}, year {2015}, url { archivePrefix {arXiv}, eprint {1512.00567}, timestamp {Mon, 13 Aug 2018 16:49:07 +0200}, biburl { bibsource {dblp computer science bibliography, model description Inception v3 introduces a few upgrades over the previous inception networks. It reduces representational bottlenecks as well as utilize smart factorization methods making convolutions computationally efficient. provenance architecture Convolutional Neural Network (CNN) learning_type Supervised learning format .h5 I/O model I/O can be viewed here (contrib_src/model/config.json) license model license can be viewed here (contrib_src/license/model) run To run this model and view others in the collection, view the instructions on ModelHub . contribute To contribute models, visit the ModelHub docs .",Image Classification,Image Classification 1932,Computer Vision,Computer Vision,Computer Vision,"network in network This repository hosts the contributor source files for the network in network model. ModelHub integrates these files into an engine and controlled runtime environment. A unified API allows for out of the box reproducible implementations of published models. For more information, please visit www.modelhub.ai or contact us info@modelhub.ai (mailto:info@modelhub.ai). meta id b73d1ee2 c1c2 4a7f 951e 6ec569a8dc98 application_area ImageNet task Classification task_extended ImageNet classification data_type Image/Photo data_source publication title Network In Network source arXiv url year 2014 authors Min Lin, Qiang Chen, Shuicheng Yan abstract We propose a novel deep network structure called 'Network In Network' (NIN) to enhance model discriminability for local patches within the receptive field. The conventional convolutional layer uses linear filters followed by a nonlinear activation function to scan the input. Instead, we build micro neural networks with more complex structures to abstract the data within the receptive field. We instantiate the micro neural network with a multilayer perceptron, which is a potent function approximator. The feature maps are obtained by sliding the micro networks over the input in a similar manner as CNN; they are then fed into the next layer. Deep NIN can be implemented by stacking mutiple of the above described structure. With enhanced local modeling via the micro network, we are able to utilize global average pooling over feature maps in the classification layer, which is easier to interpret and less prone to overfitting than traditional fully connected layers. We demonstrated the state of the art classification performances with NIN on CIFAR 10 and CIFAR 100, and reasonable performances on SVHN and MNIST datasets. google_scholar bibtex @article{DBLP:journals/corr/LinCY13, author {Min Lin and Qiang Chen and Shuicheng Yan}, title {Network In Network}, journal {CoRR}, volume {abs/1312.4400}, year {2013}, url { archivePrefix {arXiv}, eprint {1312.4400}, timestamp {Mon, 13 Aug 2018 16:47:07 +0200}, biburl { bibsource {dblp computer science bibliography, model description This network consists of multi layer perceptron convolutional layers which use multilayer perceptrons to convolve the input and a global average pooling layer as a replacement for the fully connected layers in conventional CNN. provenance architecture Convolutional Neural Network (CNN) learning_type Supervised learning format .json I/O model I/O can be viewed here (contrib_src/model/config.json) license model license can be viewed here (contrib_src/license/model) run To run this model and view others in the collection, view the instructions on ModelHub . contribute To contribute models, visit the ModelHub docs .",Image Classification,Image Classification 1933,Computer Vision,Computer Vision,Computer Vision,"xception This repository hosts the contributor source files for the xception model. ModelHub integrates these files into an engine and controlled runtime environment. A unified API allows for out of the box reproducible implementations of published models. For more information, please visit www.modelhub.ai or contact us info@modelhub.ai (mailto:info@modelhub.ai). meta id dbf35840 9c8d 408f 8a36 b9fe2f8e5546 application_area ImageNet task Classification task_extended ImageNet classification data_type Image/Photo data_source publication title Xception: Deep Learning with Depthwise Separable Convolutions source Arxiv url year 2016 authors Francois Chollet abstract We present an interpretation of Inception modules in convolutional neural networks as being an intermediate step in between regular convolution and the depthwise separable convolution operation (a depthwise convolution followed by a pointwise convolution). In this light, a depthwise separable convolution can be understood as an Inception module with a maximally large number of towers. This observation leads us to propose a novel deep convolutional neural network architecture inspired by Inception, where Inception modules have been replaced with depthwise separable convolutions. We show that this architecture, dubbed Xception, slightly outperforms Inception V3 on the ImageNet dataset (which Inception V3 was designed for), and significantly outperforms Inception V3 on a larger image classification dataset comprising 350 million images and 17,000 classes. Since the Xception architecture has the same number of parameters as Inception V3, the performance gains are not due to increased capacity but rather to a more efficient use of model parameters. google_scholar bibtex @article{DBLP:journals/corr/Chollet16a, author {Francois Chollet}, title {Xception: Deep Learning with Depthwise Separable Convolutions}, journal {CoRR}, volume {abs/1610.02357}, year {2016}, url { archivePrefix {arXiv}, eprint {1610.02357}, timestamp {Mon, 13 Aug 2018 16:46:20 +0200}, biburl { bibsource {dblp computer science bibliography, model description Xception is inspired by Inception and introduces modified depthwise separable convolutions. provenance architecture Convolutional Neural Network (CNN) learning_type Supervised learning format .h5 I/O model I/O can be viewed here (contrib_src/model/config.json) license model license can be viewed here (contrib_src/license/model) run To run this model and view others in the collection, view the instructions on ModelHub . contribute To contribute models, visit the ModelHub docs .",Image Classification,Image Classification 1936,Computer Vision,Computer Vision,Computer Vision,GAN_lesion_filling Lesion filling with (CC)GANs and Patrial Convolutions Attribuitions to and,Image Classification,Image Classification 1941,Computer Vision,Computer Vision,Computer Vision,"SENet.pytorch An implementation of SENet, proposed in Squeeze and Excitation Networks by Jie Hu, Li Shen and Gang Sun, who are the winners of ILSVRC 2017 classification competition. Now SE ResNet (18, 34, 50, 101, 152/20, 32) and SE Inception v3 are implemented. python cifar.py runs SE ResNet20 with Cifar10 dataset. python imagenet.py IMAGENET_ROOT runs SE ResNet50 with ImageNet(2012) dataset. + You need to prepare dataset by yourself + First download files and then follow the instruction . + The number of workers and some hyper parameters are fixed so check and change them if you need. + This script uses all GPUs available. To specify GPUs, use CUDA_VISIBLE_DEVICES variable. (e.g. CUDA_VISIBLE_DEVICES 1,2 to use GPU 1 and 2) For SE Inception v3, the input size is required to be 299x299 as the original Inception . Pre requirements Python> 3.6 PyTorch> 1.0 torchvision For training To run cifar.py or imagenet.py , you need pip install git+ hub You can use some SE ResNet ( se_resnet{20, 56, 50, 101} ) via torch.hub . python import torch.hub hub_model torch.hub.load( 'moskomule/senet.pytorch', 'se_resnet20', num_classes 10) Also, a pretrained SE ResNet50 model is available. python import torch.hub hub_model torch.hub.load( 'moskomule/senet.pytorch', 'se_resnet50', pretrained True,) Result SE ResNet20/Cifar10 python cifar.py baseline ResNet20 SE ResNet20 (reduction 4 or 8) : : : max. test accuracy 92% 93% SE ResNet50/ImageNet The initial learning rate and mini batch size are different from the original version because of my computational resource . ResNet SE ResNet : : : max. test accuracy(top1) 76.15 %( ) 77.06% ( ) + ( ): ResNet 50 in torchvision + ( ): When using imagenet.py with the distributed setting on 8 GPUs. The weight is available . python senet se_resnet50(num_classes 1000) senet.load_state_dict(torch.load( weight.pkl )) References paper authors' Caffe implementation",Image Classification,Image Classification 1955,Computer Vision,Computer Vision,Computer Vision,"MNIST reconstruction using Convnet, Neuralnet and CapsuleNets Deep Convolutional GAN The below GIF displays the sample of images generated from epoch 1 to 50 at every 5 epochs. Conv layers enable GANs to generate better images much faster than neural net. Each epoch takes around 60 seconds ! Images_generated_using_conv_net (/images/gan_cnn/digits/cnn_epoch_1_50.gif?raw true Images Generated using Conv Layers in GAN architecture ) Graph of Loss over 50 epochs ! Graph1 (/images/gan_cnn/conv_gan_loss.png?raw true Graph of the loss over 50 epochs ) Deep Neural GAN The below GIF displays the sample of images generated from epoch 1 to 200 at every 20 epochs. Neural net enables GANs to generate decent images but after much longer training epochs. Each epoch takes around 15 seconds. ! Images_generated_using_conv_net (/images/gan_neuralnet/digits/gan_nn_epoch_1_to_200.gif?raw true Images Generated using NeuralNet Layers in GAN architecture ) Capsule Nets The below GIF displays the sample of images generated from epoch 1 to 9 at every epoch. At the decoder end a 28x28 image is reconstructed by passing the latent vector along with its true class variable through two fully connected layers Each epoch takes around 55 mins seconds. ! Images_generated_using_caps_net (/images/capsulenet/Selected/epochs.gif?raw true Images Generated using CapsNet ) Graph of Loss over 9 epochs ! Graph3 (/images/capsulenet/capsnet_graph.jpg?raw true Graph of the loss and accuracy over 9 epochs ) Libraries Tensorflow Keras openCV PIL numpy Refrences GANs, Overview of GANs, Capsule Nets,",Image Classification,Image Classification 1957,Computer Vision,Computer Vision,Computer Vision,"Disclaimer Under Development chestai This repo is dedicated to prepare an automatic chest disease classification using deep learning. Data NIH Chest X ray Dataset of 14 Common Thorax Disease Categories: (1, Atelectasis; 2, Cardiomegaly; 3, Effusion; 4, Infiltration; 5, Mass; 6, Nodule; 7, Pneumonia; 8, Pneumothorax; 9, Consolidation; 10, Edema; 11, Emphysema; 12, Fibrosis; 13, Pleural_Thickening; 14 Hernia) References 1 Xiaosong Wang, Yifan Peng, Le Lu, Zhiyong Lu, Mohammadhadi Bagheri, Ronald Summers, ChestX ray8: Hospital scale Chest X ray Database and Benchmarks on Weakly Supervised Classification and Localization of Common Thorax Diseases, IEEE CVPR, pp. 3462 3471, 2017 2 Hoo chang Shin, Kirk Roberts, Le Lu, Dina Demner Fushman, Jianhua Yao, Ronald M. Summers, Learning to Read Chest X Rays: Recurrent Neural Cascade Model for Automated Image Annotation, IEEE CVPR, pp. 2497 2506, 2016 3 : Huang et. al., Gao Huang, Zhuang Liu, Laurens van der Maaten, Kilian Q. Weinberger Densely Connected Convolutional Networks Updates Coming Soon",Image Classification,Image Classification 1970,Computer Vision,Computer Vision,Computer Vision,"Kaggle CIFAR 10 Code for CIFAR 10 competition. Summary Description Model Very Deep Convolutional Networks with 3x3 kernel 1 Data Augmentation cropping, horizontal reflection 2 and scaling. see lib/data_augmentation.lua Preprocessing Global Contrast Normalization (GCN) and ZCA whitening. see lib/preprocessing.lua Training Time 20 hours on GTX760. Prediction Time 2.5 hours on GTX760. Result 0.93320 (single model). 0.94150 (average 6 models) Neural Network Configurations Layer type Parameters input size: 24x24, channel: 3 convolution kernel: 3x3, channel: 64, padding: 1 relu convolution kernel: 3x3, channel: 64, padding: 1 relu max pooling kernel: 2x2, stride: 2 dropout rate: 0.25 convolution kernel: 3x3, channel: 128, padding: 1 relu convolution kernel: 3x3, channel: 128, padding: 1 relu max pooling kernel: 2x2, stride: 2 dropout rate: 0.25 convolution kernel: 3x3, channel: 256, padding: 1 relu convolution kernel: 3x3, channel: 256, padding: 1 relu convolution kernel: 3x3, channel: 256, padding: 1 relu convolution kernel: 3x3, channel: 256, padding: 1 relu max pooling kernel: 2x2, stride: 2 dropout rate: 0.25 linear channel: 1024 relu dropout rate: 0.5 linear channel: 1024 relu dropout rate: 0.5 linear channel: 10 softmax Developer Environment Ubuntu 14.04 15GB RAM (This codebase can run on g2.2xlarge!) CUDA (GTX760 or more higher GPU) Torch7 latest cuda convnet2.torch Installation (This document is outdated. See: Getting started with Torch ) Install CUDA (on Ubuntu 14.04): apt get install nvidia 331 apt get install nvidia cuda toolkit Install Torch7 (see Torch (easy) install ): curl s bash Install(or upgrade) dependency packages: luarocks install torch luarocks install nn luarocks install cutorch luarocks install cunn luarocks install Checking CUDA environment th cuda_test.lua Please check your Torch7/CUDA environment when this code fails. Convert dataset Place the data files into a subfolder ./data. ls ./data test train trainLabels.csv th convert_data.lua Local testing th validate.lua dataset: train test 1 40000 40001 50000 Generating the submission.txt th train.lua th predict.lua MISC Model Averaging Training with different seed parameter for each nodes. (same model, same data, different initial weights, different training order) th train.lua seed 11 th train.lua seed 12 ... th train.lua seed 16 Mount the models directory for each nodes. for example, ec2/node1 , ec2/node2 , .., ec2/node6 . Edit the path of model file in predict_averaging.lua . Run the prediction command. th predict_averaging.lua Network In Network ./nin_model.lua is an implementation of Network In Network 3 . This model gives score of 0.92400. My NIN implementation is 2 layer NIN. Its differ from mavenlin's implementation . I tried to implement the mavenlin's 3 layer NIN. However, I did not get good result. My implementation of 3 layer NIN is here . Bug global_contrast_normalization in ./lib/preprocessing.lua is incorrect implementation (This function is just z score). but I was using this implementation in the competition. Figure data augmentation + preprocessing ! data augmentation preprocessing References 1 Karen Simonyan, Andrew Zisserman, Very Deep Convolutional Networks for Large Scale Image Recognition , link 2 Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, ImageNet Classification with Deep Convolutional Neural Networks , link 3 Min Lin, Qiang Chen, Shuicheng Yan, Network In Network , link 4 R. Collobert, K. Kavukcuoglu, C. Farabet, Torch7: A Matlab like Environment for Machine Learning",Image Classification,Image Classification 1973,Computer Vision,Computer Vision,Computer Vision,"Computer Vision Ferrari Detective Overview This is a completed computer vision project. The objective of this project was to train a convolutional neural network to detect a Ferrari Testarosssa out of a repository of vehichles. Execution This method of using a pre trained model is called transfer learning. Transfer learning gives everyone access to robust models, making machine learning and artificial intelligence widely accessible without access to expensive training computers. From work performed in the deep learning space, research has shown that convolutional networks can be substantially deeper, more accurate, and efficient to train if they contain shorter connections between layers close to the input and those close to the output. This is the reason I chose the the densenet pre trained model. After embedding a pre trained model, in the next step I had to create a fully connected layer or hidden layer. These nodes will act as an input layer to these fully connected layers. Within these layers is where my network will be generating percentages for deciding between a Ferrari Tesarossa or not. Since the output I am looking for only has 2 options, I created a binary classification model compared to categorical model. (The closer the prediction percentage is to 0%, the more it believes the car is a Ferarri Testarossa) ‘Dense’ is the function to add a fully connected layer. These values will always be between the number of input nodes and the output nodes, but choosing the most optimal number of nodes can be achieved only through experimental tries. It took many iterations to realize to get my model to be effective, it required one layer of 128 output nodes, followed by another layer of 1 output node. Finally I had to decide whether or not to use a softmax or sigmoid activation function. Activation functions of a node defines the output of that node, given a set of inputs. I chose sigmoid because it can be preferred over softmax if there are isn’t multiple classes and each input doesn’t have to belong to exactly one class. After training my model, my last epoch had a 99.3% accuracy. Challenges Incorrect Labels: The first data set I used contained about 8100 cars in the training and validation, which I though was a great lead. Unfortunately the labels were incorrect so I was forced to find another dataset. Though unfortunate, this was a useful challenge because it taught me about how to properly label my data for the training to be effective. The way I did so was by creating two directories titled TEST and TRAIN, each with two more directories within titled 'Ferarri Testarossa' and 'Not a Ferrari Testarossa'. Low Accuracy: Initially my accuracy was 60% I was able to dramatically improve my accuracy to about 99.3% by changing the amount of output nodes within my dense fully connected layer. Since changes as these are not consistent and vary accross different models, I predict it worked because for binary classification, it may help if your last layers are closer to one, since the final activation layer will only have one node. High Accuracy, but low Validation Accuracy: Another issue was after changing the amount of output nodes, my accuracy improved, but unfortunately my model still was unable to properly identify Ferrari Testarossas. I identified that my use of regularizers, an overfitting prevention technique, was too powerful. I subsequently removed the regularizers, and was able to gain even higher accuracy and my model was now finally able to now properly predict Testarossas, without a need for preprocessing the data.",Image Classification,Image Classification 1974,Computer Vision,Computer Vision,Computer Vision,"Computer Vision Ferrari Detective Overview This is a completed computer vision project. The objective of this project was to train a convolutional neural network to detect a Ferrari Testarosssa out of a repository of vehichles. Execution This method of using a pre trained model is called transfer learning. Transfer learning gives everyone access to robust models, making machine learning and artificial intelligence widely accessible without access to expensive training computers. From work performed in the deep learning space, research has shown that convolutional networks can be substantially deeper, more accurate, and efficient to train if they contain shorter connections between layers close to the input and those close to the output. This is the reason I chose the the densenet pre trained model. After embedding a pre trained model, in the next step I had to create a fully connected layer or hidden layer. These nodes will act as an input layer to these fully connected layers. Within these layers is where my network will be generating percentages for deciding between a Ferrari Tesarossa or not. Since the output I am looking for only has 2 options, I created a binary classification model compared to categorical model. (The closer the prediction percentage is to 0%, the more it believes the car is a Ferarri Testarossa) ‘Dense’ is the function to add a fully connected layer. These values will always be between the number of input nodes and the output nodes, but choosing the most optimal number of nodes can be achieved only through experimental tries. It took many iterations to realize to get my model to be effective, it required one layer of 128 output nodes, followed by another layer of 1 output node. Finally I had to decide whether or not to use a softmax or sigmoid activation function. Activation functions of a node defines the output of that node, given a set of inputs. I chose sigmoid because it can be preferred over softmax if there are isn’t multiple classes and each input doesn’t have to belong to exactly one class. After training my model, my last epoch had a 99.3% accuracy. Challenges Incorrect Labels: The first data set I used contained about 8100 cars in the training and validation, which I though was a great lead. Unfortunately the labels were incorrect so I was forced to find another dataset. Though unfortunate, this was a useful challenge because it taught me about how to properly label my data for the training to be effective. The way I did so was by creating two directories titled TEST and TRAIN, each with two more directories within titled 'Ferarri Testarossa' and 'Not a Ferrari Testarossa'. Low Accuracy: Initially my accuracy was 60% I was able to dramatically improve my accuracy to about 99.3% by changing the amount of output nodes within my dense fully connected layer. Since changes as these are not consistent and vary accross different models, I predict it worked because for binary classification, it may help if your last layers are closer to one, since the final activation layer will only have one node. High Accuracy, but low Validation Accuracy: Another issue was after changing the amount of output nodes, my accuracy improved, but unfortunately my model still was unable to properly identify Ferrari Testarossas. I identified that my use of regularizers, an overfitting prevention technique, was too powerful. I subsequently removed the regularizers, and was able to gain even higher accuracy and my model was now finally able to now properly predict Testarossas, without a need for preprocessing the data.",Image Classification,Image Classification 1980,Computer Vision,Computer Vision,Computer Vision,"CapsNet Tensorflow Contributions welcome (CONTRIBUTING.md) License Gitter A Tensorflow implementation of CapsNet based on Geoffrey Hinton's paper Dynamic Routing Between Capsules ! capsVSneuron (imgs/capsuleVSneuron.png) > Notes: > 1. The current version supports MNIST and Fashion MNIST datasets. The current test accuracy for MNIST is 99.64% , and Fashion MNIST 90.60% , see details in the Results section > 2. See dist_version (dist_version) for multi GPU support > 3. Here(知乎) is an article explaining my understanding of the paper. It may be helpful in understanding the code. > Important: > > If you need to apply CapsNet model to your own datasets or build up a new model with the basic block of CapsNet, please follow my new project CapsLayer , which is an advanced library for capsule theory, aiming to integrate capsule relevant technologies, provide relevant analysis tools, develop related application examples, and promote the development of capsule theory. For example, you can use capsule layer block in your code easily with the API capsLayer.layers.fully_connected and capsLayer.layers.conv2d Requirements Python NumPy Tensorflow > 1.3 tqdm (for displaying training progress info) scipy (for saving images) Usage Step 1. Download this repository with git or click the download ZIP button. $ git clone $ cd CapsNet Tensorflow Step 2. Download MNIST or Fashion MNIST dataset. In this step, you have two choices: a) Automatic downloading with download_data.py script $ python download_data.py (for mnist dataset) $ python download_data.py dataset fashion mnist save_to data/fashion mnist (for fashion mnist dataset) b) Manual downloading with wget or other tools, move and extract dataset into data/mnist or data/fashion mnist directory, for example: $ mkdir p data/mnist $ wget c P data/mnist $ wget c P data/mnist $ wget c P data/mnist $ wget c P data/mnist $ gunzip data/mnist/ .gz Step 3. Start the training(Using the MNIST dataset by default): $ python main.py $ or training for fashion mnist dataset $ python main.py dataset fashion mnist $ If you need to monitor the training process, open tensorboard with this command $ tensorboard logdir logdir $ or use tail command on linux system $ tail f results/val_acc.csv Step 4. Calculate test accuracy $ python main.py is_training False $ for fashion mnist dataset $ python main.py dataset fashion mnist is_training False > Note: The default parameters of batch size is 128, and epoch 50. You may need to modify the config.py file or use command line parameters to suit your case, e.g. set batch size to 64 and do once test summary every 200 steps: python main.py test_sum_freq 200 batch_size 48 Results The pictures here are plotted by tensorboard and my tool plot_acc.R training loss ! total_loss (results/total_loss.png) ! margin_loss (results/margin_loss.png) ! reconstruction_loss (results/reconstruction_loss.png) Here are the models I trained and my talk and something else: Baidu Netdisk (password:ahjs) The best val error(using reconstruction) Routing iteration 1 3 4 : : : : : : val error 0.36 0.36 0.41 Paper 0.29 0.25 ! test_acc (results/routing_trials.png) > My simple comments for capsule > 1. A new version neural unit(vector in vector out, not scalar in scalar out) > 2. The routing algorithm is similar to attention mechanism > 3. Anyway, a great potential work, a lot to be built upon My weChat: ! my_wechat (/imgs/my_wechat_QR.png) Reference XifengGuo/CapsNet Keras : referred for some code optimizations",Image Classification,Image Classification 2013,Computer Vision,Computer Vision,Computer Vision,"Stochastic Delta Rule implementation using DenseNet in TensorFlow THIS REPOSITORY IS NO LONGER IN USE. THE NEW SDR REPOSITORY CAN BE FOUND HERE . NOTE This is repository is based off of Illarion Khlestov's DenseNet implementation . Check out his blog post about implementing DenseNet in TensorFlow here . Check out @lifeiteng's results from implementing SDR with WaveNet . UPDATE : Due to a bug found by @basveeling which has now been corrected, the testing errors are being recalculated. Here are the preliminary results, which I will continue to update as the results come out. indicates results that have not yet been redone. Model type Depth C10 C100 : : : : DenseNet( k 12) 40 ( ) ( ) DenseNet( k 12) 100 ( ) ( ) DenseNet BC( k 12) 100 ( ) ( ) This repository holds the code for the paper 'Dropout is a special case of the stochastic delta rule: faster and more accurate deep learning' (submitted to NIPS; on arXiv ) Noah Frazier Logue , Stephen Jose Hanson Stochastic Delta Rule (SDR) is a weight update mechanism that assigns to each weight a standard deviation that changes as a function of the gradients every training iteration. At the beginning of each training iteration, the weights are re initialized using a normal distribution bound by their standard deviations. Over the course of the training iterations and epochs, the standard deviations converge towards zero as the network becomes more sure of what the values of each of the weights should be. For a more detailed description of the method and its properties, have a look at the paper link here . Two types of Densely Connected Convolutional Networks (DenseNets) are available: DenseNet without bottleneck layers DenseNet BC with bottleneck layers Each model can be tested on such datasets: CIFAR 10 CIFAR 10+ (with data augmentation) CIFAR 100 CIFAR 100+ (with data augmentation) SVHN A number of layers, blocks, growth rate, image normalization and other training params may be changed trough shell or inside the source code. Usage Example run: python run_dense_net.py depth 40 train test dataset C10 sdr This run uses SDR instead of dropout. To use dropout, run something like python run_dense_net.py depth 40 train test dataset C10 keep_prob 0.8 where keep_prob is the probability (in this case 80%) that a neuron is kept during dropout. NOTE: the sdr argument will override the keep_prob argument. For example: python run_dense_net.py depth 40 train test dataset C10 keep_prob 0.8 sdr will use SDR and not dropout. List all available options: python run_dense_net.py help There are also many other implementations they may be useful. Citation: @article{Huang2016Densely, author {Huang, Gao and Liu, Zhuang and Weinberger, Kilian Q.}, title {Densely Connected Convolutional Networks}, journal {arXiv preprint arXiv:1608.06993}, year {2016} } KNOWN ISSUES model will not save due to graph definiton being larger than 2GB If you see anything wrong, feel free to open an issue! Results from SDR paper This table shows the results on CIFAR shown in the paper. Parameters are all the same as what are used in the paper, except for a batch size of 100 and an epoch size of 100. SDR's beta value was 0.1 and zeta was 0.01. The augmented datasets were not tested on because dropout was not used on these datasets in the original paper, however they may be added in the future (as will the SVHN results and results with higher layer counts). Model type Depth C10 C100 : : : : DenseNet( k 12) 40 2.256(5.160) 09.36(22.60) DenseNet( k 12) 100 1.360 (3.820) 05.16 (11.06) DenseNet BC( k 12) 100 2.520(6.340) 11.12(25.08) Epochs to error rate The below tables show the number of training epochs required to reach a training error of 15, 10, and 5, respectively. For example, the dropout version of DenseNet 40 on CIFAR 10 took 8 epochs to reach a training error of 15, 16 epochs to reach a training error of 10, and 94 epochs to reach a training error of 5. In contrast, the SDR version of DenseNet 40 on CIFAR 10 took 5 epochs to reach a training error of 15, 5 epochs to reach a training error of 10, and 15 epochs to reach a training error of 5. Best results for each value, across both dropout and SDR, are bolded. Dropout Model type Depth C10 C100 : : : : DenseNet( k 12) 40 8 \ 16 \ 94 95 \ \ DenseNet( k 12) 100 8 \ 13 \ 25 28 \ 60 \ DenseNet BC( k 12) 100 10 \ 25 \ \ \ SDR Model type Depth C10 C100 : : : : DenseNet( k 12) 40 5 \ 8 \ 15 27 \ 48 \ DenseNet( k 12) 100 6 \ 9 \ 15 17 \ 21 \ 52 DenseNet BC( k 12) 100 5 \ 8 \ 17 31 \ 87 \ Comparison to original DenseNet implementation with dropout Test results on various datasets. Image normalization per channels was used. Results reported in paper provided in parenthesis. For Cifar+ datasets image normalization was performed before augmentation. This may cause a little bit lower results than reported in paper. Model type Depth C10 C10+ C100 C100+ : : : : : : DenseNet( k 12) 40 6.67(7.00) 5.44(5.24) 27.44(27.55) 25.62(24.42) DenseNet BC( k 12) 100 5.54(5.92) 4.87(4.51) 24.88(24.15) 22.85(22.27) Difference compared to the original implementation The existing model should use identical hyperparameters to the original code. Dependencies Model was tested with Python 3.4.3+ and Python 3.5.2 with and without CUDA. Model should work as expected with TensorFlow > 0.10 FOR DROPOUT ONLY. SDR was added using a development environment with TensorFlow 1.7 so it may require 1.0+. Repo supported with requirements files so the easiest way to install all just run: in case of CPU usage pip install r requirements/cpu.txt . in case of GPU usage pip install r requirements/gpu.txt .",Image Classification,Image Classification 2014,Computer Vision,Computer Vision,Computer Vision,"Stochastic Delta Rule This repository holds the code for the paper 'Dropout is a special case of the stochastic delta rule: faster and more accurate deep learning' (submitted to ICML; on arXiv ) Noah Frazier Logue , Stephen José Hanson Stochastic Delta Rule (SDR) is a weight update mechanism that assigns to each weight a standard deviation that changes as a function of the gradients every training iteration. At the beginning of each training iteration, the weights are re initialized using a normal distribution bound by their standard deviations. Over the course of the training iterations and epochs, the standard deviations converge towards zero as the network becomes more sure of what the values of each of the weights should be. For a more detailed description of the method and its properties, have a look at the paper . Results Here is a TensorBoard instance that shows the results from the paper regarding titration of training epochs and the comparison to dropout (on DN100/CIFAR 100). We show that SDR can reach (and surpass) dropout's level of accuracy in 35 epochs as opposed to dropout's 100 epochs. Note: Results in this repository are more current than what are in the paper due to how often they are updated and how often the arXiv post can be replaced. Dropout Model type Depth C10 C100 : : : : DenseNet( k 12) 40 6.88 27.88 DenseNet( k 12) 100 24.41 DenseNet BC( k 12) 250 23.91 SDR Model type Depth C10 C100 : : : : DenseNet( k 12) 40 5.95 24.58 DenseNet( k 12) 100 21.36 DenseNet BC( k 12) 250 19.79 Two types of Densely Connected Convolutional Networks (DenseNets) are available: DenseNet without bottleneck layers DenseNet BC with bottleneck layers Each model can be tested on such datasets: CIFAR 10 CIFAR 100 ImageNet (results coming soon) A number of layers, blocks, growth rate, image normalization and other training params may be changed trough shell or inside the source code. Usage Example run: python train.py layers 40 no bottleneck growth 12 reduce 1.0 b 100 epochs 100 name DN40_C100_alpha_0.25_beta_0.05_zeta_0.7 tensorboard sdr dataset C100 lr 0.25 beta 0.52 zeta 0.7 This run would train a 40 layer DenseNet model on CIFAR 100 and log the progress to TensorBoard. To use dropout, run something like python train.py layers 40 no bottleneck growth 12 reduce 1.0 b 100 epochs 100 name DN40_C100_do_0.2 tensorboard dataset C100 droprate 0.2 where droprate is the probability (in this case 20%) that a neuron is dropped during dropout. NOTE: the sdr argument will override the droprate argument. For example: python train.py layers 40 no bottleneck growth 12 reduce 1.0 b 100 epochs 100 name DN40_C100_alpha_0.25_beta_0.02_zeta_0.7 tensorboard sdr dataset C100 lr 0.25 beta 0.02 zeta 0.7 droprate 0.2 will use SDR and not dropout. List all available options: python train.py help TensorBoard logs and steps to reproduce results Note: Emphasis below has been placed on test results, but training/encoding optimized TensorBoard logs will be supplied where available. These will be updated as more results are generated. DenseNet 40 on CIFAR 10 TensorBoard logs: Testing Training Command to replicate test results: python train.py layers 40 no bottleneck growth 12 reduce 1.0 b 100 epochs 100 name DN40_C10_alpha_0.25_beta_0.1_zeta_0.999 tensorboard sdr dataset C10 lr 0.25 beta 0.1 zeta 0.999 DenseNet 40 on CIFAR 100 TensorBoard logs: Testing Training Command to replicate test results: python train.py layers 40 no bottleneck growth 12 reduce 1.0 b 100 epochs 100 name DN40_C100_alpha_0.3_beta_0.2_zeta_0.9999 tensorboard sdr dataset C100 lr 0.3 beta 0.2 zeta 0.9999 DenseNet 100 on CIFAR 100 TensorBoard logs: Testing Training Command to replicate test results: python train.py layers 100 no bottleneck growth 12 reduce 1.0 b 100 epochs 100 name DN100_C100_alpha_0.25_beta_0.1_zeta_0.7 tensorboard sdr dataset C100 lr 0.25 beta 0.1 zeta 0.7 DenseNet 250 BC on CIFAR 100 TensorBoard logs: Testing Training Command to replicate test results: python train.py layers 250 growth 12 reduce 1.0 b 100 epochs 100 name DN250_C100_alpha_0.25_beta_0.03_zeta_0.5 tensorboard sdr dataset C100 lr 0.25 beta 0.03 zeta 0.5 ImageNet results will be generated and posted as soon as our institution finishes setting up our account with AWS. The code used is based heavily on Andreas Veit's DenseNet implementation and PyTorch's Vision repository . Dependencies PyTorch NumPy Optional tensorboardX Cite If you use DenseNets in your work, please cite the original paper as: @article{Huang2016Densely, author {Huang, Gao and Liu, Zhuang and Weinberger, Kilian Q.}, title {Densely Connected Convolutional Networks}, journal {arXiv preprint arXiv:1608.06993}, year {2016} }",Image Classification,Image Classification 2044,Computer Vision,Computer Vision,Computer Vision,"Udacity Robotics NanoDegree Program Deep Learning Project Follow Me In this project we will train a deep neural network, especially Fully Convolutional Neural Network (FCN) to identify and track a target in simulation. So called “follow me” applications like this are key to many fields of robotics and the very same techniques you apply here could be extended to scenarios like advanced cruise control in autonomous vehicles or human robot collaboration in industry. image_0 : ./docs/misc/followme.jpg ! alt text image_0 Setup Instructions Clone the repository $ git clone Download the data Save the following three files into the data folder of the cloned repository. Training Data Validation Data Sample Evaluation Data We used above data training and validation for train weight for FCN. Download the QuadSim binary To interface your neural net with the QuadSim simulator, you must use a version QuadSim that has been custom tailored for this project. The previous version that you might have used for the Controls lab will not work. The simulator binary can be downloaded here Software used for Project Training: Windows 8.1 64bit Python 3.x Tensorflow 1.2.1 NumPy 1.11 SciPy 0.17.0 eventlet Flask h5py PIL python socketio scikit image transforms3d PyQt4/Pyqt5 Hardware used for Project Training: Notebook ASUS N56V Intel(R) Core(TM) i7 3630QM RAM 8 GB NVIDIA GeForce 650M (2 GB with 384 core) Implement the Segmentation Network 1. Download the training dataset from above and extract to the project data directory. 2. Implement your solution in model_training.ipynb 3. Train the network locally 4. Continue to experiment with the training data and network until you attain the score you desire. 5. Once you are comfortable with performance on the training dataset, see how it performs in live simulation! Collecting Training Data A simple training dataset has been provided in this project's repository. This dataset will allow you to verify that your segmentation network is semi functional. However, if your interested in improving your score,you may want to collect additional training data. To do it, please see the following steps. The data directory is organized as follows: data/runs contains the results of prediction runs data/train/images contains images for the training set data/train/masks contains masked (labeled) images for the training set data/validation/images contains images for the validation set data/validation/masks contains masked (labeled) images for the validation set data/weights contains trained TensorFlow models data/raw_sim_data/train/run1 data/raw_sim_data/validation/run1 Training Set 1. Run QuadSim 2. Click the DL Training button 3. Set patrol points, path points, and spawn points. TODO add link to data collection doc 3. With the simulator running, press r to begin recording. 4. In the file selection menu navigate to the data/raw_sim_data/train/run1 directory 5. optional to speed up data collection, press 9 (1 9 will slow down collection speed) 6. When you have finished collecting data, hit r to stop recording. 7. To reset the simulator, hit 8. To collect multiple runs create directories data/raw_sim_data/train/run2 , data/raw_sim_data/train/run3 and repeat the above steps. Validation Set To collect the validation set, repeat both sets of steps above, except using the directory data/raw_sim_data/validation instead rather than data/raw_sim_data/train . Image Preprocessing Before the network is trained, the images first need to be undergo a preprocessing step. The preprocessing step transforms the depth masks from the sim, into binary masks suitable for training a neural network. It also converts the images from .png to .jpeg to create a reduced sized dataset, suitable for uploading to AWS. To run preprocessing: $ python preprocess_ims.py Note : If your data is stored as suggested in the steps above, this script should run without error. Important Note 1: Running preprocess_ims.py does not delete files in the processed_data folder. This means if you leave images in processed data and collect a new dataset, some of the data in processed_data will be overwritten some will be left as is. It is recommended to delete the train and validation folders inside processed_data(or the entire folder) before running preprocess_ims.py with a new set of collected data. Important Note 2: The notebook, and supporting code assume your data for training/validation is in data/train, and data/validation. After you run preprocess_ims.py you will have new train , and possibly validation folders in the processed_ims . Rename or move data/train , and data/validation , then move data/processed_ims/train , into data/ , and data/processed_ims/validation also into data/ Important Note 3: Merging multiple train or validation may be difficult, it is recommended that data choices be determined by what you include in raw_sim_data/train/run1 with possibly many different runs in the directory. You can create a temporary folder in data/ and store raw run data you don't currently want to use, but that may be useful for later. Choose which run_x folders to include in raw_sim_data/train , and raw_sim_data/validation , then run preprocess_ims.py from within the 'code/' directory to generate your new training and validation sets. Training, Predicting and Scoring With your training and validation data having been generated or downloaded from the above section of this repository, you are free to begin working with the neural net. Note : Training CNNs is a very compute intensive process. If your system does not have a recent Nvidia graphics card, with cuDNN and CUDA installed , you may need to perform the training step in the cloud. Instructions for using AWS to train your network in the cloud may be found here Training your Model Prerequisites Training data is in data directory Validation data is in the data directory The folders data/train/images/ , data/train/masks/ , data/validation/images/ , and data/validation/masks/ should exist and contain the appropriate data To train complete the network definition in the model_training.ipynb notebook and then run the training cell with appropriate hyperparameters selected. After the training run has completed, your model will be stored in the data/weights directory as an HDF5 file, and a configuration_weights file. As long as they are both in the same location, things should work. Important Note the validation directory is used to store data that will be used during training to produce the plots of the loss, and help determine when the network is overfitting your data. The sample_evalution_data directory contains data specifically designed to test the networks performance on the FollowME task. In sample_evaluation data are three directories each generated using a different sampling method. The structure of these directories is exactly the same as validation , and train datasets provided to you. For instance patrol_with_targ contains an images and masks subdirectory. If you would like to the evaluation code on your validation data a copy of the it should be moved into sample_evaluation_data , and then the appropriate arguments changed to the function calls in the model_training.ipynb notebook. The notebook has examples of how to evaulate your model once you finish training. Think about the sourcing methods, and how the information provided in the evaluation sections relates to the final score. Then try things out that seem like they may work. Scoring To score the network on the Follow Me task, two types of error are measured. First the intersection over the union for the pixelwise classifications is computed for the target channel. In addition to this we determine whether the network detected the target person or not. If more then 3 pixels have probability greater then 0.5 of being the target person then this counts as the network guessing the target is in the image. We determine whether the target is actually in the image by whether there are more then 3 pixels containing the target in the label mask. Using the above the number of detection true_positives, false positives, false negatives are counted. How the Final score is Calculated The final score is the pixelwise average_IoU (n_true_positive/(n_true_positive+n_false_positive+n_false_negative)) on data similar to that provided in sample_evaulation_data Ideas for Improving your Score Collect more data from the sim. Look at the predictions think about what the network is getting wrong, then collect data to counteract this. Or improve your network architecture and hyperparameters. Obtaining a Leaderboard Score Share your scores in slack, and keep a tally in a pinned message. Scores should be computed on the sample_evaluation_data. This is for fun, your grade will be determined on unreleased data. If you use the sample_evaluation_data to train the network, it will result in inflated scores, and you will not be able to determine how your network will actually perform when evaluated to determine your grade. Experimentation: Testing in Simulation 1. Copy your saved model to the weights directory data/weights . 2. Launch the simulator, select Spawn People , and then click the Follow Me button. 3. Run the realtime follower script $ python follower.py my_amazing_model.h5 Note: If you'd like to see an overlay of the detected region on each camera frame from the drone, simply pass the pred_viz parameter to follower.py Udacity Robotics NanoDegree Program Write Up Report Follow Me Project Write Up by Dedi Networks Convolutional Neural Network (CNN) Convolutional neural network have architecture as image below: A Convolution Neural Network may have several layer which is each layer might capture a different level in the hierarchy of object. The first layer called as lowest level hierarchy, where CNN may classifies small parth of the image into simple shapes like horizontal and vertical linea and simple blobs of colors. The last layers tend to be highest level in the hierarchy and may classify more complex ideas like shapes and eventually full object like cars. All this convolutional layer also called as feature learning. The highest level of hierarchy or the last convolutional layer then connected with classification layer which is consist with fully connected layer and softmax . From this layer, the input would be classified as which object. CNN are usually used to classify object inside an image. Fully Convolutional Neural Network (FCN) CNN very useful for tackling tasks such as image classification, which just want to determine 'what' is the object in a image. But when we want to know 'where' is in the image a certain object, CNN would not work since fully connected layers remove any sense of spacial information. Therefore Fully Convolutional Network (FCN), will perform this task. A FCN is a CNN, which is the classification layer is replace with 1x1 convolution layer with a large receptive field and add with upscale layer which called as decoder. The purpose in here is to get the global context of the scene and enable us to get what are the object on image and their spatial information. The output of this network not only contain object classification but also the scene of segmentation. The structure of FCN is divide by two part that is encoder layer part which will extract feature from the image and decoder layer part which will upscale the output of the encoder so the output will have the original size of the image. This two part connected with 1x1 convolution layer. a 1x1 convolution simply maps an input pixel with all its channel to an output pixel, not looking at anything around itself. It is often used to reduce the number of depth channels, since it is often very slow to multiply volumes with extremely large depths. When we convert our last fully connected (FC) layer of the CNN to a 1x1 convolutional layer we choose our new conv layer to be big enough so that it will enable us to have this localization effect scaled up to our original input image size then activate pixels to indicate objects and their approximate locations in the scene as shown in above figure. replacement of fully connected layers with convolutional layers presents an added advantage that during inference (testing your model), you can feed images of any size into your trained network. In GoogLeNet architecture, 1x1 convolution is used for two purposes To make network deep by adding an inception module like Network in network paper To reduce the dimensions inside this inception module Here is the screenshot from the paper, which elucidates above points: It can be seen from the image on the right, that 1x1 convolutions (in yellow), are specially used before 3x3 and 5x5 convolution to reduce the dimensions. It should be noted that a two step convolution operation can always to combined into one, but in this case and in most other deep learning networks, convolutions are followed by non linear activation and hence convolutions are no longer linear operators and cannot be combined. Image without Skip Connections: Everytime we do convolution (down sampling), we are facing one problem with this approach that is we lose some information; we keep the smaller picture (the local context) and lose the bigger picture (the global context) for example if we are using max pooling to reduce the size of the input, and allow the neural network to focus on only the most important elements. Max pooling does this by only retaining the maximum value for each filtered area, and removing the remaining values. To solve this problem we also get some activation from previous layers and sum/interpolate them together. This process is called skip from the creators of this algorithm. Those up sampling operations used on skip are also learn able. Below we show the effects of this skip process, notice how the resolution of the segmentation improves after some skips Data Collection In this learning project, I didn't record train, validation and sample_evaluation_data data from quadcopter simulator. I used train and validation data from link above to get weight from the network model that I have design. Data Set 1 Folder Content /data/train 4,131 images + 4,131 masks /data/validation 1,184 images + 1,184 masks /data/sample_evalution_data/following_images 542 images + 542 masks /data/sample_evalution_data/patrol_non_targ 270 images + 270 masks /data/sample_evalution_data/patrol_with_targ 322 images + 322 masks FCN Layers In this project, there are seven layers to build a fully convolutional networks (FCN). Three layers for encoder, one layers as one by one convolutional matrix and another three layers as decoder block. See image below: The first layer is encoder block layer which have input from image. This layer have filter width 32 and 2 strides. The second layer is encoder block layer which have input from first layer. This layer have filter width 64 and 2 strides. The third layer is encoder block layer which have input from second layer. This layer have filter with width 128 and build from 2 strides. The fourth layer is 1x1 convolution layer using convolution 2D batch normalize. This layer build with filter width 256, and kernel 1x1 with 1 strides. The fifth layer is decoder block layer which have input from 1x1 convolution layer and skip connection from second layer. The sixth layer is decoder block layer which have input from fifth layer and skip connection from the first layer. The dimension of this layer is same with first layer. The last layer is decoder block layer which have input from sixth layer and skip connection from input image. The last layer is an output layer of FCN. This layer have the dimension as same as input image. Explanations of how to build the code for FCN Design above would be explain in Build the Model section below Build The Model Separable convolution layer: The Encoder for FCN require separable convolution layers. The 1x1 convolution layer in the FCN, however, is a regular convolution. Implementations for both are provided below for your use. Each includes batch normalization with the ReLU activation function applied to the layers. python def separable_conv2d_batchnorm(input_layer, filters, strides 1): output_layer SeparableConv2DKeras(filters filters,kernel_size 3, strides strides, padding 'same', activation 'relu')(input_layer) output_layer layers.BatchNormalization()(output_layer) return output_layer def conv2d_batchnorm(input_layer, filters, kernel_size 3, strides 1): output_layer layers.Conv2D(filters filters, kernel_size kernel_size, strides strides, padding 'same', activation 'relu')(input_layer) output_layer layers.BatchNormalization()(output_layer) return output_layer Bilinear Upsampling The following helper function implements the bilinear upsampling layer. Upsampling by a factor of 2 is generally recommended, but you can try out different factors as well. Upsampling is used in the decoder block of the FCN. python def bilinear_upsample(input_layer): output_layer BilinearUpSampling2D((2,2))(input_layer) return output_layer TODO Code Encoder Block Create an encoder block that includes a separable convolution layer using the separable_conv2d_batchnorm() function. The filters parameter defines the size or depth of the output layer. python def encoder_block(input_layer, filters, strides): TODO Create a separable convolution layer using the separable_conv2d_batchnorm() function. output_layer separable_conv2d_batchnorm(input_layer, filters, strides) return output_layer Decoder Block The decoder block is comprised of three parts: A bilinear upsampling layer using the upsample_bilinear() function. The current recommended factor for upsampling is set to 2. A layer concatenation step. This step is similar to skip connections. You will concatenate the upsampled small_ip_layer and the large_ip_layer. Some (one or two) additional separable convolution layers to extract some more spatial information from prior layers. python def decoder_block(small_ip_layer, large_ip_layer, filters): TODO Upsample the small input layer using the bilinear_upsample() function. upsample_small_ip_layer bilinear_upsample(small_ip_layer) TODO Concatenate the upsampled and large input layers using layers.concatenate output_layer layers.concatenate( upsample_small_ip_layer, large_ip_layer ) TODO Add some number of separable convolution layers output_layer separable_conv2d_batchnorm( output_layer, filters, strides 1) output_layer separable_conv2d_batchnorm( output_layer, filters, strides 1) return output_layer The FCN Model Now that you have the encoder and decoder blocks ready, go ahead and build your FCN architecture! There are three steps: Add encoder blocks to build the encoder layers. This is similar to how you added regular convolutional layers in your CNN lab. Add a 1x1 Convolution layer using the conv2d_batchnorm() function. Remember that 1x1 Convolutions require a kernel and stride of 1. Add decoder blocks for the decoder layers. python def fcn_model(inputs, num_classes): TODO Add Encoder Blocks. Remember that with each encoder layer, the depth of your model (the number of filters) increases. layer01 encoder_block(inputs , filters 32 , strides 2) layer02 encoder_block(layer01, filters 64 , strides 2) layer03 encoder_block(layer02, filters 128, strides 2) TODO Add 1x1 Convolution layer using conv2d_batchnorm(). layer04 conv2d_batchnorm(layer03, filters 256, kernel_size 1, strides 1) TODO: Add the same number of Decoder Blocks as the number of Encoder Blocks layer05 decoder_block(layer04, layer02, filters 128 ) layer06 decoder_block(layer05, layer01, filters 64 ) layer07 decoder_block(layer06, inputs , filters 32 ) The function returns the output layer of your model. layer07 is the final layer obtained from the last decoder_block() outputs layers.Conv2D(num_classes, 1, activation 'softmax', padding 'same')(layer07) print( Outputs shape: ,outputs.shape, \tOutput Size in Pixel ) return outputs Training Training is my bigest problems. I have facing two problem in AWS account, the first one is AWS reject my request increasing EC2 instance p2.xlarge and the second ones is AWS facing problem when send my promotion code for initial balance by mail. Thanks for support dashboard in AWS, my complain had been approved at 18 June for initial credit. And at 20 June AWS approved for increasing limit in p2.xlarge when I reopen the case. Sadly when all my request had been approved, I moved to my village with lower internet connectivity speed therefore I used my laptop for trained the model. Spesification of my laptop you can see at above explanations. To increase the speed of training the model in my laptop, I install the following software and library: Latest NVIDIA Driver 398.11 CUDA v9.0 CuDNN v7.1 Hyperparameters Batch Size Number of training samples/images that get propagated through the network in a single pass. In this training we used batch_size with value 32. python learning_rate 0.001 batch_size 32 Learning rate number I had selected is 0.001. I select this lowest number because the first one I select is 0.05 with failure final score 33%. Epochs Number of Epochs Number of times the entire training dataset gets propagated through the network. In this training we choose the total number of epochs were 20. Step Each Epoch Number of batches of training images that go through the network in 1 epoch. One recommended value to try would be based on the total number of images in training dataset divided by the batch_size. Total number in training data set is 4131 images divided by 32 with the result is 129. We select step each epoch is 200. python num_epochs 20 steps_per_epoch 200 validation_steps 50 workers 2 This training is very hard because I need almost 24 hour to get my model training result get finished. Below is my plotting training loss and validation loss using 20 epochs. Each epoch required 4114 second, therefore 20 epochs would be need about 82280 second or 22.8 hour. Detail graphics you can see in model_training.html Prediction Now that you have your model trained and saved, you can make predictions on your validation dataset. These predictions can be compared to the mask images, which are the ground truth labels, to evaluate how well your model is doing under different conditions. There are three different predictions available from the helper code provided: patrol_with_targ: Test how well the network can detect the hero from a distance. patrol_non_targ: Test how often the network makes a mistake and identifies the wrong person as the target. following_images: Test how well the network can identify the target while following them. Patrol with Target Patrol without Target Patrol with Target while following them Evaluation Evaluate our model! The following cells include several different scores to help you evaluate your model under the different conditions discussed during the Prediction step. Scores for while the quad is following behind the target. number of validation samples intersection over the union evaulated on 542 average intersection over union for background is 0.9944914007764788 average intersection over union for other people is 0.3256942366738677 average intersection over union for the hero is 0.9125996469040777 number true positives: 539, number false positives: 0, number false negatives: 0 Scores for images while the quad is on patrol and the target is not visable number of validation samples intersection over the union evaulated on 270 average intersection over union for background is 0.981193497537517 average intersection over union for other people is 0.6976223997700709 average intersection over union for the hero is 0.0 number true positives: 0, number false positives: 52, number false negatives: 0 This score measures how well the neural network can detect the target from far away number of validation samples intersection over the union evaulated on 322 average intersection over union for background is 0.995441234028656 average intersection over union for other people is 0.40741392661423764 average intersection over union for the hero is 0.19374283779449622 number true positives: 118, number false positives: 0, number false negatives: 183 Sum all the true positives, etc from the three datasets to get a weight for the score 0.7365470852017937 The IoU for the dataset that never includes the hero is excluded from grading 0.553171242349 And the final grade score is 0.40743666617 Test the model that have been created in the quadcopter simulator The model weights selected is model_weights_new that have final score 40.74, to run this model weight bash >python follower.py model_weights_new Simulation Video Future Enhancement For future enhancement, there are several thing that need to be improved to increased final model score and accuracy in simulator that is: 1. Increased Data Training. In this project I didn't record any data train manually except I used data train that provide by udacity. To get more data train, I must add data train that provided by udacity with data train that I will collecting manually. 2. Decreased Time Training I have training the model using my laptop which have a standard graphical card which give me about 22.8 hour to finish model training. Because my EC2 Instances limit increase request have been approved by AWS, I will train my model using AWS Services. 3. Change Hyperparameters In this project, I used epochs and steps_per_epoch limited to get passing required scores. I need to increase the number of epochs to increase my model final scores. References: Network in Networks Paper : arxiv.org/pdf/1312.4400v3.pdf https://leonardoaraujosantos.gitbooks.io/artificial inteligence/content/image_segmentation.html",Image Classification,Image Classification 2055,Computer Vision,Computer Vision,Computer Vision,"CleverHans (latest release: v3.0.1) Build Status Documentation Status This repository contains the source code for CleverHans, a Python library to benchmark machine learning systems' vulnerability to adversarial examples . You can learn more about such vulnerabilities on the accompanying blog . The CleverHans library is under continual development, always welcoming contributions of the latest attacks and defenses. In particular, we always welcome help towards resolving the issues currently open. Major updates coming to CleverHans CleverHans will soon support 3 frameworks: JAX, PyTorch, and TF2. The package itself will focus on its initial principle: reference implementation of attacks against machine learning models to help with benchmarking models against adversarial examples. This repository will also contain two folders: tutorials/ for scripts demonstrating the features of CleverHans and defenses/ for scripts that contain authoritative implementations of defenses in one of the 3 supported frameworks. The structure of the future repository will look like this: cleverhans/ jax/ attacks/ ... tf2/ attacks/ ... torch/ attacks/ ... defenses/ jax/ ... tf2/ ... torch/ ... tutorials/ jax/ ... tf2/ ... torch/ ... In the meanwhile, all of these folders can be found in the correspond future/ subdirectory (e.g., cleverhans/future/jax/attacks or defenses/future/jax/ ). A public milestone has been created to track the changes that are to be implemented before the library version is incremented to v4. Setting up CleverHans Dependencies This library uses TensorFlow to accelerate graph computations performed by many machine learning models. Therefore, installing TensorFlow is a pre requisite. You can find instructions here . For better performance, it is also recommended to install TensorFlow with GPU support (detailed instructions on how to do this are available in the TensorFlow installation documentation). Installing TensorFlow will take care of all other dependencies like numpy and scipy . Installation Once dependencies have been taken care of, you can install CleverHans using pip or by cloning this Github repository. pip installation If you are installing CleverHans using pip , run the following command after installing TensorFlow: pip install cleverhans This will install the last version uploaded to Pypi . If you'd instead like to install the bleeding edge version, use: pip install git+ Installation for development If you want to make an editable installation of CleverHans so that you can develop the library and contribute changes back, first fork the repository on GitHub and then clone your fork into a directory of your choice: git clone You can then install the local package in editable mode in order to add it to your PYTHONPATH : cd cleverhans pip install e . Currently supported setups Although CleverHans is likely to work on many other machine configurations, we currently test it it with Python 3.5 and TensorFlow {1.8, 1.12} on Ubuntu 14.04.5 LTS (Trusty Tahr). Support for Python 2.7 is deprecated. CleverHans 3.0.1 supports Python 2.7 and the master branch is likely to continue to work in Python 2.7 for some time, but we no longer run the tests in Python 2.7 and we do not plan to fix bugs affecting only Python 2.7 after 2019 07 04. Support for TensorFlow prior to 1.12 is deprecated. Backwards compatibility wrappers for these versions may be removed after 2019 07 07, and we will not fix bugs for those versions after that date. Support for TensorFlow 1.7 and earlier is already deprecated: we do not fix bugs for those versions and any remaining wrapper code for those versions may be removed without further notice. Getting support If you have a request for support, please ask a question on StackOverflow rather than opening an issue in the GitHub tracker. The GitHub issue tracker should only be used to report bugs or make feature requests. Contributing Contributions are welcomed! To speed the code review process, we ask that: New efforts and features be coordinated on the mailing list for CleverHans development: cleverhans dev@googlegroups.com . When making code contributions to CleverHans, you follow the PEP8 with two spaces coding style (the same as the one used by TensorFlow) in your pull requests. In most cases this can be done by running autopep8 i indent size 2 on the files you have edited. You can check your code by running nosestests cleverhans/devtools/tests/test_format.py or check an individual file by running pylint from inside the cleverhans repository root directory. When making your first pull request, you sign the Google CLA We do not accept pull requests that add git submodules because of the problems that arise when maintaining git submodules Bug fixes can be initiated through Github pull requests. Scripts: scripts directory The scripts directory contains command line utilities. In many cases you can use these to run CleverHans functionality on your saved models without needing to write any of your own Python code. You may want to set your .bashrc / .bash_profile file to add the CleverHans scripts directory to your PATH environment variable so that these scripts will be conveniently executable from any directory. Tutorials: cleverhans_tutorials directory To help you get started with the functionalities provided by this library, the cleverhans_tutorials/ folder comes with the following tutorials: MNIST with FGSM ( code (cleverhans_tutorials/mnist_tutorial_tf.py)): this tutorial covers how to train a MNIST model using TensorFlow, craft adversarial examples using the fast gradient sign method , and make the model more robust to adversarial examples using adversarial training. MNIST with FGSM using Keras ( code (cleverhans_tutorials/mnist_tutorial_keras_tf.py)): this tutorial covers how to define a MNIST model with Keras and train it using TensorFlow, craft adversarial examples using the fast gradient sign method , and make the model more robust to adversarial examples using adversarial training. MNIST with JSMA ( code (cleverhans_tutorials/mnist_tutorial_jsma.py)): this second tutorial covers how to define a MNIST model with Keras and train it using TensorFlow and craft adversarial examples using the Jacobian based saliency map approach . MNIST using a black box attack ( code (cleverhans_tutorials/mnist_blackbox.py)): this tutorial implements the black box attack described in this paper . The adversary train a substitute model: a copy that imitates the black box model by observing the labels that the black box model assigns to inputs chosen carefully by the adversary. The adversary then uses the substitute model’s gradients to find adversarial examples that are misclassified by the black box model as well. NOTE: the tutorials are maintained carefully, in the sense that we use continuous integration to make sure they continue working. They are not considered part of the API and they can change at any time without warning. You should not write 3rd party code that imports the tutorials and expect that the interface will not break. Only the main library is subject to our six month interface deprecation warning rule. NOTE: please write to cleverhans dev@googlegroups.com before writing a new tutorial. Because each new tutorial involves a large amount of duplicated code relative to the existing tutorials, and because every line of code requires ongoing testing and maintenance indefinitely, we generally prefer not to add new tutorials. Each tutorial should showcase an extremely different way of using the library. Just calling a different attack, model, or dataset is not enough to justify maintaining a parallel tutorial. Examples : examples directory The examples/ folder contains additional scripts to showcase different uses of the CleverHans library or get you started competing in different adversarial example contests. We do not offer nearly as much ongoing maintenance or support for this directory as the rest of the library, and if code in here gets broken we may just delete it without warning. List of attacks You can find a full list attacks along with their function signatures at cleverhans.readthedocs.io Reporting benchmarks When reporting benchmarks, please: Use a versioned release of CleverHans. You can find a list of released versions here . Either use the latest version, or, if comparing to an earlier publication, use the same version as the earlier publication. Report which attack method was used. Report any configuration variables used to determine the behavior of the attack. For example, you might report We benchmarked the robustness of our method to adversarial attack using v3.0.1 of CleverHans. On a test set modified by the FastGradientMethod with a max norm eps of 0.3, we obtained a test set accuracy of 71.3%. Citing this work If you use CleverHans for academic research, you are highly encouraged (though not required) to cite the following paper : @article{papernot2018cleverhans, title {Technical Report on the CleverHans v2.1.0 Adversarial Examples Library}, author {Nicolas Papernot and Fartash Faghri and Nicholas Carlini and Ian Goodfellow and Reuben Feinman and Alexey Kurakin and Cihang Xie and Yash Sharma and Tom Brown and Aurko Roy and Alexander Matyasko and Vahid Behzadan and Karen Hambardzumyan and Zhishuai Zhang and Yi Lin Juang and Zhi Li and Ryan Sheatsley and Abhibhav Garg and Jonathan Uesato and Willi Gierke and Yinpeng Dong and David Berthelot and Paul Hendricks and Jonas Rauber and Rujun Long}, journal {arXiv preprint arXiv:1610.00768}, year {2018} } About the name The name CleverHans is a reference to a presentation by Bob Sturm titled “Clever Hans, Clever Algorithms: Are Your Machine Learnings Learning What You Think? and the corresponding publication, A Simple Method to Determine if a Music Information Retrieval System is a 'Horse'. Clever Hans was a horse that appeared to have learned to answer arithmetic questions, but had in fact only learned to read social cues that enabled him to give the correct answer. In controlled settings where he could not see people's faces or receive other feedback, he was unable to answer the same questions. The story of Clever Hans is a metaphor for machine learning systems that may achieve very high accuracy on a test set drawn from the same distribution as the training data, but that do not actually understand the underlying task and perform poorly on other inputs. Authors This library is managed and maintained by Ian Goodfellow (Google Brain) and Nicolas Papernot (Google Brain). The following authors contributed 100 lines or more (ordered according to the GitHub contributors page): Ian Goodfellow (Google Brain) Nicolas Papernot (Google Brain) Nicholas Carlini (Google Brain) Fartash Faghri (University of Toronto) Tzu Wei Sung (National Taiwan University) Alexey Kurakin (Google Brain) Reuben Feinman (New York University) Phani Krishna (Video Analytics Lab) David Berthelot (Google Brain) Tom Brown (Google Brain) Cihang Xie (Johns Hopkins) Yash Sharma (The Cooper Union) Aashish Kumar (HARMAN X) Aurko Roy (Google Brain) Alexander Matyasko (Nanyang Technological University) Anshuman Suri (Microsoft) Yen Chen Lin (MIT) Vahid Behzadan (Kansas State) Jonathan Uesato (DeepMind) Haojie Yuan (University of Science & Technology of China) Zhishuai Zhang (Johns Hopkins) Karen Hambardzumyan (YerevaNN) Jianbo Chen (UC Berkeley) Catherine Olsson (Google Brain) Aidan Gomez (University of Oxford) Zhi Li (University of Toronto) Yi Lin Juang (NTUEE) Pratyush Sahay (formerly HARMAN X) Abhibhav Garg (IIT Delhi) Aditi Raghunathan (Stanford University) Yang Song (Stanford University) Riccardo Volpi (Italian Institute of Technology) Angus Galloway (University of Guelph) Yinpeng Dong (Tsinghua University) Willi Gierke (Hasso Plattner Institute) Bruno López Jonas Rauber (IMPRS) Paul Hendricks (NVIDIA) Ryan Sheatsley (Pennsylvania State University) Rujun Long (0101.AI) Bogdan Kulynych (EPFL) Erfan Noury (UMBC) Robert Wagner (Case Western Reserve University) Copyright Copyright 2019 Google Inc., OpenAI and Pennsylvania State University.",Image Classification,Image Classification 2087,Computer Vision,Computer Vision,Computer Vision,"neural_nets This repository contains ipython notebooks which walk through the steps of building artificial neural network models. The comments will be more detailed than most examples found on the web to faciliate learning. A list of notebooks with brief descriptions is provided below. Content will be updated regularly. Thanks are due to Facebook for creating and making pytorch public, Udacity and Facebook for teaching everyone how to use it, and Google for the free Google Colab gpu time. I love that deep learning is now more accessible than ever. income_model_binary_nn.ipynb This notebook builds a neural net version of the model from the income_model_binary.ipynb notebook in the gradient_boosting repository. It is a very basic model and is meant only to be instructive. Some references are made to the boosted tree model previously built. transfer_learning_image.ipynb This notebook exploits transfer learning to build an image classifier. The well known CIFAR 10 data set and Inception V3 are used.",Image Classification,Image Classification 2109,Computer Vision,Computer Vision,Computer Vision,"Maxout Networks (Ian Goodfellow, Yoshua Bengio 2013) Maxout Networks TensorFlow implementation presented in How to run the MNIST experiment? bash make sure Tensorflow is installed. git clone git@github.com:philipperemy/tensorflow maxout.git maxout && cd maxout python mnist_maxout_example.py MAXOUT Can pick up from one of those values: LINEAR, RELU, MAXOUT. How to integrate it in your code It's two lines of code. Sorry I can't make it shorter. python from maxout import max_out y tf.matmul(x, W1) + b1 t max_out(y, num_units 50) Some Results on MNIST dataset Those results are not meant to reproduce the results of the paper. It's more about showing on how to use the maxout non linearity in the Tensorflow graphs. Loss As expected, Maxout strictly outperforms Sigmoid and ReLU. Having one hidden layer + non linearity helps to have a smaller loss. Accuracy Model Accuracy (100 epochs) : : MLP Hidden MaxOut 0.9730 MLP Hidden ReLU 0.9704 MLP Hidden Sigmoid 0.9353 MLP Linear 0.9214",Image Classification,Image Classification 2115,Computer Vision,Computer Vision,Computer Vision,MATIC MATIC stands for Multiband Astronomical TIme series Classifier Fully convolutional architecture,Image Classification,Image Classification 2119,Computer Vision,Computer Vision,Computer Vision,"This code is outdated, please find a new version here: iNNvestigate PatternNet, PatternLRP and more Introduction PatternNet and PatternLRP are methods that help to interpret decision of non linear neural networks. They are in a line with the methods DeConvNet, GuidedBackprop and LRP: ! An overview of the different explanation methods. and improve on them: ! Different explanation methods on ImageNet. For more details we refer to the paper: PatternNet and PatternLRP Improving the interpretability of neural networks Pieter Jan Kindermans, Kristof T. Schütt, Maximilian Alber, Klaus Robert Müller, Sven Dähne If you use this code please cite the following paper: TODO: Add link to SW paper. Installation To install the code, please clone the repository and run the setup script: bash git clone cd nn patterns python setup.py install Usage and Examples Explaining All the presented methods have in common that they try to explain the output of a specific neuron with respect to input to the neural network. Typically one explains the neuron with the largest activation in the output layer. Now given the output layer 'output_layer' of a Lasagne network, one can create an explainer by: python import nn_patterns output_layer create_a_lasagne_network() pattern load_pattern() explainer nn_patterns.create_explainer( patternnet , output_layer, patterns patterns) and explain the influence of the neural networks input on the output neuron by: python explanation explainer.explain(input) The following explanation methods are available: function : gradient: The gradient of the output neuron with respect to the input. signal : deconvnet: DeConvNet guided: Guided BackProp patternnet: PatternNet interaction : patternlrp: PatternLRP lrp.z: LRP The pattern parameter is only necessary for PatternNet and PatternLRP. The available options to select the target neuron are: max_output (default): the neuron with the largest activation. integer i: always the neuron at position i. None: take the activation of the last layer as they are. This results in a superposition of explanations. Computing patterns The methods PatternNet and PatternLRP are based on so called patterns that are network and data specific and need to be computed. Given a training set X and a desired batch_size this can be done in the following way: python import nn_patterns.pattern computer nn_patterns.pattern.CombinedPatternComputer(output_layer) patterns computer.compute_patterns(X, batch_size batch_size) Examples In the directory examples one can find different examples as Python scripts and as Jupyter notebooks: step_by_step_cifar10 (): explains how to compute patterns for a given neural networks and how to use them with PatternNet and PatternLRP. step_by_step_imagenet : explains how to apply pre computed patterns for the VGG16 network on ImageNet. all_methods : shows how to use the different methods with VGG16 on ImageNet, i.e. the reproduce the explanation grid above.",Image Classification,Image Classification 2120,Computer Vision,Computer Vision,Computer Vision,squeeze_excitation_keras implementation of squeze excitation moduels in keras SE module can be used after any 2D convolutional layer,Image Classification,Image Classification 2126,Computer Vision,Computer Vision,Computer Vision,"Emotion Recognition Nobel laureate Herbert Simon wrote, In order to have anything like a complete theory of human rationality, we have to understand what role emotion plays in it. Industries are rapidly changing with the rapid growth of Artifical Intelligence. When it comes to understand human decisions, there are various factors that we have to take into consideration and one of them is emotion. Emotions can act as a bias to many of our day to day decision making. In this project, I attempt to use the 'Chicago Faces Database' to identify one of 4 emotions: Happy, Fear, Anger, and Neutral. The 'Chicago Faces Database' is unbalanced. The approximate distribution of models is shown below: Neutral 50% Happy 25% Fear 12.5% Anger 12.5% As a solution to this distribution, I trained to models which are the 'NeutralModel' and the 'EmotionModel'. The NeutralModel predicts whether the emotion shown is Neutral or not. If the model proves negative in the NeutralModel, the image is sent to the EmotionModel which predicts the specific emotion. An attempt of dividing the EmotionModel into a 'HappyModel' and a 'FearOrAngerModel' will be tested to check if it performs better. The current model acts as a 'binary > categorical' sequence. Dividing the EmotionModel will be like having a DecisionTree. The model is trained using TensorFlow GPU. The validation set is 20% of the original data. The NeutralModel performed with a peak validation accuracy of 79%. The EmotionModel performed with a peak validation accuracy of 84%. Demo below shows the sample performance of the the past model. ! (src/demo.gif) Getting Started These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system. Prerequisites Make sure you have the following installed (or the latest version): Python 3.6 Numpy 1.15.4 Pandas 0.23.4 OpenCV2 3.4.4 Tensorflow 1.12.0 Installing Simply fork notebook into your local directory. Deployment Assuming you have all necessary modules installed. Through your command prompt, move to the local repository and run the command: python GrabScreen.py A window would open mirroring a portion of your screen. Simply move the image over that portion of the screen and the predicted emotion is shown on the upper left. For best performance, let the face occupy the entire window. Authors Prince Mallari Acknowledgments Prudhvi Raj Dachapally University of Chicago Center for Decision Research Ian J. Goodfellow & David Warde Farley Mehdi Mirza Aaron Courville Yoshua Bengio Harrison Kinsley",Image Classification,Image Classification 2127,Computer Vision,Computer Vision,Computer Vision,"Emotion Recognition This is an attempt to detect emotion based on facial features. The model determines whether the person is Happy, Angry, Sad, Disgusted, Afraid, Surprised or Neutral. The model has three convolutional layers connected to three fully connected layers. The model is trained on a dataset of 4172 images. Currently, the model is deployed via screen grab which detects a portion of the screen and takes it as input. ! (demo.gif) Getting Started These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system. Prerequisites Make sure you have the following installed (or the latest version): Python 3.6 Numpy 1.15.4 Pandas 0.23.4 OpenCV2 3.4.4 Torch 1.0.0 TorchVision 0.2.1 Installing Simply fork notebook into your local directory. Deployment Assuming you have all necessary modules installed. Through your command prompt, move to the local repository and run the command: python GrabScreen.py A window would open mirroring a portion of your screen. Simply move the image over that portion of the screen and the predicted emotion is shown on the upper left. For best performance, let the face occupy the entire window. Authors Prince Mallari Acknowledgments Prudhvi Raj Dachapally Ian J. Goodfellow & David Warde Farley Mehdi Mirza Aaron Courville Yoshua Bengio Harrison Kinsley",Image Classification,Image Classification 2132,Computer Vision,Computer Vision,Computer Vision,"SHIELD: Fast, Practical Defense and Vaccination for Deep Learning using JPEG Compression Overview This is the code repository for the KDD 2018 Applied Data Science paper: SHIELD: Fast, Practical Defense and Vaccination for Deep Learning using JPEG Compression . Visit our research group homepage Polo Club of Data Science at Georgia Tech for more related research! The code included here reproduces our techniques (e.g. SLQ) presented in the paper, and also our experiment results reported, such as using various JPEG compression qualities to remove adversarial perturbation introduced by Carlini Wagner L2, DeepFool, I FSGM, and FSGM. SHIELD overview YouTube video (readme/shield youtube thumbnail.jpg) Research Abstract The rapidly growing body of research in adversarial machine learning has demonstrated that deep neural networks (DNNs) are highly vulnerable to adversarially generated images. This underscores the urgent need for practical defense that can be readily deployed to combat attacks in real time. Observing that many attack strategies aim to perturb image pixels in ways that are visually imperceptible, we place JPEG compression at the core of our proposed SHIELD defense framework, utilizing its capability to effectively compress away such pixel manipulation. To immunize a DNN model from artifacts introduced by compression, SHIELD vaccinates a model by re training it with compressed images, where different compression levels are applied to generate multiple vaccinated models that are ultimately used together in an ensemble defense. On top of that, SHIELD adds an additional layer of protection by employing randomization at test time that compresses different regions of an image using random compression levels, making it harder for an adversary to estimate the transformation performed. This novel combination of vaccination, ensembling, and randomization makes SHIELD a fortified, multi pronged defense. We conducted extensive, large scale experiments using the ImageNet dataset, and show that our approaches eliminate up to 94% of black box attacks and 98% of gray box attacks delivered by the recent, strongest techniques, such as Carlini Wagner's L2 and DeepFool. Our approaches are fast and work without requiring knowledge about the model. Installation and Setup Clone Repository To clone this repository using git , simply run the following command: bash git clone Install Dependencies This repository uses attacks from the CleverHans library, and the models are adapted from tf slim . We also use Sacred to keep track of the experiments. All dependencies for this repository can be found in requirements.txt . To install these dependencies, run the following command from the jpeg defense directory: bash pip install r requirements.txt Setup ImageNet Dataset The code expects the ImageNet validation dataset to be available in TFRecord format in the data/validation directory. To provision the data, we have provided a script ( setup/get_imagenet.py ) that downloads, processes, and saves the entire ImageNet dataset in the required format. This script can be run from the setup directory in the following manner: bash python get_imagenet.py local_scratch_dir /path/to/jpeg defense/data Downloading the entire dataset from the ImageNet website using this script may be very slow. Optionally, we recommend downloading the ILSVRC2012_img_train.tar and ILSVRC2012_img_val.tar using Academic Torrents , and placing these files into the data/raw_data directory. Then, you can run the following command to skip downloading the dataset and proceed with converting the data into TFRecord format: bash python get_imagenet.py \ local_scratch_dir /path/to/jpeg defense/data \ provision_only True Download Pre trained Model Weights This repository currently supports the ResNet50 v2 and Inception v4 models from tf slim . Running the following command from the jpeg defense directory will download the pre trained .ckpt files for these models into the data/checkpoints folder using the provided setup/get_model_checkpoints.sh script: bash bash setup/get_model_checkpoints.sh data/checkpoints Example Usage The main.py script in the shield package can be used to perform all the experiments using the perform attack defend evaluate flags. attack Attacks the specified model with the specified method and its parameters (see shield/opts.py ). bash python main.py with \ perform attack \ model resnet_50_v2 \ attack fgsm \ attack_options {'eps': 16} defend Defends the specified attacked images with the specified defense and its parameters (see shield/opts.py ). The defense uses the attack parameters only to determine which images are loaded for preprocessing, as these parameters are not used by the preprocessing itself. bash python main.py with \ perform defend \ model resnet_50_v2 \ attack fgsm \ attack_options {'eps': 16} \ defense jpeg \ defense_options {'quality': 80} evaluate Evaluates the specified model with the specified attacked/defended version of the images. bash python main.py with \ perform evaluate \ model resnet_50_v2 \ attack fgsm \ attack_options {'eps': 16} Video Demo YouTube video demo (readme/shield demo youtube thumbnail.jpg) Paper PDF on arXiv Paper PDF on arXiv Citation SHIELD: Fast, Practical Defense and Vaccination for Deep Learning using JPEG Compression. Nilaksh Das, Madhuri Shanbhogue, Shang Tse Chen, Fred Hohman, Siwei Li, Li Chen, Michael E. Kounavis, Duen Horng Chau. ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) 2018 . London, UK. Aug 19 23, 2018. BibTeX @article{das2018shield, title {SHIELD: Fast, Practical Defense and Vaccination for Deep Learning using JPEG Compression}, author {Das, Nilaksh and Shanbhogue, Madhuri and Chen, Shang Tse and Hohman, Fred and Li, Siwei and Chen, Li and Kounavis, Michael E and Chau, Duen Horng}, booktitle {Proceedings of the 24nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining}, year {2018}, organization {ACM} } Researchers Name Affiliation Nilaksh Das Georgia Tech Madhuri Shanbhogue Georgia Tech Shang Tse Chen Georgia Tech Fred Hohman Georgia Tech Siwei Li Georgia Tech Li Chen Intel Corporation Michael E. Kounavis Intel Corporation Polo Chau Georgia Tech",Image Classification,Image Classification 2146,Computer Vision,Computer Vision,Computer Vision,"Grey scale Image Classification using KERAS Disclaimer This is a research project submitted for credit for a course that we just completed. The results seen here are subjective and should not be considered as final or accurate. 1. Introduction In this study, we try to understand the limits of our system when running a Deep Learning training. The step to train a model is the most time consuming step of the model building process. With contraints put on the hardware, what can we do on the programming side to help us train models better? What if you had a limited amount of time? To try out our hand at augmentation, we will be using the Flickr27 dataset. 2. Objective Our objectives will include: 1. Can we run at least 3 training sessions in a working day (8 hours)? (simple model, little data) 2. Can we achieve a target accuracy of at least 90%? (complex model, more data) 3. What is the amount of time the model takes for prediction? (simple model) These objectives helps us limiting the amount of data we can process and the complexity of the model we can run. Trading of one for the other might help us understand which would provide better value in the long run. 3. Learning so far ... Let us view some of the points that we have to consider when working with Deep Learning models. 1. The number of trainable parameters. Each layer in the model would add more capabilities to the model and possibly help in detecting more features but at the same time would increase the model complexity and therefore take more time to run. 2. The number of images that the model uses for training and validation. The more (and different) data we have, the model would be able to generalize more accurately. However, running large datasets will make the model run much longer. 3. The number of epochs we need to reach an acceptable accuracy. The more time, the more accurate. Sometimes to the point of memorizing. A model which reaches its target accuracy in 10 epochs would suggest that the model is very complex for the problem. This is in no way an exhaustive list but they do constitute some of the most important points that we have to keep in mind. Training on a CPU with the parameters provided above proved to be next to impossible. Using the inbuilt GPU improved the training time by a factor of 10 (minimum). Regarding the algorithm that we intend on using, we will be testing CNNs (Convolutional Neural Networks). Our intention is to test smaller custom architectures and then move to larger ones. While our code uses a modified version of the InceptionNet v3 architecture, we experimented with others as well and settled for the one with the best performance. 4. Software requirements In terms of software requirements, we will be using the following Python with Keras and Tensorflow GPU drivers and associated installations. Please refer to the link to check if and how to install the GPU Most of our programming will happen on Jupyter notebooks rather than python programs, as we will require to view output on a line by line execution level. Also Jupyter will help us format the code is a format that is presentable. Highly suggested (but not mandatory) is installing Anaconda. This will help you create separate environments in which you can execute your projects. Should any of the libraries that we use be upgraded or changed, the failure would be contained within the environment and would not affect all the other developments that you have 5. More about the dataset Flick27 is a collection of 27 classes (such as Apple, Google, McDonald's) with each class containing about 35 images. The rules of Flickr27 state that each picture will contain only one of the 27 logos which have been categorized.The dataset is already broken up into train (30 images) and test (5 images) sets. When training any model, we need to have a train and validation set, we therefore broke the train set into two sets: a train (24 images) and a validation (6 images). It is best that you put all the image files into sub folders whose names represent the class to which it belongs. The test set should not be used until you have acceptable training and validation accuracy. Each of the class, which had 24 original images, were augmented to 1920 images and the validation set which contained 6 images used similar rules and were augmented to 480 images. This means that we will have 5760 images for training and 1440 images for validation. This will be the start of our test and periodically, we shall reduce the number of images (augmentations) to help us understand the impact of lesser data. 6. System configuration Laptop with: Windows 10 Intel Core i5 (7th generation) 2.5 GHz NVIDIA GeForce 940MX with 2 GB dedicated VRAM 8 GB DDR4 Memory 1 TB HDD This is close to the minimum requirements necessary to run a small scale image classification project. Standard requirements would include: Intel Core i7 (7th gernation) NVIDIA GTX Series 970 16 GB DDR4 Memory and 1 TB HDD I would recommend that you look at Siraj's video that was posted on June 2018. Best Laptop for Machine Learning . And yes, I would highly recommend other videos on Machine Learning posted by Siraj. 7. Libraries used The following are the list of Python Libraries that have been used for this project. All of these libraries can be installed with basic 'pip' commands. 1. numpy 2. pandas 3. skimage 4. openCV (cv2) 5. keras 6. matplotlib pyplot 7. pathlib 8. h5py 9. os 10. scikitlearn 11. scipy NOTE: It is highly recommended that you install these libraries within your environment before you run the code files mentioned in section 7. Some of these may already be available with your current python distribution. 8. Basic code files Here is a list of the code files that were used and their functions: CreateModel.ipynb: Your first step is to create a model. There are two ways of creating models. You could import a model programmed in Keras directly (read this link for information on available models or you could create your own model. In this case, we will be creating our own model using InceptionV3 as the base. The reason in doing so is that most models work with RGB images only and not with Grey scale. There are a few variables that you will have to change: Number of channels the image has: 1 represents a Grey scale image, 3 represents a RGB (or HSV) image Number of classes: This is important as this will represent your final output layer. In this example, the value is set to 3 TrainModel.ipynb: The next step is to train your model. This step could be the most time consuming process. Remember that this will depend on the system and its configuration that is available. In this example, we ran 100 epochs, each of which took approximately 200 seconds. Notice that in the 67th epoch, we have a training accuracy of 99.97% and a validation accuracy of 98.33%. This does represent an over fitting problem but only very slightly. TestModel.ipynb: Finally, we use the trained model (with weights) and predicted classes for the images that we have in our validation set. The results are not as good as we expected. It was 13 correct predictions out of the 15 available, and this translated to 86.6% accuracy. This might also indicate that the model has started to memorize rather than generalize. 9. Conclusion With a 99.97% training, 98.33% validation, and a 86.66% test, this algorithm does show it is possible to create a highly accurate model with less data. Point of note here: The development of this model was for a very specific use case and may not work on all instances of the brand logo. We have found reasonable success during our tests in a very specific and controlled source of new data to test the predictions on. We cannot guarantee that we will get the same levels of accuracies on all instances of the logo in new scenarios. 10. What changes would we make? 1. Find a way to compare images and get a score of the similarity between them. This way we remove duplicates from our train and test sets, thus reducing the training time. This will also give us more space to perhaps even classify a fourth logo. 2. Change the algorithm to use RGB images instead of Grey scale images as lose features that are important when converting the images from RGB to Grey scale. 3. Find a method of checking what is being detected in the image that is used for prediction. This will help us understand the reasons behind why the classification goes wrong. 4. Detecting multiple logos in an image. For this algorithm to have any sort of meaningful revenue generation, our next steps should include methods of detecting (and classifying) multiple classes in one image and providing accuracy percentages for each of the detected classes along with bounding boxes. Citations, Credits, Sources, and References Filckr27 Y. Kalantidis, LG. Pueyo, M. Trevisiol, R. van Zwol, Y. Avrithis. Scalable Triangulation based Logo Recognition. In Proceedings of ACM International Conference on Multimedia Retrieval (ICMR 2011), Trento, Italy, April 2011. Design Guide for CNN: George Seif April 2018 Inception Net Design: Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, Zbigniew Wojna December 2015 Scene Classification with Inception 7: Christian Szegedy, Vincent Vanhoucke, Julian Ibarz Understanding how image quality affects Deep neural networks: Samuel Dodge, Lina Karam April 2016 Benchmarks for popular CNN models: Justin Johnson Tutorials on CNN: Stanford Education Why do deep convolutional networks generalize so poorly to small image transformations?: Aharon Azulay, Yair Weiss May 2018 How to Resize, Pad Image to Square Shape and Keep Its Aspect Ratio With Python: square with padding/ Jiedong Hao November 2017 Rotate images (correctly) with OpenCV and Python: and python/ Adrian Rosebrock January 2017 Understanding regularization for image classification and machine learning: regularization for image classification and machine learning/ Adrian Rosebrock September 2016 About the author(s) My name is Prem, and I am currently working as a freelance consultant specializing in SAP ABAP and Android. I have a total of 12 years of experience and have just completed a course in Machine Learning. My primary focus is Image Classification using Keras and Tensorflow. The learning I garner is generally task oriented. My colleagues on this project are Satyajit Nair and Vivek V. Krishnan .",Image Classification,Image Classification 2151,Computer Vision,Computer Vision,Computer Vision,"objectRecognition In this project, deployed a convolutional neural network (CNN) for object recognition. Please refer: Built this model using Keras, a high level neural network application programming interface (API) that supports both Theano and Tensorflow backends. You can use either backend; however, I will be using Theano. Steps: Import datasets from Keras Use one hot vectors for categorical labels Addlayers to a Keras model Load pre trained weights Make predictions using a trained Keras model The dataset we will be using is the CIFAR 10 dataset, which consists of 60,000 32x32 color images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images. Dataset :",Image Classification,Image Classification 2154,Computer Vision,Computer Vision,Computer Vision,"ResNeXt.pytorch Reproduces ResNet V3 (Aggregated Residual Transformations for Deep Neural Networks) with pytorch. x Trains on Cifar10 and Cifar100 x Upload Cifar Training Curves x Upload Cifar Trained Models x Pytorch 4.0 Train Imagenet Download bash git clone cd resnext.pytorch git checkout R4.0 R3.0 for backwards compatibility. Usage To train on Cifar 10 using 2 gpu: bash python train.py /DATASETS/cifar.python cifar10 s ./snapshots log ./logs ngpu 2 learning_rate 0.05 b 128 It should reach 3.65% on Cifar 10, and 17.77% on Cifar 100. After train phase, you can check saved model. Thanks to @AppleHolic we have now a test script: To test on Cifar 10 using 2 gpu: bash python test.py /DATASETS/cifar.python cifar10 ngpu 2 load ./snapshots/model.pytorch test_bs 128 Configurations From the original paper : cardinality base_width parameters Error cifar10 error cifar100 default : : : : : : : : : : : : 8 64 34.4M 3.65 17.77 x 16 64 68.1M 3.58 17.31 Update: widen_factor has been disentangled from base_width because it was confusing. Now widen factor is set to consant 4, and base_width is the same as in the original paper. Trained models and curves Link to trained models corresponding to the following curves: Update: several commits have been pushed after training the models in Mega, so it is recommended to revert to e10c37d8cf7a958048bc0f58cd86c3e8ac4e707d ! CIFAR 10 ! CIFAR 100 Other frameworks torch (@facebookresearch) . (Original) Cifar and Imagenet caffe (@terrychenism) . Imagenet MXNet (@dmlc) . Imagenet Cite @article{xie2016aggregated, title {Aggregated residual transformations for deep neural networks}, author {Xie, Saining and Girshick, Ross and Doll{\'a}r, Piotr and Tu, Zhuowen and He, Kaiming}, journal {arXiv preprint arXiv:1611.05431}, year {2016} }",Image Classification,Image Classification 2155,Computer Vision,Computer Vision,Computer Vision,"Tensorflow DenseNet with ImageNet Pretrained Models This is an Tensorflow implementation of DenseNet by G. Huang, Z. Liu, K. Weinberger, and L. van der Maaten with ImageNet pretrained models. The weights are converted from DenseNet Keras Models . The code are largely borrowed from TensorFlow Slim Models . Pre trained Models The top 1/5 accuracy rates by using single center crop (crop size: 224x224, image size: 256xN) Network Top 1 Top 5 Checkpoints : : : : : : : : DenseNet 121 (k 32) 74.91 92.19 model DenseNet 169 (k 32) 76.09 93.14 model DenseNet 161 (k 48) 77.64 93.79 model Usage Follow the instruction TensorFlow Slim Models . Step by step Example of training on flowers dataset. Downloading ans converting flowers dataset $ DATA_DIR /tmp/data/flowers $ python download_and_convert_data.py \ dataset_name flowers \ dataset_dir ${DATA_DIR} Training a model from scratch. $ DATASET_DIR /tmp/data/flowers $ TRAIN_DIR /tmp/train_logs $ python train_image_classifier.py \ train_dir ${TRAIN_DIR} \ dataset_name flowers \ dataset_split_name train \ dataset_dir ${DATASET_DIR} \ model_name densenet121 Fine tuning a model from an existing checkpoint $ DATASET_DIR /tmp/data/flowers $ TRAIN_DIR /tmp/train_logs $ CHECKPOINT_PATH /tmp/my_checkpoints/tf densenet121.ckpt $ python train_image_classifier.py \ train_dir ${TRAIN_DIR} \ dataset_name flowers \ dataset_split_name train \ dataset_dir ${DATASET_DIR} \ model_name densenet121 \ checkpoint_path ${CHECKPOINT_PATH} \ checkpoint_exclude_scopes global_step,densenet121/logits \ trainable_scopes densenet121/logits",Image Classification,Image Classification 2157,Computer Vision,Computer Vision,Computer Vision,"Show and Tell: A Neural Image Caption Generator A TensorFlow implementation of the image to text model described in the paper: Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge. Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan. IEEE transactions on pattern analysis and machine intelligence (2016). Full text available at: Contact Author: Chris Shallue Pull requests and issues: @cshallue Contents Model Overview ( model overview) Introduction ( introduction) Architecture ( architecture) Getting Started ( getting started) A Note on Hardware and Training Time ( a note on hardware and training time) Install Required Packages ( install required packages) Prepare the Training Data ( prepare the training data) Download the Inception v3 Checkpoint ( download the inception v3 checkpoint) Training a Model ( training a model) Initial Training ( initial training) Fine Tune the Inception v3 Model ( fine tune the inception v3 model) Generating Captions ( generating captions) Model Overview Introduction The Show and Tell model is a deep neural network that learns how to describe the content of images. For example: ! Example captions (g3doc/example_captions.jpg) Architecture The Show and Tell model is an example of an encoder decoder neural network. It works by first encoding an image into a fixed length vector representation, and then decoding the representation into a natural language description. The image encoder is a deep convolutional neural network. This type of network is widely used for image tasks and is currently state of the art for object recognition and detection. Our particular choice of network is the Inception v3 image recognition model pretrained on the ILSVRC 2012 CLS image classification dataset. The decoder is a long short term memory (LSTM) network. This type of network is commonly used for sequence modeling tasks such as language modeling and machine translation. In the Show and Tell model, the LSTM network is trained as a language model conditioned on the image encoding. Words in the captions are represented with an embedding model. Each word in the vocabulary is associated with a fixed length vector representation that is learned during training. The following diagram illustrates the model architecture. ! Show and Tell Architecture (g3doc/show_and_tell_architecture.png) In this diagram, \{ s 0 , s 1 , ..., s N 1 \} are the words of the caption and \{ w e s 0 , w e s 1 , ..., w e s N 1 \} are their corresponding word embedding vectors. The outputs \{ p 1 , p 2 , ..., p N \} of the LSTM are probability distributions generated by the model for the next word in the sentence. The terms \{log p 1 ( s 1 ), log p 2 ( s 2 ), ..., log p N ( s N )\} are the log likelihoods of the correct word at each step; the negated sum of these terms is the minimization objective of the model. During the first phase of training the parameters of the Inception v3 model are kept fixed: it is simply a static image encoder function. A single trainable layer is added on top of the Inception v3 model to transform the image embedding into the word embedding vector space. The model is trained with respect to the parameters of the word embeddings, the parameters of the layer on top of Inception v3 and the parameters of the LSTM. In the second phase of training, all parameters including the parameters of Inception v3 are trained to jointly fine tune the image encoder and the LSTM. Given a trained model and an image we use beam search to generate captions for that image. Captions are generated word by word, where at each step t we use the set of sentences already generated with length t 1 to generate a new set of sentences with length t . We keep only the top k candidates at each step, where the hyperparameter k is called the beam size . We have found the best performance with k 3. Getting Started A Note on Hardware and Training Time The time required to train the Show and Tell model depends on your specific hardware and computational capacity. In this guide we assume you will be running training on a single machine with a GPU. In our experience on an NVIDIA Tesla K20m GPU the initial training phase takes 1 2 weeks. The second training phase may take several additional weeks to achieve peak performance (but you can stop this phase early and still get reasonable results). It is possible to achieve a speed up by implementing distributed training across a cluster of machines with GPUs, but that is not covered in this guide. Whilst it is possible to run this code on a CPU, beware that this may be approximately 10 times slower. Install Required Packages First ensure that you have installed the following required packages: Bazel ( instructions ) TensorFlow 1.0 or greater ( instructions ) NumPy ( instructions ) Natural Language Toolkit (NLTK) : First install NLTK ( instructions ) Then install the NLTK data package punkt ( instructions ) Unzip Prepare the Training Data To train the model you will need to provide training data in native TFRecord format. The TFRecord format consists of a set of sharded files containing serialized tf.SequenceExample protocol buffers. Each tf.SequenceExample proto contains an image (JPEG format), a caption and metadata such as the image id. Each caption is a list of words. During preprocessing, a dictionary is created that assigns each word in the vocabulary to an integer valued id. Each caption is encoded as a list of integer word ids in the tf.SequenceExample protos. We have provided a script to download and preprocess the MSCOCO image captioning data set into this format. Downloading and preprocessing the data may take several hours depending on your network and computer speed. Please be patient. Before running the script, ensure that your hard disk has at least 150GB of available space for storing the downloaded and processed data. shell Location to save the MSCOCO data. MSCOCO_DIR ${HOME}/im2txt/data/mscoco Build the preprocessing script. cd research/im2txt bazel build //im2txt:download_and_preprocess_mscoco Run the preprocessing script. bazel bin/im2txt/download_and_preprocess_mscoco ${MSCOCO_DIR} The final line of the output should read: 2016 09 01 16:47:47.296630: Finished processing all 20267 image caption pairs in data set 'test'. When the script finishes you will find 256 training, 4 validation and 8 testing files in DATA_DIR . The files will match the patterns train ????? of 00256 , val ????? of 00004 and test ????? of 00008 , respectively. Download the Inception v3 Checkpoint The Show and Tell model requires a pretrained Inception v3 checkpoint file to initialize the parameters of its image encoder submodel. This checkpoint file is provided by the TensorFlow Slim image classification library which provides a suite of pre trained image classification models. You can read more about the models provided by the library here . Run the following commands to download the Inception v3 checkpoint. shell Location to save the Inception v3 checkpoint. INCEPTION_DIR ${HOME}/im2txt/data mkdir p ${INCEPTION_DIR} wget tar xvf inception_v3_2016_08_28.tar.gz C ${INCEPTION_DIR} rm inception_v3_2016_08_28.tar.gz Note that the Inception v3 checkpoint will only be used for initializing the parameters of the Show and Tell model. Once the Show and Tell model starts training it will save its own checkpoint files containing the values of all its parameters (including copies of the Inception v3 parameters). If training is stopped and restarted, the parameter values will be restored from the latest Show and Tell checkpoint and the Inception v3 checkpoint will be ignored. In other words, the Inception v3 checkpoint is only used in the 0 th global step (initialization) of training the Show and Tell model. Training a Model Initial Training Run the training script. shell Directory containing preprocessed MSCOCO data. MSCOCO_DIR ${HOME}/im2txt/data/mscoco Inception v3 checkpoint file. INCEPTION_CHECKPOINT ${HOME}/im2txt/data/inception_v3.ckpt Directory to save the model. MODEL_DIR ${HOME}/im2txt/model Build the model. cd research/im2txt bazel build c opt //im2txt/... Run the training script. bazel bin/im2txt/train \ input_file_pattern ${MSCOCO_DIR}/train ????? of 00256 \ inception_checkpoint_file ${INCEPTION_CHECKPOINT} \ train_dir ${MODEL_DIR}/train \ train_inception false \ number_of_steps 1000000 Run the evaluation script in a separate process. This will log evaluation metrics to TensorBoard which allows training progress to be monitored in real time. Note that you may run out of memory if you run the evaluation script on the same GPU as the training script. You can run the command export CUDA_VISIBLE_DEVICES to force the evaluation script to run on CPU. If evaluation runs too slowly on CPU, you can decrease the value of num_eval_examples . shell MSCOCO_DIR ${HOME}/im2txt/data/mscoco MODEL_DIR ${HOME}/im2txt/model Ignore GPU devices (only necessary if your GPU is currently memory constrained, for example, by running the training script). export CUDA_VISIBLE_DEVICES Run the evaluation script. This will run in a loop, periodically loading the latest model checkpoint file and computing evaluation metrics. bazel bin/im2txt/evaluate \ input_file_pattern ${MSCOCO_DIR}/val ????? of 00004 \ checkpoint_dir ${MODEL_DIR}/train \ eval_dir ${MODEL_DIR}/eval Run a TensorBoard server in a separate process for real time monitoring of training progress and evaluation metrics. shell MODEL_DIR ${HOME}/im2txt/model Run a TensorBoard server. tensorboard logdir ${MODEL_DIR} Fine Tune the Inception v3 Model Your model will already be able to generate reasonable captions after the first phase of training. Try it out! (See Generating Captions ( generating captions)). You can further improve the performance of the model by running a second training phase to jointly fine tune the parameters of the Inception v3 image submodel and the LSTM. shell Restart the training script with train_inception true. bazel bin/im2txt/train \ input_file_pattern ${MSCOCO_DIR}/train ????? of 00256 \ train_dir ${MODEL_DIR}/train \ train_inception true \ number_of_steps 3000000 Additional 2M steps (assuming 1M in initial training). Note that training will proceed much slower now, and the model will continue to improve by a small amount for a long time. We have found that it will improve slowly for an additional 2 2.5 million steps before it begins to overfit. This may take several weeks on a single GPU. If you don't care about absolutely optimal performance then feel free to halt training sooner by stopping the training script or passing a smaller value to the flag number_of_steps . Your model will still work reasonably well. Generating Captions Your trained Show and Tell model can generate captions for any JPEG image! The following command line will generate captions for an image from the test set. shell Path to checkpoint file or a directory containing checkpoint files. Passing a directory will only work if there is also a file named 'checkpoint' which lists the available checkpoints in the directory. It will not work if you point to a directory with just a copy of a model checkpoint: in that case, you will need to pass the checkpoint path explicitly. CHECKPOINT_PATH ${HOME}/im2txt/model/train Vocabulary file generated by the preprocessing script. VOCAB_FILE ${HOME}/im2txt/data/mscoco/word_counts.txt JPEG image file to caption. IMAGE_FILE ${HOME}/im2txt/data/mscoco/raw data/val2014/COCO_val2014_000000224477.jpg Build the inference binary. cd research/im2txt bazel build c opt //im2txt:run_inference Ignore GPU devices (only necessary if your GPU is currently memory constrained, for example, by running the training script). export CUDA_VISIBLE_DEVICES Run inference to generate captions. bazel bin/im2txt/run_inference \ checkpoint_path ${CHECKPOINT_PATH} \ vocab_file ${VOCAB_FILE} \ input_files ${IMAGE_FILE} Example output: Captions for image COCO_val2014_000000224477.jpg: 0) a man riding a wave on top of a surfboard . (p 0.040413) 1) a person riding a surf board on a wave (p 0.017452) 2) a man riding a wave on a surfboard in the ocean . (p 0.005743) Note: you may get different results. Some variation between different models is expected. Here is the image: ! Surfer (g3doc/COCO_val2014_000000224477.jpg)",Image Classification,Image Classification 2160,Computer Vision,Computer Vision,Computer Vision,"DenseNet on MURA Dataset using PyTorch A PyTorch implementation of 169 layer DenseNet model on MURA dataset, inspired from the paper arXiv:1712.06957v3 by Pranav Rajpurkar et al. MURA is a large dataset of musculoskeletal radiographs, where each study is manually labeled by radiologists as either normal or abnormal. know more Important Points: The implemented model is a 169 layer DenseNet with single node output layer initialized with weights from a model pretrained on ImageNet dataset. Before feeding the images to the network, each image is normalized to have same mean and standard deviation as of the images in the ImageNet training set, scaled to 224 x 224 and augmentented with random lateral inversions and rotations. The model uses modified Binary Cross Entropy Loss function as mentioned in the paper. The Learning Rate decays by a factor of 10 every time the validation loss plateaus after an epoch. The optimization algorithm is Adam with default parameters β1 0.9 and β2 0.999. According to MURA dataset paper: > The model takes as input one or more views for a study of an upper extremity. On each view, our 169 layer convolutional neural network predicts the probability of abnormality. We compute the overall probability of abnormality for the study by taking the arithmetic mean of the abnormality probabilities output by the network for each image. The model implemented in model.py (model.py) takes as input 'all' the views for a study of an upper extremity. On each view the model predicts the probability of abnormality. The Model computes the overall probability of abnormality for the study by taking the arithmetic mean of the abnormality probabilites output by the network for each image. Instructions Install dependencies: PyTorch TorchVision Numpy Pandas Train the model with python main.py Citation @ARTICLE{2017arXiv171206957R, author {{Rajpurkar}, P. and {Irvin}, J. and {Bagul}, A. and {Ding}, D. and {Duan}, T. and {Mehta}, H. and {Yang}, B. and {Zhu}, K. and {Laird}, D. and {Ball}, R.L. and {Langlotz}, C. and {Shpanskaya}, K. and {Lungren}, M.P. and {Ng}, A.}, title {MURA Dataset: Towards Radiologist Level Abnormality Detection in Musculoskeletal Radiographs} , journal {ArXiv e prints}, archivePrefix arXiv , eprint {1712.06957}, primaryClass physics.med ph , keywords {Physics Medical Physics, Computer Science Artificial Intelligence}, year 2017, month dec, adsurl { adsnote {Provided by the SAO/NASA Astrophysics Data System} }",Image Classification,Image Classification 2183,Computer Vision,Computer Vision,Computer Vision,"FALCON: FAst and Lightweight CONvolution This repository provides implementations of FALCON convolution / Mobile convolution and their corresponding CNN model. FALCON, a faster and lighter convolution, is capable of compressing and accelerateing standard convolution. FALCON compresses and accelerates MobileNet with standard convolution for 7.4× and 2.36×, respectively. Overview Code structure ./src : source code for FALCON ./src/model : python scripts for model definition ./src/main : python scripts for training/testing models defined in ./src/model ./src/utils : utils for execution of training/testing codes in ./src/main ./scripts : shell scripts for execution of training/testing codes in ./main Naming convention FALCON : FAst and Lightweight CONvolution the new convolution architecture we proposed MobileConv : Convolution architecture from paper 'MobileNet' (refer to Rank : Rank of convolution. Copy the conv layer for n times, run independently and add output together at the end of the layer. This hyper parameter helps balace compression rate/ accelerate rate and accuracy. Data description CIFAR 10 datasets CIFAR 100 datasets Note that: The datasets depends on torchvision . You don't have to download anything. When execute the source code, the datasets will be automaticly download if it is not detected. Output After training, the trained model will be saved in src/train_test/trained_model/ . You can test the model only if there is a trained model in src/train_test/trained_model/. Install Environment Unbuntu CUDA 9.0 Python 3.6 torch torchvision Dependence Install pip3 install torch torchvision How to use Clone the repository git clone cd FALCON DEMO To train the model, run script: cd scr/train_test python main.py train conv StandardConv python main.py train conv FALCON The trained model will be saved in src/train_test/trained_model/ To test the model, run script: cd scr/train_test python main.py conv StandardConv python main.py conv FALCON The testing accuracy and inference time will be printed on the screen. You can test the model only if there is a trained model in train_test/trained_model/. To check the trained model size, run script: cd scr/train_test/trained_model ls l Pre trained model is saved in FALCON/src/train_test/trained_model/ Standard model: (It is about 115M. You have to train it first, since trained model is too large to upload.) conv StandardConv,model MobileNet,data cifar100,rank 1,alpha 1.pkl FALCON model: (It is trained and saved in folder.) conv FALCON,model MobileNet,data cifar100,rank 1,alpha 1.pkl Scripts There are four demo scripts: scripts/train.sh , scripts/train_rank.sh , scripts/test.sh , scripts/test_rank.sh You can change arguments in .sh files to train/test different model. train.sh : Execute training of model ( conv FALCON m VGG16 data cifar10) Output: trained model will be saved Training procedure and result will be print on the screen. trian_rank.sh : Execute training of model ( conv RankFALCON m VGG16 data cifar10 k 2 al 0.5) Output: trained model will be saved Training procedure and result will be print on the screen. test.sh : Execute test of trained model ( conv FALCON m MobileNet data cifar100) Accuracy/ inference time/ compression rate/ computation reduction rate will be print on the screen. test_rank.sh : Execute test of trained model ( conv FALCON m MobileNet data cifar100 k 2 al 0.9) Accuracy/ inference time/ compression rate/ computation reduction rate will be print on the screen. Contact us Chun Quan (quanchun@snu.ac.kr) U Kang (ukang@snu.ac.kr) Data Mining Lab. at Seoul National University.",Image Classification,Image Classification 2184,Computer Vision,Computer Vision,Computer Vision,"FALCON: FAst and Lightweight CONvolution This package provides implementations of FALCON convolution/ Mobile convolution and their corresponding CNN model. Overview Code structure ./src : source code for FALCON ./src/model : python scripts for model definition ./src/main : python scripts for training/testing models defined in ./src/model ./src/utils : utils for execution of training/testing codes in ./src/main ./scripts : shell scripts for execution of training/testing codes in ./main Naming convention FALCON : FAst and Lightweight CONvolution the new convolution architecture we proposed MobileConv : Convolution architecture from paper 'MobileNet' (refer to Rank : Rank of convolution. Copy the conv layer for n times, run independently and add output together at the end of the layer. This hyper parameter helps balace compression rate/ accelerate rate and accuracy. Data description CIFAR 10 datasets CIFAR 100 datasets Note that: The datasets depends on torchvision . You don't have to download anything. When execute the source code, the datasets will be automaticly download if it is not detected. Output After training, the trained model will be saved in train_test/trained_model/ . You can test the model only if there is a trained model in train_test/trained_model/. Install Environment Unbuntu CUDA 9.0 Python 3.6 torch torchvision Dependence Install pip3 install torch torchvision How to use Clone the repository git clone cd FALCON DEMO To train the model, run script: cd scr/train_test python main.py train conv StandardConv python main.py train conv FALCON The trained model will be saved in src/train_test/trained_model/ To test the model, run script: cd scr/train_test python main.py conv StandardConv python main.py conv FALCON The testing accuracy and inference time will be printed on the screen. To check the trained model size, run script: cd scr/train_test/trained_model ls l Pre trained model is saved in FALCON/src/train_test/trained_model/ Standard model: conv StandardConv,model MobileNet,data cifar100,rank 1,alpha 1.pkl FALCON model: conv FALCON,model MobileNet,data cifar100,rank 1,alpha 1.pkl Scripts There are four demo scripts: scripts/train.sh , scripts/train_rank.sh , scripts/test.sh , scripts/test_rank.sh You can change arguments in .sh files to train/test different model. train.sh : Execute training of model ( conv FALCON m VGG16 data cifar10) Output: trained model will be saved Training procedure and result will be print on the screen. trian_rank.sh : Execute training of model ( conv RankFALCON m VGG16 data cifar10 k 2 al 0.5) Output: trained model will be saved Training procedure and result will be print on the screen. test.sh : Execute test of trained model ( conv FALCON m MobileNet data cifar100) Accuracy/ inference time/ compression rate/ computation reduction rate will be print on the screen. test_rank.sh : Execute test of trained model ( conv FALCON m MobileNet data cifar100 k 2 al 0.5) Accuracy/ inference time/ compression rate/ computation reduction rate will be print on the screen. Contact us Chun Quan (quanchun@snu.ac.kr) U Kang (ukang@snu.ac.kr) Data Mining Lab. at Seoul National University.",Image Classification,Image Classification 2187,Computer Vision,Computer Vision,Computer Vision,"Detection of pests Asian University Machine Learning Camp Jeju 2018 Table of Contents 1. Overview ( overview) Datasets Used ( dataset) 2. Workflow ( workflow) Introduction ( introduction) Pre requsites ( prerequsite) Complete Workflow ( complete) Steps to Follow ( steps) 3. References ( ref) 4. Acknowledgement ( ack) Overview/Motivation Agriculture is a most important and ancient occupation in India. As economy of India is based on agricultural production, utmost care of food production is necessary. Pests like virus, fungus and bacteria causes infection to plants with loss in quality and quantity production. There is large amount of loss of farmer in production. Hence proper care of plants is necessary for same. Image processing provides more efficient ways to detect diseases caused by fungus, bacteria or virus on plants. Mere observations by eyes to detect diseases are not accurate. Excess use also damages plants nutrient quality. It results in huge loss of production to farmer. Hence use of image processing techniques to detect and classify diseases in agricultural applications is helpful. Datasets Used For this project, the datasets are images: you can download images in 2 ways 1. Fatkun batch Download (chrome Extension) 2. FFmpeg Command line tool (extracts images from videos) Workflow The workflow is divided into 4 main parts. Introduction ImageNet is a common academic data set in machine learning for training an image recognition system. Code in this directory demonstrates how to use TensorFlow to train and evaluate a type of convolutional neural network (CNN) on this data set. This network achieves 21.2% top 1 and 5.6% top 5 error for single frame evaluation with a computational cost of 5 billion multiply adds per inference and with using less than 25 million parameters. Below is a visualization of the model architecture. ! image Pre Requsite Python 3.5+ Tensorflow Docker Toolbox Android Studio Complete Workflow ! image Why to use pre trained model when we can build new one?? ! image How does the pre trained network works? ! image Steps to Follow 1. Install python 3.5 or above, tensorflow & Docker toolbox. 2. Place all your Datasets in a folder & The classification script uses the folder names as label names, and the images inside each folder should be pictures that correspond to that label. Collect as many pictures of each label as you can and try it out! 3. Clone the repository & navigate to the directory. shell git clone cd Detection of pests 4. For training you can download the sample flower Datasets & extract them to tf_files/training_dataset folder. batch └───training_dataset ├───Fruit Piercing Moth ├───Gall flies ├───Leaf feeding caterpillars └───Scrab Beetle 5. Run the retrain.py in docker with the following command. As it trains, you can start TensorBoard in background. you'll see a series of step outputs, each one showing training accuracy, validation accuracy, and the cross entropy: The training accuracy shows the percentage of the images used in the current training batch that were labeled with the correct class. Validation accuracy: The validation accuracy is the precision (percentage of correctly labelled images) on a randomly selected group of images from a different set. Cross entropy is a loss function that gives a glimpse into how well the learning process is progressing. (Lower numbers are better.) shell setting up the architecture IMAGE_SIZE 224 ARCHITECTURE mobilenet_0.50_${IMAGE_SIZE} The default training is 4000 iterations. python m scripts.retrain bottleneck_dir tf_files/bottlenecks model_dir tf_files/models/ ${ARCHITECTURE} summaries_dir tf_files/training_summaries/ ${ARCHITECTURE} output_graph tf_files/retrained_graph.pb output_labels tf_files/retrained_labels.txt architecture ${ARCHITECTURE} image_dir tf_files/training_dataset TensorBoard tensorboard logdir tf_files/training_summaries & The training completed with 92% accuracy.. ! image ! image 6. when the training is completed the output is retrained_graph.pb & retrained_labels.txt you can test the model using the graph. shell Testing python m scripts.label_image graph tf_files/retrained_graph.pb image tf_files/testing_dataset/image_name.jpg ! image 7. If you want to run the model to predict on android device you have to opimize the model some of the methods for compressing the model are: Freeze the Graph : Converts Variables to Constants Graph Transform Tool : Removes unnessary parts Quantize Weights : Quantizes weights & calculations quantize Calculations : Converts 32 bit Floating point calculations to 8 bit Integer by maintaining decent Accuracy. Meamory Mapping : uses Meamory mapping to load parameters instead of standard File I/O. 8. You can perform the optimization to the retrained_graph.pb file so that the model can be compressible by running the following commands which can compress the model upto 70%. shell optimized graph python m tensorflow.python.tools.optimize_for_inference input tf_files/retrained_graph.pb output tf_files/optimized_graph.pb input_names input output_names final_result rounded graph python m scripts.quantize_graph input tf_files/optimized_graph.pb output tf_files/graph.pb output_node_names final_result mode weights_rounded 9. Paste the graph.pb & labels.txt in android/tfmobile/assets & open android studio choose existing project tfmobile. let the Gradle build finish & run your app. Now you can use your andoid mobile to classify things my using the App. ! image References 1. 2. 3. 4. Acknowledgement I would sincerely like to thank Jeju National University , Jeju Development Center & all the sponcers who have supported us for our work. And i would specially like to thank professor Yungcheol Byun for guiding us & helping us in every situation. ! image",Image Classification,Image Classification 2199,Computer Vision,Computer Vision,Computer Vision,"DLFramework Build Status Build Status Framework Manifest Format yaml name: MXNet name of the framework version: 0.1 framework version container: containers used to perform model prediction multiple platforms can be specified amd64: if unspecified, then the default container for the framework is used gpu: raiproject/carml mxnet:amd64 cpu cpu: raiproject/carml mxnet:amd64 gpu ppc64le: cpu: raiproject/carml mxnet:ppc64le gpu gpu: raiproject/carml mxnet:ppc64le gpu Model Manifest Format yaml name: InceptionNet name of your model framework: the framework to use name: MXNet framework for the model version: ^0.1 framework version constraint version: 1.0 version information in semantic version format container: containers used to perform model prediction multiple platforms can be specified amd64: if unspecified, then the default container for the framework is used gpu: raiproject/carml mxnet:amd64 cpu cpu: raiproject/carml mxnet:amd64 gpu ppc64le: cpu: raiproject/carml mxnet:ppc64le gpu gpu: raiproject/carml mxnet:ppc64le gpu description: > An image classification convolutional network. Inception achieves 21.2% top 1 and 5.6% top 5 error on the ILSVRC 2012 validation dataset. It consists of fewer than 25M parameters. references: references to papers / websites / etc.. describing the model license of the model license: MIT inputs to the model inputs: first input type for the model type: image description of the first input description: the input image parameters: type parameters dimensions: 1, 3, 224, 224 output: the type of the output type: feature a description of the output parameter description: the output label parameters: type parameters features_url: before_preprocess: > code... preprocess: > code... after_preprocess: > code... before_postprocess: > code... postprocess: > code... after_postprocess: > code... model: specifies model graph and weights resources base_url: graph_path: Inception BN symbol.json weights_path: Inception BN 0126.params is_archive: false if set, then the base_url is a url to an archive the graph_path and weights_path then denote the file names of the graph and weights within the archive attributes: extra network attributes kind: CNN the kind of neural network (CNN, RNN, ...) training_dataset: ImageNet dataset used to for training manifest_author: abduld hidden: false hide the model from the frontend",Image Classification,Image Classification 2222,Computer Vision,Computer Vision,Computer Vision,"Large Scale Fine Grained Categorization and Domain Specific Transfer Learning Tensorflow code and models for the paper: Large Scale Fine Grained Categorization and Domain Specific Transfer Learning \ Yin Cui , Yang Song , Chen Sun , Andrew Howard, Serge Belongie \ CVPR 2018 This repository contains code and pre trained models used in the paper and 2 demos to demonstrate: 1) the importance of pre training data on transfer learning; 2) how to calculate domain similarity between source domain and target domain. Notice that we used a mini validation set (./inat_minival.txt) contains 9,697 images that are randomly selected from the original iNaturalist 2017 validation set. The rest of valdiation images were combined with the original training set to train our model in the paper. There are 665,473 training images in total. Dependencies: + Python (3.5) + Tensorflow (1.11) + pyemd + scikit learn + scikit image Preparation: + Clone the repo with recursive: bash git clone recursive + Install dependencies. Please refer to TensorFlow, pyemd, scikit learn and scikit image official websites for installation guide. + Download data and feature and unzip them into the same directory as the cloned repo. You should have two folders './data' and './feature' in the repo's directory. Datasets (optional): In the paper, we used data from 9 publicly available datasets: + ImageNet (ILSVRC2012) + iNaturalist 2017 + Aircraft + CUB 200 2011 + Oxford Flower 102 + Food 101 + NABirds + Stanford Cars + Stanford Dogs We provide a download link that includes the entire CUB 200 2011 dataset and data splits for the rest of 8 datasets. The provided link contains sufficient data for this repo. If you would like to use other 8 datasets, please download them from the official websites and put them in the corresponding subfolders under './data'. Pre trained Models (optional): The models were trained using TensorFlow Slim . We implemented Squeeze and Excitation Networks (SENet) under './slim'. The pre trained models can be downloaded from the following links: Network Pre trained Data Input Size Download Link Inception V3 ImageNet 299 link Inception V3 iNat2017 299 link Inception V3 iNat2017 448 link Inception V3 iNat2017 299 > 560 FT 1 link Inception V3 ImageNet + iNat2017 299 link Inception V3 SE ImageNet + iNat2017 299 link Inception V4 iNat2017 448 link Inception V4 iNat2017 448 > 560 FT 2 link Inception ResNet V2 ImageNet + iNat2017 299 link Inception ResNet V2 SE ImageNet + iNat2017 299 link ResNet V2 50 ImageNet + iNat2017 299 link ResNet V2 101 ImageNet + iNat2017 299 link ResNet V2 152 ImageNet + iNat2017 299 link 1 This model was trained with 299 input size on train + 90% val and then fine tuned with 560 input size on 90% val. 2 This model was trained with 448 input size on train + 90% val and then fine tuned with 560 input size on 90% val. TensorFlow Hub also provides a pre trained Inception V3 299 on iNat2017 original training set here . Featrue Extraction (optional): Run the following Python script to extract feature: python feature_extraction.py To run this script, you need to download the checkpoint of Inception V3 299 trained on iNat2017 . The dataset and pre trained model can be modified in the script. We provide a download link that includes features used in the domos of this repo. Demos 1. Linear logistic regression on extracted features: This demo shows the importance of pre training data on transfer learning. Based on features extracted from an Inception V3 pre trained on iNat2017, we are able to achieve 89.9% classification accuracy on CUB 200 2011 with the simple logistic regression, outperforming most state of the art methods. LinearClassifierDemo.ipynb 2. Calculating domain similarity by Earth Mover's Distance (EMD): This demo gives an example to calculate the domain similarity proposed in the paper. Results correspond to part of the Fig. 5 in the original paper. DomainSimilarityDemo.ipynb Training and Evaluation + Convert dataset into '.tfrecord': python convert_dataset.py dataset_name cub_200 num_shards 10 + Train (fine tune) the model on 1 GPU: CUDA_VISIBLE_DEVICES 0 ./train.sh + Evaluate the model on another GPU simultaneously: CUDA_VISIBLE_DEVICES 1 ./eval.sh + Run Tensorboard for visualization: tensorboard logdir ./checkpoints/cub_200/ port 6006 Citation If you find our work helpful in your research, please cite it as: latex @inproceedings{Cui2018iNatTransfer, title {Large Scale Fine Grained Categorization and Domain Specific Transfer Learning}, author {Yin Cui, Yang Song, Chen Sun, Andrew Howard, Serge Belongie}, booktitle {CVPR}, year {2018} }",Image Classification,Image Classification 2225,Computer Vision,Computer Vision,Computer Vision,"SHIELD: Fast, Practical Defense and Vaccination for Deep Learning using JPEG Compression Overview This is the code repository for the KDD 2018 Applied Data Science paper: SHIELD: Fast, Practical Defense and Vaccination for Deep Learning using JPEG Compression . Visit our research group homepage Polo Club of Data Science at Georgia Tech for more related research! The code included here reproduces our techniques (e.g. SLQ) presented in the paper, and also our experiment results reported, such as using various JPEG compression qualities to remove adversarial perturbation introduced by Carlini Wagner L2, DeepFool, I FSGM, and FSGM. SHIELD overview YouTube video (readme/shield youtube thumbnail.jpg) Research Abstract The rapidly growing body of research in adversarial machine learning has demonstrated that deep neural networks (DNNs) are highly vulnerable to adversarially generated images. This underscores the urgent need for practical defense that can be readily deployed to combat attacks in real time. Observing that many attack strategies aim to perturb image pixels in ways that are visually imperceptible, we place JPEG compression at the core of our proposed SHIELD defense framework, utilizing its capability to effectively compress away such pixel manipulation. To immunize a DNN model from artifacts introduced by compression, SHIELD vaccinates a model by re training it with compressed images, where different compression levels are applied to generate multiple vaccinated models that are ultimately used together in an ensemble defense. On top of that, SHIELD adds an additional layer of protection by employing randomization at test time that compresses different regions of an image using random compression levels, making it harder for an adversary to estimate the transformation performed. This novel combination of vaccination, ensembling, and randomization makes SHIELD a fortified, multi pronged defense. We conducted extensive, large scale experiments using the ImageNet dataset, and show that our approaches eliminate up to 94% of black box attacks and 98% of gray box attacks delivered by the recent, strongest techniques, such as Carlini Wagner's L2 and DeepFool. Our approaches are fast and work without requiring knowledge about the model. Installation and Setup Clone Repository To clone this repository using git , simply run the following command: bash git clone Install Dependencies This repository uses attacks from the CleverHans library, and the models are adapted from tf slim . We also use Sacred to keep track of the experiments. All dependencies for this repository can be found in requirements.txt . To install these dependencies, run the following command from the jpeg defense directory: bash pip install r requirements.txt Setup ImageNet Dataset The code expects the ImageNet validation dataset to be available in TFRecord format in the data/validation directory. To provision the data, we have provided a script ( setup/get_imagenet.py ) that downloads, processes, and saves the entire ImageNet dataset in the required format. This script can be run from the setup directory in the following manner: bash python get_imagenet.py local_scratch_dir /path/to/jpeg defense/data Downloading the entire dataset from the ImageNet website using this script may be very slow. Optionally, we recommend downloading the ILSVRC2012_img_train.tar and ILSVRC2012_img_val.tar using Academic Torrents , and placing these files into the data/raw_data directory. Then, you can run the following command to skip downloading the dataset and proceed with converting the data into TFRecord format: bash python get_imagenet.py \ local_scratch_dir /path/to/jpeg defense/data \ provision_only True Download Pre trained Model Weights This repository currently supports the ResNet50 v2 and Inception v4 models from tf slim . Running the following command from the jpeg defense directory will download the pre trained .ckpt files for these models into the data/checkpoints folder using the provided setup/get_model_checkpoints.sh script: bash bash setup/get_model_checkpoints.sh data/checkpoints Example Usage The main.py script in the shield package can be used to perform all the experiments using the perform attack defend evaluate flags. attack Attacks the specified model with the specified method and its parameters (see shield/opts.py ). bash python main.py with \ perform attack \ model resnet_50_v2 \ attack fgsm \ attack_options {'eps': 16} defend Defends the specified attacked images with the specified defense and its parameters (see shield/opts.py ). The defense uses the attack parameters only to determine which images are loaded for preprocessing, as these parameters are not used by the preprocessing itself. bash python main.py with \ perform defend \ model resnet_50_v2 \ attack fgsm \ attack_options {'eps': 16} \ defense jpeg \ defense_options {'quality': 80} evaluate Evaluates the specified model with the specified attacked/defended version of the images. bash python main.py with \ perform evaluate \ model resnet_50_v2 \ attack fgsm \ attack_options {'eps': 16} Video Demo YouTube video demo (readme/shield demo youtube thumbnail.jpg) Paper PDF on arXiv Paper PDF on arXiv Citation SHIELD: Fast, Practical Defense and Vaccination for Deep Learning using JPEG Compression. Nilaksh Das, Madhuri Shanbhogue, Shang Tse Chen, Fred Hohman, Siwei Li, Li Chen, Michael E. Kounavis, Duen Horng Chau. ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) 2018 . London, UK. Aug 19 23, 2018. BibTeX @article{das2018shield, title {SHIELD: Fast, Practical Defense and Vaccination for Deep Learning using JPEG Compression}, author {Das, Nilaksh and Shanbhogue, Madhuri and Chen, Shang Tse and Hohman, Fred and Li, Siwei and Chen, Li and Kounavis, Michael E and Chau, Duen Horng}, booktitle {Proceedings of the 24nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining}, year {2018}, organization {ACM} } Researchers Name Affiliation Nilaksh Das Georgia Tech Madhuri Shanbhogue Georgia Tech Shang Tse Chen Georgia Tech Fred Hohman Georgia Tech Siwei Li Georgia Tech Li Chen Intel Corporation Michael E. Kounavis Intel Corporation Polo Chau Georgia Tech",Image Classification,Image Classification 2235,Computer Vision,Computer Vision,Computer Vision,"Submission for Ifood challenge at FGVC workshop CVPR'18 Our entry stood fourth and we presented our work at the workshop poster, we had utilized a 42 model ensemble (Resnets, Resnexts ) Our top 3 best accuracy on a single model was 89.6 % while with using pretrained feature of penultimate layer of ResNext and then applying xgboost with 0.01 learning rate and 200 rounds , the top 3 accuracy is 86.7 % Acknowledgement densenet pytorch Wide Residual Networks (BMVC 2016) by Sergey Zagoruyko and Nikos Komodakis. BC leanring (CVPR 2018) by Yuji Tokozume, Yoshitaka Ushiku, Tatsuya Harada Team Rishabh Gupta, Atsuhiro Noguchi and Kuniaki Saito (The University of Tokyo) More details to follow soon...",Image Classification,Image Classification 2241,Computer Vision,Computer Vision,Computer Vision,"Edit: IMPORTANT These are a few code snippets from a project I had to do. A description is in the original readme below. This was the first project that went a little bit overboard, I did not fix requirements at the beginning of it and demands were rather dynamically raised, hence the chaotic structure in a few files. This was the trigger that made me review SW architecture . Originally I did write a GUI, but it turned out that this was meant to run a Linux server, so I canceled that. I deleted the main scripts and the results of course. 1. Introduction 1.1. Preface This project implements a Gaussian hyperparameter tuning. It is suited for the ESPnet network or similar structures, and is meant to automatically tune hyperparameters. Background knowledge on Gaussian hyperparameter tuning and the ESPnet can be obtained from the references and the appendix in the end of this file. This readme mainly aims at introducing on how this work is strucured, how to use it and how to properly troubleshoot/modify the file in case it should be adapted to new sets. A more detailed explanation of the code is given in the doc/ directory, where another README.md exists. 1.2. Content 1. Introduction 1.1. Preface 1.2. Content 1.3. The Directory Structure 2. How to use the Interface 2.1. The First Run 2.2. Starting from the script run_terminal.py 2.3. Starting from the run.sh 3. How to Modify Properly 3.1. Adding or Deleting Options 3.2. Hanging in New Benchmark 4. Troubleshooting 5. Desicions Made 6. Sources 6.1. Related Papers 6.2. Useful Links 1.3. The Directory Strucure The directories are the following: /src This is the main code, where the terminal and the optimizing code sit in. /utils Auxiliary functions such as parsers are stored in here. Furthermore, the current program status is stored in here. /log This is where the logging takes place, i.e. each test run setting and the parameters are stored in here. /eval Here is an auxiliary python notebook to evaluate the results. Of course it can be modified by the user. /doc Here is a documentation of the code as well as a few refences and reads. Furthermore, there might be a folder with the virtual environment. 2. How to use the Interface 2.1. The First Run When running the program, at first it loads all the standard settings provided by the Tedlium data set. Additional settings are also provided during this step, such as how many GPUs will be used and path settings. The settings are both in the src/initialize.ini utils/config.py. , where the settings in the config.py are only loaded if the initialize.ini does not exist. Before you run the program please make sure at least the following parameters: path_to_run backend n_gpu , where the path_to_run will be the path starting in the root folder and into one of the folders of ESPnet that contains a run.sh file, e.g. /home/robert/espnet/an4/asr1/. Additionally, in the utils/config.py there are bounds given for the trainable parameters. Please adjust the bounds to your needs if necessary. When running the first also, the script stops after training, because the Gaussian Process Regressor from sklearn needs at least two samples to run. Let the test stop and give it the next point by writing the desired parameters into the initialize.ini. After this, the script should be able to run through without any problem. 2.2. Starting from the script run_terminal.py The interface has two runnable modes, an interactive mode and a normal mode. The interactive mode starts with the parameter interactive , when thereafter the parameters can be handed over to the tool. The interactive mode provides a manual . In normal mode, the parameters are just handed over. It is recommended to use the run.sh script, since this gives an overview of all possible parameters. All parameters can be shown by the help option, emphasis shall be put on n_interations the number of iterations network the network type, e.g. tedlium, an4 or freudnet For adding or removing options see section 3.1. 2.3. Starting from the run.sh The run.sh is the main starting script. The parameters are all taken from the tedlium benchmark of ESPnet. In order to run the test simply run this file. Please note: It first executes a virtual environment. In case you deleted that or did change it you have to change that line. 3. How to Modify Properly 3.1 Adding or Deleting Options All possible main options are in the utils/configs.py script, where they come in the form of a dictionary. The main key of that dictionary is the parameter, followed by an initial value (that is used if no src/initialize.ini exists), its type and possible bounds. Since the Gaussian Process Regressor can only work with numerical values , only those are made trainable. A list of non trainable numerical values exists in the first few lines of the run_terminal.py script, when the parameters are loaded. Hence, adding or deleting parameters goes by simply adding or deleting entries in the dictionary. Please make sure that the parameter name must match the key of the dictionary. Trainable parameters must furthermore have upper and lower bounds , please follow the structure given by the other entries. 3.2 Hanging in New Benchmark When inserting a new benchmark, a few adjustments have to be made. First of all, please make sure that the results can properly be parsed. The current parser, which parses the result.txt of the ESPnet results, is the function parse_result in utils/configs.py . Also, the folder where the results are in have to be found. Here, again the structure of ESPnet is assumed: The path where the run.sh is in should have a folder called 'exp/', in the the result folders are saved. Then, the folder with the result is being found in that with the function get_path_to_result in utils/helpers.py . Currently, this function assumes a 'test' in the folder, because ESPnet throws out two folders, distinct in the beginning: one says test, the says train. Obviously, we want the test folder in the ESPnet setting. 4. Troubleshooting When the result.txt cannot be found in the log folder This happens when the espnet could not execute succesfully. In this please try to check the input to espnet. Examples would be to set the upper and lower bounds correctly, as a non defined may have been handed over. Do not forget to rewrite the current value in the initialize.ini, if it is wrong at this moment. Test is being Aborted After Training This can have three possible (here discussed) roots: You started the first test with a new benchmark. In that case the test will stop after one training, and an src/initialize.ini file will be written. Open this file, and give the tool a new point by at least altering one of the trainable parameters. Then, the tests should be able to run through thereafter One of the parameters is in a forbidden domain. E.g. the number of layers of a neural network can not be negative, or a learning rate cannot be larger than 1. Please adjust the boundaries accordingly in the utils/configs.py and put the out of range parameter in the src/initialize.ini in a good domain again. Something went wrong with the neural network or benchmark to train, and the file with the results could not be found. Please make sure a correct execution of those trainings. 5. Choices Made on Optimization Kernel The kernel utilized by the tool is choosable, but here a Matern kernel is used. This Kernel has been proposed by Snoek et al., 1 and proven to be working with several setups including convolutional neural networks. Please note that the parameters of the kernel do not have to be tuned, as the GaussianProcessRegressor object does this by itself with the optional argument n_restarts_optimizer. Acquisition Function There are multiple acquisition functions possible, however, every serious source that was available the time this project finished used the expected improvement (EI). A discussion of possible acquisition functions can be found in the links Discussion of Acquisition Functions . Finding the Maximum In order to maximize the EI the minimize function from the scipy library is being used. We switch the sign in the next_point function and then minimize. A for loop with random sampled points is used, because the minimize function uses gradient descent methods. That way local optima can be circumvented. The starting points are sampled with respect to the upper and lower bounds of the parameters, hence do not fall out of allowed domains. The number of sample points can be adjusted with n_minimize_restarts in hp_optimizer, which can be higher, because the cost of time for the gradient descent is marginal compared to the network trainings. 6. Sources 6.1. Related Papers 1 Practical Bayesian Optimization of Machine Learning Algorithms, Snoek et al., arXiv:1206.2944, 2012 2 Sequential Model Based Optimization for General Algorithm Configuration, Hutter et al., LION, DOI: 10.1007/978 3 642.3_40, 2011 3 A Tutorial on Bayesian Optimization, Frazier, arXiv:1807.02811, 2018 6.2. Useful Links TUM Videos: Blog Entry with example code: Derivation of EI: Discussion of Acquisition Functions: Possibly intersting: Using Deep Neural Networks as Alternative:",Image Classification,Image Classification 2266,Computer Vision,Computer Vision,Computer Vision,"PyTorch Pretrained Dual Path Networks (DPN) This repository includes a PyTorch implementation of DualPathNetworks that works with cypw's pretrained weights. The code is based upon cypw's original MXNet implementation with oyam's PyTorch implementation as a reference. If anyone would like to host a direct link of PyTorch pth files I am happy to do the conversion and upload somewhere. I do not have the resources to host myself. All testing of these models and all validation was done with torch (0.2.0.post1) and mxnet (0.11.0) pip packages installed. Usage Download and untar trained weights files from into a './pretrained' folder where this code is located. The pretrained weights can then be used in two ways: 1. They can be converted to PyTorch pth files by using the convert_from_mxnet.py script from the command line and then used as a normal PyTorch checkpoint. 2. They can be used via the model creation functions with pretrained True if executing in an environment with MXNet available and weights in the './pretrained' folder. Conversion Script python convert_from_mxnet.py ./pretrained/ model dpn107 Pretrained python validate.py /imagenet/validation/ pretrained model dpn92 multi gpu img size 320 Ensure you are executing the above with the appropriate MXNet model weights untarred into the './pretrained' folder. TODO Add conversion support for 5k models and test (need 5K Imagenet) Add/test training code from PyTorch imagenet ref impl if any interest Results The following tables contain the validation results (from included validation code) on ImageNet 1K. The DPN models are using the converted weights from the pretrained MXNet models. Also included are results from Torchvision ResNet, DenseNet as well as an InceptionV4 and InceptionResnetV2 port (by Cadene, for reference. All DPN runs at image size above 224x224 are using the mean max pooling scheme described by cypw. Note that results are sensitive to image crop, scaling interpolation, and even the image library used. All image operations for these models are performed with PIL. Bicubic interpolation is used for all but the ResNet models where bilinear produced better results. Results for InceptionV4 and InceptionResnetV2 where better at 100% crop, all other networks being evaluated at their native training resolution use 87.5% crop. Models with a ' ' are using weights that were trained on ImageNet 5k and fine tuned on ImageNet 1k. The MXNet weights files for these have an ' extra' suffix in their name. Results @224x224 Model Prec@1 (Err) Prec@5 (Err) Params Crop DenseNet121 74.752 (25.248) 92.152 (7.848) 7.98 87.5% ResNet50 76.130 (23.870) 92.862 (7.138) 25.56 87.5% DenseNet169 75.912 (24.088) 93.024 (6.976) 14.15 87.5% DualPathNet68 76.346 (23.654) 93.008 (6.992) 12.61 87.5% ResNet101 77.374 (22.626) 93.546 (6.454) 44.55 87.5% DenseNet201 77.290 (22.710) 93.478 (6.522) 20.01 87.5% DenseNet161 77.348 (22.652) 93.646 (6.354) 28.68 87.5% DualPathNet68b 77.528 (22.472) 93.846 (6.154) 12.61 87.5% ResNet152 78.312 (21.688) 94.046 (5.954) 60.19 87.5% DualPathNet92 79.128 (20.872) 94.448 (5.552) 37.67 87.5% DualPathNet98 79.666 (20.334) 94.646 (5.354) 61.57 87.5% DualPathNet131 79.806 (20.194) 94.706 (5.294) 79.25 87.5% DualPathNet92 80.034 (19.966) 94.868 (5.132) 37.67 87.5% DualPathNet107 80.172 (19.828) 94.938 (5.062) 86.92 87.5% Results @299x299 Model Prec@1 (Err) Prec@5 (Err) Params Crop InceptionV3 77.436 (22.564) 93.476 (6.524) 27.16 87.5% DualPathNet68 78.006 (21.994) 94.158 (5.842) 12.61 100% DualPathNet68b 78.582 (21.418) 94.470 (5.530) 12.61 100% InceptionV4 80.138 (19.862) 95.010 (4.99) 42.68 100% DualPathNet92 80.408 (19.592) 95.190 (4.810) 37.67 100% DualPathNet92 80.480 (19.520) 95.192 (4.808) 37.67 100% InceptionResnetV2 80.492 (19.508) 95.270 (4.730) 55.85 100% DualPathNet98 81.062 (18.938) 95.404 (4.596) 61.57 100% DualPathNet131 81.208 (18.792) 95.630 (4.370) 79.25 100% DualPathNet107 81.432 (18.568) 95.706 (4.294) 86.92 100% Results @320x320 Model Prec@1 (Err) Prec@5 (Err) Params Crop DualPathNet68 78.450 (21.550) 94.358 (5.642) 12.61 100% DualPathNet68b 78.764 (21.236) 94.726 (5.274) 12.61 100% DualPathNet92 80.824 (19.176) 95.570 (4.430) 37.67 100% DualPathNet92 80.960 (19.040) 95.500 (4.500) 37.67 100% DualPathNet98 81.276 (18.724) 95.666 (4.334) 61.57 100% DualPathNet131 81.458 (18.542) 95.786 (4.214) 79.25 100% DualPathNet107 81.800 (18.200) 95.910 (4.090) 86.92 100%",Image Classification,Image Classification 2303,Computer Vision,Computer Vision,Computer Vision,"Description This is a quick implementation of the DenseNet model described in the paper Densely Connected Convolutional Networks by Huang et al. ( arXiv ) It has only been tested on the Cifar 10 dataset without data augmentation, but it should work fine on any dataset. Getting started The basic model is defined in DenseNet.py . The scipt cifar10_densenet_classification.py provides an example on how to create and use the model on Cifar 10 classification. Finally, utils.py contains a few helper functions. Prerequisites Keras (> 2) (only tested with Tensorflow backend) numpy (> 1.13) Results Below are the results of running both DenseNet and DenseNet BC models on Cifar 10 dataset with the same hyperparameters and optimization techniques as in the original paper. DenseNet (L 40, k 12) ! DenseNet_loss (/results/DenseNet_loss.png) ! DenseNet_accuracy (/results/DenseNet_accuracy.png) DenseNet BC (L 100, k 12) ! DenseNet BC_loss (/results/DenseNet BC_loss.png) ! DenseNet BC_accuracy (/results/DenseNet BC_accuracy.png) TODO Add data augmentation techniques Try different architectures Try other optimizers, eg Adam Try out transfer learning on ImageNet",Image Classification,Image Classification 2313,Computer Vision,Computer Vision,Computer Vision,"Machine Learning Engineer Nanodegree Capstone Proposal 2018 04 01 Proposal Domain Background Years ago I read an article about hand written digit recognition. I was amazed. It was the first time I have heard about Machine Learning. Now, years later I’d like to train my own Machine Learning algorithm to classify hand written mathematical symbols (HASYv2 dataset), especially because I studied mathematics. Handwriting recognition (HWR) belongs to the domain of image processing. One of the first researchers who was active in this field was Sheloa Guberman 1962. The business application capabilities of this topic are incredible . The best results in this domain have been achieved on the MNIST database. It is probably the most famous dataset and therefore the most influential data set in this research area. In this capstone project we won’t use this dataset but it’s essential to mention it. Furthermore, I suggest to the interested reader to look at Street View Imagery . Problem Statement Handwriting recognition is a classical classification problem. In our case we have a photo with a mathematical symbol and want to know which it is. In this capstone project will have to handle 369 classes (symbols). With the aid of 168233 labeled training data we will use a supervised learning approach to solve this problem. I will use several learners like the Deep Neural Networks Convolutional Neural Networks and compare the prediction accuracy. We are faced with challenges because Number of training data per class varies Mathematical symbols look complex and sometimes similar Large amount of classes. Datasets and Inputs For this project we will use data from the free available HASYv2 dataset 1 . It contains 168233 labeled images of 369 handwritten mathematical symbols. The images have a 32 x 32 px resolution. So there are 32 x 32 1024 features in 0,255 . We will use One Hot encoded labels to train the learners. It’s important to remark that the number of training data per class varies as mentioned above. In addition, the HASYv2 dataset is designated to be trained by a 10 fold cross validation. Solution Statement As already mentioned we will use supervised training algorithms to classify the symbols. We will use labeled training data to fit our learners. To be able to guarantee replicability we will train the algorithms with set RandomSates. We will train the Deep Neural Networks (DNN) with some (1,2,3,4,5) hidden, fully connected layers. The DNN will have 1024 input neurons and 369 output neurons. We will use the sigmoid as activation function. Certainly the CNN will have 1024 input neurons and 369 output neurons too. But the architecture of the hidden layer differs. We will use some sequence of convolutional and pooling layers. Towards the training process recognition proceeds fast. One has to put a picture into the classifier. Each of the 369 output neurons will return a value in 0,1 . We will take the neuron with the highest output value and choose the symbol corresponding to the neuron. Benchmark Model In the capstone project we will choose two different benchmark approaches . On the one hand we will compare the presented learners among themselves. On the other hand I will compare my results with domain related results 1 . Project Design Programming : Python 3 Libraries : pandas, sklearn, keras, ggplot, numpy, pyplot Data : HASYv2 dataset Sequence of Work : Import and preprocess data Get an overview Generate dummy variables Split Data (k fold cross validation ) Define Neural Network Architecture Train the DNN and CNN Validate Test Generate visualization, diagrams, Confusion Matrix Clean code Let’ s describe the data structure in detail. Here we can see the head of the csv file in detail. path symbol_id latex user_id 0 hasy data/v2 00000.png 31 A 50 1 hasy data/v2 00001.png 31 A 10 2 hasy data/v2 00002.png 31 A 43 3 hasy data/v2 00003.png 31 A 43 4 hasy data/v2 00004.png 31 A 4435 Each row of the csv represents a hand drawn mathematical symbol. The path feature shows where the corresponding picture is saved. In the words of relational algebra the features symbol_id and latex are functional dependent. Both label the symbols. The feature user_id represents the creator of the symbol mentioned not to be reliable 1 . 1 Thoma Martin, The HASYv2 dataset , arXiv:1701.08380v1, 2017",Image Classification,Image Classification 2317,Computer Vision,Computer Vision,Computer Vision,"Copyright (C) 2016 Sergey Demyanov contact: my_name@my_sirname.net You can also find and use my WorkLab for Tensorflow . This toolbox has been written as a part of my PhD project. It contains the implementation of convolitional neural nets for Matlab, written on C++ and CUDA. The most of the kernels are taken from CUDNN v5 library, others are written manually. Therefore CUDNN, v5 or higher is required. It also contain the implementation of Invariant Backpropagation (IBP) and Adversarial Training (AT) algorithms. GENERAL INFORMATION Convolutional neural network is a type of deep learning classification and segmentation algorithms, which can learn useful features from raw data by themselves. Learning is performed by tuning its weights. CNNs consist of several layers, which are usually convolutional and subsampling layers following each other. Convolutional layer performs filtering of its input with a small matrix of weights and applies some non linear function to the result. Subsampling layer does not contain weights and simply reduces the size of its input by averaging of max pooling operation. The number of channels on the last layer should coincide with the number of classes. If used for classification, the height and width of the last layer output should be 1. Learning process consists of 2 steps: forward and backward passes, which are conducted for all objects in a training set. On the forward pass each layer transforms the output of the previous layer according to its function. The output of the last layer is compared with the label values and the loss function is computed. On the backward pass the derivatives of loss function with respect to outputs are consecutively computed from the last layer to the first, together with the derivatives with respect to weights. After that the weights are changed in the direction which decreases the value of the loss function. This process is performed for a batch of objects simultaneously, in order to decrease the sample bias. Processing of all objects in the dataset is called the epoch. Usually training consists of many epochs, conducted with different batch splits. DESCRIPTION The toolbox was written for Matlab and its functions can be called only from Matlab scripts. The toolbox requires a Cuda capable GPU. The toolbox DOES NOT REQUIRE Parallel Computing Toolbox as MatConvNet, but you can import and use pretrained MatConvNet models. The toolbox operates with 4 dimensional tensors with incides corresponding to height(H), width(W), channel(C) and number(N). Labels should also be 4 dimensional tensors. If used for classification, labels should have height 1 and width 1. Before passing to c++ code the height and width dimensions are permuted, so the layout becomes NCHW (N is the slowest index). Same layout is used for weights everywhere. For speedup purposes weights are passed and returned as a long vector or stretched and concatenated weights from all layers. Use functions weights getweights(layers) and layers setweights(layers, weights) to obtain the vector and assign it back to layers. The toolbox contains 3 main functions to call: weights genweights(layers, params) Returns randomly generated initial weights for the net. Has to be called before the training. weights, trainerr train(layers, weights, params, train_x, train_y) Performs neural net training. Returns the set of updated weights and values of the main and additional loss functions. err, bad, pred test(layers, weights, params, test_x, test_y) Returns predictions and calculates the test error. LAYERS Define the structure of CNN. Sets up as cell array, with each element representing an independent layer. Currently 6 layer types are implemented: input input layer. Must be the first and only the first one. Must contain the mapsize field, that is a vector with 2 integer values, representing the objects size (height and width). May also contain the following additional fields: 1) 'channels' that specifies the number of data channels, if it differs from 1. jitt jittering layer. Performs affine transformations of the image. With the default parameters performs central cropping. Must have the parameter 'mapsize'. Other possible parameters are: 1) 'shift' specifies the maximum shift of the image in each dimension, 2) 'scale' specifies the maximum scale in each dimension. Must be more than 1. The image scales with the random factors from 1/x x . 3) 'mirror' determines if the image might be mirrored (1) in a particular dimension or not (0). 4) 'angle' scalar, that specifies the maximum angle of rotation. Must be from 0, 1 . The value 1 corresponds to 180 degrees. 5) 'defval' specifies the value that is used when the transformed image lies outside the borders of the original image. If this value is not specified, the transformed value should be always inside the original one, otherwise there will be an error. On the test stage the images are just centrally cropped to the size 'mapsize', like there were no additional parameters. conv convolutional layer. Must contain the filtersize field, that identifies the filter size. Must also contain the channels field, which is the number of output channels. If the previous layer has m maps and the current one has n maps, the total number of filters on it is m n. Despite that it is called convolutional, it performs filtering, that is a convolution operation with flipped dimensions. deconv reverse convolutional layer. Must contain the same fields as the convolutional layer. On the forward pass performs the same operation as performed on the backward pass of the conv layer, and otherwise. Therefore, instead of scaling the dimensions by a factor of stride it multiplies them on stride . pool pooling layer. The pooling type is specified by pooling field, which can be eigther max or avg . Default value is max . Must contain the scale and stride fields, which are the vectors with 2 integer values. full fully connected layer. Produces a tensor with height 1 and width 1. Must contain the channels field, which defines the number of output channels. Considers its input as a single vector. Additionally, all layers might have the following parameters: function defines the non linear transformation function. It can be relu , sigm , soft or none , which correspond to rectified linear unit, sigmoid, softmax or no transformation respectively. The default value is relu . The value soft must be used only on the last layer. padding a 2 dimensional vector of non negative integers. Considered by conv , deconv and pool layers. Determines the number of zero padding rows (columns) on the top and bottom (padding 0 ) and left and right (padding 1 ). stride a 2 dimensional vector of non negative integers. Considered by conv , deconv and pool layers. Determines the distance between the positions of applied kernels in vertical and horizontal dimensions. init_std the standard deviation of normal distribution that is used to generate the weights. When is not defined, init_std sqrt(2/(h w m)), where 'h' and 'w' is the filter size and 'm' is the number of input channels. Considered by all layers with weights. add_bias whether the layer should add bias to the output or not. The length of the bias vector is equal to the number of output channels. Considered by all layers. Default is true for all layers with weights, false for others. bias_coef the multiplier for the bias learning rate. Default is 1. lr_coef the multiplier for the learning rate on this layer, both weights and biases. Considered by all layers. Set it to 0 to fix some layers. dropout a scalar from 0, 1), which determines the probability of dropping the activations on this layer. Should not be too large, otherwise it drops everything. PARAMS Define the learning process. It is a structure with the fields described below. seed any integer, which allows to repeat the same random numbers. Default is 0. Note that if conv , deconv or pool layers are used, the results are not guaranteed to be exactly the same even if the same seed is used. For more details read CUDNN User Guide. batchsize defines the size of batches. Default is 32. epochs the number of repeats the training procedure with different batch splits. Default is 1. alpha defines the learning rate. Default is 1. beta defines the invariant learning rate (see the article ). The value '0' corresponds to the standard backpropagation algorithm. Default is 0. shift defines the shift in the Adversarial Training algorithm (see the article ). The value '0' corresponds to the standard backpropagation algorithm. Default is 0. normfun defines the type of norm used as second loss function in IBP or used to generate adversarial examples in AT. Default is 1. momentum defines the actual direction of weight change according to the formula m dp + (1 m) d, where m is momentum, dp is the previous change and d is the current derivative. Default is 0. decay defines the weight decay, i.e. every update all weights are multiplied on (1 decay). lossfun string. Specifies the employed loss function. Must be eigher squared or logreg , that correspond to sum of squared differences and negative log likelihood respectively. If you use logreg , it is better to use softmax nonlinear function on the last layer and reduce the learning rate about 10 times. The default value is logreg . shuffle determines whether the input dataset will be shuffled or not. If it is set to 0, the batches are created in a natural order: first batchsize objects become the first batch and so on. Otherwise, it should be 1. Default is 0. verbose determines output info during learning. For 0 there is no output, for 1 it prints only number of current epoch, for 2 it prints both numbers of epoch and batch. Default is 0. memory determines the maximum number of megabytes of GPU memory allocated as a workspace for convolutional operations. Default is 512. gpu allows to specify the index of gpu device to work on. Default is 0. classcoefs allow to specify coefficients of class importance, for example if the dataset is unbalanced. Should be a vector of 1xN, where N is the number of classes. By default all coefficients are 1. Recommended class coefficients for an unbalanced dataset are c i sum i N n i / (n i N). COMPILATION If you cannot use the provided binaries, you need to compile them by yourself. The compilation options are defined in the file settings.h . They are: PRECISION . Might have two values: 1 single, uses type 'float'. 2 double, uses type 'double'. The second version has not been tested. PRECISION_EPS . Equal to 1e 6 by default. For consistency purposes all values that are less than it are assigned to 0. COMPILATION Linux adjust the paths in the './c++/Makefile' file and run make . That should be enough. Windows has been tested long time ago. 1) Using 'compile' script. While CPU compilation is easy, the GPU compilation is tricky and might take some efforts to do it. First of all, run 'mex setup' in order to check that you have a proper C++ compiler. If not, install it. You need either a full version of Visual Studio or an express version with Microsoft SDK, that are free. Of course, you need to install CUDA as well. Download it from NVIDIA site. The CUDA settings for 'mex' are located in file with the name like mex_CUDA_win64.xml . Read more on the MathWorks website . You must have this file in your Matlab folder. The one that works for me is located in ./c++/cuda folder. Adjust your Microsoft SDK and CUDA folders, CUDA computation capability and other options there. Make sure you have proper values of environment variables 'CUDA_PATH' and 'VS100COMNTOOLS'. You can do it using functions 'getenv' and 'setenv'. If you don't do it, you might get an error No supported compiler or SDK was found . You might also get an error about the file 'vcvars64.bat'. In this case use the one that is located in ./c++/cuda folder. Adjust the path in it as well. After that you should be able to compile. 2) Using Visual Studio project. This is a project to compile 'cnntrain_mex'. Add all '.h', '.cpp' and '.cu' files, adjust paths in Include and Libraries fields, and enjoy incremental compilation every time you change just one single file. Create similar project with the same settings to compile 'classify' and 'genweights'. LOADING PRETRAINED WEIGHTS It is possible to use pretrained models from MatConvNet . Given that you reconstruct the same architecture, you can use the function 'import_weights.m' to load the pretrained weights to the network. An example for fully convolutional network is provided. EXAMPLES mnist.m provides an example of training a convolutional network on MNIST dataset. The error after 5 epochs should be close to 1%. fcn_test.m provides an example of loading pretrained weights from MatConvNet and segmenting the images. On the provided test set, which is smaller than the original PASCAL test set, the results should be (meanIU 0.5188, pixelAccuracy 0.8766, meanAccuracy 0.6574). This is because one of the classes is not presented, so its IU is 0. KNOWN ERRORS When you change gpu index, the first time it might fail. Just run it again.",Image Classification,Image Classification 2320,Computer Vision,Computer Vision,Computer Vision,"densenet This project includes: densenet.py, a keras implementation of DenseNet. cifar10 test.py, code to train the model on Cifar 10, also to test the trained model. vis_densenet.py, a demo to visualize the layers of the model, using keras vis. testnet.py, a vgg like model, a convnet model with bottle neck layers and densenet style skip connections. References 1. Gao Huang, Zhuang Liu and K. Weinberger. Densely Connected Convolutional Networks. Arxiv. Github. 2. DenseNet Other implementations , especially, copied Christopher Masch's code and fixed some issues for aligning to the original DenseNet paper.",Image Classification,Image Classification 2330,Computer Vision,Computer Vision,Computer Vision,Neural Network Implementations Contains: Implementation of IndRNN in Tensorflow as describes in the following paper: Implementation of Squeeze Excite Network on ResNet50 with MNIST dataset :,Image Classification,Image Classification 2331,Computer Vision,Computer Vision,Computer Vision,Neural Network Implementations Contains: Implementation of IndRNN in Tensorflow as describes in the following paper: Implementation of Squeeze Excite Network on ResNet50 with MNIST dataset :,Image Classification,Image Classification 2339,Computer Vision,Computer Vision,Computer Vision,"RandWireNN Caveat: Currently, this repo have critical issue on random graph generation. (See 8 ) This will be fixed, and experimented again. Since then, use JiaminRen's implementation . PWC Unofficial PyTorch Implementation of: Exploring Randomly Wired Neural Networks for Image Recognition . ! (./assets/teaser.png) Results Validation result on Imagenet(ILSVRC2012) dataset: Top 1 accuracy (%) Paper Here RandWire WS(4, 0.75), C 78 74.7 63.0 (2019.04.14) 62.6%: 396k steps with SGD optimizer, lr 0.1, momentum 0.9, weigth decay 5e 5, lr decay about 0.1 at 300k (2019.04.12) 62.6%: 416k steps with Adabound optimizer, initial lr 0.001(decayed about 0.1 at 300k), final lr 0.1, no weight decay JiaminRen's implementation reached accuarcy which is almost close to paper, using identical training strategy with paper. (2019.04.10) 63.0%: 450k steps with Adam optimizer, initial lr 0.001, lr decay about 0.1 for every 150k step (2019.04.07) 56.8%: Training took about 16 hours on AWS p3.2xlarge(NVIDIA V100). 120k steps were done in total, and Adam optimizer with lr 0.001, batch_size 128 was used with no learning rate decay. ! (./assets/train overall.png) Orange: Adam Blue: AdaBound Red: SGD Dependencies This code was tested on Python 3.6 with PyTorch 1.0.1. Other packages can be installed by: bash pip install r requirements.txt Generate random DAG bash cd model/graphs python er.py p 0.2 o er 02.txt Erdos Renyi python ba.py m 7 o ba 7.txt Barbasi Albert python ws.py k 4 p 0.75 ws 4 075.txt Watts Strogatz number of nodes: n option All outputs from commands shown above will produce txt file like: (number of nodes) (number of edges) (lines, each line representing edges) Train RandWireNN 1. Download ImageNet dataset. Train/val folder should contain list of 1,000 directories, each containing list of images for corresponding category. For validation image files, this script can be useful: 1. Edit config.yaml bash cd config cp default.yaml config.yaml vim config.yaml specify data directory, graph txt files 1. Train Note. Validation performed here won't use entire test set, since it will consume much time. (about 3 min.) python trainer.py c config yaml m name 1. View tensorboardX tensorboard logdir ./logs Validation Run full validation: bash python validation.py c config path p checkpoint path This will show accuracy and average test loss of the trained model. Author Seungwon Park / @seungwonpark License Apache License 2.0",Image Classification,Image Classification 2346,Computer Vision,Computer Vision,Computer Vision,"Pytorch Stochastic Depth Resnet Pytorch Implementation of Deep Networks with Stochastic Depth Original torch implementation: Speed up resnet training process around 1.66x How to use? For linear decay probability from TYY_stodepth_lineardecay import testing : out self.prob out + identity net resnet18_StoDepth_lineardecay(pretrained True, prob_0_L 1,0.5 , multFlag True) testing : out out + identity net resnet18_StoDepth_lineardecay(pretrained True, prob_0_L 1,0.5 , multFlag False) For uniform probability from TYY_stodepth_lineardecay import testing : out self.prob out + identity net resnet18_StoDepth_lineardecay(pretrained True, prob_0_L 0.5,0.5 , multFlag True) testing : out out + identity net resnet18_StoDepth_lineardecay(pretrained True, prob_0_L 0.5,0.5 , multFlag False) Something you should know The original paper uses the following equation in testing. out self.prob out + identity However, I found that sometimes it could cause performance degradation. Change multFlag to False if you dont want to multiply probability on the testing output.",Image Classification,Image Classification 2348,Computer Vision,Computer Vision,Computer Vision,"MSc Data Science Thesis Vector Capsule Network for wild animal species recognition in Camera trap images Abstract The most critical task in Ecology is observing and studying the wild animals in their natural habitat. Camera traps are the most helpful tools for ecologists and researchers for effective and constant monitoring of wild animals. Monitoring wildlife using camera traps is necessary as it helps researchers to understand human impacts on wildlife, and also help management to make better decisions. However, annotating a large amount of camera trap data is time consuming. Hence, auto tagging of the wild animal species in camera traps is of great importance, and its gaining speed as more and more deep learning solutions are being implemented on a variety of problems. One such computer assisted system could be an auto tagging tool which can classify diverse animal species in real time. This work is an attept to present a new and novel deep neural networks called 'Capsule Network' as a solution to the problem of classifying wild animal species. Capsule Network is the latest development in Artificial Intelligence and is a new and exciting approach to computer vision. It has shown excellent results on various image recognition problems as per the research published so far. The work further tests the robustness of the newly introduced 'Capsule Network,' by comparing its performance with the Convolutional Neural Network. The results achieved indicate that the Capsule Networks are better at understanding images than ConvNets. Hypothesis The camera trap images of wild animals are the kind of images where Convolutional Neural Networks may have a problem, because of species exhibiting a variety of poses and therefore, Capsule Networks can be expected to outperform Convolutional Neural Networks. Capsule Network has obtained state of the art results on the dataset with the objects with affine transformations , and can, therefore, be expected to perform better than Convolutional Neural Networks on camera trap images. Research Question Is Capsule Network better than Convolutional Neural Network in classifying wild animal species in camera trap images? Dataset An annotated camera trap dataset of 20 species commonly found in North America is used for training the model. The dataset contains 15826 images of 20 species namely Agouti, Bird spec, Coiban Agouti, Collared Peccary, Common Opossum, European Hare, Great Tinamou, Mouflon, Ocelot, Paca, Red Brocket Deer, Red Deer, Red Fox, Red Squirrel, Roe Deer, Spiny Rat, White Tailed Deer, White nosed Coati, Wild Boar, and Wood Mouse. The dataset contains a collection of gray scale and color images. The images captured at night are in gray scales and images captured in daytime are in colored. Every image contains only one species out of 20 species. 80% of the dataset, i.e., 12660 images is used for training and the rest 20% for testing. Link to Dataset Repo or ! (Images/Page_00.png) ! (Images/Page_01.png) Convolutional Neural Network architecture used for wild animal species classification. The architecture consists of two convolution layer with ReLU activation function and followed by a pooling layer with a dropout of 50%, and a final layer, a fully connected layer. INPUT > CONV1 > ReLU > CONV2 > ReLU > POOL > DROPOUT > FC > ReLU > DROPOUT > FC > SOFTMAX ! (Images/CNNArch.png) CNN Specification ! (Images/CNNSpec.png) Capsule Network Architecture used for wild animal species classification. The architecture is similar to the one presented in paper (Hinton,Sabour and Frosst, 2017, INPUT > CONV1 > ReLU >PRIMARYCAPS > SPECIESCAPS > FC > FC ! (Images/CapsNetArch.png) CapsNet Specification ! (Images/CapsnetSpec.png)",Image Classification,Image Classification 2349,Computer Vision,Computer Vision,Computer Vision,"MSc Data Science Thesis Vector Capsule Network for wild animal species recognition in Camera trap images Abstract The most critical task in Ecology is observing and studying the wild animals in their natural habitat. Camera traps are the most helpful tools for ecologists and researchers for effective and constant monitoring of wild animals. Monitoring wildlife using camera traps is necessary as it helps researchers to understand human impacts on wildlife, and also help management to make better decisions. However, annotating a large amount of camera trap data is time consuming. Hence, auto tagging of the wild animal species in camera traps is of great importance, and its gaining speed as more and more deep learning solutions are being implemented on a variety of problems. One such computer assisted system could be an auto tagging tool which can classify diverse animal species in real time. This work is an attept to present a new and novel deep neural networks called 'Capsule Network' as a solution to the problem of classifying wild animal species. Capsule Network is the latest development in Artificial Intelligence and is a new and exciting approach to computer vision. It has shown excellent results on various image recognition problems as per the research published so far. The work further tests the robustness of the newly introduced 'Capsule Network,' by comparing its performance with the Convolutional Neural Network. The results achieved indicate that the Capsule Networks are better at understanding images than ConvNets. Hypothesis The camera trap images of wild animals are the kind of images where Convolutional Neural Networks may have a problem, because of species exhibiting a variety of poses and therefore, Capsule Networks can be expected to outperform Convolutional Neural Networks. Capsule Network has obtained state of the art results on the dataset with the objects with affine transformations , and can, therefore, be expected to perform better than Convolutional Neural Networks on camera trap images. Research Question Is Capsule Network better than Convolutional Neural Network in classifying wild animal species in camera trap images? Dataset An annotated camera trap dataset of 20 species commonly found in North America is used for training the model. The dataset contains 15826 images of 20 species namely Agouti, Bird spec, Coiban Agouti, Collared Peccary, Common Opossum, European Hare, Great Tinamou, Mouflon, Ocelot, Paca, Red Brocket Deer, Red Deer, Red Fox, Red Squirrel, Roe Deer, Spiny Rat, White Tailed Deer, White nosed Coati, Wild Boar, and Wood Mouse. The dataset contains a collection of gray scale and color images. The images captured at night are in gray scales and images captured in daytime are in colored. Every image contains only one species out of 20 species. 80% of the dataset, i.e., 12660 images is used for training and the rest 20% for testing. Link to Dataset Repo or ! (Images/Page_00.png) ! (Images/Page_01.png) Convolutional Neural Network architecture used for wild animal species classification. The architecture consists of two convolution layer with ReLU activation function and followed by a pooling layer with a dropout of 50%, and a final layer, a fully connected layer. INPUT > CONV1 > ReLU > CONV2 > ReLU > POOL > DROPOUT > FC > ReLU > DROPOUT > FC > SOFTMAX ! (Images/CNNArch.png) CNN Specification ! (Images/CNNSpec.png) Capsule Network Architecture used for wild animal species classification. The architecture is similar to the one presented in paper (Hinton,Sabour and Frosst, 2017, INPUT > CONV1 > ReLU >PRIMARYCAPS > SPECIESCAPS > FC > FC ! (Images/CapsNetArch.png) CapsNet Specification ! (Images/CapsnetSpec.png)",Image Classification,Image Classification 2351,Computer Vision,Computer Vision,Computer Vision,CapsNet A simple implementation of Dynamic Routing Between Capsules in Pytorch Shashank Manjunath 25 September 2018 A simple implementation of a capsule network. Run python capsule_network.py to run the network Credit to Kenta Iwasaki for the margin loss code.,Image Classification,Image Classification 2364,Computer Vision,Computer Vision,Computer Vision,"MobileNet Caffe Introduction This is a Caffe implementation of Google's MobileNets (v1 and v2). For details, please read the following papers: v1 MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications v2 Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation Pretrained Models on ImageNet We provide pretrained MobileNet models on ImageNet, which achieve slightly better accuracy rates than the original ones reported in the paper. The top 1/5 accuracy rates by using single center crop (crop size: 224x224, image size: 256xN): Network Top 1 Top 5 sha256sum Architecture : : : : : : : : : : MobileNet v1 70.81 89.85 8d6edcd3 (16.2 MB) netscope , netron MobileNet v2 71.90 90.49 a3124ce7 (13.5 MB) netscope , netron Evaluate Models with a single image Evaluate MobileNet v1: python eval_image.py proto mobilenet_deploy.prototxt model mobilenet.caffemodel image ./cat.jpg Expected Outputs: 0.42 'n02123159 tiger cat' 0.08 'n02119022 red fox, Vulpes vulpes' 0.07 'n02119789 kit fox, Vulpes macrotis' 0.06 'n02113023 Pembroke, Pembroke Welsh corgi' 0.06 'n02123045 tabby, tabby cat' Evaluate MobileNet v2: python eval_image.py proto mobilenet_v2_deploy.prototxt model mobilenet_v2.caffemodel image ./cat.jpg Expected Outputs: 0.26 'n02123159 tiger cat' 0.22 'n02124075 Egyptian cat' 0.15 'n02123045 tabby, tabby cat' 0.04 'n02119022 red fox, Vulpes vulpes' 0.02 'n02326432 hare' Finetuning on your own data Modify deploy.prototxt and save it as your train.prototxt as follows: Remove the first 5 input / input_dim lines, and add Image Data layer in the beginning like this: layer { name: data type: ImageData top: data top: label include { phase: TRAIN } transform_param { scale: 0.017 mirror: true crop_size: 224 mean_value: 103.94, 116.78, 123.68 } image_data_param { source: your_list_train_txt batch_size: 32 your batch size new_height: 256 new_width: 256 root_folder: your_path_to_training_data_folder } } Remove the last prob layer, and add Loss and Accuracy layers in the end like this: layer { name: loss type: SoftmaxWithLoss bottom: fc7 bottom: label top: loss } layer { name: top1/acc type: Accuracy bottom: fc7 bottom: label top: top1/acc include { phase: TEST } } layer { name: top5/acc type: Accuracy bottom: fc7 bottom: label top: top5/acc include { phase: TEST } accuracy_param { top_k: 5 } } Related Projects MobileNet in this repo has been used in the following projects, we recommend you to take a look: The MobileNet neural network using Apple's new CoreML framework hollance/MobileNet CoreML Mobile deep learning baidu/mobile deep learning Receptive Field Block Net for Accurate and Fast Object Detection ruinmessi/RFBNet Depthwise Convolutional Layer yonghenglh6/DepthwiseConvolution MobileNet MXNet KeyKy/mobilenet mxnet Caffe2 MobileNet camel007/caffe2 mobilenet Updates (Feb. 5, 2018) Add pretrained MobileNet v2 models (including deploy.prototxt and weights) Hold pretrained weights in this repo Add sha256sum code for pretrained weights Add some code snippets for single image evaluation Uncomment engine: CAFFE used in mobilenet_deploy.prototxt Add params ( lr_mult and decay_mult ) for Scale layers of mobilenet_deploy.prototxt Add prob layer for mobilenet_deploy.prototxt",Image Classification,Image Classification 2374,Computer Vision,Computer Vision,Computer Vision,"LSUV.pytorch Implementation of Layer sequential unit variance (LSUV) initialization which is proposed by All you need is a good init in PyTorch. Requirements PyTorch 1.0+ How to use python from lsuv import lsuv_init model lsuv_init(ResNet34(), train_loader, needed_std 1.0, std_tol 0.1, max_attempts 10, do_orthonorm True, device device) Reference Mishkin, Dmytro, and Jiri Matas. All you need is a good init. arXiv preprint arXiv:1511.06422 (2015).",Image Classification,Image Classification 2383,Computer Vision,Computer Vision,Computer Vision,"TinyYOLOv2 in Tensorflow made easier What you can do with this code Extract weights from binary file of the original yolo v2, assign them to a TF network, save ckpt, perform detection on an input image or webcam What you CANNOT do with this code Train in any way YOLOv2 for any dataset Description I've been searching for a Tensorflow implementation of YOLOv2 for a while but the darknet version and derivatives are not really easy to understand. This one is an hopefully easier to understand version of Tiny YOLOv2. The weight extraction, weights structure, weight assignment, network, inference and postprocessing are made as simple as possible. The output of this implementation on the test image dog.jpg is the following: ! alt text Just to be clear, this implementation is called tiny yolo voc on pjreddie's site and can be found here: ! alt text This is a specific implementation of tiny yolo voc but the code could be re used to import other configurations! You will need to change the network architecture and hyperparameters according to the cfg file you want to use. The code is organized in this way: weights_loader.py : loads the weights from pjreddie's binary weights file into the tensorflow network and saves the ckpt net.py : contains the definition of the Tiny YOLOv2 network as defined in pjreddie's cfg file test.py : performs detection on an input_image that you can define in the main. Outputs the input_image with B Boxes test_webcam.py: performs detection on the webcam. It is exactly like test.py but some functions are slightly modified to take directly the frames from the webcam as inputs (instead of the image_path). To use this code: Clone the project and place it where you want Download the binary file (60MB) from pjreddie's site: and place it into the folder where the scripts are Launch test.py or test_webcam.py. Change the input_img_path and the weights_path in the main if you want, now the network has dog.jpg as input_img. The code is now configured to run with weights and input image in the same folder as the script. python python3 test.py If you are launching them for the first time, the weights will be extracted from the binary file and a ckpt will be created. Next time only the ckpt will be used! Requirements: I've implemented everything with Tensorflow 1.0, Ubuntu 16.04, Numpy 1.13.0, Python 3.4, OpenCV 3.0 How to use the binary weights file ( Only if you want to use it in other projects, here it is already done ) I've been struggling on understanding how the binary weights file was written. I hope to save you some time by explaining how I imported the weights into a Tensorflow network: Download the binary file from pjreddie's site: Extract the weights from binary to a numpy float32 array with weight_array np.fromfile(weights_path, dtype 'f') Delete the first 4 numbers because they are not relevant Define a function ( load_conv_layer ) to take a part of the array and assign it to the Tensorflow variables of the net IMPORTANT: the weights order is 'biases','gamma','moving_mean','moving_variance','kernel' IMPORTANT: the 'biases' here refer to the beta value of the Batch Normalization. It does not refer to the biases that must be added after the conv2d because they are set all to zero! ( According to the paper by Ioffe et al. ) IMPORTANT: the kernel weights are written in Caffe style which means they have shape (out_dim, in_dim, height, width). They must be converted into Tensorflow style which has shape (height, width, in_dim, out_dim) IMPORTANT: in order to obtain the correct results from the weights they need to be DENORMALIZED according to Batch Normalization. It can be done in two ways: define the network with Batch Normalization and use the weights as they are OR define the net without BN ( this implementation ) and DENORMALIZE the weights. ( details are in weights_loader.py ) In order to verify that the weights extraction is succesfull, I check the total number of params with the number of weights into the weight file. They are both 15867885 in my case. How to postprocess the predictions ( Only if you want to use it in other projects, here it is already done ) Another key point is how the predictions tensor is made. It is a 13x13x125 tensor. To process it better: Convert the tensor to have shape 13x13x5x25 grid_cells x n_boxes_in_each_cell x n_predictions_for_each_box The 25 predictions are: 2 coordinates and 2 shape values (x,y,h,w), 1 Objectness score, 20 Class scores Now access to the tensor in an easy way! E.g. predictions row, col, b, :4 will return the 2 coords and shape of the b B Box which is in the row,col grid cell They must be postprocessed according to the parametrization of YOLOv2. In my implementation it is made like this: python Pre defined anchors shapes! They are not coordinates of the boxes, they are height and width of the 5 anchors defined by YOLOv2 anchors 1.08,1.19, 3.42,4.41, 6.63,11.38, 9.42,5.11, 16.62,10.52 image_height image_width 416 n_grid_cells 13 n_b_boxes 5 for row in range(n_grid_cells): for col in range(n_grid_cells): for b in range(n_b_boxes): tx, ty, tw, th, tc predictions row, col, b, :5 IMPORTANT: (416) / (13) 32! The coordinates and shape values are parametrized w.r.t center of the grid cell They are parameterized to be in 0,1 so easier for the network to predict and learn With the iterations on every grid cell at row,col they return to their original positions The x,y coordinates are: (pre defined coordinates of the grid cell row,col + parametrized offset) 32 center_x (float(col) + sigmoid(tx)) 32.0 center_y (float(row) + sigmoid(ty)) 32.0 Also the width and height must return to the original value by looking at the shape of the anchors roi_w np.exp(tw) anchors 2 b + 0 32.0 roi_h np.exp(th) anchors 2 b + 1 32.0 Compute the final objectness score (confidence that there is an object in the B Box) final_confidence sigmoid(tc) class_predictions predictions row, col, b, 5: class_predictions softmax(class_predictions) YOLOv2 predicts parametrized values that must be converted to full size by multiplying them by 32! You can see other EQUIVALENT ways to do this but this one works fine. I've seen someone who, instead of multiplying by 32, divides by 13 and then multiplies by 416 which at the end equals a single multiplication by 32. Notes The code runs at 15fps on my laptop which has a 2GB Nvidia GeForce GTX 960M GPU This implementation does not have the training part If you have questions or suggestions do not wait! I'm looking forward to help",Image Classification,Image Classification 2389,Computer Vision,Computer Vision,Computer Vision,"Hierarchical Deep CNN Hierarchical Deep CNN for image recognition and labeling This program is designed around the concept proposed in the following paper HD CNN: Hierarchical Deep Convolutional Neural Networks for Large Scale Visual Recognition Zhicheng Yan, Hao Zhang, Robinson Piramuthu, Vignesh Jagadeesh, Dennis DeCoste, Wei Di, Yizhou Yu I also Benjamin Graham's paper Spatially Sparce Convolutional Neural Networks as a referrence quite frequently. This implementation uses Keras to design a hierarchical deep CNN using an NiN network structure and the Cifar100 dataset.",Image Classification,Image Classification 2406,Computer Vision,Computer Vision,Computer Vision,"FractalNet implementation in Keras Information I built this network as stated in the paper but fractals are done iterative instead of functional to avoid the extra complexity when merging the fractals. The Join layers are built with a shared indicator sampled from a binomial distribution to indicate if global or local drop path must be used. When local drop path is used, each Join layer samples it's own paths. But when global drop path is used, all the join layers share the same tensor randomly sampled so one of the columns is globally selected. Notes In the paper, they state that the last Join layer of each block is switched with the MaxPooling layer because of convenience. I don't do it and finish each block with a Join >MaxPooling but it should not affect the model. Also it's not clear how and where the Dropout should be used. I found an implementation of the network here by Larsson (one of the paper authors) and he adds it in each convolutional block (Convolution >Dropout >BatchNorm >ReLU). I implemented it the same way. For testing the deepest column, the network is built with all the columns but the indicator for global drop path is always set and the tensor with the paths is set to a constant array indicating which column is enabled. Model Model graph image of FractalNet(c 3, b 5) generated by Keras: link Experiments This results are from the experiments with the code published here. The authors of the paper have not yet released a complete implementation of the network as of the publishing of this so I can't say what's different from theirs code. Also there is no kind of standardization, scaling or normalization across the dataset in these raw tests (which they may have used). So far the results are promising when compared against Residual Networks. But I couldn't reproduce their deepest column experiment. The code here might have bugs too, if you find anything write me or submit a PR and I will rerun the tests. Test error (%) Method C10 C100 ResNet (reported by 1 ) 13.63 44.76 ResNet Stochastic Depth (reported by 1 ) 11.66 37.80 FractalNet (paper w/SGD) 10.18 35.34 FractalNet+dropout/drop path (paper w/SGD) 7.33 28.20 FractalNet+dropout/drop path (this w/SGD) 8.76 31.10 FractalNet+dropout/drop path (this w/Adam) 8.33 31.30 FractalNet+dropout/drop path/deepest column (paper w/SGD) 7.27 29.05 FractalNet+dropout/drop path/deepest column (this w/SGD) 12.53 43.07 FractalNet+dropout/drop path/deepest column (this w/Adam) 12.28 41.32 1 G. Huang, Y. Sun, Z. Liu, D. Sedra, and K. Weinberger. Deep networks with stochastic depth. arXiv preprint arXiv:1603.09382, 2016. CIFAR 10 Training as reported by the paper with SGD for 400 epochs starting with 0.02 learning rate and reducing it by 10x each time it reaches half of the remaining epochs (200, 300, 350, 375). Training with Adam is with default parameters. ! CIFAR 100 Trained with SGD (as with CIFAR 10) and Adam with default parameters: ! Paper arXiv: FractalNet: Ultra Deep Neural Networks without Residuals @article{larsson2016fractalnet, title {FractalNet: Ultra Deep Neural Networks without Residuals}, author {Larsson, Gustav and Maire, Michael and Shakhnarovich, Gregory}, journal {arXiv preprint arXiv:1605.07648}, year {2016} }",Image Classification,Image Classification 2417,Computer Vision,Computer Vision,Computer Vision,"Caffe YOLO License (LICENSE) This Repository combines multiple Deep Learning innovations together within the Caffe Framework. Caffe is developed by Berkeley AI Research ( BAIR )/The Berkeley Vision and Learning Center (BVLC) and community contributors. Project site Source code Further additions to this repo include Depthwise Separable Convolutions ShuffleNet Units YOLO object detection Open source implementations used for this work include caffe mobilenet ShuffleNet Darknet License and Citation Caffe is released under the BSD 2 Clause license . The BAIR/BVLC reference models are released for unrestricted use. Please cite Caffe in your publications if it helps your research: @article{jia2014caffe, Author {Jia, Yangqing and Shelhamer, Evan and Donahue, Jeff and Karayev, Sergey and Long, Jonathan and Girshick, Ross and Guadarrama, Sergio and Darrell, Trevor}, Journal {arXiv preprint arXiv:1408.5093}, Title {Caffe: Convolutional Architecture for Fast Feature Embedding}, Year {2014} }",Image Classification,Image Classification 2429,Computer Vision,Computer Vision,Computer Vision,"fashion_mnist (CONV3x3)x3 > MAXPOOL > (CONV3x3)x2 > MAXPOOL > FC(2048) > FC(512) > FC(256) > SOFTMAX(10) Test accuracy: 0.94769996 (1024 neurons in the first FC layer, BN after RELU) Test accuracy: 0.9482 (2048 neurons in the first FC layer, BN after RELU) Test accuracy: 0.9498 (2048 neurons in the first FC layer, BN BEFORE RELU) LeakyRelu BatchNorm before non linearity(Leaky RELU): 0.9466 BatchNorm after non linearity (Leaky RELU): 0.9448999 Test accuracy: 0.95009995 (PReLU only in FC) Test accuracy: 0.949 (LeakyRELU only in FC) (CONV3x3)x3 > MAXPOOL > (CONV3x3)x2 > CONV1x1 > MAXPOOL > DROPOUT > FC(2048) DROPOUT > FC(512) > DROPOUT > FC(256) > SOFTMAX(10) Test accuracy: 0.9483 Linear decay, relu activations (CONV3x3x64)x3 > MAXPOOL > DROPOUT(0.8) > (CONV3x3x128)x2 > CONV1x1x64 > MAXPOOL > DROPOUT(0.8) > FC(2048) > FC(512) > FC(256) > SOFTMAX(10) Test accuracy: 0.9531 200 iterations, batch_size 100, Test accuracy: 0.9526001 Linear decay, relu activations, 120 epochs (CONV3x3x64)x3 > MAXPOOL > DROPOUT(0.7) > (CONV3x3x128)x2 > CONV1x1x64 > MAXPOOL > DROPOUT(0.7) > FC(2048) > FC(512) > FC(256) > SOFTMAX(10) Test accuracy: 0.95350003 150 Epochs, batch_size 60 Test accuracy: 0.95369995 Best result with data augmentation: 0.9568",Image Classification,Image Classification 2437,Computer Vision,Computer Vision,Computer Vision,"CS348K Optional Assignment 2: Efficient MobileNet Conv Layer Evaluation __Note that this is an optional assignment, and can be used to add extra credit to your other assignments or final project.__ In this assignment you will implement a simplified version of the MobileNet CNN. In particular, this assignment is restricted to the evaluation of a single convolutional layer of the network. See for more details about the full network. Also, unlike Assignment 2, this assignemnt is focused on efficiency. Your code will be evaluated on how fast it runs. Why speedup MobileNets? The MobileNets DNN architecture was designed with performance in mind, and a major aspect of the network's design is the use of a separable convolution. So in this assignment you will implement a one part of the DNN which consists of the following sequence of stages: ! MobileNet Layers (images/conv_layer.png MobileNet Layers ) Here BN stands for a batchnorm layer and ReLU is a rectified linear unit (see below for details). What is the challenge? Implementing the layers correctly is easy. The challenge is to implementing them efficiently using many of the techniques described in class, such as SIMD vector processing, multi core execution, and efficient blocking for cache locality. To make these techniques simpler, we encourage you to attempt an implementation in Halide. You are allowed to use the reference Halide algorithm provided in the codebase verbatim . However, to improve the performance you will need to write an efficient Halide schedule. The starter code uses a naive/default Halide schedule, which has loops that look like: produce output: for c: for y: for x: output(...) ... for c: for y: for x: for pointwise_rdom: produce tmp: for c: for y: for x: tmp(...) ... for c: for y: for x: for depthwise_rdom: for depthwise_rdom: tmp(...) ... for c: for y: for x: tmp(...) ... for c: for y: for x: tmp(...) ... consume tmp: output(...) ... for c: for y: for x: output(...) ... for c: for y: for x: output(...) ... Your job then would be to write a custom Halide schedule that performs better than the default. (See Halide::Func::print_loop_nest() to inspect and debug your schedule like this.) Resources and documentation Halide tutorials . In particular, see Tutorial 01 for a basic introduction, Tutorial 07 for a convolution example, and Tutorial 05 for an introduction to Halide schedules, and Tutorial 08 for more advanced scheduling topics. Exhaustive Halide documentation . Details on the batchnorm layer: ReLU ) TensorFlow Slim documentation . In case you choose to compare your implementation to a TensorFlow version, we encourage use of TensorFlow Slim which is easier to get off the ground with than TensorFlow proper. Going further To really see how good your implementation is, we encourage you to compare your performance against that of popular DNN frameworks like TensorFlow or MX.net. Since the algorithm for this assignment is fixed, you can even write an implementation in hand tuned native C++ code (using AVX2 intrinsics and threading primitives). Again, you are allowed to use the provided native C++ implementation verbatim , but you should modify it to improve the performance. Assignment mechanics Grab the assignment starter code. git clone git@github.com:stanford cs348k/asst2 mobilenet.git To run the assignment, you will need to download the scene datasets, which you can get from the course staff upon request. __Build Instructions__ The codebase uses a simple Makefile as the build system. However, there is a dependency on Halide. To get the code building right away without Halide , you can modify Makefile , and replace the lines DEFINES : DUSE_HALIDE LDFLAGS : L$(HALIDE_DIR)/bin lHalide ldl lpthread with DEFINES : LDFLAGS : ldl lpthread To build the starter code, run make from the top level directory. The assignment source code is in src/ , and object files and binaries will be populated in build/ and bin/ respectively. Once you decide to use Halide, follow the instructions at In particular, you should download a binary release of Halide . Once you've downloaded and untar'd the release, say into directory halide_dir , change the previous lines back, and also the following line in Makefile HALIDE_DIR /Users/setaluri/halide to HALIDE_DIR Then you can build the code using the instructions above. __Running the starter code:__ Now you can run the camera. Just run: ./bin/convlayer DATA_DIR/activations.bin DATA_DIR/weights.bin DATA_DIR/golden.bin This code will run your (initially empty) version of the convolution layer using the activations in DATA_DIR/activations.bin and weights in DATA_DIR/weights.bin . It will run for num_runs trials, and report the timings across all runs, as well as validate the output against the data contained in DATA_DIR/golden.bin . Note that if you are using Halide, the command will be slightly different. On OSX it will be DYLD_LIBRARY_PATH /bin ./bin/convlayer and on Linux it will be LD_LIBRARY_PATH /bin ./bin/convlayer __Modifying the code__ Your modifications to the code should only go in files fast_convolution_layer.hpp and fast_convolution_layer.cpp , in the regions marked // BEGIN: CS348K STUDENTS MODIFY THIS CODE // END: CS348K STUDENTS MODIFY THIS CODE If you need to make changes to the build system (e.g. add g++ flags to get vector intrinsics working) _please make a note of it in your submission_. We have provided two reference implementations in simple_convolution_layer.cpp and halide_convolution_layer.cpp . You can use any of the code in these files for your implementation. In particular, you can (a) copy and paste the native C++ implementation as a starting point if you choose to go the native C++ route, and (b) copy the Halide algorithm (and just provide a custom schedule) if you choose to go the Halide route.",Image Classification,Image Classification 2446,Computer Vision,Computer Vision,Computer Vision,"Show and Tell: A Neural Image Caption Generator A TensorFlow implementation of the image to text model described in the paper: Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge. Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan. IEEE transactions on pattern analysis and machine intelligence (2016). Full text available at: Contact Author: Chris Shallue Pull requests and issues: @cshallue Contents Model Overview ( model overview) Introduction ( introduction) Architecture ( architecture) Getting Started ( getting started) A Note on Hardware and Training Time ( a note on hardware and training time) Install Required Packages ( install required packages) Prepare the Training Data ( prepare the training data) Download the Inception v3 Checkpoint ( download the inception v3 checkpoint) Training a Model ( training a model) Initial Training ( initial training) Fine Tune the Inception v3 Model ( fine tune the inception v3 model) Generating Captions ( generating captions) Model Overview Introduction The Show and Tell model is a deep neural network that learns how to describe the content of images. For example: ! Example captions (g3doc/example_captions.jpg) Architecture The Show and Tell model is an example of an encoder decoder neural network. It works by first encoding an image into a fixed length vector representation, and then decoding the representation into a natural language description. The image encoder is a deep convolutional neural network. This type of network is widely used for image tasks and is currently state of the art for object recognition and detection. Our particular choice of network is the Inception v3 image recognition model pretrained on the ILSVRC 2012 CLS image classification dataset. The decoder is a long short term memory (LSTM) network. This type of network is commonly used for sequence modeling tasks such as language modeling and machine translation. In the Show and Tell model, the LSTM network is trained as a language model conditioned on the image encoding. Words in the captions are represented with an embedding model. Each word in the vocabulary is associated with a fixed length vector representation that is learned during training. The following diagram illustrates the model architecture. ! Show and Tell Architecture (g3doc/show_and_tell_architecture.png) In this diagram, \{ s 0 , s 1 , ..., s N 1 \} are the words of the caption and \{ w e s 0 , w e s 1 , ..., w e s N 1 \} are their corresponding word embedding vectors. The outputs \{ p 1 , p 2 , ..., p N \} of the LSTM are probability distributions generated by the model for the next word in the sentence. The terms \{log p 1 ( s 1 ), log p 2 ( s 2 ), ..., log p N ( s N )\} are the log likelihoods of the correct word at each step; the negated sum of these terms is the minimization objective of the model. During the first phase of training the parameters of the Inception v3 model are kept fixed: it is simply a static image encoder function. A single trainable layer is added on top of the Inception v3 model to transform the image embedding into the word embedding vector space. The model is trained with respect to the parameters of the word embeddings, the parameters of the layer on top of Inception v3 and the parameters of the LSTM. In the second phase of training, all parameters including the parameters of Inception v3 are trained to jointly fine tune the image encoder and the LSTM. Given a trained model and an image we use beam search to generate captions for that image. Captions are generated word by word, where at each step t we use the set of sentences already generated with length t 1 to generate a new set of sentences with length t . We keep only the top k candidates at each step, where the hyperparameter k is called the beam size . We have found the best performance with k 3. Getting Started A Note on Hardware and Training Time The time required to train the Show and Tell model depends on your specific hardware and computational capacity. In this guide we assume you will be running training on a single machine with a GPU. In our experience on an NVIDIA Tesla K20m GPU the initial training phase takes 1 2 weeks. The second training phase may take several additional weeks to achieve peak performance (but you can stop this phase early and still get reasonable results). It is possible to achieve a speed up by implementing distributed training across a cluster of machines with GPUs, but that is not covered in this guide. Whilst it is possible to run this code on a CPU, beware that this may be approximately 10 times slower. Install Required Packages First ensure that you have installed the following required packages: Bazel ( instructions ) TensorFlow 1.0 or greater ( instructions ) NumPy ( instructions ) Natural Language Toolkit (NLTK) : First install NLTK ( instructions ) Then install the NLTK data package punkt ( instructions ) Unzip Prepare the Training Data To train the model you will need to provide training data in native TFRecord format. The TFRecord format consists of a set of sharded files containing serialized tf.SequenceExample protocol buffers. Each tf.SequenceExample proto contains an image (JPEG format), a caption and metadata such as the image id. Each caption is a list of words. During preprocessing, a dictionary is created that assigns each word in the vocabulary to an integer valued id. Each caption is encoded as a list of integer word ids in the tf.SequenceExample protos. We have provided a script to download and preprocess the MSCOCO image captioning data set into this format. Downloading and preprocessing the data may take several hours depending on your network and computer speed. Please be patient. Before running the script, ensure that your hard disk has at least 150GB of available space for storing the downloaded and processed data. shell Location to save the MSCOCO data. MSCOCO_DIR ${HOME}/im2txt/data/mscoco Build the preprocessing script. cd research/im2txt bazel build //im2txt:download_and_preprocess_mscoco Run the preprocessing script. bazel bin/im2txt/download_and_preprocess_mscoco ${MSCOCO_DIR} The final line of the output should read: 2016 09 01 16:47:47.296630: Finished processing all 20267 image caption pairs in data set 'test'. When the script finishes you will find 256 training, 4 validation and 8 testing files in DATA_DIR . The files will match the patterns train ????? of 00256 , val ????? of 00004 and test ????? of 00008 , respectively. Download the Inception v3 Checkpoint The Show and Tell model requires a pretrained Inception v3 checkpoint file to initialize the parameters of its image encoder submodel. This checkpoint file is provided by the TensorFlow Slim image classification library which provides a suite of pre trained image classification models. You can read more about the models provided by the library here . Run the following commands to download the Inception v3 checkpoint. shell Location to save the Inception v3 checkpoint. INCEPTION_DIR ${HOME}/im2txt/data mkdir p ${INCEPTION_DIR} wget tar xvf inception_v3_2016_08_28.tar.gz C ${INCEPTION_DIR} rm inception_v3_2016_08_28.tar.gz Note that the Inception v3 checkpoint will only be used for initializing the parameters of the Show and Tell model. Once the Show and Tell model starts training it will save its own checkpoint files containing the values of all its parameters (including copies of the Inception v3 parameters). If training is stopped and restarted, the parameter values will be restored from the latest Show and Tell checkpoint and the Inception v3 checkpoint will be ignored. In other words, the Inception v3 checkpoint is only used in the 0 th global step (initialization) of training the Show and Tell model. Training a Model Initial Training Run the training script. shell Directory containing preprocessed MSCOCO data. MSCOCO_DIR ${HOME}/im2txt/data/mscoco Inception v3 checkpoint file. INCEPTION_CHECKPOINT ${HOME}/im2txt/data/inception_v3.ckpt Directory to save the model. MODEL_DIR ${HOME}/im2txt/model Build the model. cd research/im2txt bazel build c opt //im2txt/... Run the training script. bazel bin/im2txt/train \ input_file_pattern ${MSCOCO_DIR}/train ????? of 00256 \ inception_checkpoint_file ${INCEPTION_CHECKPOINT} \ train_dir ${MODEL_DIR}/train \ train_inception false \ number_of_steps 1000000 Run the evaluation script in a separate process. This will log evaluation metrics to TensorBoard which allows training progress to be monitored in real time. Note that you may run out of memory if you run the evaluation script on the same GPU as the training script. You can run the command export CUDA_VISIBLE_DEVICES to force the evaluation script to run on CPU. If evaluation runs too slowly on CPU, you can decrease the value of num_eval_examples . shell MSCOCO_DIR ${HOME}/im2txt/data/mscoco MODEL_DIR ${HOME}/im2txt/model Ignore GPU devices (only necessary if your GPU is currently memory constrained, for example, by running the training script). export CUDA_VISIBLE_DEVICES Run the evaluation script. This will run in a loop, periodically loading the latest model checkpoint file and computing evaluation metrics. bazel bin/im2txt/evaluate \ input_file_pattern ${MSCOCO_DIR}/val ????? of 00004 \ checkpoint_dir ${MODEL_DIR}/train \ eval_dir ${MODEL_DIR}/eval Run a TensorBoard server in a separate process for real time monitoring of training progress and evaluation metrics. shell MODEL_DIR ${HOME}/im2txt/model Run a TensorBoard server. tensorboard logdir ${MODEL_DIR} Fine Tune the Inception v3 Model Your model will already be able to generate reasonable captions after the first phase of training. Try it out! (See Generating Captions ( generating captions)). You can further improve the performance of the model by running a second training phase to jointly fine tune the parameters of the Inception v3 image submodel and the LSTM. shell Restart the training script with train_inception true. bazel bin/im2txt/train \ input_file_pattern ${MSCOCO_DIR}/train ????? of 00256 \ train_dir ${MODEL_DIR}/train \ train_inception true \ number_of_steps 3000000 Additional 2M steps (assuming 1M in initial training). Note that training will proceed much slower now, and the model will continue to improve by a small amount for a long time. We have found that it will improve slowly for an additional 2 2.5 million steps before it begins to overfit. This may take several weeks on a single GPU. If you don't care about absolutely optimal performance then feel free to halt training sooner by stopping the training script or passing a smaller value to the flag number_of_steps . Your model will still work reasonably well. Generating Captions Your trained Show and Tell model can generate captions for any JPEG image! The following command line will generate captions for an image from the test set. shell Path to checkpoint file or a directory containing checkpoint files. Passing a directory will only work if there is also a file named 'checkpoint' which lists the available checkpoints in the directory. It will not work if you point to a directory with just a copy of a model checkpoint: in that case, you will need to pass the checkpoint path explicitly. CHECKPOINT_PATH ${HOME}/im2txt/model/train Vocabulary file generated by the preprocessing script. VOCAB_FILE ${HOME}/im2txt/data/mscoco/word_counts.txt JPEG image file to caption. IMAGE_FILE ${HOME}/im2txt/data/mscoco/raw data/val2014/COCO_val2014_000000224477.jpg Build the inference binary. cd research/im2txt bazel build c opt //im2txt:run_inference Ignore GPU devices (only necessary if your GPU is currently memory constrained, for example, by running the training script). export CUDA_VISIBLE_DEVICES Run inference to generate captions. bazel bin/im2txt/run_inference \ checkpoint_path ${CHECKPOINT_PATH} \ vocab_file ${VOCAB_FILE} \ input_files ${IMAGE_FILE} Example output: Captions for image COCO_val2014_000000224477.jpg: 0) a man riding a wave on top of a surfboard . (p 0.040413) 1) a person riding a surf board on a wave (p 0.017452) 2) a man riding a wave on a surfboard in the ocean . (p 0.005743) Note: you may get different results. Some variation between different models is expected. Here is the image: ! Surfer (g3doc/COCO_val2014_000000224477.jpg)",Image Classification,Image Classification 2452,Computer Vision,Computer Vision,Computer Vision,ResearchPapersLinks Understanding the difficulty of training deep feedforward neural networks >> Delving Deep into Rectifiers: Surpassing Human Level Performance on ImageNet Classification >> Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift >>,Image Classification,Image Classification 2453,Computer Vision,Computer Vision,Computer Vision,"Gluon Mobilenet YOLOv3 Paper YOLOv3: An Incremental Improvement _Joseph Redmon, Ali Farhadi_ Abstract We present some updates to YOLO! We made a bunch of little design changes to make it better. We also trained this new network that’s pretty swell. It’s a little bigger than last time but more accurate. It’s still fast though, don’t worry. At 320 × 320 YOLOv3 runs in 22 ms at 28.2 mAP, as accurate as SSD but three times faster. When we look at the old .5 IOU mAP detection metric YOLOv3 is quite good. It achieves 57.9 AP50 in 51 ms on a Titan X, compared to 57.5 AP50 in 198 ms by RetinaNet, similar performance but 3.8× faster. As always, all the code is online at Paper Original Implementation MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam Abstract We present a class of efficient models called MobileNets for mobile and embedded vision applications. MobileNets are based on a streamlined architecture that uses depth wise separable convolutions to build light weight deep neural networks. We introduce two simple global hyper parameters that efficiently trade off between latency and accuracy. These hyper parameters allow the model builder to choose the right sized model for their application based on the constraints of the problem. We present extensive experiments on resource and accuracy tradeoffs and show strong performance compared to other popular models on ImageNet classification. We then demonstrate the effectiveness of MobileNets across a wide range of applications and use cases including object detection, finegrain classification, face attributes and large scale geo localization. Paper Original Implementation Prerequisites 1. Python 3.6 + 2. Gluoncv 3. Mxnet Usage Mobilenet voc python3 train_yolo3_mobilenet.py network mobilenet1_0 dataset voc gpus 0,1,2,3,4,5,6,7 batch size 64 j 16 log interval 100 lr decay epoch 160,180 epochs 200 syncbn warmup epochs 4 coco python3 train_yolo3_mobilenet.py network mobilenet1_0 dataset coco gpus 0,1,2,3,4,5,6,7 batch size 64 j 32 log interval 100 lr decay epoch 220,250 epochs 280 syncbn warmup epochs 2 mixup no mixup epochs 20 label smooth no wd MAP Backbone GPU Dataset Size MAP : : : : : : : : Mobilenet 8 Tesla v100 VOC random shape 76.12 Mobilenet 8 Tesla v100 COCO2017 random shape 28.3 Credit @article{yolov3, title {YOLOv3: An Incremental Improvement}, author {Redmon, Joseph and Farhadi, Ali}, journal {arXiv}, year {2018} } @article{mobilenets, title {MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications}, author {Andrew G. Howard, Menglong Zhu, Bo Chen,Dmitry Kalenichenko,Weijun Wang, Tobias Weyand,Marco Andreetto, Hartwig Adam}, journal {arXiv}, year {2017} }",Image Classification,Image Classification 2465,Computer Vision,Computer Vision,Computer Vision,"reimagined winner CIFAR 10 Object Detection with improved accuracy using Fractional MaxPooling with Convolutional Neural Networks This code uses 2 D Convolutional Neural Networks with the KERAS library and Fractional MaxPooling2D. Fractional Maxpooling is an advanced pooling algorithm that uses a fractional pooling ratio unlike the general MaxPooling approaches where pooling usually is done in a integer ratio (generally 2). A Fractional Pooling ratio allows better scaling of the images and allows us to use larger number of convolutional layers to learn the image better at different scales. Use of Pseudo Random sequences adds randomness to the pooling operation with enables learning more robust features for classification. The library for 2D Fractional Maxpooling for keras implementations is provided. FractionalPooling2D('pool_ratio' 4 D Tuple, 'pseudo_random' bool, 'overlap' bool, name string) 1,1.44,1.67,1 is a valid 4 D tuple for Pooling ratio for batch_size, rows, cols, channels Use ( batch_input_shape ) while implementing To have a better understanding about Fractional Maxpooling refer to :",Image Classification,Image Classification 2474,Computer Vision,Computer Vision,Computer Vision,"PWC RandWire_tensorflow tensorflow implementation of Exploring Randomly Wired Neural Networks for Image Recognition using Cifar10, MNIST ! alt text Requirements Tensorflow 1.x GPU version recommended Python 3.x networkx 2.x pyyaml 5.x Dataset Please download dataset from this link Both Cifar10 and MNIST dataset are converted into tfrecords format for conveinence. Put train.tfrecords , test.tfrecords files into dataset/cifar10 , dataset/mnist You can create tfrecord file with your own dataset with dataset/dataset_generator.py . sh python dataset_generator.py image_dir ./cifar10/test/images label_dir ./cifar10/test/labels output_dir ./cifar10 output_filename test.tfrecord Options: image_dir (str) directory of your image files. it is recommended to set the name of images to integers like 0.png label_dir (str) directory of your label files. it is recommended to set the name of images to integers like 0.txt . label text file must contain class label in integer like 8 . output_dir (str) directory for output tfrecord file. outpuf_filename (str) filename of output tfrecord file. Experiments Datasets Model Parameters Accuracy Epoch CIFAR 10 ResNet110 (Paper) 1.7M 93.57% 300 CIFAR 10 RandWire (my_small_regime) 1.2M 93.64% 60 CIFAR 100 RandWire (my_regime) 8M 74.49% 100 (19.04.18 changed) I trained on Cifar10 dataset and get 6.36 % error on test set. You can download pretrained network from here . Unzip the file and move all files under checkpoint file or your checkpoint directory and try running test script to check the accuracy. The number of parameters used for cifar10 model is aboud 1.2M, which is similar result on ResNet 110 (6.43 %) which used 1.7M parameters. (19.04.16 added) I trained on Cifar100 dataset and get 74.49% accuracy on test set. You can download pretrained network from same link above. Training Cifar 10 sh python train.py class_num 10 image_shape 32 32 3 stages 4 channel_count 78 graph_model ws graph_param 32 4 0.75 dropout_rate 0.2 learning_rate 0.1 momentum 0.9 weight_decay 0.0001 train_set_size 50000 val_set_size 10000 batch_size 100 epochs 100 checkpoint_dir ./checkpoint checkpoint_name randwire_cifar10 train_record_dir ./dataset/cifar10/train.tfrecord val_record_dir ./dataset/cifar10/test.tfrecord Options: class_num (int) output number of class. Cifar10 has 10 classes. image_shape (int nargs) shape of input image. Cifar10 has 32 32 3 shape. stages (int) stage (or block) number of randwire network. channel_count (int) channel count of randwire network. please refer to the paper graph_model (str) currently randwire has 3 random graph models. you can choose from 'er', 'ba' and 'ws'. graph_param (float nargs) first value is node count. for 'er' and 'ba', there are one extra parameter so it would be like 32 0.4 or 32 7 . for 'ws' there are two extra parameters like above. learning_rate (float) initial learning rate momentum (float) momentum from momentum optimizer weight_decay (float) weight decay factor train_set_size (int) number of training data. Cifar10 has 50000 data. val_set_size (int) number of validating data. I used test data for validation, so there are 10000 data. batch_size (int) size of mini batch epochs (int) number of epoch checkpoint_dir (str) directory to save checkpoint checkpoint_name (str) file name of checkpoint train_record_dir (str) file location of training set tfrecord test_record_dir (str) file location of test set tfrecord (for validation) MNIST sh python train.py class_num 10 image_shape 28 28 1 stages 4 channel_count 78 graph_model ws graph_param 32 4 0.75 dropout_rate 0.2 learning_rate 0.1 momentum 0.9 weight_decay 0.0001 train_set_size 50000 val_set_size 10000 batch_size 100 epochs 100 checkpoint_dir ./checkpoint checkpoint_name randwire_mnist train_record_dir ./dataset/mnist/train.tfrecord val_record_dir ./dataset/mnist/test.tfrecord Options: options are same as Cifar10 Cifar100 (19.04.16 added) sh python train.py class_num 100 image_shape 32 32 3 stages 4 channel_count 78 graph_model ws graph_param 32 4 0.75 dropout_rate 0.2 learning_rate 0.1 momentum 0.9 weight_decay 0.0001 train_set_size 50000 val_set_size 10000 batch_size 100 epochs 100 checkpoint_dir ./checkpoint checkpoint_name randwire_cifar100 train_record_dir ./dataset/cifar100/train.tfrecord val_record_dir ./dataset/cifar100/test.tfrecord Options: options are same as Cifar10 Testing sh python test.py class_num checkpoint_dir ./checkpoint/best test_record_dir ./dataset/cifar10/test.tfrecord batch_size 256 Options: class_num (int) the number of classes checkpoint_dir (str) directory for the checkpoint you want to load and test test_record_dir (str) directory for the test dataset batch_size (int) batch size for testing test.py loads network graph and tensors from meta data and evalutes. Implementation Details Learning rate decreases by multiplying 0.1 in 50% and 75% of entire training step. I made an option init_subsample in my_regime , my_small_regime and small_regime in RandWire.py which do not to use stride 2 for the initial convolutional layer since cifar10 and mnist has low resolution. if you set init_subsample False, then it will use stride 2. While training, it will save the checkpoint with best validation accuracy. While training, it will save tensorboard log for training and validation accuracy and loss in YOUR_CHECKPOINT_DIRECTORY /log . You can visualize yourself with tensorboard. I'm currently working on drop connection for regularization and downloading ImageNet dataset to train on my implementation. I added dropout layer after the Relu Conv BN triplet unit for regularization. You can set dropout_rate 0.0 to disable it. In train.py, you can use small_regime or regular_regime instead of my_regime and my_small_regime . Both do not use stride 2 in order to prevent subsampling to maintain the spatial information since cifar datasets are not large enough. python output logit from NN output RandWire.my_small_regime(images, args.stages, args.channel_count, args.class_num, args.dropout_rate, args.graph_model, args.graph_param, args.checkpoint_dir + '/' + 'graphs', False, training) output RandWire.small_regime(images, args.stages, args.channel_count, args.class_num, args.dropout_rate, args.graph_model, args.graph_param, args.checkpoint_dir + '/' + 'graphs', False, training) output RandWire.regular_regime(images, args.stages, args.channel_count, args.class_num, args.dropout_rate, args.graph_model, args.graph_param, args.checkpoint_dir + '/' + 'graphs', training)",Image Classification,Image Classification 2477,Computer Vision,Computer Vision,Computer Vision,densenet_tensorflow this is a densenet by tensorflow This repository contains the tensorflow implementation for the paper . Dependencies: Python 3 TensorFlow > 1.0 tensorlayer How to use: python densenet.py,Image Classification,Image Classification 2482,Computer Vision,Computer Vision,Computer Vision,"Human Protein Atlas Challenge This is an implementation and application of the SENet 154 neural network architecture to the Human Protein Atlas Challenge dataset on Kaggle. Performance on the leaderboard: Top 12% (234/2172) This is a record of the various iterations and experiments I conducted along the way to create a convolutional neural network that to recognise proteins found on microscopy staining images. Used the FastAI library as well as the pretrained models from: If you are considering using some of this code, things to note: 1. I used tabbed views to retain results of previous experiments this means that trying to read the notebooks in order may be confusing 2. The code using the FastAI library is likely to be out of date (I used v0.7 whereas v1.0 has been launched) 3. The one cycle policy did not seem to work well and I had trouble tuning the hyperparameters as recommended",Image Classification,Image Classification 2484,Computer Vision,Computer Vision,Computer Vision,4099 Emotion Analyser This repo is for the code associated with the 4099 project Chosen CNN for Feature Extraction: Inception ResNet V2 (with credit to Keras for pretrained model) Paper link: Dataset used for training and validation: RAVDESS Notebook for data preparation: Data_preparation.ipynb Notebook for feature extraction and training: InceptionResNetV2.ipynb,Image Classification,Image Classification 2487,Computer Vision,Computer Vision,Computer Vision,"PyINN CuPy implementations of fused PyTorch ops. PyTorch version of imagine nn The purpose of this package is to contain CUDA ops written in Python with CuPy, which is not a PyTorch dependency. An alternative to CuPy would be , but it requires a lot of wrapping code like , so doesn't really work with quick prototyping. Another advantage of CuPy over C code is that dimensions of each op are known at JIT ing time, and compiled kernels potentially can be faster. Also, the first version of the package was in PyCUDA, but it can't work with PyTorch multi GPU. On Maxwell Titan X pyinn.conv2d_depthwise MobileNets are 2.6x faster than F.conv2d benchmark.py (test/benchmark.py) No longer the case with new kernels PyTorch 0.3.0 is now 20% faster than pyinn. Installation pip install git+ Example python import torch from torch.autograd import Variable import pyinn as P x Variable(torch.randn(1,4,5,5).cuda()) w Variable(torch.randn(4,1,3,3).cuda()) y P.conv2d_depthwise(x, w, padding 1) or with modules interface: python from pyinn.modules import Conv2dDepthwise module Conv2dDepthwise(channels 4, kernel_size 3, padding 1).cuda() y module(x) Documentation conv2d_depthwise Implements depthwise convolution as in MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications CUDA kernels from CPU side is done by F.conv2d . Equivalent to: python F.conv2d(input, weight, groups input.size(1)) Inputs and arguments are the same with F.conv2d dgmm Multiplication with a diagonal matrix. Used CUDA dgmm function, sometimes is faster than expand. In torch functions does input.mm(x.diag()) . Both left and right mutliplications are supported. Args: input: 2D tensor x: 1D tensor cdgmm Complex multiplication with a diagonal matrix. Does input.mm(x.diag()) where input and x are complex. Args: input: 3D tensor with last dimension of size 2 x: 2D tensor with last dimension of size 2 NCReLU Applies NCReLU (negative concatenated ReLU) nonlinearity. Does torch.cat( x.clamp(min 0), x.clamp(max 0) , dim 1) in a single fused op. Used in DiracNets: Training Very Deep Neural Networks Without Skip Connections Args: input: 4D tensor im2col and col2im Rearrange image blocks into columns. The representation is used to perform GEMM based convolution. Output is 5D (or 6D in case of minibatch) tensor. Minibatch implementation is inefficient, and could be done in a single CUDA kernel.",Image Classification,Image Classification 2488,Computer Vision,Computer Vision,Computer Vision,"Wide Residual Networks This code was used for experiments with Wide Residual Networks (BMVC 2016) by Sergey Zagoruyko and Nikos Komodakis. Deep residual networks were shown to be able to scale up to thousands of layers and still have improving performance. However, each fraction of a percent of improved accuracy costs nearly doubling the number of layers, and so training very deep residual networks has a problem of diminishing feature reuse, which makes these networks very slow to train. To tackle these problems, in this work we conduct a detailed experimental study on the architecture of ResNet blocks, based on which we propose a novel architecture where we decrease depth and increase width of residual networks. We call the resulting network structures wide residual networks (WRNs) and show that these are far superior over their commonly used thin and very deep counterparts. For example, we demonstrate that even a simple 16 layer deep wide residual network outperforms in accuracy and efficiency all previous deep residual networks, including thousand layer deep networks. We further show that WRNs achieve incredibly good results (e.g., achieving new state of the art results on CIFAR 10, CIFAR 100, SVHN, COCO and substantial improvements on ImageNet) and train several times faster than pre activation ResNets. Update: We updated the paper with ImageNet, COCO and meanstd preprocessing CIFAR results. If you're comparing your method against WRN, please report correct preprocessing numbers because they give substantially different results. tldr; ImageNet WRN 50 2 bottleneck (ResNet 50 with wider inner bottleneck 3x3 convolution) is significantly faster than ResNet 152 and has better accuracy; on CIFAR meanstd preprocessing (as in fb.resnet.torch) gives better results than ZCA whitening; on COCO wide ResNet with 34 layers outperforms even Inception v4 based Fast RCNN model in single model performance. Test error (%, flip/translation augmentation, meanstd normalization, median of 5 runs) on CIFAR: Network CIFAR 10 CIFAR 100 : : : : pre ResNet 164 5.46 24.33 pre ResNet 1001 4.92 22.71 WRN 28 10 4.00 19.25 WRN 28 10 dropout 3.89 18.85 Single time runs (meanstd normalization): Dataset network test perf. : : : : CIFAR 10 WRN 40 10 dropout 3.8% CIFAR 100 WRN 40 10 dropout 18.3% SVHN WRN 16 8 dropout 1.54% ImageNet (single crop) WRN 50 2 bottleneck 21.9% top 1, 5.79% top 5 COCO val5k (single model) WRN 34 2 36 mAP See for details. bibtex: @INPROCEEDINGS{Zagoruyko2016WRN, author {Sergey Zagoruyko and Nikos Komodakis}, title {Wide Residual Networks}, booktitle {BMVC}, year {2016}} Pretrained models ImageNet WRN 50 2 bottleneck (wider bottleneck), see pretrained (pretrained/README.md) for details Download (263MB): There are also PyTorch and Tensorflow model definitions with pretrained weights at COCO Coming Installation The code depends on Torch Follow instructions here and run: luarocks install torchnet luarocks install optnet luarocks install iterm For visualizing training curves we used ipython notebook with pandas and bokeh. Usage Dataset support The code supports loading simple datasets in torch format. We provide the following: MNIST data preparation script CIFAR 10 recommended data preparation script , preprocessed data (176MB) CIFAR 10 whitened (using pylearn2) preprocessed dataset CIFAR 100 recommended data preparation script , preprocessed data (176MB) CIFAR 100 whitened (using pylearn2) preprocessed dataset SVHN data preparation script To whiten CIFAR 10 and CIFAR 100 we used the following scripts and then converted to torch using and npy to torch converter We are running ImageNet experiments and will update the paper and this repo soon. Training We provide several scripts for reproducing results in the paper. Below are several examples. bash model wide resnet widen_factor 4 depth 40 ./scripts/train_cifar.sh This will train WRN 40 4 on CIFAR 10 whitened (supposed to be in datasets folder). This network achieves about the same accuracy as ResNet 1001 and trains in 6 hours on a single Titan X. Log is saved to logs/wide resnet_$RANDOM$RANDOM folder with json entries for each epoch and can be visualized with itorch/ipython later. For reference we provide logs for this experiment and ipython notebook (notebooks/visualize.ipynb) to visualize the results. After running it you should see these training curves: ! viz Another example: bash model wide resnet widen_factor 10 depth 28 dropout 0.3 dataset ./datasets/cifar100_whitened.t7 ./scripts/train_cifar.sh This network achieves 20.0% error on CIFAR 100 in about a day on a single Titan X. Multi GPU is supported with nGPU n parameter. Other models Additional models in this repo: NIN (7.4% on CIFAR 10 whitened) VGG (modified from cifar.torch , 6.3% on CIFAR 10 whitened) pre activation ResNet (from Implementation details The code evolved from To reduce memory usage we use @fmassa's optimize net , which automatically shares output and gradient tensors between modules. This keeps memory usage below 4 Gb even for our best networks. Also, it can generate network graph plots as the one for WRN 16 2 in the end of this page. Acknowledgements We thank startup company VisionLabs and Eugenio Culurciello for giving us access to their clusters, without them ImageNet experiments wouldn't be possible. We also thank Adam Lerer and Sam Gross for helpful discussions. Work supported by EC project FP7 ICT 611145 ROBOSPECT.",Image Classification,Image Classification 2504,Computer Vision,Computer Vision,Computer Vision,"TensorFlow Tutorial Table of Contents 1. TensorFlow Basics ( basics) 2. Understanding static and dynamic shapes ( shapes) 3. Scopes and when to use them ( scopes) 4. Broadcasting the good and the ugly ( broadcast) 5. Feeding data to TensorFlow ( data) 6. Take advantage of the overloaded operators ( overloaded_ops) 7. Understanding order of execution and control dependencies ( control_deps) 8. Control flow operations: conditionals and loops ( control_flow) 9. Prototyping kernels and advanced visualization with Python ops ( python_ops) 10. Multi GPU processing with data parallelism ( multi_gpu) 11. Debugging TensorFlow models ( debug) 12. Numerical stability in TensorFlow ( stable) 13. Building a neural network training framework with learn API ( tf_learn) 14. TensorFlow Cookbook ( cookbook) Get shape ( get_shape) Batch gather ( batch_gather) Beam search ( beam_search) Merge ( merge) Entropy ( entropy) KL Divergence ( kld) Make parallel ( make_parallel) Leaky Relu ( leaky_relu) Batch normalization ( batch_norm) Squeeze and excitation ( squeeze_excite) _We aim to gradually expand this series by adding new articles and keep the content up to date with the latest releases of TensorFlow API. If you have suggestions on how to improve this series or find the explanations ambiguous, feel free to create an issue, send patches, or reach out by email._ _We encourage you to also check out the accompanied neural network training framework built on top of tf.contrib.learn API. The framework can be downloaded separately:_ git clone TensorFlow Basics The most striking difference between TensorFlow and other numerical computation libraries such as NumPy is that operations in TensorFlow are symbolic. This is a powerful concept that allows TensorFlow to do all sort of things (e.g. automatic differentiation) that are not possible with imperative libraries such as NumPy. But it also comes at the cost of making it harder to grasp. Our attempt here is to demystify TensorFlow and provide some guidelines and best practices for more effective use of TensorFlow. Let's start with a simple example, we want to multiply two random matrices. First we look at an implementation done in NumPy: python import numpy as np x np.random.normal(size 10, 10 ) y np.random.normal(size 10, 10 ) z np.dot(x, y) print(z) Now we perform the exact same computation this time in TensorFlow: python import tensorflow as tf x tf.random_normal( 10, 10 ) y tf.random_normal( 10, 10 ) z tf.matmul(x, y) sess tf.Session() z_val sess.run(z) print(z_val) Unlike NumPy that immediately performs the computation and produces the result, tensorflow only gives us a handle (of type Tensor) to a node in the graph that represents the result. If we try printing the value of z directly, we get something like this: Tensor( MatMul:0 , shape (10, 10), dtype float32) Since both the inputs have a fully defined shape, tensorflow is able to infer the shape of the tensor as well as its type. In order to compute the value of the tensor we need to create a session and evaluate it using Session.run() method. __Tip__: When using Jupyter notebook make sure to call tf.reset_default_graph() at the beginning to clear the symbolic graph before defining new nodes. To understand how powerful symbolic computation can be let's have a look at another example. Assume that we have samples from a curve (say f(x) 5x^2 + 3) and we want to estimate f(x) based on these samples. We define a parametric function g(x, w) w0 x^2 + w1 x + w2, which is a function of the input x and latent parameters w, our goal is then to find the latent parameters such that g(x, w) ≈ f(x). This can be done by minimizing the following loss function: L(w) ∑ (f(x) g(x, w))^2. Although there's a closed form solution for this simple problem, we opt to use a more general approach that can be applied to any arbitrary differentiable function, and that is using stochastic gradient descent. We simply compute the average gradient of L(w) with respect to w over a set of sample points and move in the opposite direction. Here's how it can be done in TensorFlow: python import numpy as np import tensorflow as tf Placeholders are used to feed values from python to TensorFlow ops. We define two placeholders, one for input feature x, and one for output y. x tf.placeholder(tf.float32) y tf.placeholder(tf.float32) Assuming we know that the desired function is a polynomial of 2nd degree, we allocate a vector of size 3 to hold the coefficients. The variable will be automatically initialized with random noise. w tf.get_variable( w , shape 3, 1 ) We define yhat to be our estimate of y. f tf.stack( tf.square(x), x, tf.ones_like(x) , 1) yhat tf.squeeze(tf.matmul(f, w), 1) The loss is defined to be the l2 distance between our estimate of y and its true value. We also added a shrinkage term, to ensure the resulting weights would be small. loss tf.nn.l2_loss(yhat y) + 0.1 tf.nn.l2_loss(w) We use the Adam optimizer with learning rate set to 0.1 to minimize the loss. train_op tf.train.AdamOptimizer(0.1).minimize(loss) def generate_data(): x_val np.random.uniform( 10.0, 10.0, size 100) y_val 5 np.square(x_val) + 3 return x_val, y_val sess tf.Session() Since we are using variables we first need to initialize them. sess.run(tf.global_variables_initializer()) for _ in range(1000): x_val, y_val generate_data() _, loss_val sess.run( train_op, loss , {x: x_val, y: y_val}) print(loss_val) print(sess.run( w )) By running this piece of code you should see a result close to this: 4.9924135, 0.00040895029, 3.4504161 Which is a relatively close approximation to our parameters. This is just tip of the iceberg for what TensorFlow can do. Many problems such as optimizing large neural networks with millions of parameters can be implemented efficiently in TensorFlow in just a few lines of code. TensorFlow takes care of scaling across multiple devices, and threads, and supports a variety of platforms. Understanding static and dynamic shapes Tensors in TensorFlow have a static shape attribute which is determined during graph construction. The static shape may be underspecified. For example we might define a tensor of shape None, 128 : python import tensorflow as tf a tf.placeholder(tf.float32, None, 128 ) This means that the first dimension can be of any size and will be determined dynamically during Session.run(). You can query the static shape of a Tensor as follows: python static_shape a.shape.as_list() returns None, 128 To get the dynamic shape of the tensor you can call tf.shape op, which returns a tensor representing the shape of the given tensor: python dynamic_shape tf.shape(a) The static shape of a tensor can be set with Tensor.set_shape() method: python a.set_shape( 32, 128 ) static shape of a is 32, 128 a.set_shape( None, 128 ) first dimension of a is determined dynamically You can reshape a given tensor dynamically using tf.reshape function: python a tf.reshape(a, 32, 128 ) It can be convenient to have a function that returns the static shape when available and dynamic shape when it's not. The following utility function does just that: python def get_shape(tensor): static_shape tensor.shape.as_list() dynamic_shape tf.unstack(tf.shape(tensor)) dims s 1 if s 0 is None else s 0 for s in zip(static_shape, dynamic_shape) return dims Now imagine we want to convert a Tensor of rank 3 to a tensor of rank 2 by collapsing the second and third dimensions into one. We can use our get_shape() function to do that: python b tf.placeholder(tf.float32, None, 10, 32 ) shape get_shape(b) b tf.reshape(b, shape 0 , shape 1 shape 2 ) Note that this works whether the shapes are statically specified or not. In fact we can write a general purpose reshape function to collapse any list of dimensions: python import tensorflow as tf import numpy as np def reshape(tensor, dims_list): shape get_shape(tensor) dims_prod for dims in dims_list: if isinstance(dims, int): dims_prod.append(shape dims ) elif all( isinstance(shape d , int) for d in dims ): dims_prod.append(np.prod( shape d for d in dims )) else: dims_prod.append(tf.prod( shape d for d in dims )) tensor tf.reshape(tensor, dims_prod) return tensor Then collapsing the second dimension becomes very easy: python b tf.placeholder(tf.float32, None, 10, 32 ) b reshape(b, 0, 1, 2 ) Scopes and when to use them Variables and tensors in TensorFlow have a name attribute that is used to identify them in the symbolic graph. If you don't specify a name when creating a variable or a tensor, TensorFlow automatically assigns a name for you: python a tf.constant(1) print(a.name) prints Const:0 b tf.Variable(1) print(b.name) prints Variable:0 You can overwrite the default name by explicitly specifying it: python a tf.constant(1, name a ) print(a.name) prints a:0 b tf.Variable(1, name b ) print(b.name) prints b:0 TensorFlow introduces two different context managers to alter the name of tensors and variables. The first is tf.name_scope: python with tf.name_scope( scope ): a tf.constant(1, name a ) print(a.name) prints scope/a:0 b tf.Variable(1, name b ) print(b.name) prints scope/b:0 c tf.get_variable(name c , shape ) print(c.name) prints c:0 Note that there are two ways to define new variables in TensorFlow, by creating a tf.Variable object or by calling tf.get_variable. Calling tf.get_variable with a new name results in creating a new variable, but if a variable with the same name exists it will raise a ValueError exception, telling us that re declaring a variable is not allowed. tf.name_scope affects the name of tensors and variables created with tf.Variable, but doesn't impact the variables created with tf.get_variable. Unlike tf.name_scope, tf.variable_scope modifies the name of variables created with tf.get_variable as well: python with tf.variable_scope( scope ): a tf.constant(1, name a ) print(a.name) prints scope/a:0 b tf.Variable(1, name b ) print(b.name) prints scope/b:0 c tf.get_variable(name c , shape ) print(c.name) prints scope/c:0 python with tf.variable_scope( scope ): a1 tf.get_variable(name a , shape ) a2 tf.get_variable(name a , shape ) Disallowed But what if we actually want to reuse a previously declared variable? Variable scopes also provide the functionality to do that: python with tf.variable_scope( scope ): a1 tf.get_variable(name a , shape ) with tf.variable_scope( scope , reuse True): a2 tf.get_variable(name a , shape ) OK This becomes handy for example when using built in neural network layers: python features1 tf.layers.conv2d(image1, filters 32, kernel_size 3) Use the same convolution weights to process the second image: with tf.variable_scope(tf.get_variable_scope(), reuse True): features2 tf.layers.conv2d(image2, filters 32, kernel_size 3) This syntax may not look very clean to some. Especially if you want to do lots of variable sharing keeping track of when to define new variables and when to reuse them can be cumbersome and error prone. TensorFlow templates are designed to handle this automatically: python conv3x32 tf.make_template( conv3x32 , lambda x: tf.layers.conv2d(x, 32, 3)) features1 conv3x32(image1) features2 conv3x32(image2) Will reuse the convolution weights. You can turn any function to a TensorFlow template. Upon the first call to a template, the variables defined inside the function would be declared and in the consecutive invocations they would automatically get reused. Broadcasting the good and the ugly TensorFlow supports broadcasting elementwise operations. Normally when you want to perform operations like addition and multiplication, you need to make sure that shapes of the operands match, e.g. you can’t add a tensor of shape 3, 2 to a tensor of shape 3, 4 . But there’s a special case and that’s when you have a singular dimension. TensorFlow implicitly tiles the tensor across its singular dimensions to match the shape of the other operand. So it’s valid to add a tensor of shape 3, 2 to a tensor of shape 3, 1 python import tensorflow as tf a tf.constant( 1., 2. , 3., 4. ) b tf.constant( 1. , 2. ) c a + tf.tile(b, 1, 2 ) c a + b Broadcasting allows us to perform implicit tiling which makes the code shorter, and more memory efficient, since we don’t need to store the result of the tiling operation. One neat place that this can be used is when combining features of varying length. In order to concatenate features of varying length we commonly tile the input tensors, concatenate the result and apply some nonlinearity. This is a common pattern across a variety of neural network architectures: python a tf.random_uniform( 5, 3, 5 ) b tf.random_uniform( 5, 1, 6 ) concat a and b and apply nonlinearity tiled_b tf.tile(b, 1, 3, 1 ) c tf.concat( a, tiled_b , 2) d tf.layers.dense(c, 10, activation tf.nn.relu) But this can be done more efficiently with broadcasting. We use the fact that f(m(x + y)) is equal to f(mx + my). So we can do the linear operations separately and use broadcasting to do implicit concatenation: python pa tf.layers.dense(a, 10, activation None) pb tf.layers.dense(b, 10, activation None) d tf.nn.relu(pa + pb) In fact this piece of code is pretty general and can be applied to tensors of arbitrary shape as long as broadcasting between tensors is possible: python def merge(a, b, units, activation tf.nn.relu): pa tf.layers.dense(a, units, activation None) pb tf.layers.dense(b, units, activation None) c pa + pb if activation is not None: c activation(c) return c A slightly more general form of this function is included ( merge) in the cookbook. So far we discussed the good part of broadcasting. But what’s the ugly part you may ask? Implicit assumptions almost always make debugging harder to do. Consider the following example: python a tf.constant( 1. , 2. ) b tf.constant( 1., 2. ) c tf.reduce_sum(a + b) What do you think the value of c would be after evaluation? If you guessed 6, that’s wrong. It’s going to be 12. This is because when rank of two tensors don’t match, TensorFlow automatically expands the first dimension of the tensor with lower rank before the elementwise operation, so the result of addition would be 2, 3 , 3, 4 , and the reducing over all parameters would give us 12. The way to avoid this problem is to be as explicit as possible. Had we specified which dimension we would want to reduce across, catching this bug would have been much easier: python a tf.constant( 1. , 2. ) b tf.constant( 1., 2. ) c tf.reduce_sum(a + b, 0) Here the value of c would be 5, 7 , and we immediately would guess based on the shape of the result that there’s something wrong. A general rule of thumb is to always specify the dimensions in reduction operations and when using tf.squeeze. Feeding data to TensorFlow TensorFlow is designed to work efficiently with large amount of data. So it's important not to starve your TensorFlow model in order to maximize its performance. There are various ways that you can feed your data to TensorFlow. Constants The simplest approach is to embed the data in your graph as a constant: python import tensorflow as tf import numpy as np actual_data np.random.normal(size 100 ) data tf.constant(actual_data) This approach can be very efficient, but it's not very flexible. One problem with this approach is that, in order to use your model with another dataset you have to rewrite the graph. Also, you have to load all of your data at once and keep it in memory which would only work with small datasets. Placeholders Using placeholders solves both of these problems: python import tensorflow as tf import numpy as np data tf.placeholder(tf.float32) prediction tf.square(data) + 1 actual_data np.random.normal(size 100 ) tf.Session().run(prediction, feed_dict {data: actual_data}) Placeholder operator returns a tensor whose value is fetched through the feed_dict argument in Session.run function. Note that running Session.run without feeding the value of data in this case will result in an error. Python ops Another approach to feed the data to TensorFlow is by using Python ops: python def py_input_fn(): actual_data np.random.normal(size 100 ) return actual_data data tf.py_func(py_input_fn, , (tf.float32)) Python ops allow you to convert a regular Python function to a TensorFlow operation. Dataset API The recommended way of reading the data in TensorFlow however is through the dataset API: python actual_data np.random.normal(size 100 ) dataset tf.contrib.data.Dataset.from_tensor_slices(actual_data) data dataset.make_one_shot_iterator().get_next() If you need to read your data from file, it may be more efficient to write it in TFrecord format and use TFRecordDataset to read it: python dataset tf.contrib.data.TFRecordDataset(path_to_data) See the official docs for an example of how to write your dataset in TFrecord format. Dataset API allows you to make efficient data processing pipelines easily. For example this is how we process our data in the accompanied framework (See trainer.py ): python dataset ... dataset dataset.cache() if mode tf.estimator.ModeKeys.TRAIN: dataset dataset.repeat() dataset dataset.shuffle(batch_size 5) dataset dataset.map(parse, num_threads 8) dataset dataset.batch(batch_size) After reading the data, we use Dataset.cache method to cache it into memory for improved efficiency. During the training mode, we repeat the dataset indefinitely. This allows us to process the whole dataset many times. We also shuffle the dataset to get batches with different sample distributions. Next, we use the Dataset.map function to perform preprocessing on raw records and convert the data to a usable format for the model. We then create batches of samples by calling Dataset.batch. Take advantage of the overloaded operators Just like NumPy, TensorFlow overloads a number of python operators to make building graphs easier and the code more readable. The slicing op is one of the overloaded operators that can make indexing tensors very easy: python z x begin:end z tf.slice(x, begin , end begin ) Be very careful when using this op though. The slicing op is very inefficient and often better avoided, especially when the number of slices is high. To understand how inefficient this op can be let's look at an example. We want to manually perform reduction across the rows of a matrix: python import tensorflow as tf import time x tf.random_uniform( 500, 10 ) z tf.zeros( 10 ) for i in range(500): z + x i sess tf.Session() start time.time() sess.run(z) print( Took %f seconds. % (time.time() start)) On my MacBook Pro, this took 2.67 seconds to run! The reason is that we are calling the slice op 500 times, which is going to be very slow to run. A better choice would have been to use tf.unstack op to slice the matrix into a list of vectors all at once: python z tf.zeros( 10 ) for x_i in tf.unstack(x): z + x_i This took 0.18 seconds. Of course, the right way to do this simple reduction is to use tf.reduce_sum op: python z tf.reduce_sum(x, axis 0) This took 0.008 seconds, which is 300x faster than the original implementation. TensorFlow also overloads a range of arithmetic and logical operators: python z x z tf.negative(x) z x + y z tf.add(x, y) z x y z tf.subtract(x, y) z x y z tf.mul(x, y) z x / y z tf.div(x, y) z x // y z tf.floordiv(x, y) z x % y z tf.mod(x, y) z x y z tf.pow(x, y) z x @ y z tf.matmul(x, y) z x > y z tf.greater(x, y) z x > y z tf.greater_equal(x, y) z x As we discussed in the first item, TensorFlow doesn't immediately run the operations that are defined but rather creates corresponding nodes in a graph that can be evaluated with Session.run() method. This also enables TensorFlow to do optimizations at run time to determine the optimal order of execution and possible trimming of unused nodes. If you only have tf.Tensors in your graph you don't need to worry about dependencies but you most probably have tf.Variables too, and tf.Variables make things much more difficult. My advice to is to only use Variables if Tensors don't do the job. This might not make a lot of sense to you now, so let's start with an example. python import tensorflow as tf a tf.constant(1) b tf.constant(2) a a + b tf.Session().run(a) Evaluating a will return the value 3 as expected. Note that here we are creating 3 tensors, two constant tensors and another tensor that stores the result of the addition. Note that you can't overwrite the value of a tensor. If you want to modify it you have to create a new tensor. As we did here. __TIP__: If you don't define a new graph, TensorFlow automatically creates a graph for you by default. You can use tf.get_default_graph() to get a handle to the graph. You can then inspect the graph, for example by printing all its tensors: python print(tf.contrib.graph_editor.get_tensors(tf.get_default_graph())) Unlike tensors, variables can be updated. So let's see how we may use variables to do the same thing: python a tf.Variable(1) b tf.constant(2) assign tf.assign(a, a + b) sess tf.Session() sess.run(tf.global_variables_initializer()) print(sess.run(assign)) Again, we get 3 as expected. Note that tf.assign returns a tensor representing the value of the assignment. So far everything seemed to be fine, but let's look at a slightly more complicated example: python a tf.Variable(1) b tf.constant(2) c a + b assign tf.assign(a, 5) sess tf.Session() for i in range(10): sess.run(tf.global_variables_initializer()) print(sess.run( assign, c )) Note that the tensor c here won't have a deterministic value. This value might be 3 or 7 depending on whether addition or assignment gets executed first. You should note that the order that you define ops in your code doesn't matter to TensorFlow runtime. The only thing that matters is the control dependencies. Control dependencies for tensors are straightforward. Every time you use a tensor in an operation that op will define an implicit dependency to that tensor. But things get complicated with variables because they can take many values. When dealing with variables, you may need to explicitly define dependencies using tf.control_dependencies() as follows: python a tf.Variable(1) b tf.constant(2) c a + b with tf.control_dependencies( c ): assign tf.assign(a, 5) sess tf.Session() for i in range(10): sess.run(tf.global_variables_initializer()) print(sess.run( assign, c )) This will make sure that the assign op will be called after the addition. Control flow operations: conditionals and loops When building complex models such as recurrent neural networks you may need to control the flow of operations through conditionals and loops. In this section we introduce a number of commonly used control flow ops. Let's assume you want to decide whether to multiply to or add two given tensors based on a predicate. This can be simply implemented with tf.cond which acts as a python if function: python a tf.constant(1) b tf.constant(2) p tf.constant(True) x tf.cond(p, lambda: a + b, lambda: a b) print(tf.Session().run(x)) Since the predicate is True in this case, the output would be the result of the addition, which is 3. Most of the times when using TensorFlow you are using large tensors and want to perform operations in batch. A related conditional operation is tf.where, which like tf.cond takes a predicate, but selects the output based on the condition in batch. python a tf.constant( 1, 1 ) b tf.constant( 2, 2 ) p tf.constant( True, False ) x tf.where(p, a + b, a b) print(tf.Session().run(x)) This will return 3, 2 . Another widely used control flow operation is tf.while_loop. It allows building dynamic loops in TensorFlow that operate on sequences of variable length. Let's see how we can generate Fibonacci sequence with tf.while_loops: python n tf.constant(5) def cond(i, a, b): return i Operation kernels in TensorFlow are entirely written in C++ for efficiency. But writing a TensorFlow kernel in C++ can be quite a pain. So, before spending hours implementing your kernel you may want to prototype something quickly, however inefficient. With tf.py_func() you can turn any piece of python code to a TensorFlow operation. For example this is how you can implement a simple ReLU nonlinearity kernel in TensorFlow as a python op: python import numpy as np import tensorflow as tf import uuid def relu(inputs): Define the op in python def _relu(x): return np.maximum(x, 0.) Define the op's gradient in python def _relu_grad(x): return np.float32(x > 0) An adapter that defines a gradient op compatible with TensorFlow def _relu_grad_op(op, grad): x op.inputs 0 x_grad grad tf.py_func(_relu_grad, x , tf.float32) return x_grad Register the gradient with a unique id grad_name MyReluGrad_ + str(uuid.uuid4()) tf.RegisterGradient(grad_name)(_relu_grad_op) Override the gradient of the custom op g tf.get_default_graph() with g.gradient_override_map({ PyFunc : grad_name}): output tf.py_func(_relu, inputs , tf.float32) return output To verify that the gradients are correct you can use TensorFlow's gradient checker: python x tf.random_normal( 10 ) y relu(x x) with tf.Session(): diff tf.test.compute_gradient_error(x, 10 , y, 10 ) print(diff) compute_gradient_error() computes the gradient numerically and returns the difference with the provided gradient. What we want is a very low difference. Note that this implementation is pretty inefficient, and is only useful for prototyping, since the python code is not parallelizable and won't run on GPU. Once you verified your idea, you definitely would want to write it as a C++ kernel. In practice we commonly use python ops to do visualization on Tensorboard. Consider the case that you are building an image classification model and want to visualize your model predictions during training. TensorFlow allows visualizing images with tf.summary.image() function: python image tf.placeholder(tf.float32) tf.summary.image( image , image) But this only visualizes the input image. In order to visualize the predictions you have to find a way to add annotations to the image which may be almost impossible with existing ops. An easier way to do this is to do the drawing in python, and wrap it in a python op: python import io import matplotlib.pyplot as plt import numpy as np import PIL import tensorflow as tf def visualize_labeled_images(images, labels, max_outputs 3, name image ): def _visualize_image(image, label): Do the actual drawing in python fig plt.figure(figsize (3, 3), dpi 80) ax fig.add_subplot(111) ax.imshow(image :: 1,... ) ax.text(0, 0, str(label), horizontalalignment left , verticalalignment top ) fig.canvas.draw() Write the plot as a memory file. buf io.BytesIO() data fig.savefig(buf, format png ) buf.seek(0) Read the image and convert to numpy array img PIL.Image.open(buf) return np.array(img.getdata()).reshape(img.size 0 , img.size 1 , 1) def _visualize_images(images, labels): Only display the given number of examples in the batch outputs for i in range(max_outputs): output _visualize_image(images i , labels i ) outputs.append(output) return np.array(outputs, dtype np.uint8) Run the python op. figs tf.py_func(_visualize_images, images, labels , tf.uint8) return tf.summary.image(name, figs) Note that since summaries are usually only evaluated once in a while (not per step), this implementation may be used in practice without worrying about efficiency. Multi GPU processing with data parallelism If you write your software in a language like C++ for a single cpu core, making it run on multiple GPUs in parallel would require rewriting the software from scratch. But this is not the case with TensorFlow. Because of its symbolic nature, tensorflow can hide all that complexity, making it effortless to scale your program across many CPUs and GPUs. Let's start with the simple example of adding two vectors on CPU: python import tensorflow as tf with tf.device(tf.DeviceSpec(device_type CPU , device_index 0)): a tf.random_uniform( 1000, 100 ) b tf.random_uniform( 1000, 100 ) c a + b tf.Session().run(c) The same thing can as simply be done on GPU: python with tf.device(tf.DeviceSpec(device_type GPU , device_index 0)): a tf.random_uniform( 1000, 100 ) b tf.random_uniform( 1000, 100 ) c a + b But what if we have two GPUs and want to utilize both? To do that, we can split the data and use a separate GPU for processing each half: python split_a tf.split(a, 2) split_b tf.split(b, 2) split_c for i in range(2): with tf.device(tf.DeviceSpec(device_type GPU , device_index i)): split_c.append(split_a i + split_b i ) c tf.concat(split_c, axis 0) Let's rewrite this in a more general form so that we can replace addition with any other set of operations: python def make_parallel(fn, num_gpus, kwargs): in_splits {} for k, v in kwargs.items(): in_splits k tf.split(v, num_gpus) out_split for i in range(num_gpus): with tf.device(tf.DeviceSpec(device_type GPU , device_index i)): with tf.variable_scope(tf.get_variable_scope(), reuse i > 0): out_split.append(fn( {k : v i for k, v in in_splits.items()})) return tf.concat(out_split, axis 0) def model(a, b): return a + b c make_parallel(model, 2, a a, b b) You can replace the model with any function that takes a set of tensors as input and returns a tensor as result with the condition that both the input and output are in batch. Note that we also added a variable scope and set the reuse to true. This makes sure that we use the same variables for processing both splits. This is something that will become handy in our next example. Let's look at a slightly more practical example. We want to train a neural network on multiple GPUs. During training we not only need to compute the forward pass but also need to compute the backward pass (the gradients). But how can we parallelize the gradient computation? This turns out to be pretty easy. Recall from the first item that we wanted to fit a second degree polynomial to a set of samples. We reorganized the code a bit to have the bulk of the operations in the model function: python import numpy as np import tensorflow as tf def model(x, y): w tf.get_variable( w , shape 3, 1 ) f tf.stack( tf.square(x), x, tf.ones_like(x) , 1) yhat tf.squeeze(tf.matmul(f, w), 1) loss tf.square(yhat y) return loss x tf.placeholder(tf.float32) y tf.placeholder(tf.float32) loss model(x, y) train_op tf.train.AdamOptimizer(0.1).minimize( tf.reduce_mean(loss)) def generate_data(): x_val np.random.uniform( 10.0, 10.0, size 100) y_val 5 np.square(x_val) + 3 return x_val, y_val sess tf.Session() sess.run(tf.global_variables_initializer()) for _ in range(1000): x_val, y_val generate_data() _, loss_val sess.run( train_op, loss , {x: x_val, y: y_val}) _, loss_val sess.run( train_op, loss , {x: x_val, y: y_val}) print(sess.run(tf.contrib.framework.get_variables_by_name( w ))) Now let's use make_parallel that we just wrote to parallelize this. We only need to change two lines of code from the above code: python loss make_parallel(model, 2, x x, y y) train_op tf.train.AdamOptimizer(0.1).minimize( tf.reduce_mean(loss), colocate_gradients_with_ops True) The only thing that we need to change to parallelize backpropagation of gradients is to set the colocate_gradients_with_ops flag to true. This ensures that gradient ops run on the same device as the original op. Debugging TensorFlow models Symbolic nature of TensorFlow makes it relatively more difficult to debug TensorFlow code compared to regular python code. Here we introduce a number of tools included with TensorFlow that make debugging much easier. Probably the most common error one can make when using TensorFlow is passing Tensors of wrong shape to ops. Many TensorFlow ops can operate on tensors of different ranks and shapes. This can be convenient when using the API, but may lead to extra headache when things go wrong. For example, consider the tf.matmul op, it can multiply two matrices: python a tf.random_uniform( 2, 3 ) b tf.random_uniform( 3, 4 ) c tf.matmul(a, b) c is a tensor of shape 2, 4 But the same function also does batch matrix multiplication: python a tf.random_uniform( 10, 2, 3 ) b tf.random_uniform( 10, 3, 4 ) tf.matmul(a, b) c is a tensor of shape 10, 2, 4 Another example that we talked about before in the broadcasting ( broadcast) section is add operation which supports broadcasting: python a tf.constant( 1. , 2. ) b tf.constant( 1., 2. ) c a + b c is a tensor of shape 2, 2 Validating your tensors with tf.assert ops One way to reduce the chance of unwanted behavior is to explicitly verify the rank or shape of intermediate tensors with tf.assert ops. python a tf.constant( 1. , 2. ) b tf.constant( 1., 2. ) check_a tf.assert_rank(a, 1) This will raise an InvalidArgumentError exception check_b tf.assert_rank(b, 1) with tf.control_dependencies( check_a, check_b ): c a + b c is a tensor of shape 2, 2 Remember that assertion nodes like other operations are part of the graph and if not evaluated would get pruned during Session.run(). So make sure to create explicit dependencies to assertion ops, to force TensorFlow to execute them. You can also use assertions to validate the value of tensors at runtime: python check_pos tf.assert_positive(a) See the official docs for a full list of assertion ops . Logging tensor values with tf.Print Another useful built in function for debugging is tf.Print which logs the given tensors to the standard error: python input_copy tf.Print(input, tensors_to_print_list) Note that tf.Print returns a copy of its first argument as output. One way to force tf.Print to run is to pass its output to another op that gets executed. For example if we want to print the value of tensors a and b before adding them we could do something like this: python a ... b ... a tf.Print(a, a, b ) c a + b Alternatively we could manually define a control dependency. Check your gradients with tf.compute_gradient_error __Not__ all the operations in TensorFlow come with gradients, and it's easy to unintentionally build graphs for which TensorFlow can not compute the gradients. Let's look at an example: python import tensorflow as tf def non_differentiable_softmax_entropy(logits): probs tf.nn.softmax(logits) return tf.nn.softmax_cross_entropy_with_logits(labels probs, logits logits) w tf.get_variable( w , shape 5 ) y non_differentiable_softmax_entropy(w) opt tf.train.AdamOptimizer() train_op opt.minimize(y) sess tf.Session() sess.run(tf.global_variables_initializer()) for i in range(10000): sess.run(train_op) print(sess.run(tf.nn.softmax(w))) We are using tf.nn.softmax_cross_entropy_with_logits to define entropy over a categorical distribution. We then use Adam optimizer to find the weights with maximum entropy. If you have passed a course on information theory, you would know that uniform distribution contains maximum entropy. So you would expect for the result to be 0.2, 0.2, 0.2, 0.2, 0.2 . But if you run this you may get unexpected results like this: 0.34081486 0.24287023 0.23465775 0.08935683 0.09230034 It turns out tf.nn.softmax_cross_entropy_with_logits has undefined gradients with respect to labels! But how may we spot this if we didn't know? Fortunately for us TensorFlow comes with a numerical differentiator that can be used to find symbolic gradient errors. Let's see how we can use it: python with tf.Session(): diff tf.test.compute_gradient_error(w, 5 , y, ) print(diff) If you run this, you would see that the difference between the numerical and symbolic gradients are pretty high (0.06 0.1 in my tries). Now let's fix our function with a differentiable version of the entropy and check again: python import tensorflow as tf import numpy as np def softmax_entropy(logits, dim 1): plogp tf.nn.softmax(logits, dim) tf.nn.log_softmax(logits, dim) return tf.reduce_sum(plogp, dim) w tf.get_variable( w , shape 5 ) y softmax_entropy(w) print(w.get_shape()) print(y.get_shape()) with tf.Session() as sess: diff tf.test.compute_gradient_error(w, 5 , y, ) print(diff) The difference should be 0.0001 which looks much better. Now if you run the optimizer again with the correct version you can see the final weights would be: 0.2 0.2 0.2 0.2 0.2 which are exactly what we wanted. TensorFlow summaries , and tfdbg (TensorFlow Debugger) are other tools that can be used for debugging. Please refer to the official docs to learn more. Numerical stability in TensorFlow When using any numerical computation library such as NumPy or TensorFlow, it's important to note that writing mathematically correct code doesn't necessarily lead to correct results. You also need to make sure that the computations are stable. Let's start with a simple example. From primary school we know that x y / y is equal to x for any non zero value of x. But let's see if that's always true in practice: python import numpy as np x np.float32(1) y np.float32(1e 50) y would be stored as zero z x y / y print(z) prints nan The reason for the incorrect result is that y is simply too small for float32 type. A similar problem occurs when y is too large: python y np.float32(1e39) y would be stored as inf z x y / y print(z) prints 0 The smallest positive value that float32 type can represent is 1.4013e 45 and anything below that would be stored as zero. Also, any number beyond 3.40282e+38, would be stored as inf. python print(np.nextafter(np.float32(0), np.float32(1))) prints 1.4013e 45 print(np.finfo(np.float32).max) print 3.40282e+38 To make sure that your computations are stable, you want to avoid values with small or very large absolute value. This may sound very obvious, but these kind of problems can become extremely hard to debug especially when doing gradient descent in TensorFlow. This is because you not only need to make sure that all the values in the forward pass are within the valid range of your data types, but also you need to make sure of the same for the backward pass (during gradient computation). Let's look at a real example. We want to compute the softmax over a vector of logits. A naive implementation would look something like this: python import tensorflow as tf def unstable_softmax(logits): exp tf.exp(logits) return exp / tf.reduce_sum(exp) tf.Session().run(unstable_softmax( 1000., 0. )) prints nan, 0. Note that computing the exponential of logits for relatively small numbers results to gigantic results that are out of float32 range. The largest valid logit for our naive softmax implementation is ln(3.40282e+38) 88.7, anything beyond that leads to a nan outcome. But how can we make this more stable? The solution is rather simple. It's easy to see that exp(x c) / ∑ exp(x c) exp(x) / ∑ exp(x). Therefore we can subtract any constant from the logits and the result would remain the same. We choose this constant to be the maximum of logits. This way the domain of the exponential function would be limited to inf, 0 , and consequently its range would be 0.0, 1.0 which is desirable: python import tensorflow as tf def softmax(logits): exp tf.exp(logits tf.reduce_max(logits)) return exp / tf.reduce_sum(exp) tf.Session().run(softmax( 1000., 0. )) prints 1., 0. Let's look at a more complicated case. Consider we have a classification problem. We use the softmax function to produce probabilities from our logits. We then define our loss function to be the cross entropy between our predictions and the labels. Recall that cross entropy for a categorical distribution can be simply defined as xe(p, q) ∑ p_i log(q_i). So a naive implementation of the cross entropy would look like this: python def unstable_softmax_cross_entropy(labels, logits): logits tf.log(softmax(logits)) return tf.reduce_sum(labels logits) labels tf.constant( 0.5, 0.5 ) logits tf.constant( 1000., 0. ) xe unstable_softmax_cross_entropy(labels, logits) print(tf.Session().run(xe)) prints inf Note that in this implementation as the softmax output approaches zero, the log's output approaches infinity which causes instability in our computation. We can rewrite this by expanding the softmax and doing some simplifications: python def softmax_cross_entropy(labels, logits): scaled_logits logits tf.reduce_max(logits) normalized_logits scaled_logits tf.reduce_logsumexp(scaled_logits) return tf.reduce_sum(labels normalized_logits) labels tf.constant( 0.5, 0.5 ) logits tf.constant( 1000., 0. ) xe softmax_cross_entropy(labels, logits) print(tf.Session().run(xe)) prints 500.0 We can also verify that the gradients are also computed correctly: python g tf.gradients(xe, logits) print(tf.Session().run(g)) prints 0.5, 0.5 which is correct. Let me remind again that extra care must be taken when doing gradient descent to make sure that the range of your functions as well as the gradients for each layer are within a valid range. Exponential and logarithmic functions when used naively are especially problematic because they can map small numbers to enormous ones and the other way around. Building a neural network training framework with learn API For simplicity, in most of the examples here we manually create sessions and we don't care about saving and loading checkpoints but this is not how we usually do things in practice. You most probably want to use the learn API to take care of session management and logging. We provide a simple but practical framework for training neural networks using TensorFlow. In this item we explain how this framework works. When experimenting with neural network models you usually have a training/test split. You want to train your model on the training set, and once in a while evaluate it on test set and compute some metrics. You also need to store the model parameters as a checkpoint, and ideally you want to be able to stop and resume training. TensorFlow's learn API is designed to make this job easier, letting us focus on developing the actual model. The most basic way of using tf.learn API is to use tf.Estimator object directly. You need to define a model function that defines a loss function, a train op, one or a set of predictions, and optinoally a set of metric ops for evaluation: python import tensorflow as tf def model_fn(features, labels, mode, params): predictions ... loss ... train_op ... metric_ops ... return tf.estimator.EstimatorSpec( mode mode, predictions predictions, loss loss, train_op train_op, eval_metric_ops metric_ops) params ... run_config tf.contrib.learn.RunConfig(model_dir FLAGS.output_dir) estimator tf.estimator.Estimator( model_fn model_fn, config run_config, params params) To train the model you would then simply call Estimator.train() function while providing an input function to read the data: python def input_fn(): features ... labels ... return features, labels estimator.train(input_fn input_fn, max_steps ...) and to evaluate the model, simply call Estimator.evaluate(): estimator.evaluate(input_fn input_fn) Estimator object might be good enough for simple cases, but TensorFlow provides a higher level object called Experiment which provides some additional useful functionality. Creating an experiment object is very easy: python experiment tf.contrib.learn.Experiment( estimator estimator, train_input_fn train_input_fn, eval_input_fn eval_input_fn) Now we can call train_and_evaluate function to compute the metrics while training: experiment.train_and_evaluate() An even higher level way of running experiments is by using learn_runner.run() function. Here's how our main function looks like in the provided framework: python import tensorflow as tf tf.flags.DEFINE_string( output_dir , , Optional output dir. ) tf.flags.DEFINE_string( schedule , train_and_evaluate , Schedule. ) tf.flags.DEFINE_string( hparams , , Hyper parameters. ) FLAGS tf.flags.FLAGS def experiment_fn(run_config, hparams): estimator tf.estimator.Estimator( model_fn make_model_fn(), config run_config, params hparams) return tf.contrib.learn.Experiment( estimator estimator, train_input_fn make_input_fn(tf.estimator.ModeKeys.TRAIN, hparams), eval_input_fn make_input_fn(tf.estimator.ModeKeys.EVAL, hparams)) def main(unused_argv): run_config tf.contrib.learn.RunConfig(model_dir FLAGS.output_dir) hparams tf.contrib.training.HParams() hparams.parse(FLAGS.hparams) estimator tf.contrib.learn.learn_runner.run( experiment_fn experiment_fn, run_config run_config, schedule FLAGS.schedule, hparams hparams) if __name__ __main__ : tf.app.run() The schedule flag decides which member function of the Experiment object gets called. So, if you for example set schedule to train_and_evaluate , experiment.train_and_evaluate() would be called. The input function returns two tensors (or dictionaries of tensors) providing the features and labels to be passed to the model: python def input_fn(): features ... labels ... return features, labels See mnist.py for an example of how to read your data with the dataset API. To learn about various ways of reading your data in TensorFlow refer to this item ( data). The framework also comes with a simple convolutional network classifier in alexnet.py that includes an example model. And that's it! This is all you need to get started with TensorFlow learn API. I recommend to have a look at the framework source code and see the official python API to learn more about the learn API. TensorFlow Cookbook This section includes implementation of a set of common operations in TensorFlow. Get shape python def get_shape(tensor): Returns static shape if available and dynamic shape otherwise. static_shape tensor.shape.as_list() dynamic_shape tf.unstack(tf.shape(tensor)) dims s 1 if s 0 is None else s 0 for s in zip(static_shape, dynamic_shape) return dims Batch Gather python def batch_gather(tensor, indices): Gather in batch from a tensor of arbitrary size. In pseudocode this module will produce the following: output i tf.gather(tensor i , indices i ) Args: tensor: Tensor of arbitrary size. indices: Vector of indices. Returns: output: A tensor of gathered values. shape get_shape(tensor) flat_first tf.reshape(tensor, shape 0 shape 1 + shape 2: ) indices tf.convert_to_tensor(indices) offset_shape shape 0 + 1 (indices.shape.ndims 1) offset tf.reshape(tf.range(shape 0 ) shape 1 , offset_shape) output tf.gather(flat_first, indices + offset) return output Beam Search python import tensorflow as tf def rnn_beam_search(update_fn, initial_state, sequence_length, beam_width, begin_token_id, end_token_id, name rnn ): Beam search decoder for recurrent models. Args: update_fn: Function to compute the next state and logits given the current state and ids. initial_state: Recurrent model states. sequence_length: Length of the generated sequence. beam_width: Beam width. begin_token_id: Begin token id. end_token_id: End token id. name: Scope of the variables. Returns: ids: Output indices. logprobs: Output log probabilities probabilities. batch_size initial_state.shape.as_list() 0 state tf.tile(tf.expand_dims(initial_state, axis 1), 1, beam_width, 1 ) sel_sum_logprobs tf.log( 1. + 0. (beam_width 1) ) ids tf.tile( begin_token_id , batch_size, beam_width ) sel_ids tf.zeros( batch_size, beam_width, 0 , dtype ids.dtype) mask tf.ones( batch_size, beam_width , dtype tf.float32) for i in range(sequence_length): with tf.variable_scope(name, reuse True if i > 0 else None): state, logits update_fn(state, ids) logits tf.nn.log_softmax(logits) sum_logprobs ( tf.expand_dims(sel_sum_logprobs, axis 2) + (logits tf.expand_dims(mask, axis 2))) num_classes logits.shape.as_list() 1 sel_sum_logprobs, indices tf.nn.top_k( tf.reshape(sum_logprobs, batch_size, num_classes beam_width ), k beam_width) ids indices % num_classes beam_ids indices // num_classes state batch_gather(state, beam_ids) sel_ids tf.concat( batch_gather(sel_ids, beam_ids), tf.expand_dims(ids, axis 2) , axis 2) mask (batch_gather(mask, beam_ids) tf.to_float(tf.not_equal(ids, end_token_id))) return sel_ids, sel_sum_logprobs Merge python import tensorflow as tf def merge(tensors, units, activation tf.nn.relu, name None, kwargs): Merge features with broadcasting support. This operation concatenates multiple features of varying length and applies non linear transformation to the outcome. Example: a tf.zeros( m, 1, d1 ) b tf.zeros( 1, n, d2 ) c merge( a, b , d3) shape of c would be m, n, d3 . Args: tensors: A list of tensor with the same rank. units: Number of units in the projection function. with tf.variable_scope(name, default_name merge ): Apply linear projection to input tensors. projs for i, tensor in enumerate(tensors): proj tf.layers.dense( tensor, units, activation None, name proj_%d % i, kwargs) projs.append(proj) Compute sum of tensors. result projs.pop() for proj in projs: result result + proj Apply nonlinearity. if activation: result activation(result) return result Entropy python import tensorflow as tf def softmax_entropy(logits, dim 1): Compute entropy over specified dimensions. plogp tf.nn.softmax(logits, dim) tf.nn.log_softmax(logits, dim) return tf.reduce_sum(plogp, dim) KL Divergence python def gaussian_kl(q, p (0., 0.)): Computes KL divergence between two isotropic Gaussian distributions. To ensure numerical stability, this op uses mu, log(sigma^2) to represent the distribution. If q is not provided, it's assumed to be unit Gaussian. Args: q: A tuple (mu, log(sigma^2)) representing a multi variatie Gaussian. p: A tuple (mu, log(sigma^2)) representing a multi variatie Gaussian. Returns: A tensor representing KL(q, p). mu1, log_sigma1_sq q mu2, log_sigma2_sq p return tf.reduce_sum( 0.5 (log_sigma2_sq log_sigma1_sq + tf.exp(log_sigma1_sq log_sigma2_sq) + tf.square(mu1 mu2) / tf.exp(log_sigma2_sq) 1), axis 1) Make parallel python def make_parallel(fn, num_gpus, kwargs): Parallelize given model on multiple gpu devices. Args: fn: Arbitrary function that takes a set of input tensors and outputs a single tensor. First dimension of inputs and output tensor are assumed to be batch dimension. num_gpus: Number of GPU devices. kwargs: Keyword arguments to be passed to the model. Returns: A tensor corresponding to the model output. in_splits {} for k, v in kwargs.items(): in_splits k tf.split(v, num_gpus) out_split for i in range(num_gpus): with tf.device(tf.DeviceSpec(device_type GPU , device_index i)): with tf.variable_scope(tf.get_variable_scope(), reuse i > 0): out_split.append(fn( {k : v i for k, v in in_splits.items()})) return tf.concat(out_split, axis 0) Leaky relu python def leaky_relu(tensor, alpha 0.1): Computes the leaky rectified linear activation. return tf.maximum(tensor, alpha tensor) Batch normalization python def batch_normalization(tensor, training False, epsilon 0.001, momentum 0.9, fused_batch_norm False, name None): Performs batch normalization on given 4 D tensor. The features are assumed to be in NHWC format. Noe that you need to run UPDATE_OPS in order for this function to perform correctly, e.g.: with tf.control_dependencies(tf.get_collection(tf.GraphKeys.UPDATE_OPS)): train_op optimizer.minimize(loss) Based on: with tf.variable_scope(name, default_name batch_normalization ): channels tensor.shape.as_list() 1 axes list(range(tensor.shape.ndims 1)) beta tf.get_variable( 'beta', channels, initializer tf.zeros_initializer()) gamma tf.get_variable( 'gamma', channels, initializer tf.ones_initializer()) avg_mean tf.get_variable( avg_mean , channels, initializer tf.zeros_initializer(), trainable False) avg_variance tf.get_variable( avg_variance , channels, initializer tf.ones_initializer(), trainable False) if training: if fused_batch_norm: mean, variance None, None else: mean, variance tf.nn.moments(tensor, axes axes) else: mean, variance avg_mean, avg_variance if fused_batch_norm: tensor, mean, variance tf.nn.fused_batch_norm( tensor, scale gamma, offset beta, mean mean, variance variance, epsilon epsilon, is_training training) else: tensor tf.nn.batch_normalization( tensor, mean, variance, beta, gamma, epsilon) if training: update_mean tf.assign( avg_mean, avg_mean momentum + mean (1.0 momentum)) update_variance tf.assign( avg_variance, avg_variance momentum + variance (1.0 momentum)) tf.add_to_collection(tf.GraphKeys.UPDATE_OPS, update_mean) tf.add_to_collection(tf.GraphKeys.UPDATE_OPS, update_variance) return tensor Squeeze and excitation python def squeeze_and_excite(tensor, ratio 16, name None): Apply squeeze/excite on given 4 D tensor. Based on: with tf.variable_scope(name, default_name squeeze_and_excite ): original tensor units tensor.shape.as_list() 1 tensor tf.reduce_mean(tensor, 1, 2 , keep_dims True) tensor tf.layers.dense(tensor, units / ratio, use_bias False) tensor tf.nn.relu(tensor) tensor tf.layers.dense(tensor, units, use_bias False) tensor tf.nn.sigmoid(tensor) tensor original tensor return tensor",Image Classification,Image Classification 2512,Computer Vision,Computer Vision,Computer Vision,DenseNet Paper: Build and train a generic DenseNet( BC) using the CIFAR 10 dataset. Trained an DenseNet 40 to an error rate of 5.32% with data augmenation. This is comparable to the error reported in the paper of 5.24%.,Image Classification,Image Classification 2529,Computer Vision,Computer Vision,Computer Vision,"CleverHans (latest release: v3.0.1) Build Status Documentation Status This repository contains the source code for CleverHans, a Python library to benchmark machine learning systems' vulnerability to adversarial examples . You can learn more about such vulnerabilities on the accompanying blog . The CleverHans library is under continual development, always welcoming contributions of the latest attacks and defenses. In particular, we always welcome help towards resolving the issues currently open. Major updates coming to CleverHans CleverHans will soon support 3 frameworks: JAX, PyTorch, and TF2. The package itself will focus on its initial principle: reference implementation of attacks against machine learning models to help with benchmarking models against adversarial examples. This repository will also contain two folders: tutorials/ for scripts demonstrating the features of CleverHans and defenses/ for scripts that contain authoritative implementations of defenses in one of the 3 supported frameworks. The structure of the future repository will look like this: cleverhans/ jax/ attacks/ ... tf2/ attacks/ ... torch/ attacks/ ... defenses/ jax/ ... tf2/ ... torch/ ... tutorials/ jax/ ... tf2/ ... torch/ ... In the meanwhile, all of these folders can be found in the correspond future/ subdirectory (e.g., cleverhans/future/jax/attacks or defenses/future/jax/ ). A public milestone has been created to track the changes that are to be implemented before the library version is incremented to v4. Setting up CleverHans Dependencies This library uses TensorFlow to accelerate graph computations performed by many machine learning models. Therefore, installing TensorFlow is a pre requisite. You can find instructions here . For better performance, it is also recommended to install TensorFlow with GPU support (detailed instructions on how to do this are available in the TensorFlow installation documentation). Installing TensorFlow will take care of all other dependencies like numpy and scipy . Installation Once dependencies have been taken care of, you can install CleverHans using pip or by cloning this Github repository. pip installation If you are installing CleverHans using pip , run the following command after installing TensorFlow: pip install cleverhans This will install the last version uploaded to Pypi . If you'd instead like to install the bleeding edge version, use: pip install git+ Installation for development If you want to make an editable installation of CleverHans so that you can develop the library and contribute changes back, first fork the repository on GitHub and then clone your fork into a directory of your choice: git clone You can then install the local package in editable mode in order to add it to your PYTHONPATH : cd cleverhans pip install e . Currently supported setups Although CleverHans is likely to work on many other machine configurations, we currently test it it with Python 3.5 and TensorFlow {1.8, 1.12} on Ubuntu 14.04.5 LTS (Trusty Tahr). Support for Python 2.7 is deprecated. CleverHans 3.0.1 supports Python 2.7 and the master branch is likely to continue to work in Python 2.7 for some time, but we no longer run the tests in Python 2.7 and we do not plan to fix bugs affecting only Python 2.7 after 2019 07 04. Support for TensorFlow prior to 1.12 is deprecated. Backwards compatibility wrappers for these versions may be removed after 2019 07 07, and we will not fix bugs for those versions after that date. Support for TensorFlow 1.7 and earlier is already deprecated: we do not fix bugs for those versions and any remaining wrapper code for those versions may be removed without further notice. Getting support If you have a request for support, please ask a question on StackOverflow rather than opening an issue in the GitHub tracker. The GitHub issue tracker should only be used to report bugs or make feature requests. Contributing Contributions are welcomed! To speed the code review process, we ask that: New efforts and features be coordinated on the mailing list for CleverHans development: cleverhans dev@googlegroups.com . When making code contributions to CleverHans, you follow the PEP8 with two spaces coding style (the same as the one used by TensorFlow) in your pull requests. In most cases this can be done by running autopep8 i indent size 2 on the files you have edited. You can check your code by running nosestests cleverhans/devtools/tests/test_format.py or check an individual file by running pylint from inside the cleverhans repository root directory. When making your first pull request, you sign the Google CLA We do not accept pull requests that add git submodules because of the problems that arise when maintaining git submodules Bug fixes can be initiated through Github pull requests. Scripts: scripts directory The scripts directory contains command line utilities. In many cases you can use these to run CleverHans functionality on your saved models without needing to write any of your own Python code. You may want to set your .bashrc / .bash_profile file to add the CleverHans scripts directory to your PATH environment variable so that these scripts will be conveniently executable from any directory. Tutorials: cleverhans_tutorials directory To help you get started with the functionalities provided by this library, the cleverhans_tutorials/ folder comes with the following tutorials: MNIST with FGSM ( code (cleverhans_tutorials/mnist_tutorial_tf.py)): this tutorial covers how to train a MNIST model using TensorFlow, craft adversarial examples using the fast gradient sign method , and make the model more robust to adversarial examples using adversarial training. MNIST with FGSM using Keras ( code (cleverhans_tutorials/mnist_tutorial_keras_tf.py)): this tutorial covers how to define a MNIST model with Keras and train it using TensorFlow, craft adversarial examples using the fast gradient sign method , and make the model more robust to adversarial examples using adversarial training. MNIST with JSMA ( code (cleverhans_tutorials/mnist_tutorial_jsma.py)): this second tutorial covers how to define a MNIST model with Keras and train it using TensorFlow and craft adversarial examples using the Jacobian based saliency map approach . MNIST using a black box attack ( code (cleverhans_tutorials/mnist_blackbox.py)): this tutorial implements the black box attack described in this paper . The adversary train a substitute model: a copy that imitates the black box model by observing the labels that the black box model assigns to inputs chosen carefully by the adversary. The adversary then uses the substitute model’s gradients to find adversarial examples that are misclassified by the black box model as well. NOTE: the tutorials are maintained carefully, in the sense that we use continuous integration to make sure they continue working. They are not considered part of the API and they can change at any time without warning. You should not write 3rd party code that imports the tutorials and expect that the interface will not break. Only the main library is subject to our six month interface deprecation warning rule. NOTE: please write to cleverhans dev@googlegroups.com before writing a new tutorial. Because each new tutorial involves a large amount of duplicated code relative to the existing tutorials, and because every line of code requires ongoing testing and maintenance indefinitely, we generally prefer not to add new tutorials. Each tutorial should showcase an extremely different way of using the library. Just calling a different attack, model, or dataset is not enough to justify maintaining a parallel tutorial. Examples : examples directory The examples/ folder contains additional scripts to showcase different uses of the CleverHans library or get you started competing in different adversarial example contests. We do not offer nearly as much ongoing maintenance or support for this directory as the rest of the library, and if code in here gets broken we may just delete it without warning. List of attacks You can find a full list attacks along with their function signatures at cleverhans.readthedocs.io Reporting benchmarks When reporting benchmarks, please: Use a versioned release of CleverHans. You can find a list of released versions here . Either use the latest version, or, if comparing to an earlier publication, use the same version as the earlier publication. Report which attack method was used. Report any configuration variables used to determine the behavior of the attack. For example, you might report We benchmarked the robustness of our method to adversarial attack using v3.0.1 of CleverHans. On a test set modified by the FastGradientMethod with a max norm eps of 0.3, we obtained a test set accuracy of 71.3%. Citing this work If you use CleverHans for academic research, you are highly encouraged (though not required) to cite the following paper : @article{papernot2018cleverhans, title {Technical Report on the CleverHans v2.1.0 Adversarial Examples Library}, author {Nicolas Papernot and Fartash Faghri and Nicholas Carlini and Ian Goodfellow and Reuben Feinman and Alexey Kurakin and Cihang Xie and Yash Sharma and Tom Brown and Aurko Roy and Alexander Matyasko and Vahid Behzadan and Karen Hambardzumyan and Zhishuai Zhang and Yi Lin Juang and Zhi Li and Ryan Sheatsley and Abhibhav Garg and Jonathan Uesato and Willi Gierke and Yinpeng Dong and David Berthelot and Paul Hendricks and Jonas Rauber and Rujun Long}, journal {arXiv preprint arXiv:1610.00768}, year {2018} } About the name The name CleverHans is a reference to a presentation by Bob Sturm titled “Clever Hans, Clever Algorithms: Are Your Machine Learnings Learning What You Think? and the corresponding publication, A Simple Method to Determine if a Music Information Retrieval System is a 'Horse'. Clever Hans was a horse that appeared to have learned to answer arithmetic questions, but had in fact only learned to read social cues that enabled him to give the correct answer. In controlled settings where he could not see people's faces or receive other feedback, he was unable to answer the same questions. The story of Clever Hans is a metaphor for machine learning systems that may achieve very high accuracy on a test set drawn from the same distribution as the training data, but that do not actually understand the underlying task and perform poorly on other inputs. Authors This library is managed and maintained by Ian Goodfellow (Google Brain) and Nicolas Papernot (Google Brain). The following authors contributed 100 lines or more (ordered according to the GitHub contributors page): Ian Goodfellow (Google Brain) Nicolas Papernot (Google Brain) Nicholas Carlini (Google Brain) Fartash Faghri (University of Toronto) Tzu Wei Sung (National Taiwan University) Alexey Kurakin (Google Brain) Reuben Feinman (New York University) Phani Krishna (Video Analytics Lab) David Berthelot (Google Brain) Tom Brown (Google Brain) Cihang Xie (Johns Hopkins) Yash Sharma (The Cooper Union) Aashish Kumar (HARMAN X) Aurko Roy (Google Brain) Alexander Matyasko (Nanyang Technological University) Anshuman Suri (Microsoft) Yen Chen Lin (MIT) Vahid Behzadan (Kansas State) Jonathan Uesato (DeepMind) Haojie Yuan (University of Science & Technology of China) Zhishuai Zhang (Johns Hopkins) Karen Hambardzumyan (YerevaNN) Jianbo Chen (UC Berkeley) Catherine Olsson (Google Brain) Aidan Gomez (University of Oxford) Zhi Li (University of Toronto) Yi Lin Juang (NTUEE) Pratyush Sahay (formerly HARMAN X) Abhibhav Garg (IIT Delhi) Aditi Raghunathan (Stanford University) Yang Song (Stanford University) Riccardo Volpi (Italian Institute of Technology) Angus Galloway (University of Guelph) Yinpeng Dong (Tsinghua University) Willi Gierke (Hasso Plattner Institute) Bruno López Jonas Rauber (IMPRS) Paul Hendricks (NVIDIA) Ryan Sheatsley (Pennsylvania State University) Rujun Long (0101.AI) Bogdan Kulynych (EPFL) Erfan Noury (UMBC) Robert Wagner (Case Western Reserve University) Copyright Copyright 2019 Google Inc., OpenAI and Pennsylvania State University.",Image Classification,Image Classification 2530,Computer Vision,Computer Vision,Computer Vision,"Lingvo What is it? Lingvo is a framework for building neural networks in Tensorflow, particularly sequence models. A list of publications using Lingvo can be found here (PUBLICATIONS.md). Quick start Docker The easiest way to get started is to use the provided Docker script (docker/dev.dockerfile). If instead you want to install it directly on your machine, skip to the section below. First, install docker . Then, the following commands should give you a working shell with Lingvo installed. shell LINGVO_DIR /tmp/lingvo (change to the cloned lingvo directory, e.g. $HOME/lingvo ) LINGVO_DEVICE gpu (Leave empty to build and run CPU only docker) sudo docker build tag tensorflow:lingvo $(test $LINGVO_DEVICE gpu && echo build arg base_image nvidia/cuda:10.0 cudnn7 runtime ubuntu16.04 ) 1,2 asr.librispeech.Librispeech960Wpm 1,2 Image image.mnist.LeNet5 3 Language Modelling lm.one_billion_wds.WordLevelOneBwdsSimpleSampledSoftmax 4 Machine Translation mt.wmt14_en_de.WmtEnDeTransformerBase 5 mt.wmt14_en_de.WmtEnDeRNMT 5 mt.wmtm16_en_de.WmtCaptionEnDeTransformer 5 \ 1 : Listen, Attend and Spell . William Chan, Navdeep Jaitly, Quoc V. Le, and Oriol Vinyals. ICASSP 2016. \ 2 : End to end Continuous Speech Recognition using Attention based Recurrent NN: First Results . Jan Chorowski, Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. arXiv 2014. \ 3 : Gradient based learning applied to document recognition . Yann LeCun, Leon Bottou, Yoshua Bengio, and Patrick Haffner. IEEE 1998. \ 4 : Exploring the Limits of Language Modeling . Rafal Jozefowicz, Oriol Vinyals, Mike Schuster, Noam Shazeer, and Yonghui Wu. arXiv, 2016. \ 5 : The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation . Mia X. Chen, Orhan Firat, Ankur Bapna, Melvin Johnson, Wolfgang Macherey, George Foster, Llion Jones, Mike Schuster, Noam Shazeer, Niki Parmar, Ashish Vaswani, Jakob Uszkoreit, Lukasz Kaiser, Zhifeng Chen, Yonghui Wu, and Macduff Hughes. ACL 2018. References API Docs Please cite this paper when referencing Lingvo. @misc{shen2019lingvo, title {Lingvo: a Modular and Scalable Framework for Sequence to Sequence Modeling}, author {Jonathan Shen and Patrick Nguyen and Yonghui Wu and Zhifeng Chen and others}, year {2019}, eprint {1902.08295}, archivePrefix {arXiv}, primaryClass {cs.LG} }",Image Classification,Image Classification 2535,Computer Vision,Computer Vision,Computer Vision,"TensorFlow Models This repository contains a number of different models implemented in TensorFlow : The official models (official) are a collection of example models that use TensorFlow's high level APIs. They are intended to be well maintained, tested, and kept up to date with the latest stable TensorFlow API. They should also be reasonably optimized for fast performance while still being easy to read. We especially recommend newer TensorFlow users to start here. The research models are a large collection of models implemented in TensorFlow by researchers. They are not officially supported or available in release branches; it is up to the individual researchers to maintain the models and/or provide support on issues and pull requests. The samples folder (samples) contains code snippets and smaller models that demonstrate features of TensorFlow, including code presented in various blog posts. The tutorials folder (tutorials) is a collection of models described in the TensorFlow tutorials . Contribution guidelines If you want to contribute to models, be sure to review the contribution guidelines (CONTRIBUTING.md). License Apache License 2.0 (LICENSE)",Image Classification,Image Classification 2541,Computer Vision,Computer Vision,Computer Vision,"TensorFlow Models This repository contains a number of different models implemented in TensorFlow : The official models (official) are a collection of example models that use TensorFlow's high level APIs. They are intended to be well maintained, tested, and kept up to date with the latest stable TensorFlow API. They should also be reasonably optimized for fast performance while still being easy to read. We especially recommend newer TensorFlow users to start here. The research models are a large collection of models implemented in TensorFlow by researchers. They are not officially supported or available in release branches; it is up to the individual researchers to maintain the models and/or provide support on issues and pull requests. The samples folder (samples) contains code snippets and smaller models that demonstrate features of TensorFlow, including code presented in various blog posts. The tutorials folder (tutorials) is a collection of models described in the TensorFlow tutorials . Contribution guidelines If you want to contribute to models, be sure to review the contribution guidelines (CONTRIBUTING.md). License Apache License 2.0 (LICENSE)",Image Classification,Image Classification 2542,Computer Vision,Computer Vision,Computer Vision,"TensorFlow Models This repository contains a number of different models implemented in TensorFlow : The official models (official) are a collection of example models that use TensorFlow's high level APIs. They are intended to be well maintained, tested, and kept up to date with the latest stable TensorFlow API. They should also be reasonably optimized for fast performance while still being easy to read. We especially recommend newer TensorFlow users to start here. The research models are a large collection of models implemented in TensorFlow by researchers. They are not officially supported or available in release branches; it is up to the individual researchers to maintain the models and/or provide support on issues and pull requests. The samples folder (samples) contains code snippets and smaller models that demonstrate features of TensorFlow, including code presented in various blog posts. The tutorials folder (tutorials) is a collection of models described in the TensorFlow tutorials . Contribution guidelines If you want to contribute to models, be sure to review the contribution guidelines (CONTRIBUTING.md). License Apache License 2.0 (LICENSE)",Image Classification,Image Classification 2545,Computer Vision,Computer Vision,Computer Vision,"TensorFlow Models This repository contains a number of different models implemented in TensorFlow : The official models (official) are a collection of example models that use TensorFlow's high level APIs. They are intended to be well maintained, tested, and kept up to date with the latest stable TensorFlow API. They should also be reasonably optimized for fast performance while still being easy to read. We especially recommend newer TensorFlow users to start here. The research models are a large collection of models implemented in TensorFlow by researchers. They are not officially supported or available in release branches; it is up to the individual researchers to maintain the models and/or provide support on issues and pull requests. The samples folder (samples) contains code snippets and smaller models that demonstrate features of TensorFlow, including code presented in various blog posts. The tutorials folder (tutorials) is a collection of models described in the TensorFlow tutorials . Contribution guidelines If you want to contribute to models, be sure to review the contribution guidelines (CONTRIBUTING.md). License Apache License 2.0 (LICENSE)",Image Classification,Image Classification 2558,Computer Vision,Computer Vision,Computer Vision,OctaveConv A MXNet Implementation for Drop an Octave This repository contains a MXNet implementation of the paper Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution . Model Top1 : : : ResNet v1 50 76.05 OctResNet v1 50 77.47 OctResNet v1 50 cosine 78.04 ! example (fig/training curve.png) OctResNet v1 50 cosine model used alpha 0.25 in the table 2 of the paper. To Do List support mobilenet v1/v2 Acknowledgment This repo is based on DPN . UPDATE Here is the official implementation: (Highly Recommend!),Image Classification,Image Classification 2561,Computer Vision,Computer Vision,Computer Vision,"Face_Recognition Những thứ cần đọc 1: 2: (sách python) 3: 4: 5: 6: 7: BGR convert to HSV color space 'HSV color space is consists of 3 matrices, 'hue', 'saturation' and 'value'. In OpenCV, value range for 'hue', 'saturation' and 'value' are respectively 0 179, 0 255 and 0 255. 'Hue' represents the color, 'saturation' represents the amount to which that respective color is mixed with white and 'value' represents the amount to which that respective color is mixed with black' 8: package for Windows (Python) : 9: Tmux vs PM2 10: towardsdatascience 11: batch_size vs epoch 12: Mô hình tượng trưng (Backpropagation algorithm) 13: Event Sourcing 14: Command Sourcing 15: relu def relu(input): '''Define your relu activation function here''' Calculate the value for the output of the relu function: output output max(0, input) Return the value just calculated return(output) 16: Understanding more about NN 17: Những điều nên biết 18: NLP 19: Tensorflow 20: Word2Vector 21: Quốc Phạm Bá Cường 22: Install Conda console on Windows install pytorch using fastai 23: >>> Hoc Classify <<<< 24: Thiết kế và Export DB (new generate) 25: Thuật toán python trình bày dạng code 26: Tracker mot diem 27: Học xử lý ảnh 28: Djangon Struct JavaFX+Sphinx4+TextField : Requested in Commends Quick Update Welcome to the HelloGIT wiki! Post Single Images Những thứ cần xem cho việc nhận dang người !! Facenet (đây là cái chính mà chúng ta không thể bỏ qua nó bao gốm các vận dụng nó vào realtime để có thể sửa và sử dụng) face net và cách sử dụng github cho đoạn code readtime Học cách train Cách ảnh để nhận diện python src/align/align_dataset_mtcnn.py /datasets/casia/CASIA maxpy clean/ /datasets/casia/casia_maxpy_mtcnnpy_182 image_size 182 margin 44 Các dataset so sánh với nhau (Mình cảm thấy cái này rất hay) (nên xem để tạp train) (convert .mat to .csv) counting Nhan Dien Moi how to train tensorflow : Khuyet Diem UU diem Highly collectable. Highly exposed due to location of face and larger details. Nhuoc Diem 👍 Low distinctiveness. Facial characteristics may repeat in people, e.g. in twins. Medium permanence and stability, may get affected by age. conclude Despite being the part of physiological biometrics, fingerprint recognition considerably differ from facial recognition. Both the recognition methods have their own advantages and disadvantages, but none of them can replace each other. Facial recognition is good in mass surveillance applications at crowded places, while personal identification with user consent is better achieved by fingerprint recognition. Both the recognition methods have been used in law enforcement extensively. Biometrics has been a part of forensics for more than 100 years, while modern mass surveillance is performed with facial recognition systems by various law enforcement and national security agencies. counting : Gender Recognition web: của nó Package: Emotion : những ứng dụng của nhận diện khuôn mặt Java POST def count_substring(string, sub_string): counter 0 sub_len len(sub_string) for i in range(len(string) len(sub_string)+1): if string i sub_string 0 : if string i:(i + sub_len) sub_string: counter counter + 1 return counter ChatBot 1 so thu hay ho tieng anh window ML Object Detection With ObjectCV quan trong ve CNN thông tin chi tiết trong Tensorlfow Chứng minh Inception_v3 in tensorflow: (PDF) Luận văn :+1: Thêm phần định nghĩa các cấu trúc Cái Này rất quan trọng Cài đặt cuda9.0 trên tensorflow 1.9.0 Mạng máy tính: một vài trang cần biết trước khi làm (Phân loại chất lượng thực phẩm) Intel 3D camera D400 Trang chủ: (trang thảo luận) Issue",Image Classification,Image Classification 2584,Computer Vision,Computer Vision,Computer Vision,"Randomly Wired Neural Networks PWC Tensorflow implementation of Exploring Randomly Wired Neural Networks for Image Recognition Saining Xie, Alexander Kirillov, Ross Girshick, Kaiming He Arxiv Features ImageNet tfrecord input pipeline TF estimator support Horovod distribute training Label smoothing Cosine lr schedule Weight decay Small and regular regime in paper MNIST custom example Requirements Tensorflow 1.13 NetworkX Horovod (optional) Prepare ImageNet TFRecords Download ILSVRC2012_img_train.tar , ILSVRC2012_img_val.tar , synset_labels.txt Put in /your_path/to/data python src/dataset/imagenet_to_tfrecord.py raw_data_dir /your_path/to/data local_scratch_dir /path/to/tfrecords Training train.py: config: Path to config file (default: 'src/config/default.json') num_gpu: If greater or equal to 2, use distribute training (default: '1') pretrained: Continue training from this pretrained model (default: '') save_path: Path to save ckpt and logging files (default: '') MNIST python train.py config src/config/mnist.json save_path mnist_example ! alt text (assets/mnist_loss.png) ! (assets/mnist_top1.png) ImageNet with Tensorflow Estimator Set config 'Data' 'root_path' to your imagenet tfrecords folder python train.py config src/config/regular.json save_path mnist_example ImageNet with Horovod Set config 'Data' 'root_path' to your imagenet tfrecords folder Generate rand graph first to avoid conflict python src/util/generate_rand_graph.py config src/config/small.json save_path small_imagenet Run horovod command horovodrun np ${num_gpu} H localhost:${num_gpu} python train_horovod.py Exported model_dir eval eval_log train_log model.ckpt rand_graph dag_0.txt dag_1.txt dag_2.txt Generated random graph adjacency matrix will be saved as text file TODO Training on ImageNet License Apache License 2.0.",Image Classification,Image Classification 2585,Computer Vision,Computer Vision,Computer Vision,"Stochastic Weight Averaging (SWA) This repository contains a PyTorch implementation of the Stochastic Weight Averaging (SWA) training method for DNNs from the paper Averaging Weights Leads to Wider Optima and Better Generalization by Pavel Izmailov, Dmitrii Podoprikhin, Timur Garipov, Dmitry Vetrov and Andrew Gordon Wilson. Introduction SWA is a simple DNN training method that can be used as a drop in replacement for SGD with improved generalization, faster convergence, and essentially no overhead. The key idea of SWA is to average multiple samples produced by SGD with a modified learning rate schedule. We use a constant or cyclical learning rate schedule that causes SGD to _explore_ the set of points in the weight space corresponding to high performing networks. We observe that SWA converges more quickly than SGD, and to wider optima that provide higher test accuracy. In this repo we implement the constant learning rate schedule that we found to be most practical on CIFAR datasets. Please cite our work if you find this approach useful in your research: latex @article{izmailov2018averaging, title {Averaging Weights Leads to Wider Optima and Better Generalization}, author {Izmailov, Pavel and Podoprikhin, Dmitrii and Garipov, Timur and Vetrov, Dmitry and Wilson, Andrew Gordon}, journal {arXiv preprint arXiv:1803.05407}, year {2018} } Dependencies PyTorch torchvision tabulate Usage The code in this repository implements both SWA and conventional SGD training, with examples on the CIFAR 10 and CIFAR 100 datasets. To run SWA use the following command: bash python3 train.py dir \ dataset \ data_path \ model \ epochs \ lr_init \ wd \ swa \ swa_start \ swa_lr Parameters: DIR — path to training directory where checkpoints will be stored DATASET — dataset name CIFAR10/CIFAR100 (default: CIFAR10) PATH — path to the data directory MODEL — DNN model name: VGG16/VGG16BN/VGG19/VGG19BN PreResNet110/PreResNet164 WideResNet28x10 EPOCHS — number of training epochs (default: 200) LR_INIT — initial learning rate (default: 0.1) WD — weight decay (default: 1e 4) SWA_START — the number of epoch after which SWA will start to average models (default: 161) SWA_LR — SWA learning rate (default: 0.05) To run conventional SGD training use the following command: bash python3 train.py dir \ dataset \ data_path \ model \ epochs \ lr_init \ wd Examples To reproduce the results from the paper run (we use same parameters for both CIFAR 10 and CIFAR 100 except for PreResNet): bash VGG16 python3 train.py dir dataset CIFAR100 data_path model VGG16 epochs 200 lr_init 0.05 wd 5e 4 SGD python3 train.py dir dataset CIFAR100 data_path model VGG16 epochs 300 lr_init 0.05 wd 5e 4 swa swa_start 161 swa_lr 0.01 SWA 1.5 Budgets PreResNet python3 train.py dir dataset CIFAR100 data_path model PreResNet110 or PreResNet164 epochs 150 lr_init 0.1 wd 3e 4 SGD CIFAR100 python3 train.py dir dataset CIFAR100 data_path model PreResNet110 or PreResNet164 epochs 225 lr_init 0.1 wd 3e 4 swa swa_start 126 swa_lr 0.05 SWA 1.5 Budgets CIFAR10 python3 train.py dir dataset CIFAR10 data_path model PreResNet110 or PreResNet164 epochs 225 lr_init 0.1 wd 3e 4 swa swa_start 126 swa_lr 0.01 SWA 1.5 Budgets WideResNet28x10 python3 train.py dir dataset CIFAR100 data_path model WideResNet28x10 epochs 200 lr_init 0.1 wd 5e 4 SGD python3 train.py dir dataset CIFAR100 data_path model WideResNet28x10 epochs 300 lr_init 0.1 wd 5e 4 swa swa_start 161 swa_lr 0.05 SWA 1.5 Budgets Results CIFAR 100 Test accuracy (%) of SGD and SWA on CIFAR 100 for different training budgets. For each model the _Budget_ is defined as the number of epochs required to train the model with the conventional SGD procedure. DNN (Budget) SGD SWA 1 Budget SWA 1.25 Budgets SWA 1.5 Budgets : : : : : : : : VGG16 (200) 72.55 ± 0.10 73.91 ± 0.12 74.17 ± 0.15 74.27 ± 0.25 PreResNet110 (150) 76.77 ± 0.38 78.75 ± 0.16 78.91 ± 0.29 79.10 ± 0.21 PreResNet164 (150) 78.49 ± 0.36 79.77 ± 0.17 80.18 ± 0.23 80.35 ± 0.16 WideResNet28x10 (200) 80.82 ± 0.23 81.46 ± 0.23 81.91 ± 0.27 82.15 ± 0.27 Below we show the convergence plot for SWA and SGD with PreResNet164 on CIFAR 100 and the corresponding learning rates. The dashed line illustrates the accuracy of individual models averaged by SWA. CIFAR 10 Test accuracy (%) of SGD and SWA on CIFAR 10 for different training budgets. DNN (Budget) SGD SWA 1 Budget SWA 1.25 Budgets SWA 1.5 Budgets : : : : : : : : VGG16 (200) 93.25 ± 0.16 93.59 ± 0.16 93.70 ± 0.22 93.64 ± 0.18 PreResNet110 (150) 95.03 ± 0.05 95.51 ± 0.10 95.65 ± 0.03 95.82 ± 0.03 PreResNet164 (150) 95.28 ± 0.10 95.56 ± 0.11 95.77 ± 0.04 95.83 ± 0.03 WideResNet28x10 (200) 96.18 ± 0.11 96.45 ± 0.11 96.64 ± 0.08 96.79 ± 0.05 References Provided model implementations were adapted from VGG: github.com/pytorch/vision/ PreResNet: github.com/bearpaw/pytorch classification WideResNet: github.com/meliketoy/wide resnet.pytorch",Image Classification,Image Classification 2586,Computer Vision,Computer Vision,Computer Vision,pytorch capsule A Pytorch implementation of Hinton's Dynamic Routing Between Capsules . Thanks to @naturomics for his Tensorflow implementation which was a useful guide and sanity check. To use: $ python main.py,Image Classification,Image Classification 2590,Computer Vision,Computer Vision,Computer Vision,"Dense Net in Keras DenseNet implementation of the paper Densely Connected Convolutional Networks in Keras Now supports the more efficient DenseNet BC (DenseNet Bottleneck Compressed) networks. Using the DenseNet BC 190 40 model, it obtaines state of the art performance on CIFAR 10 and CIFAR 100 Architecture DenseNet is an extention to Wide Residual Networks. According to the paper: The lth layer has l inputs, consisting of the feature maps of all preceding convolutional blocks. Its own feature maps are passed on to all L − l subsequent layers. This introduces L(L+1) / 2 connections in an L layer network, instead of just L, as in traditional feed forward architectures. Because of its dense connectivity pattern, we refer to our approach as Dense Convolutional Network (DenseNet). It features several improvements such as : 1. Dense connectivity : Connecting any layer to any other layer. 2. Growth Rate parameter Which dictates how fast the number of features increase as the network becomes deeper. 3. Consecutive functions : BatchNorm Relu Conv which is from the Wide ResNet paper and improvement from the ResNet paper. The Bottleneck Compressed DenseNets offer further performance benefits, such as reduced number of parameters, with similar or better performance. Take into consideration the DenseNet 100 12 model, with nearly 7 million parameters against with the DenseNet BC 100 12, with just 0.8 million parameters. The BC model achieves 4.51 % error in comparison to the original models' 4.10 % error The best original model, DenseNet 100 24 (27.2 million parameters) achieves 3.74 % error, whereas the DenseNet BC 190 40 (25.6 million parameters) achieves 3.46 % error which is a new state of the art performance on CIFAR 10. Dense Nets have an architecture which can be shown in the following image from the paper: Performance The accuracy of DenseNet has been provided in the paper, beating all previous benchmarks in CIFAR 10, CIFAR 100 and SVHN Usage Import the densenet.py script and use the DenseNet(...) method to create a custom DenseNet model with a variety of parameters. Examples : import densenet 'th' dim ordering or 'tf' dim ordering image_dim (3, 32, 32) or image_dim (32, 32, 3) model densenet.DenseNet(classes 10, input_shape image_dim, depth 40, growth_rate 12, bottleneck True, reduction 0.5) Or, Import a pre built DenseNet model for ImageNet, with some of these models having pre trained weights (121, 161 and 169). Example : import densenet 'th' dim ordering or 'tf' dim ordering image_dim (3, 224, 224) or image_dim (224, 224, 3) model densenet.DenseNetImageNet121(input_shape image_dim) Weights for the DenseNetImageNet121, DenseNetImageNet161 and DenseNetImageNet169 models are provided ( in the release tab ) and will be automatically downloaded when first called. They have been trained on ImageNet. The weights were ported from the repository Requirements Keras Theano (weights not tested) / Tensorflow (tested) / CNTK (weights not tested) h5Py",Image Classification,Image Classification 2592,Computer Vision,Computer Vision,Computer Vision,"Mobile Networks (V1 and V2) in Keras Keras implementation of the paper MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications + ported weights. Contains the Keras implementation of the paper MobileNetV2: Inverted Residuals and Linear Bottlenecks + ported weights. ! mobilenets Benefits of Mobile Nets As explained in the paper, large neural networks can be exorbitant, both in the amount of memory they require to perform predictions, to the actual size of the model weights. Therefore, by using Depthwise Convolutions, we can reduce a significant portion of the model size while still retaining very good performance. Creating MobileNets The default MobileNet corresponds to the model pre trained on ImageNet. It has an input shape of (224, 224, 3). You can now create either the original version of MobileNet or the MobileNetV2 recently released using the appropriate method. from mobilenets import MobileNet, MobileNetV2 for V1 model MobileNet() for V2 model MobileNetV2() MobileNet V1 There are two hyperparameters that you can change alpha (the widening factor), and depth_multiplier . The ImageNet model uses the default values of 1 for both of the above. from mobilenets import MobileNet model MobileNet(alpha 1, depth_multiplier 1) MobileNet V2 There are three hyperparameters that you can change alpha (the widening factor), expansion_factor (multiplier by which the inverted residual block is multiplied) and depth_multiplier . The ImageNet model uses the default values of 1 for alpha and depth_multiplied and a default of 6 for expansion_factor . from mobilenets import MobileNetV2 model MobileNetV2(alpha 1, expansion_factor 6, depth_multiplier 1) Testing The model can be tested by running the predict_imagenet.py script, using the given elephant image. It will return a top 5 prediction score, where African Elephant score will be around 97.9%. Image Predictions ('African_elephant', 0.814673136), ('tusker', 0.15983042), ('Indian_elephant', 0.025479317), ('Weimaraner', 6.0817301e 06), ('bison', 3.7597524e 06) ('cheetah', 0.99743026), ('leopard', 0.0010753422), ('lion', 0.00069186132), ('snow_leopard', 0.00059767498), ('lynx', 0.00012871811) Conversion of Tensorflow Weights The weights were originally from which used Tensorflow checkpoints. There are scripts and some documentation for how the weights were converted in the _weight_extraction folder. The weights for V2 model were originally from which used Tensorflow checkpoints. There are scripts and some documentation for how the weights were converted in the _weight_extraction_v2 folder.",Image Classification,Image Classification 2615,Computer Vision,Computer Vision,Computer Vision,"res2net on mxnet Try to reproduce res2net using mxnet res2net I'm training res2net on cifar10 now. Some problem When I train the network by using mx.mod.Module and it's fit function, after 200 epoch, the val accuracy only achieve 0.88, at the same time train accuracy is 0.99. But when I replace Module with mx.model.FeedForward and mx.model's fit function, after 200 epoch, the val acc can achieve 0.92 and train acc is 0.99 or 1.0. I try to add batch norm and activation to get better model...",Image Classification,Image Classification 2624,Computer Vision,Computer Vision,Computer Vision,"Frank Wolfe AdvML A Frank Wolfe Framework for Efficient and Effective Adversarial Attacks Prerequisites: Tensorflow Setup Inception/ResNet model (see details follows) Download ImageNet validation set and put them in /imagenetdata/imgs/ folder Command Line Arguments: lr: (start) learning rate method: attack method, e.g., FW_L2 , FW_Linf arch: network architecture, e.g. inception , resnet sample: number of samples to attack eps: epsilon, value 0.0 to enable grid search maxiter: maximum number of iterations per attack lambd: lambda grad_est_batch: zeroth order gradient estimation batch size sensing: type of sensing vectors, e.g. gaussian , sphere Usage Examples: Setup Inception V3 model: bash python3 setup_inception_v3.py Setup Inception V3 model: bash python3 setup_resnet.py Run white box attack on Inception V3 model: bash python3 test_attack.py method FW_Linf arch inception sample 1 eps 0.05 lr 0.005 lambd 5 Run black box attack on ResNet V2 model: bash python3 test_attack_black.py method FW_L2 arch resnet sample 500 maxiter 1000 eps 5 lr 0.03 lambd 30 delta 0.001",Image Classification,Image Classification 2632,Computer Vision,Computer Vision,Computer Vision,"Decorrelated Batch Normalization Code for reproducing the results in the following paper: Decorrelated Batch Normalization Lei Huang, Dawei Yang, Bo Lang, Jia Deng IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. arXiv:1804.08450 Requirements and Dependency Install MAGMA (you can find the instructions in 'Install MAGMA.md' (./Install_MAGMA.md) ). Note: MAGMA is required for SVD on GPU. Without MAGMA, you can run the code on CPU only, while all the CNN experiments in the paper are run on GPU. Install Torch with CUDA (for GPU). Note that cutorch should be compiled with MAGMA support if you have installed MAGMA and set the environments correctly. Install cudnn v5 . Install the dependency optnet by: Bash luarocks install optnet Experiments 1. Reproduce the results for PCA whitening: Run: Bash bash execute_MLP_0debug_MNIST.sh This script will download MNIST automatically and you should put the mnist.t7/ under ./dataset/ . The experiment results will be saved at ./set_result/MLP/ . 2. Reproduce the results for MLP architecture: (1) FIM experiments on YaleB dataset Prepare the data: download the YaleB dataset here , and put the data files under /dataset/ so that the paths look like ./dataset/YaleB/YaleB_train.dat and ./dataset/YaleB/YaleB_test.dat . Run: Bash bash execute_MLP_1FIM_YaleB_best.sh The experiment results will be saved at directory: 'set_result/MLP/'. You can experiment with different hyperparameters by running these scripts execute_MLP_1FIM_YaleB_HyperP.sh and execute_MLP_1FIM_YaleB_HyperP_nnn.sh . (2) Experiments on PIE dataset Prepare the data: download the PIE dataset here , and put the data file under ./dataset/ such that the paths look like ./dataset/PIE/PIE_train.dat and ./dataset/PIE/PIE_test.dat . To experiment with different group sizes, run: Bash bash execute_MLP_2PIE_DBNGroup.sh To obtain different baseline performances, execute: Bash bash execute_MLP_2PIE.sh bash execute_MLP_2PIE_nnn.sh Note that the experiments until this point can be run on CPU, so MAGMA is not needed in above experiments. 3. Reproduce the results for VGG A architecture on CIFAR 10: Prepare the data: follow the instructions for CIFAR 10 in this project . It will generate a preprocessed dataset and save a 1400MB file. Put this file cifar_provider.t7 under ./dataset/ . Run: Bash bash execute_Conv_1vggA_2test_adam.sh bash execute_Conv_1vggA_2test_base.sh bash execute_Conv_1vggA_2test_ELU.sh bash execute_Conv_1vggA_2test_var.sh Note that if your machine has fewer than 4 GPUs, the environment variable CUDA_VISIBLE_DEVICES should be changed accordingly. 4. Analyze the properties of DBN on CIFAR 10 datset: Prepare the data: same as in VGG A experiments. Run: Bash bash exp_Conv_4Splain_1deep.lua bash exp_Conv_4Splain_2large.lua 5. Reproduce the ResNet experiments on CIFAR 10 datset: Prepare the data: download CIFAR 10 and CIFAR 100 , and put the data files under ./dataset/ . Run: Bash bash execute_Conv_2residual_old.sh bash execute_Conv_3residual_wide_Cifar100_wr_BN_d28_h48_g16_b128_dr0.3_s1_C2.sh bash execute_Conv_3residual_wide_Cifar100_wr_DBN_scale_L1_d28_h48_g16_b128_dr0.3_s1_C3.sh bash execute_Conv_3residual_wide_Cifar10_wr_BN_d28_h48_g16_b128_dr0.3_s1_C2.sh bash execute_Conv_3residual_wide_Cifar10_wr_DBN_scale_L1_d28_h48_g16_b128_dr0.3_s1_C3.sh 6. Reproduce the ImageNet experiments. Clone Facebook's ResNet repo here . Download ImageNet and put it in: /tmp/dataset/ImageNet/ (you can also customize the path in opts.lua ) Install the DBN module to Torch as a Lua package: go to the directory ./models/imagenet/cuSpatialDBN/ and run luarocks make cudbn 1.0 0.rockspec . Copy the model definitions in ./models/imagenet/ ( resnet_BN.lua , resnet_DBN_scale_L1.lua and init.lua ) to ./models directory in the cloned repo fb.resnet.torch , for reproducing the results reported in the paper. You also can compare the pre activation version of residual networks introduced in the paper (using the model files preresnet_BN.lua and preresnet_DBN_scale_L1.lua ). Use the default configuration and our models to run experiments. Contact Email: huanglei@nlsde.buaa.edu.cn. Any discussions and suggestions are welcome!",Image Classification,Image Classification 2636,Computer Vision,Computer Vision,Computer Vision,"Deep Learning for bacterial classification BacXeption BacXeption is a Deep Learning template of image segmentation functions and a Convolutional Neural Network (CNN) built on Keras for bacterial image classification. It uses the Xception architecture with pre trained weights . Examples 1. Getting Started This project requires Python 3.6+ 1.1 Pre requisites Install the prerequisites with PIP pip install r requirements.txt 1.2 Running the trained model 1. Place the raw images in data/test_data/ 2. Run python main.py This should output labelled images with a .txt file of the coordinates of each box in the output/$DATE_TIME folder. Example: 2. Training your own model 2.1 Two categories 1. Replace the images in the data/0/ and data/1/ with your images. 2. Run python train.py 3. Move the output/$DATE_TIME/model.json and output/$DATE_TIME/model.h5 in the model/ folder. 4. Follow the instructions in section 1.2 2.1 >Two categories 1. Change NUM_CLASSES in config.py to the number of classes wanted. 2. Add your data in the data/ folder. Each category should have a separate folder name, these must be integers starting from 0 (eg. 0/ , 1/ , 2/ for 3 categories) 3. Follow the instructions in section 2.1 3. Contributing Pull requests and suggestions are always welcome. 4. Additional information Authors Leonardo Castorina universVM Acknowledgments Dr. Teuta Pilizota Proposing the problem and useful discussions. Dario Miroli – For introducing me to Keras and debugging early versions of BacXeption François Chollet – Developing Keras and Xception",Image Classification,Image Classification 2640,Computer Vision,Computer Vision,Computer Vision,"Cutout This repository contains the code for the paper Improved Regularization of Convolutional Neural Networks with Cutout . Introduction Cutout is a simple regularization method for convolutional neural networks which consists of masking out random sections of input images during training. This technique simulates occluded examples and encourages the model to take more minor features into consideration when making decisions, rather than relying on the presence of a few major features. ! Cutout applied to CIFAR 10 Bibtex: @article{devries2017cutout, title {Improved Regularization of Convolutional Neural Networks with Cutout}, author {DeVries, Terrance and Taylor, Graham W}, journal {arXiv preprint arXiv:1708.04552}, year {2017} } Results and Usage Dependencies PyTorch v0.4.0 tqdm ResNet18 Test error (%, flip/translation augmentation, mean/std normalization, mean of 5 runs) Network CIFAR 10 CIFAR 100 ResNet18 4.72 22.46 ResNet18 + cutout 3.99 21.96 To train ResNet18 on CIFAR10 with data augmentation and cutout: python train.py dataset cifar10 model resnet18 data_augmentation cutout length 16 To train ResNet18 on CIFAR100 with data augmentation and cutout: python train.py dataset cifar100 model resnet18 data_augmentation cutout length 8 WideResNet WideResNet model implementation from Test error (%, flip/translation augmentation, mean/std normalization, mean of 5 runs) Network CIFAR 10 CIFAR 100 SVHN WideResNet 3.87 18.8 1.60 WideResNet + cutout 3.08 18.41 1.30 To train WideResNet 28 10 on CIFAR10 with data augmentation and cutout: python train.py dataset cifar10 model wideresnet data_augmentation cutout length 16 To train WideResNet 28 10 on CIFAR100 with data augmentation and cutout: python train.py dataset cifar100 model wideresnet data_augmentation cutout length 8 To train WideResNet 16 8 on SVHN with cutout: python train.py dataset svhn model wideresnet learning_rate 0.01 epochs 160 cutout length 20 Shake shake Regularization Network Shake shake regularization model implementation from Test error (%, flip/translation augmentation, mean/std normalization, mean of 3 runs) Network CIFAR 10 CIFAR 100 Shake shake 2.86 15.58 Shake shake + cutout 2.56 15.20 See README in shake shake folder for usage instructions.",Image Classification,Image Classification 2642,Computer Vision,Computer Vision,Computer Vision,"Convolutional Neural Network Adversarial Attacks Note : I am aware that there are some issues with the code, I will update this repository soon (Also will move away from cv2 to PIL). This repo is a branch off of CNN Visualisations because it was starting to get bloated. It contains following CNN adversarial attacks implemented in Pytorch: Fast Gradient Sign, Untargeted 1 Fast Gradient Sign, Targeted 1 Gradient Ascent, Adversarial Images 2 Gradient Ascent, Fooling Images (Unrecognizable images predicted as classes with high confidence) 2 It will also include more adverisarial attack and defenses techniques in the future as well. The code uses pretrained AlexNet in the model zoo. You can simply change it with your model but don't forget to change target class parameters as well. All images are pre processed with mean and std of the ImageNet dataset before being fed to the model. None of the code uses GPU as these operations are quite fast (for a single image). You can make use of gpu with very little effort. The examples below include numbers in the brackets after the description, like Mastiff (243) , this number represents the class id in the ImageNet dataset. I tried to comment on the code as much as possible, if you have any issues understanding it or porting it, don't hesitate to reach out. Below, are some sample results for each operation. Fast Gradient Sign Untargeted In this operation we update the original image with signs of the received gradient on the first layer. Untargeted version aims to reduce the confidence of the initial class. The code breaks as soon as the image stops being classified as the original label. Predicted as Eel (390) Confidence: 0.96 Adversarial Noise Predicted as Blowfish (397) Confidence: 0.81 Predicted as Snowbird (13) Confidence: 0.99 Adversarial Noise Predicted as Chickadee (19) Confidence: 0.95 Fast Gradient Sign Targeted Targeted version of FGS works almost the same as the untargeted version. The only difference is that we do not try to minimize the original label but maximize the target label. The code breaks as soon as the image is predicted as the target class. Predicted as Apple (948) Confidence: 0.95 Adversarial Noise Predicted as Rock python (62) Confidence: 0.16 Predicted as Apple (948) Confidence: 0.95 Adversarial Noise Predicted as Mud turtle (35) Confidence: 0.54 Gradient Ascent Fooling Image Generation In this operation we start with a random image and continously update the image with targeted backpropagation (for a certain class) and stop when we achieve target confidence for that class. All of the below images are generated from pretrained AlexNet to fool it. Predicted as Zebra (340) Confidence: 0.94 Predicted as Bow tie (457) Confidence: 0.95 Predicted as Castle (483) Confidence: 0.99 Gradient Ascent Adversarial Image Generation This operation works exactly same as the previous one. The only important thing is that keeping learning rate a bit smaller so that the image does not receive huge updates so that it will continue to look like the originial. As it can be seen from samples, on some images it is almost impossible to recognize the difference between two images but on others it can clearly be observed that something is wrong. All of the examples below were created from and tested on AlexNet to fool it. Predicted as Eel (390) Confidence: 0.96 Predicted as Apple (948) Confidence: 0.95 Predicted as Snowbird (13) Confidence: 0.99 Predicted as Banjo (420) Confidence: 0.99 Predicted as Abacus (398) Confidence: 0.99 Predicted as Dumbell (543) Confidence: 1 Requirements: torch > 0.2.0.post4 torchvision > 0.1.9 numpy > 1.13.0 opencv > 3.1.0 References: 1 I. J. Goodfellow, J. Shlens, C. Szegedy. Explaining and Harnessing Adversarial Examples 2 A. Nguyen, J. Yosinski, J. Clune. Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images",Image Classification,Image Classification 2643,Computer Vision,Computer Vision,Computer Vision,"Traffic Sign Recognition Writeup Build a Traffic Sign Recognition Project The goals / steps of this project are the following: Load the data set (see below for links to the project data set) Explore, summarize and visualize the data set Design, train and test a model architecture Use the model to make predictions on new images Analyze the softmax probabilities of the new images Summarize the results with a written report // : (Image References) image1 : ./writeup_images/histogram_training.png Histogram of training data image2 : ./writeup_images/histogram_valid.png Histogram of validation data image3 : ./writeup_images/mean_std.png Mean and standard deviation of data image4 : ./writeup_images/Equalization.png Equalization techniques considered image5 : ./writeup_images/Problem_1.png Children crossing image6 : ./writeup_images/Problem_2.png Bumpy road image7 : ./writeup_images/internet_images.png Internet images image8 : ./writeup_images/softmax1.png Softmax 1 image9 : ./writeup_images/softmax2.png Softmax 2 image10 : ./writeup_images/softmax3.png Softmax 3 image11 : ./writeup_images/softmax4.png Softmax 4 image12 : ./writeup_images/softmax5.png Softmax 5 image13 : ./writeup_images/softmax6.png Softmax 6 image14 : ./writeup_images/augmentation.png Augmentation image15 : ./writeup_images/arch.jpg Architecture Rubric Points Here I will consider the rubric points individually and describe how I addressed each point in my implementation. Writeup / README 1. Provide a Writeup / README that includes all the rubric points and how you addressed each one. You can submit your writeup as markdown or pdf. You can use this template as a guide for writing the report. The submission includes the project code. You're reading it! and here is a link to my project code Data Set Summary & Exploration 1. Provide a basic summary of the data set. In the code, the analysis should be done using python, numpy and/or pandas methods rather than hardcoding results manually. I used the shape() property to get the shapes of of training, validation and test datasets. Shape can also be used to find the shape of traffic sign images. Number of classes can be found out using signnames.csv or finding unique entries in the training set I use the latter The size of training set is 34799 The size of the validation set is 4410 The size of test set is 12630 The shape of a traffic sign image is (32, 32, 3) The number of unique classes/labels in the data set is 43 2. Include an exploratory visualization of the dataset. I plot the normalized histogram of the both the training and validation dataset it can be seen that both of the datasets have similar distributions. It can also be seen that some image categories are under represented like Class 0 (Speed limit 20 km/hr), Class 19 (dangerous curve to the left), etc. ! Histogram of training data image1 ! Histogram of validation data image2 I also plot the mean and standard deviation image. It can be seen from these images that the center of the image carries the traffic sign. The standard deviation is interesting because most of the image is dark I would have expected the region close to the borders of the image to be varying in pixel intensity because of the varied background of traffic sign images. However, all the images are cropped with traffic sign occupying the majority of the image leading to low standard deviation throughout the 32 32 image. ! Mean and standard deviation of images image3 Design and Test a Model Architecture 1. Describe how you preprocessed the image data. What techniques were chosen and why did you choose these techniques? Consider including images showing the output of each preprocessing technique. Pre processing refers to techniques such as converting to grayscale, normalization, etc. (OPTIONAL: As described in the Stand Out Suggestions part of the rubric, if you generated additional data for training, describe why you decided to generate additional data, how you generated the data, and provide example images of the additional data. Then describe the characteristics of the augmented training set like number of images in the set, number of images for each class, etc.) Inspired by 1 , I tried two image equalization techniques histogram equalization and CLAHE (Contrast Limited Adaptive Histogram Equalization) applied to grayscale images 2 . Both these techniques improve the contrast in the image as shown in figure below (figure shows original image, histogram equalized image and CLAHE filtered image from left to right). The 70 in the image is hardly visible in the first image, however, the equalization techniques enhance the image immensely. ! Equalization techniques considered image4 I decided to use CLAHE (on grayscale images) for data preprocessing here because histogram equalization does not work well when there are large intensity variations in an image. This is easier to demonstrate on larger images but a couple of examples where histogram equalization does not work well are shown below (as before, figure shows original image, histogram equalized image and CLAHE filtered image from left to right). ! Children crossing image5 ! Bumpy road image6 Additionally, I tried a few data augmentation techniques and ended up using the following augmentations: Image rotated randomly in the range +/ 5, 15 degrees and then scaled by 0.9 or 1.1 Randomly perturbed in both horizontal and vertical directions by 2, 2 pixels Motion blurred with a kernel of size 2 The figure below shows the original RGB image and four processed images used for training (CLAHE filtered grayscale image, scaled and roated, randomly perturbed, and motion blurred) ! augmentation image14 Note that the augmentation is applied to grayscaled and CLAHE filtered images. This gives a dataset that is four times the original dataset. Note that each copy of training set image is augmented to produce 4 images and I do not selectively choose certain image categories to augment. Such datasets may represent natural distributions and thus it may not be a good idea to augment unevenly. This is because Augmentation should increase the robustness of the model when seeing unseen images. I centred the image around the pixel mean and normalized with the standard deviation because I wanted to center the data around zero and have similar ranges for the pixels. Images under different light variations can have largely different pixel values and we desire the network to learn other features in the image than the light conditions, thus centering around the mean and normalization helps the learning process. Normalization also ensures similar values of gradient while doing backpropagation and helps prevent gradient saturation (not too relevant here because image data is already upper bounded). 2. Describe what your final model architecture looks like including model type, layers, layer sizes, connectivity, etc.) Consider including a diagram and/or table describing the final model. My final model consisted of the following layers: Layer Description : : : : Input 32x32x1 Grayscale image Convolution 5x5 1x1 stride, Valid padding, outputs 28x28x12 Batch norm RELU Max pooling 2x2 stride, outputs 14x14x12 Dropout (a) Keep probability 0.75 Convolution 5x5 1x1 stride, Valid padding, outputs 10x10x32 Batch norm RELU Max pooling 2x2 stride, outputs 5x5x32 Dropout (b) Keep probability 0.75 Flatten and Concat (a) (after additional maxpooling) & (b) outputs 1388 Dropout Keep probability 0.75 Fully connected outputs 100 Batch norm RELU Dropout Keep probability 0.5 Fully connected outputs n_classes ( 43) Softmax The overall achitecture is presented in the figure below. ! architecture image15 3. Describe how you trained your model. The discussion can include the type of optimizer, the batch size, number of epochs and any hyperparameters such as learning rate. To train the model, I used the following: 1. Xavier initialization 3 : I saw marked differences in early (in epochs) performance based on the starting weights. When using truncated normal, the training/validation performance was heavily dependent on the mean and standard deviation chosen. The same was true for normal distribution. I used Xavier initiation and immediately saw improvement in early epochs. 2. Batch normalization 4 : I tried batch normalization and saw faster convergence. Even though running it on my computer was taking more time per epoch, batch norm lead to faster convergence (in number of epochs). The exact reasons for batch norm's effectiveness are still poorly understood and is an active area of research 5 . I applied batch normalization before the RELU activation in all the layers, though recently people have been using it post the RELU activation. 3. Regularization: I experimented a lot with dropout probabilities for the different layers and ended up using 0.25 for convolutional layer and concat layer in addition to a dropout of 0.5 for fully connected layers. Without dropout, the network was overfitting easily, which is usually a good sign that the network is implemented correctly. 4. Adaptive learning rate: I tried a few techniques and ended up using a starting learning rate of 0.001 and reducing the learning rate by 0.1 every 20 epochs 6 . 5. Batch size of 128. 6. Adam optimizer: I started with Adam optimizer and it worked well and I did not get a chance to experiment with other optimizers. 7. 100 epochs was used for final submission even though the model seemed to have converged with very few epochs. 4. Describe the approach taken for finding a solution and getting the validation set accuracy to be at least 0.93. Include in the discussion the results on the training, validation and test sets and where in the code these were calculated. Your approach may have been an iterative process, in which case, outline the steps you took to get to the final solution and why you chose those steps. Perhaps your solution involved an already well known implementation or architecture. In this case, discuss why you think the architecture is suitable for the current problem. My final model results were: training set accuracy of 100% validation set accuracy of 98.8% test set accuracy of 98.0% I tried the following architectures: 1. LeNet (shown in lecture) and enhancements to it including adapting learning rate, dropout for different layers, etc. This is present as a function in my final submission. 2. A VGGNet 7 like architecture (not that deep, but employing same padding level) with 3 convolution layers (convolution+batch norm+RELU+max pooling) and two fully connected layers with adaptive learning rate and dropouts. I excluded this in the final submission . 2. Sermanet architecture shown in 1 . I tried two flavors of it and immediately saw improvement. The main idea here is to short circuit the output of the first convolutional layer directly into the fully connected layer. I saw a marked improvment in the convergence time with this method. The validation accuracy in every run was 0.97 in just 3 epochs. For the final submission, I let it run for 100 epochs. The final architecture is based on this and described below. The implementation is SermaNet2() in my submission. My journey to submission was long and I spent a lot of time experimenting with hyperparameters: I started with the LeNet architecture and tried to study the rest of the design components like data augmentation, weight initialization, learning rate, dropout, batch normalization as described in next few bullets. I started with initial weight optimization study and quickly observed that covergence rate was heavily dependent on initialization hyperparameters of mean/standard deviation and also distribution (truncated gaussian v/s gaussian). I ended up using Xavier initialization after which I never had to worry about weight initialization. The second hyperparameter I played with was learning rate. I saw marginal improvement on using learning rate adaptation of reducing it by 0.1 every 20 epochs and continued using it for the rest of the project. I kept a flag to turn off adaptation every now and then to test its effectiveness. With the above steps, the model continued to overfit the training data with 94% accuracy on validation data. I introduced dropout into the model and the validation accuracy improved to 96%. I added batch normalization and it improved convergence rate. I kept a flag and experimented with turning it off and on. I wanted to further improve the accuracy and started looking at other architectures like GoogleNet, SermaNet, VGGNet, etc. I implemented SermaNet to the best of my understanding with much smaller feature sizes than in the paper. For example, the paper uses 108 108 filter depth and I used 12 32 filter depth in my submission. I implemented two different flavors of the concatenation layer one concatenating the second layer with the output of a third convolutional layer and another concatenating output of first and second convolution layer. The latter has lesser parameters and gives better performance and was used for the final submission. In the end I tried the VGGNet like architecture mentioned above, though it gave me slightly lower accuracy than the final submission. Test a Model on New Images 1. Choose five German traffic signs found on the web and provide them in the report. For each image, discuss what quality or qualities might be difficult to classify. Here are six German traffic signs that I found on the web: ! internet_images image7 Other than intentionally picking images that cover the majority of the height and width of the image, I tried to be impartial in selecting the image. The reason I did this was the training data set has images in which traffic sign occupies the majority of the pixel space. I resized the image to 32x32 to fit the modelling. Most images have watermark on them and some of them have varied backgrounds. 2. Discuss the model's predictions on these new traffic signs and compare the results to predicting on the test set. At a minimum, discuss what the predictions were, the accuracy on these new predictions, and compare the accuracy to the accuracy on the test set (OPTIONAL: Discuss the results in more detail as described in the Stand Out Suggestions part of the rubric). Here are the results of the prediction: Image Prediction : : : : Right of way at the next intersection Right of way at the next intersection Bumpy road Bumpy road Slippery road Slippery road Road work Road work Children crossing Children crossing Speed limit (60km/h) Speed limit (60km/h) The model was able to correctly guess 6 out of 6 traffic signs, which gives an accuracy of 100%. This compares favorably to the accuracy on the test set of 98% 3. Describe how certain the model is when predicting on each of the five new images by looking at the softmax probabilities for each prediction. Provide the top 5 softmax probabilities for each image along with the sign type of each probability. (OPTIONAL: as described in the Stand Out Suggestions part of the rubric, visualizations can also be provided such as bar charts) The code for making predictions on my final model is located in the 40th cell of the Ipython notebook. For most of the images, the softmax probabilities of the correct labels are high, except for the road work sign, which has almost equal softmax probability as the right of way at next intersection. This is probably because of the tree being in the background of this particular road sign that confuses the Conv Net. Probability Prediction : : : : 1 Right of way at the next intersection 0.996 Bumpy road 0.942 Slippery road 0.412 Road work 0.478 Children crossing 0.426 Speed limit (60km/h) ! softmax1 image8 ! softmax2 image9 ! softmax3 image10 ! softmax4 image11 ! softmax5 image12 ! softmax6 image13 1 . 2 . 3 . 4 . 5 . 6 . 7 .",Image Classification,Image Classification 2649,Computer Vision,Computer Vision,Computer Vision,Implementation of Developed with Google colab,Image Classification,Image Classification 2650,Computer Vision,Computer Vision,Computer Vision,"Wide Residual Network with optional Fixup initialization The code presents the implementation of Fixup as an option for standard Wide ResNet. When BatchNorm and Fixup are enabled simultaneously, Fixup initialization and the standard structure of the residual block are used. Usage example: sh python train.py layers 40 widen factor 10 batchnorm False fixup True Acknowledgment Wide Residual Network by Sergey Zagoruyko and Nikos Komodakis Fixup Initialization: Residual Learning Without Normalization by Hongyi Zhang, Yann N. Dauphin, Tengyu Ma Fixup implementation was originally introduced by Andy Brock WRN code by xternalz",Image Classification,Image Classification 2652,Computer Vision,Computer Vision,Computer Vision,"BehavioralCloning Tutorial for building a deep learning model that generates steering angle based on image input. 1. Toolchain Simulator in training mode as data gethering/ data source Offline Anaconda Python Environment for data pre processing. Cloud based ML environment on AWS for Model Training. Download model to offline system and run the simulator in autonomous mode Getting started 1. Offline installation Install ! Anaconda , use ! environment.yml 2. Cloud Environment This is a ! medium post written to help you with the cloud based environment setup. 3. Clone this repository to both the cloud and the local environment! Behavioral Cloning Repo . 4. Simulator Repo ! Link 2. Getting around the repo Model_Preprocessing.ipynb is the scratchpad that was used to try and experiment with building the model.It helps us extract and preprocess information and combine it. Model.py is the Keras Model that contains Model Image Aug Generators It obtains data from complete_data.csv which contains the sum total of all images , steering angles and their paths. The data is obatined by running the simulator / stock data given by udacity itself. Data/File Structure The data is of the format Drive_Log.csv which contains the path information where the images are sampled from the video and the actual images are stored in IMG . Each image is timestamped. 3. Workflow Simulator Generate Data in Training Mode Analyze, Augment and PreProcess Data offline Get more Data if required. Upload Data to Cloud AWS Machine Learning system . Run the Model Training. Download Model and run simulator in autonomous mode. Repeat process 4. Summary of Steps Exploratory Visualization of Dataset Pandas + SeaBorn + MatPlotLib to create , load and append dataset from dataframe Visualization to understand the distribution and quality of data. Distribution Plot to see the spread and quantity of data Time Series Plot to understand the quality of data. (To see noise to determine if filters are required) Data Collection based on Shortcomings Visualizing collected data from driving the simulator shows that the dataset looks entirely different for Keyboard and Mouse So we pass the data through a Savitzky Golay filter that averages out the samples but maintains the Area (Steering_angle x Time) this effectively filters out the noise without destroying the signal. Based on the histogram distribution plots collecting data by using certain driving styles. Data from Track 1 Clock wise and anticlockwise ( MAC & Windows ) Data from Track 2 Clock wise and anticlockwise (MAC & Windows) Smooth Turn from both Tracks (MAC & Windows) Recovery driving from both Tracks (MAC & Windows) Problem Areas in both tracks (MAC & Windows) Keyboard and Mouse After initial model save and testing driving and training in problem areas to improve model on subset of data. Name Values Data Sources Mac, Windows, Linux Sim Input Sources Keyboard, Touch Pad, Mouse, Joystick Tracks Track 1, Track 2 Direction Clockwise and Anti Clockwise Special Cases Recovery Driving, Smooth Turns only, Stock Udacity Data Size of Dataset 37,000 Data Augmentation Augmentation using Flipping , Translation from left and right camera images Reduce the time spent on data gathering through data augmentation techniques Data Perturbation to Increase Model Robustness Brightness Perturbation : Random perturbation of brightness of the image. Gaussian Noise : Blur filter with a random normal distribution across the image. Adaptive Histogram Equalization : Can greatly help in the model learning the features quickly Colospace inversion : RBG to BGR colorspace change Sampling and Image Generators These steps increase the challenge and generalization capability by creating harder images for the model to train on. Below is an example of augmented and perturbed image batch that is linked with the image generator that generates images during model training on the go. Seen below are the distribution of data and images of one random sample generated by the generator. Since the distribution is clearly tri modal (peaks around 0 for Center Camera , + 0.25 and 0.25 for left and right cameras respectievely ) it is an unbalanced dataset. Although significant efforts have been taken gather more data around turns , there is just simply more data around 0 and +/ 0.25 Best possible option is to do Balancing through DownSampling The method used to do downsampling is Weighted Random Normal Sampling . Why we choose this is because , the dominant characteristic of the system is to stay around 0/.25 so we make sure we don't mess with that. The steering angles are Discretized i.e made to fall under categories/ Bins Counts are taken for each group and the weights are given as 1/ Counts in that group. These weights are then Normalized When summed up they need to be equal to 1 Then the batch size is used to sample this out of the data frame using the Sampling with weights Define model architecture Data Pre processing steps Normalization through feature scaling Cropping region of interest Resize image to increase model performance Salient Features of Model Batch Normalization before every activation Overfitting prevention Dropouts and batch norm Dropouts are implemented before the flatten layer and before the output layer with a 50% probability Tried by adding multiple dropouts but it did need seem to have an effect on improving validation losses. Switched from ReLU to ELU for activations after reading this paper NVIDIA End to End Model architecture and train from scratch Layer Name Size Number of Parameters cropping2d_1 (Cropping2D) (None, 90, 320, 3) 0 lambda_1 (Lambda) (None, 66, 200, 3) 0 lambda_2 (Lambda) (None, 66, 200, 3) 0 conv2d_1 (Conv2D) (None, 31, 98, 24) 1824 batch_normalization_1 (Batch (None, 31, 98, 24) 96 elu_1 (ELU) (None, 31, 98, 24) 0 conv2d_2 (Conv2D) (None, 14, 47, 36) 21636 batch_normalization_2 (Batch (None, 14, 47, 36) 144 elu_2 (ELU) (None, 14, 47, 36) 0 conv2d_3 (Conv2D) (None, 5, 22, 48) 43248 batch_normalization_3 (Batch (None, 5, 22, 48) 192 elu_3 (ELU) (None, 5, 22, 48) 0 conv2d_4 (Conv2D) (None, 3, 20, 64) 27712 batch_normalization_4 (Batch (None, 3, 20, 64) 256 elu_4 (ELU) (None, 3, 20, 64) 0 conv2d_5 (Conv2D) (None, 1, 18, 64) 36928 batch_normalization_5 (Batch (None, 1, 18, 64) 256 elu_5 (ELU) (None, 1, 18, 64) 0 flatten_1 (Flatten) (None, 1152) 0 dropout_1 (Dropout) (None, 1152) 0 dense_1 (Dense) (None, 1164) 1342092 batch_normalization_6 (Batch (None, 1164) 4656 elu_6 (ELU) (None, 1164) 0 dense_2 (Dense) (None, 100) 116500 batch_normalization_7 (Batch (None, 100) 400 elu_7 (ELU) (None, 100) 0 dense_3 (Dense) (None, 50) 5050 batch_normalization_8 (Batch (None, 50) 200 elu_8 (ELU) (None, 50) 0 dense_4 (Dense) (None, 10) 510 batch_normalization_9 (Batch (None, 10) 40 elu_9 (ELU) (None, 10) 0 dropout_2 (Dropout) (None, 10) 0 dense_5 (Dense) (None, 1) 11 Setup Model Training Pipeline Hyperparameters : Epochs , Steps per Epoch and Learning Rate decided based on search epochs on subset of data Greedy best save and checkpoint implementation. Metrics is a purely loss based. Since the label(Steering angle) here is numeric and non categorical , RMS Loss is used as the loss type. Hyperparameter Name Value Comments Epochs 10 Additional Epochs for special problem areas Learning Rate 1e 4 Default Learning rate of 1e 2 unsuitable results Batch Size 32 Chosen due to best trade off between CPU & GPU performance Metric Loss Accuracy is unsuitable as exact steering angle prediction is not what matters Loss Type Root Mean Squared Error As loss is non categorical closeness to predicted angle is what matters Optimizer Type Adam Chosen from Save and Deploy Model Save using json , hdf5 model. 5. Capture Video 6. Observation and Learning Pandas Dataframe Sample is extremely handy in picking a weighted random sample. Too many dropouts can sometimes be counter productive. Wasted a lot of time in trying to figure out why the model wasn't performing well and this was due to the gaussian blur perturbation that converted images to float. The keyboard data though noisy didnt need filtration. Filtering the steering values lead to slower responses in sharp turns. Image data generators were a life saver when it came to handling data in batches, sharing load between GPU and CPU. Recovery driving training is only partial as we don't control the direction, throttle or speed just the steering. However that is a topic for the future. To plug in the steering and throttle based closed loop PI based speed controller. 7. Further aspirations Optimize the learning process Use the model driving the vehicle to iteratively generate data for further model training. Increase the challenge in driving Single lane driving based on direction, adversarial agents, pedestrians and traffic to understand model performance.",Image Classification,Image Classification 2681,Computer Vision,Computer Vision,Computer Vision,OctaveConv.pytorch A Pytorch Implementation for the paper Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution . ! (figures/octave_conv.png) Usage python from models import octave_resnet50 model octave_resnet50(num_classes 10) Reference Inspired by the MXNet implementation here .,Image Classification,Image Classification 2690,Computer Vision,Computer Vision,Computer Vision,"FaceRec A simple working facial recognition program. Installation: 1. Install the dependencies 2. Download the pretrained models here: Then extract those files into models 3. Run main.py Requirements: Python3 (3.5 ++ is recommended) Dependencies: opencv3 numpy tensorflow ( 1.1.0 rc or 1.2.0 is recommended ) Howto: python3 main.py to run the program python3 main.py mode input to add new user. Start turning left, right, up, down after inputting the new name. Turn slowly to avoid blurred images To achieve best accuracy, please try to mimick what I did here in this gif while inputting new subject: ! GIF Demo Flags: mode input to add new user into the data set General Information: Project: Facial Recogition This is a simple minified version of a bigger project I was working on this summer. Info on the models I used: Facial Recognition Architecture: Facenet Inception Resnet V1 _Pretrained model is provided in Davidsandberg repo_ More information on the model: Face detection method: MTCNN More info on MTCNN Face Detection: Both of these models are run simultaneouslyx Framework and Libs: Tensorflow: The infamous Google's Deep Learning Framework OpenCV: Image processing (VideoCapture, resizing,..) Suggestions for Improvement: To keep this repo as simple as possible, I will probably have this plug in in a seperate repo: Given the constrain of the facenet model's accuracy, there are many ways you can improve accuracy in real world application. One of my suggestion would be to create a tracker for each detected face on screen, then run recognition on each of them in real time. Then, decide who is in each tracker after some number of frames (3 10 frames, depending on how fast your machine is). Keep doing the same thing until the tracker disappears or loses track. Your result can look somewhat like this: { Unknown :3, PersonA : 1, PersonB : 20} > This tracker is tracking PersonB This will definitely improve your program liability, because the result will most likely be leaning toward the right subject in the picture after some number of frames, instead of just deciding right away after 1 frame like you normally would. One benefit of this approach is that the longer the person stays in front of the camera, the more accurate and confident the result is, as confidence points get incremented over time. Also, you can do some multi threading/processing tricks to improve performance. Demos: ! GIF Demo Live demo: @Author: David Vu Credits: Pretrained models from:",Image Classification,Image Classification 2705,Computer Vision,Computer Vision,Computer Vision,"Bioimaging Collective effort (manual plant dataset) Plant dataset that was acquired manually from SERNEC and other plant sources such as FLAS. Plant reproductive part cutting for positive images is still a work in progress as this is done manually. Having to do this will decrease as Mr. Powell is trying to recover (through more official means) reproductive and non reproductive plants from SERNEC. We will also be able to have Biology students create data for us. William (image recog) Adapted from Using this example to automate detection of herbarium specimens that were classified incorrectly. Currently using VGG19 from Keras 2.1.2 1. Download dataset from 2. Modify config.json model name to desired model (Resnet50/VGG19/InceptionV3) 3. Run extract_features.py 4. Run train.py 5. Run test.py to test on images in dataset/test folder Dax (googlenet scratch) Inception V3 using Keras Notes: 1. Currently WIP and is incomplete. Using architecture based on 2 with features based on 1 . 2. Is missing localized response normalization used by 2 . Information about LRN can be found in 3 . However, the effectiveness of LRN is disputed, and in case studies of convolution networks such as VGGNet, LRN was omitted from the network 4 . 3. Keras implementation of Inception v3 is trained on ImageNet, automatically making this implementation transfer learning. Papers: 1 2 3 4 Known Issues Image Recog Accuracy is currently 100%. I am not sure if this is an error due to lack of data.",Image Classification,Image Classification 2722,Computer Vision,Computer Vision,Computer Vision,"A Simple Baseline for Bayesian Deep Learning This repository contains a PyTorch implementation of Stochastic Weight Averaging Gaussian (SWAG) from the paper A Simple Baseline for Bayesian Uncertainty in Deep Learning by Wesley Maddox, Timur Garipov, Pavel Izmailov, Dmitry Vetrov, and Andrew Gordon Wilson Introduction SWA Gaussian (SWAG) is a convenient method for uncertainty representation and calibration in Bayesian deep learning. The key idea of SWAG is that the SGD iterates, with a modified learning rate schedule, act like samples from a Gaussian distribution; SWAG fits this Gaussian distribution by capturing the SWA mean and a covariance matrix, representing the first two moments of SGD iterates. We use this Gaussian distribution as a posterior over neural network weights, and then perform a Bayesian model average, for uncertainty representation and calibration. In this repo, we implement SWAG for image classification with several different architectures on both CIFAR datasets and ImageNet. We also implement SWAG for semantic segmentation on CamVid using our implementation of a FCDenseNet67. We additionally include several other experiments on exploring the covariance of the gradients of the SGD iterates, the eigenvalues of the Hessian, and width/PCA decompositions of the SWAG approximate posterior. CIFAR10 > STL10 CIFAR100 : : : : ! (plots/stl_wrn.jpg) ! (plots/c100_resnet110.jpg) Please cite our work if you find it useful: @article{maddoxfast, title {A Simple Baseline for Bayesian Uncertainty in Deep Learning}, author {Maddox, Wesley and Garipov, Timur and Izmailov, Pavel and Vetrov, Dmitry and Wilson, Andrew Gordon}, journal {arXiv preprint arXiv:1902.02476}, year {2019} } Installation: bash python setup.py develop See requirements.txt file for requirements that came from our setup. We use Pytorch 1.0.0 in our experiments. Unless otherwise described, all experiments were run on a single GPU. Note that if you are using CUDA 10 you may need to manually install Pytorch with the correct CUDA toolkit. File Structure . + swag/ + posteriors/ + swag.py (class definition for SWA, SWAG and SWAG Diag) + laplace.py (class definition for KFAC Laplace) + models/ (Folder with all model definitions) + utils.py (utility functions) + experiments/ + train/ (folder containing standard training scripts for non ImageNet data) + imagenet/ (folder containing ImageNet training scripts) + grad_cov/ (gradient covariance and optimal learning rate experiments) + hessian_eigs/ (folder for eigenvalues of hessian) + segmentation/ (folder containing training scripts for segmentation experiments) + uncertainty/ (folder containing scripts and methods for all uncertainty experiments) + width/ (folder containing scripts for PCA and SVD of SGD trajectories) + tests/ (folder containing tests for SWAG sampling and SWAG log likelihood calculation.) Example Commands See experiments/ for particular READMEs Image Classification (experiments/train/README.md) Segmentation (experiments/segmentation/README.md) Uncertainty (experiments/uncertainty/README.md) Some other commands are listed here: Hessian eigenvalues cd experiments/hessian_eigs; python run_hess_eigs.py dataset CIFAR100 data_path data_path model PreResNet110 use_test file ckpt save_path output.npz Gradient covariances cd experiments/grad_cov; python run_grad_cov.py dataset CIFAR100 data_path data_path model VGG16 use_test epochs 300 lr_init 0.05 wd 5e 4 swa swa_start 161 swa_lr 0.01 grad_cov_start 251 dir dir Note that this will output the gradient covariances onto the console, so you ought to write these into a log file and retrieve them afterwards. References for Code Base Stochastic weight averaging: Pytorch repo ; most of the base methods and model definitions are built off of this repo. Model implementations: VGG: PreResNet: WideResNet: FCDensenet67: Hessian eigenvalue computation: PyTorch repo , but we ultimately ended up using GPyTorch as it allows calculation of more eigenvalues. Segmentation evaluation metrics: Lasagne repo",Image Classification,Image Classification 2734,Computer Vision,Computer Vision,Computer Vision,论文下载地址 其他 目前只是在tensorboard中查看网络结构与论文中相同,还未进行训练,训练结果会后续补充, 电脑渣渣只能在cifar数据集上进行了, 训练和评价的代码参照resnet目录 目前三种结构只实现了第一种 nerwork in neuron 后面的两种concatenation 和 group convolution方式有时间会继续补充,Image Classification,Image Classification 2742,Computer Vision,Computer Vision,Computer Vision,"Layer sequential unit variance (LSUV) initialization for Keras This is sample code for LSUV and initializations, implemented in python script within Keras framework. Usage: from lsuv_init import LSUVinit ... batch_size 32 model LSUVinit(model, train_imgs :batch_size,:,:,: ) LSUV initialization is described in: Mishkin, D. and Matas, J.,(2015). All you need is a good init. ICLR 2016 arXiv:1511.06422 . Original Caffe implementation Torch re implementation PyTorch implementation New! Thinc re implementation LSUV thinc",Image Classification,Image Classification 2759,Computer Vision,Computer Vision,Computer Vision,"Behavioral Cloning Project Udacity Self Driving Car NanoDegree 扩展阅读: 1.使用NVIDIA’s model的一个解决方案,其中包括数据增强: 2.batch_size讨论: 3.另一个解决方案,包含光线增强: 4.NVIDIA:End to End Deep Learning for Self Driving Cars 5.怎么使用生成器: 6.ELU激活函数: 7.keras其他激活函数: 8.efficient backpropagation 9.Batch Normalization批标准化: 10.dropout: 11.关于各种优化器: 项目简介 在模拟器上收集照片后使用Keras训练深度神经网络实现车辆的自动驾驶 Overview This repository contains starting files for the Behavioral Cloning Project. In this project, you will use what you've learned about deep neural networks and convolutional neural networks to clone driving behavior. You will train, validate and test a model using Keras. The model will output a steering angle to an autonomous vehicle. We have provided a simulator where you can steer a car around a track for data collection. You'll use image data and steering angles to train a neural network and then use this model to drive the car autonomously around the track. We also want you to create a detailed writeup of the project. Check out the writeup template for this project and use it as a starting point for creating your own writeup. The writeup can be either a markdown file or a pdf document. To meet specifications, the project will require submitting five files: model.py (script used to create and train the model) drive.py (script to drive the car feel free to modify this file) model.h5 (a trained Keras model) a report writeup file (either markdown or pdf) video.mp4 (a video recording of your vehicle driving autonomously around the track for at least one full lap) This README file describes how to output the video in the Details About Files In This Directory section. Creating a Great Writeup A great writeup should include the rubric points as well as your description of how you addressed each point. You should include a detailed description of the code used (with line number references and code snippets where necessary), and links to other supporting documents or external references. You should include images in your writeup to demonstrate how your code works with examples. All that said, please be concise! We're not looking for you to write a book here, just a brief description of how you passed each rubric point, and references to the relevant code :). You're not required to use markdown for your writeup. If you use another method please just submit a pdf of your writeup. The Project The goals / steps of this project are the following: Use the simulator to collect data of good driving behavior Design, train and validate a model that predicts a steering angle from image data Use the model to drive the vehicle autonomously around the first track in the simulator. The vehicle should remain on the road for an entire loop around the track. Summarize the results with a written report Dependencies This lab requires: CarND Term1 Starter Kit The lab enviroment can be created with CarND Term1 Starter Kit. Click here for the details. The following resources can be found in this github repository: drive.py video.py writeup_template.md The simulator can be downloaded from the classroom. In the classroom, we have also provided sample data that you can optionally use to help train your model. Details About Files In This Directory drive.py Usage of drive.py requires you have saved the trained model as an h5 file, i.e. model.h5 . See the Keras documentation for how to create this file using the following command: sh model.save(filepath) Once the model has been saved, it can be used with drive.py using this command: sh python drive.py model.h5 The above command will load the trained model and use the model to make predictions on individual images in real time and send the predicted angle back to the server via a websocket connection. Note: There is known local system's setting issue with replacing , with . when using drive.py. When this happens it can make predicted steering values clipped to max/min values. If this occurs, a known fix for this is to add export LANG en_US.utf8 to the bashrc file. Saving a video of the autonomous agent sh python drive.py model.h5 run1 The fourth argument, run1 , is the directory in which to save the images seen by the agent. If the directory already exists, it'll be overwritten. sh ls run1 2017 01 09 16:10:23 EST 12KiB 2017_01_09_21_10_23_424.jpg 2017 01 09 16:10:23 EST 12KiB 2017_01_09_21_10_23_451.jpg 2017 01 09 16:10:23 EST 12KiB 2017_01_09_21_10_23_477.jpg 2017 01 09 16:10:23 EST 12KiB 2017_01_09_21_10_23_528.jpg 2017 01 09 16:10:23 EST 12KiB 2017_01_09_21_10_23_573.jpg 2017 01 09 16:10:23 EST 12KiB 2017_01_09_21_10_23_618.jpg 2017 01 09 16:10:23 EST 12KiB 2017_01_09_21_10_23_697.jpg 2017 01 09 16:10:23 EST 12KiB 2017_01_09_21_10_23_723.jpg 2017 01 09 16:10:23 EST 12KiB 2017_01_09_21_10_23_749.jpg 2017 01 09 16:10:23 EST 12KiB 2017_01_09_21_10_23_817.jpg ... The image file name is a timestamp of when the image was seen. This information is used by video.py to create a chronological video of the agent driving. video.py sh python video.py run1 Creates a video based on images found in the run1 directory. The name of the video will be the name of the directory followed by '.mp4' , so, in this case the video will be run1.mp4 . Optionally, one can specify the FPS (frames per second) of the video: sh python video.py run1 fps 48 Will run the video at 48 FPS. The default FPS is 60. Why create a video 1. It's been noted the simulator might perform differently based on the hardware. So if your model drives succesfully on your machine it might not on another machine (your reviewer). Saving a video is a solid backup in case this happens. 2. You could slightly alter the code in drive.py and/or video.py to create a video of what your model sees after the image is processed (may be helpful for debugging). Tips Please keep in mind that training images are loaded in BGR colorspace using cv2 while drive.py load images in RGB to predict the steering angles. How to write a README A well written README file can enhance your project and portfolio. Develop your abilities to create professional README files by completing this free course .",Image Classification,Image Classification 2769,Computer Vision,Computer Vision,Computer Vision,Wide Residual Networks (WideResNets) in PyTorch WideResNets for CIFAR10/100 implemented in PyTorch. This implementation requires less GPU memory than what is required by the official Torch implementation: Example: python train.py dataset cifar100 layers 40 widen factor 4 Acknowledgement densenet pytorch Wide Residual Networks (BMVC 2016) by Sergey Zagoruyko and Nikos Komodakis.,Image Classification,Image Classification 2770,Computer Vision,Computer Vision,Computer Vision,"Invariant Information Clustering for Unsupervised Image Classification and Segmentation This repository contains PyTorch code for the IIC paper . IIC is an unsupervised clustering objective that trains neural networks into image classifiers and segmenters without labels, with state of the art semantic accuracy. We set 9 new state of the art records on unsupervised STL10 (unsupervised variant of ImageNet), CIFAR10, CIFAR20, MNIST, COCO Stuff 3, COCO Stuff, Potsdam 3, Potsdam, and supervised/semisupervised STL. For example: Commands used to train the models in the paper here . There you can also find the flag to turn on prediction drawing for MNIST: How to download all our trained models here . How to set up the segmentation datasets here . Package dependencies Listed here . You may want to use e.g. virtualenv to isolate the environment. It's an easy way to install package versions specific to the repository that won't affect the rest of the system. Running on your own dataset You can either plug our loss (paper fig. 4, here and here ) into your own code, or change scripts in this codebase. Auxiliary overclustering makes a large difference (paper table 2) and is easy to implement, so it's strongly recommend even if you are using your own code; the others settings are less important. To edit existing scripts to use different datasets see here . Forks There are various forks of the main repository. In general I have not verified the code or performance, but check them out as someone may be working with versions of interest to you. For example: (Tensorflow) (Python 3, Pytorch 1.0)",Image Classification,Image Classification 2771,Computer Vision,Computer Vision,Computer Vision,octconv chainer Implementation of OctConv in Chainer port from,Image Classification,Image Classification 2780,Computer Vision,Computer Vision,Computer Vision,"iDeepE: Inferring RNA protein binding sites and motifs using local and global convolutional neural network Computational algorithms for identifying RNAs that bind to specific RBPs are urgently needed, and they can complement high cost experimental methods. Previous methods all focus on using entire sequences for model training, and local sequence information is completely ignored. On the other hand, local sequences provide genomic context recognized by RBPs. In this study, we develop a convolutional neural network (CNN) based method called iDeepE to predict RBP binding sites and motifs using local and global sequences. For global CNNs, one of their drawback is their poor scalability with increasing sequence length. However, local CNNs break the entire seuqence to fixed size subsequences, which can handle any long sequence Dependency: python 2.7 PyTorch 0.1.11 Sklearn Data Download the trainig and testing data from and decompress it in current dir. It has 24 experiments of 21 RBPs, and we need train one model per experiment. Another dataset is from 47 RBPs with over 2000 binding sites are used in this study. Supported models Now it supports GPUs and 4 types of models, including CNN, CNN LSTM, DenseNet and ResNet. Each model can be trained using local CNNs and global CNNs, and also ensembling of local and global CNNs. The code support GPUs and CPUs, it automatically check whether you server install GPU or not, it will proritize using the GPUs if there exist GPUs. In addition, iDeepE can also be adapted to protein binding sites on DNAs and identify DNA binding speciticity of proteins. Usage: python ideepe.py h posi nega model_type MODEL_TYPE out_file OUT_FILE motif MOTIF train TRAIN model_file MODEL_FILE predict PREDICT motif_dir MOTIF_DIR testfile TESTFILE maxsize MAXSIZE channel CHANNEL window_size WINDOW_SIZE local LOCAL glob GLOB ensemble ENSEMBLE batch_size BATCH_SIZE num_filters NUM_FILTERS n_epochs N_EPOCHS It supports model training, testing and different model structure, MODEL_TYPE can be CNN, CNN LSTM and ResNet, DenseNet. Use case: Take ALKBH5 as an example, if you want to predict the binding sites for RBP ALKBH5 using ensembling local and global CNNs, and the default model is ensembling model. You first need train the model for RBP ALKBH5, then the trained model is used to predict binding probability of this RBP for your sequences. The follwoing CLI will train a ensembling model using local and global CNNs, which are trained using positves and negatives derived from CLIP seq. step 1: 1. python ideepe.py posi GraphProt_CLIP_sequences/ALKBH5_Baltz2012.train.positives.fa nega GraphProt_CLIP_sequences/ALKBH5_Baltz2012.train.negatives.fa model_type CNN model_file model.pkl train True For ensembling models, it will save 'model.pkl.local' and 'model.pkl.global' for local and global CNNs, respectively. step 2: 2. python ideepe.py testfile GraphProt_CLIP_sequences/ALKBH5_Baltz2012.ls.positives.fa model_type CNN model_file model.pkl predict True testfile is your input fasta sequences file, and the predicted outputs for all sequences will be defaulted saved in prediction.txt . The value in each line corresponds to the probability of being RBP binding site for the sequence in fasta file. NOTE:if you have positive and negative sequecnes, please put them in the same sequecne file, which is fed into model for prediciton. DO NOT predict probability for positive and negative sequence seperately in two fasta files, then combine the prediction. Identify motifs: You need install WebLogo and TOMTOM in MEME Suite to search identifyed motifs against known motifs of RBPs. And also you need has positive and negative sequences when using motif option. step 3: 3. python ideepe.py posi GraphProt_CLIP_sequences/ALKBH5_Baltz2012.train.positives.fa nega GraphProt_CLIP_sequences/ALKBH5_Baltz2012.train.negatives.fa model_type CNN model_file model.pkl motif True motif_dir motifs The identified motifs (PWMs, and Weblogo) are saved to be defaulted dir motifs (you can also use motif_dir to configure your dir for motifs), and also include the report from TOMTOM. Contact Xiaoyong Pan: xypan172436atgmail.com Reference Xiaoyong Pan^ , Hong Bin Shen^. Predicting RNA protein binding sites and motifs through combining local and global deep convolutional neural networks . Bioinformatics. In press. Updates: 7/27/2017: add support network for DenseNet and fix the bug when generating binding motifs, and update the identified motifs for RBPs in GraphProt dataset.",Image Classification,Image Classification 2785,Computer Vision,Computer Vision,Computer Vision,"Deep Association Learning Tensorflow Implementation of the paper Chen et al. Deep Association Learning for Unsupervised Video Person Re identification. BMVC2018 . You may refer to our poster for a quick overview. Getting Started Prerequisites: Datasets: PRID2011 3 , iLIDS VIDS 4 , MARS 5 . Python 2.7. Tensorflow version > 1.4.0. (For model training) Matlab. (For model evaluation) Data preparation: 1. Download ImageNet pretrained models: mobilenet_v1 1 , resnet_v1_50 2 . 2. Convert image data to tfrecords. (Need to supply your paths in the following .sh file. Check the TODO comments in the .sh file.) bash scripts/tf_convert_data.sh Running Experiments Training: Train models and extract features. (Need to supply your paths in the following .sh file. Check the TODO comments in the .sh file.) Model implementation include the following .py files: train_dal.py : build and run the training graph. association.py : build the anchor learning graph and compute the association losses. network.py : define the network. utils.py : data preparation. For example, to train the DAL model using mobilenet_v1 on MARS, run the the following scripts. bash scripts/train_MARS.sh Note that you may modify the type of deep model by changing the flag model_name (eg. model_name resnet_v1_50 ). You can also modify the number of gpus by changing the flag num_gpus . (eg. num_gpus 2 ). Testing: Test model performance in matlab. Evaluation codes are placed under the directory evaluation . For examples, to test the DAL model performance trained on MARS in matlab, run the following command. clear; model_name 'mobilenet_b64_dal'; CMC_mAP_MARS Citation Please refer to the following if this repository is useful for your research. Bibtex: @inproceedings{chen2018bmvc, title {Deep Association Learning for Unsupervised Video Person Re identification}, author {Chen, Yanbei and Zhu, Xiatian and Gong, Shaogang}, booktitle {Proceedings of the British Machine Vision Conference (BMVC)}, year {2018} } License This project is licensed under the MIT License see the LICENSE.md (LICENSE.md) file for details. References 1 Howard et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017. 2 He et al. Deep Residual Learning for Image Recognition. CVPR 2016. 3 Hirzer et al. Person Re Identification by Descriptive and Discriminative Classification. SCIA 2011. 4 Wang et al. Person Re Identification by Video Ranking. ECCV 2014. 5 Zheng et al. MARS: A Video Benchmark for Large Scale Person Re identification. ECCV 2016. Acknowledgements This repository is partially built upon the tensorflow/models repository. The evaluation code (cmc & mAP) is partially borrowed from the MARS evaluation repository.",Image Classification,Image Classification 2812,Computer Vision,Computer Vision,Computer Vision,"Res2Net Maybe a simple implementation of Res2Net via Pytorch Res2Net: A New Multi scale Backbone Architecture(CVPR2019) Shang Hua Gao, Ming Ming Cheng, Kai Zhao, Xin Yu Zhang, Ming Hsuan Yang, Philip Torr Link: This code is implemented by the description and structure diagram in the paper. The structure of Res2Net ! image",Image Classification,Image Classification 2816,Computer Vision,Computer Vision,Computer Vision,"Model Details 14 observations (labels): label_names 'No Finding', 'Enlarged Cardiomediastinum', 'Cardiomegaly', 'Lung Opacity', 'Lung Lesion', 'Edema', 'Consolidation', 'Pneumonia', 'Atelectasis', 'Pneumothorax', 'Pleural Effusion', 'Pleural Other', 'Fracture', 'Support Devices' 3 Class model(0: negative, 1: positive, 2: uncertain): 2 Class model(0: negative, 1: positive): Choose the best from U Zeros and U Ones U Zeros model (0: negative, 1: positive, merge uncertain into negative for training): U Ones model (0: negative, 1: positive, merge uncertain into positive for training): Wang, Xiaosong, Peng, Yifan, Lu, Le, Lu, Zhiyong, Bagheri, Mohammadhadi, and Summers, Ronald M. Chestx ray8: Hospital scale chest x ray database and benchmarks on weakly supervised classification and localization of common thorax diseases. arXiv preprint arXiv:1705.02315, 2017. Input: 224x224 image, convert to RGB, normalized based on the mean and standard deviation of training dataset of ImageNet CNN Model: densenet121 initialize parameters from the model pre trained on ImageNet: Bottleneck Features: 1x1024 —————————————————————————————— 3 Class Output: dense layer: 14x3, {p_0, p_1, p_2} on each label, without Softmax(), since we use the loss function CrossEntropyLoss() Loss Function (14 label, 3 class): for 3 classes on each label, we use CrossEntropyLoss(), which includes Softmax(), Log() and NLLLoss(), where Log() and NLLLoss() return cross entropy. Then we take the average over 14 labels. Final Output: apply Softmax() on only {p_0, p_1}, then use p_1 as the output of each label. —————————————————————————————— 2 Class Output dense layer: 14x1, only {p_1} on each label, without Sigmoid(), since we use the loss function BCEWithLogitsLoss() Loss Function (14 label, 2 class): we use BCEWithLogitsLoss(), which includes Sigmoid() and BCELoss(). Then we take the average over 14 labels. Final Output: apply Sigmoid() on {p_1} —————————————————————————————— Optimizer Adam: β1 0.9 and β2 0.999 as default Learning rate: 1E 4 Decayed Factor: 10 / 2 epoch Epoch Number: 6 or 4 Batch Batch Size (based on the size of memory) 32 for 224x224, 16 for 320x320 Training Time for 224x224: 0.6 hour / epoch for 320x320: 1.3 hour / epoch while Xception is some kind of slower than Densenet121, and Fractalnet is much slower. ROC and PR in Valid dataset use 2 class {p_0, p_1}, there is no uncertain, we output ROC and PR for 14 observations AUC(ROC) Comparison 224x224, where U Zeors/U Ones label the uncertain as negative/positive: Type U0 U1 2 Class 3 Class Atelectasis 0.75 0.81 0.82 0.75 Cardiomegaly 0.84 0.79 0.82 0.85 Consolidation 0.86 0.86 0.88 0.87 Edema 0.93 0.93 0.94 0.93 Pleural Effusion 0.92 0.92 0.93 0.91 No Finding 0.91 0.90 0.91 0.91 Enlarged Cardiomediastinum 0.62 0.50 0.59 0.59 Lung Opacity 0.92 0.92 0.91 0.91 Lung Lesion 0.32 0.64 0.83 0.18 Pneumonia 0.73 0.70 0.80 0.70 Pneumothorax 0.91 0.89 0.91 0.92 Pleural Other 0.96 0.87 0.92 0.93 Fracture NaN NaN NaN NaN Support Devices 0.92 0.94 0.92 0.93 Uncertain Method Selection for 2 Class, based on comparison between U Zeros and U Ones Type 2 Class Atelectasis U Ones Cardiomegaly U Zeros Consolidation U Zeros Edema U Ones Pleural Effution U Ones No Finding U Zeros Enlarged Cardiomediastinum U Zeros Lung Opacity U Ones Lung Lesion U Ones Pneumonia U Zeros Pneumothorax U Zeros Pleural Other U Zeros Fracture U Ones Support Devices U Ones AUC(ROC) Comparison 320x320: Type CheXNet CheXNeXt CheXpert Densenet(2) Densenet(3) Xception(2) Xception(3) Atelectasis 0.8094 0.862(0.825–0.895) 0.858(0.806,0.910) 0.81 0.73 0.83 0.78 Cardiomegaly 0.9248 0.831(0.790–0.870) 0.854(0.800,0.909) 0.77 0.83 0.83 0.80 Consolidation 0.7901 0.893(0.859 0.924) 0.939(0.908,0.971) 0.91 0.85 0.94 0.92 Edema 0.8878 0.924(0.886 0.955) 0.941(0.903,0.980) 0.94 0.93 0.94 0.94 Pleural Effusion 0.8638 0.901(0.868 0.930) 0.936(0.904,0.967) 0.93 0.92 0.93 0.94 No Finding 0.89 0.90 0.89 0.90 Enlarged Cardiomediastinum 0.54 0.57 0.51 0.49 Lung Opacity 0.93 0.92 0.92 0.92 Lung Lesion 0.77 0.47 0.21 0.60 Pneumonia 0.7680 0.851(0.781 0.911) 0.72 0.77 0.72 0.74 Pneumothorax 0.8887 0.944(0.915 0.969) 0.92 0.90 0.86 0.94 Pleural Other 0.8062 0.798(0.744 0.849) 0.96 0.95 0.96 0.98 Fracture NaN NaN NaN NaN Support Devices 0.94 0.94 0.94 0.94 AUC(PR) Comparison: Type CheXpert U0 224 U1 224 Des(2)224 Des(3)224 Des(2)320 Des(3)320 Xception(2) Xception(3) Atelectasis 0.69 0.62 0.68 0.71 0.60 0.71 0.56 0.74 0.64 Cardiomegaly 0.81 0.77 0.70 0.75 0.76 0.69 0.72 0.74 0.72 Consolidation 0.44 0.52 0.44 0.53 0.51 0.63 0.51 0.69 0.68 Edema 0.66 0.75 0.77 0.82 0.78 0.81 0.74 0.82 0.80 Pleural Effution 0.91 0.86 0.86 0.87 0.85 0.87 0.84 0.87 0.87 No Finding 0.44 0.49 0.43 0.50 0.43 0.45 0.41 0.49 Enlarged Cardiomediastinum 0.65 0.55 0.62 0.60 0.55 0.59 0.55 0.53 Lung Opacity 0.94 0.94 0.94 0.93 0.95 0.94 0.95 0.94 Lung Lesion 0.00 0.01 0.01 0.00 0.01 0.00 0.00 0.01 Pneumonia 0.09 0.09 0.13 0.10 0.09 0.11 0.10 0.14 Pneumothorax 0.19 0.30 0.19 0.39 0.46 0.36 0.33 0.37 Pleural Other 0.06 0.02 0.03 0.04 0.06 0.05 0.06 0.10 Fracture NaN NaN NaN NaN NaN NaN NaN NaN Support Devices 0.91 0.94 0.90 0.90 0.93 0.94 0.91 0.94 Challenges we met 1. We tried to use random crop and random horizontal flip in the data preprocessing, however the performance was worse than do nothing. Since random cropping might drop some important part of the images, and views from the front or from the back are actually different. While center cropping, padding zeros, and resizing to square got the similar performances, here we simply resized to square 2. To reproduce the paper, we supposed to use 320x320 resolution in the begining. However, the limitation of computing resource made us decrease the resolution to 224x224 first. We spend most of the time on raising the performance under 224x224. After that, we modified our input to 320x320. One of the reason why we must use 320x320 is because the input size of Xception model must be larger than 299x299 (if we hard code the input size to 224x224 in Xception, then the pretrained model performs bad since it is pre trained on ImageNet by 299x299 ) 3. A mistake we made was that we did prediction by each image, so the ROC performance was much worse than the paper. After we modified the prediction by each study (use the maximum if more than 1 images provided), the performance became similar with the paper 4. In the valid dataset, there is only 0 or 1 positive case in Lung Lesion, Pleural Other, and Fracture. The uncertain choices we made on these 3 types are based on the models trained on a re split dataset. The re split process should split the dateset by patients, not studies 5. We did not tune the Fractalnet too much, since it is very deep and costs a long time for training. But the results we got show that Xception and Densenet1212 perform much better than Fractalnet Implementation Step 1 in ./ conda env create environment.yml conda activate chexpert Step 2 in ./data/ unzip CheXpert v1.0 small.zip, then ./data/ should be like this: ./data/ train/ valid/ train.csv valid.csv modify the data path in datasplit.py, train.py if you need To check the performance of the trained model, we only need: valid/ and valid.csv Step 3 To train a new model: 1. empty output/ 2. in ./code2class/, run train.py, model will be saved in ../output/ To check the performance of a trained model: 1. if ./output/ is empty, move a model file ?.pth into output/. Currently, the MyCNN.pth in ./output/ is a 320x320 2 class Xception model. 3. in ./code2class/, run roc.py, graphs (ROC, PR) will be saved in ../output/ ( make sure the transforms in roc.py is consistant with the trained model. Here I did not upload the 3 class Xception, so you may not run roc.py in code3class)",Image Classification,Image Classification 2826,Computer Vision,Computer Vision,Computer Vision,"Character Aware Neural Language Models Code for the paper Character Aware Neural Language Models (AAAI 2016). A neural language model (NLM) built on character inputs only. Predictions are still made at the word level. The model employs a convolutional neural network (CNN) over characters to use as inputs into an long short term memory (LSTM) recurrent neural network language model (RNN LM). Also optionally passes the output from the CNN through a Highway Network , which improves performance. Much of the base code is from Andrej Karpathy's excellent character RNN implementation . Requirements Code is written in Lua and requires Torch. It also requires the nngraph and the luautf8 packages, which can be installed via: luarocks install nngraph luarocks install luautf8 GPU usage will additionally require cutorch and cunn packages: luarocks install cutorch luarocks install cunn cudnn will result in a good (8x 10x) speed up for convolutions, so it is highly recommended. This will make the training time of a character level model be somewhat competitive against a word level model (1500 tokens/sec vs 3000 tokens/sec for the large character/word level models described below). git clone cd cudnn.torch luarocks make cudnn scm 1.rockspec Data Data should be put into the data/ directory, split into train.txt , valid.txt , and test.txt Each line of the .txt file should be a sentence. The English Penn Treebank (PTB) data (Tomas Mikolov's pre processed version with vocab size equal to 10K, widely used by the language modeling community) is given as the default. The paper also runs the models on non English data (Czech, French, German, Russian, and Spanish), from the ICML 2014 paper Compositional Morphology for Word Representations and Language Modelling by Jan Botha and Phil Blunsom. This can be downloaded from Jan's website . For ease of use, we provide a script to download the non English data ( get_data.sh ). The script also saves the downloaded data into the relevant folders. Note on PTB The PTB data above does not have end of sentence tokens for each sentence, and hence these must be manually appended. This can be done by adding EOS '+' to the script (obviously you can use other characters than + to represent an end of sentence token we recommend a single unused character). The non English data already have end of sentence tokens for each line so, you want to add EOS '' to the command line. Unicode in Lua Lua is unicode agnostic (each string is just a sequence of bytes) so we use the luautf8 package to deal with languages where a character can be more than one byte (e.g. Russian). Many thanks to vseledkin for alerting us to the fact that previous version of the code did not take this account! Model Here are some example scripts. Add gpuid 0 to each line to use a GPU (which is required to get any reasonable speed with the CNN), and cudnn 1 to use the cudnn package. Scripts to reproduce the results of the paper can be found under run_models.sh Character level models Large character level model (LSTM CharCNN Large in the paper). This is the default: should get 82 on valid and 79 on test. Takes 5 hours with cudnn . th main.lua savefile char large EOS '+' Small character level model (LSTM CharCNN Small in the paper). This should get 96 on valid and 93 on test. Takes 2 hours with cudnn . th main.lua savefile char small rnn_size 300 highway_layers 1 kernels '{1,2,3,4,5,6}' feature_maps '{25,50,75,100,125,150}' EOS '+' Word level models Large word level model (LSTM Word Large in the paper). This should get 89 on valid and 85 on test. th main.lua savefile word large word_vec_size 650 highway_layers 0 use_chars 0 use_words 1 EOS '+' Small word level model (LSTM Word Small in the paper). This should get 101 on valid and 98 on test. th main.lua savefile word small word_vec_size 200 highway_layers 0 use_chars 0 use_words 1 rnn_size 200 EOS '+' Combining both Note that if use_chars and use_words are both set to 1, the model will concatenate the output from the CNN with the word embedding. We've found this model to underperform a purely character level model, though. Evaluation By default main.lua will evaluate the model on test data after training, but this will use the last epoch's model, and also will be slow due to the way the data is set up. Evaluation on test can be performed via the following script: th evaluate.lua model model_file.t7 data_dir data/ptb savefile model_results.t7 Where model_file.t7 is the path to the best performing (on validation) model. This will also save some basic statistics (e.g. perplexity by token) in model_results.t7 . Hierarchical Softmax Training on a larger vocabulary (e.g. 100K+) will require hierarchical softmax (HSM) to train at a reasonable speed. You can use the hsm option to do this. For example hsm 500 will randomly split the vocabulary into 500 clusters of (approximately) equal size. hsm 0 is the default and will not use HSM. hsm 1 will automatically choose the number of clusters for you, by choosing the integer closest to sqrt( V ). Batch Size If training on bigger datasets you should probably use a larger batch size (e.g. batch_size 100 ). Licence MIT",Image Classification,Image Classification 2835,Computer Vision,Computer Vision,Computer Vision,"Deep Networks with Stochastic Depth This repository hosts the Torch 7 code for the paper _Deep Networks with Stochastic Depth_ available at For now, the code reproduces the results in Figure 3 for CIFAR 10 and CIFAR 100, and Figure 4 left for SVHN. The code for the 1202 layer network is easily modified from the repo fb.resnet.torch using our provided module for stochastic depth. Table of Contents Updates ( updates) Prerequisites ( prerequisites) Getting Started on CIFAR 10 ( getting started on cifar 10) Usage Details ( usage details) Known Problems ( known problems) Contact ( contact) Updates Please see the latest implementation of stochastic depth and other cool models (DenseNet etc.) in PyTorch, by Felix Wu and Danlu Chen. Their code is much more memory efficient, more user friendly and better maintained. The 1202 layer architecture on CIFAR 10 can be trained on one TITAN X (amazingly!) under our standard settings. Prerequisites Torch 7 and CUDA with the basic packages (nn, optim, image, cutorch, cunn). cudnn and torch bindings . nninit ; luarocks install nninit should do the trick. CIFAR 10 and CIFAR 100 datasets in Torch format; this script should very conveniently handle it for you. SVHN dataset in Torch format, available here . Please note that running on SVHN requires roughly 28GB of RAM for dataset loading. Getting Started on CIFAR 10 bash git clone cd Stochastic_Depth git clone cd cifar.torch th Cifar10BinToTensor.lua cd .. mkdir results th main.lua dataRoot cifar.torch/ resultFolder results/ deathRate 0.5 Usage Details th main.lua dataRoot path_to_data resultFolder path_to_save deathRate 0.5 This command runs the 110 layer ResNet on CIFAR 10 with stochastic depth, using _linear decay_ survival probabilities ending in 0.5. The device flag allows you to specify which GPU to run on. On our machine with a TITAN X, each epoch takes about 60 seconds, and the program ends with a test error (selected by best validation error) of __5.25%__. The default deathRate is set to 0. This is equivalent to a constant depth network, so to run our baseline, enter: th main.lua dataRoot path_to_data resultFolder path_to_save On our machine with a TITAN X, each epoch takes about 75 seconds, and this baseline program ends with a test error (selected by best validation error) of 6.41% (see Figure 3 in the paper). You can run on CIFAR 100 by adding the flag dataset cifar100 . Our program provides other options, for example, your network depth ( N ), data augmentation ( augmentation ), batch size ( batchSize ) etc. You can change the optimization hyperparameters in the sgdState variable, and learning rate schedule in the the main function. The program saves a file every epoch to resultFolder /errors\_ N \_ dataset \_ deathMode \_ deathRate , which has a table of tuples containing your test and validation errors until that epoch. The architecture and number of epochs for SVHN used in our paper are slightly different from the code's default, please use the following command if you would like to replicate our result of 1.75% on SVHN: th main.lua dataRoot path_to_data resultFolder path_to_save dataset svhn N 25 maxEpochs 50 deathRate 0.5 Known Problems It is normal to get a +/ 0.2% difference from our reported results on CIFAR 10, and analogously for the other datasets. Networks are initialized differently, and most importantly, the validation set is chosen at random (determined by your seed). If you train on SVHN and the model doesn't converge for the first 1600 or so iterations, that's ok, just wait for a little longer. Xavier reported that the model is able to converge for him on CIFAR 10 only after he uses the following initalization for Batch Normalization model:add(cudnn.SpatialBatchNormalization(_dim_):init('weight', nninit.normal, 1.0, 0.002):init('bias', nninit.constant, 0)) . We could not replicate the non convergence and thus won't put this initialization into our code, but recognize that machines (or the versions of Torch installed) might be different. Contact My email is ys646 at cornell.edu. I'm happy to answer any of your questions, and I'd very much appreciate your suggestions. My academic website is at http://yueatsprograms.github.io.",Image Classification,Image Classification 2847,Computer Vision,Computer Vision,Computer Vision,todo,Image Classification,Image Classification 2868,Computer Vision,Computer Vision,Computer Vision,"unpool Implement unpool operation in tensorflow. Use tf.nn.max_pool_with_argmax to pool to get argmax. NOTE: 1.This code is UNTESTED, may have some BUGs! 2.Because of using tf.scatter_update, this operation will cause tensorflow can not be able to automatically compute the gradient.(see",Image Classification,Image Classification 2902,Computer Vision,Computer Vision,Computer Vision,tensorflow_octConv Paper:《Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution》. Implementation of OctaveConv in Tensorflow NOTE:The results are coming. Code modification based on terrychenism ! Thanks!,Image Classification,Image Classification 210,Unknown,Unknown,Unknown,fuzzbunch_wrapper Fuzzbunch Cli >Wine >fb.py wrapper Dependencies: regular fuzzbunch dependencies installed in wine install fuzzbunch this way: Set Wine's PATH environment variable to c:\ fuzzbunch path \windows\lib\x86 Windows Usage: ./fbcli.py module list ./fbcli.py eternalblue targetip 1.1.1.1 ./fbcli.py doublepulsar help ./fbcli.py doublepulsar targetip 1.1.1.1 function ping,Unknown,Unknown 211,Unknown,Unknown,Unknown,"Should DSL: Improve readability for should style expectations The goal of Should DSL is to write should expectations in Python as clear and readable as possible, using almost natural language (limited sometimes by the Python language constraints). In order to use this DSL, you need to import should and should_not objects from should_dsl module. For example:: >>> from should_dsl import should >>> 1 should equal_to(1) >>> 'should' should include('oul') >>> 3 should be_into( 0, 1, 2 ) Traceback (most recent call last): ... ShouldNotSatisfied: 3 is not into 0, 1, 2 The equal_to matcher verifies object equality. If you want to ensure identity, you must use be as matcher:: >>> 2 should be(2) A nice example of exceptions would be:: >>> def raise_zerodivisionerror(): ... return 1/0 >>> raise_zerodivisionerror should throw(ZeroDivisionError) should has a negative version: should_not :: >>> from should_dsl import should_not >>> 2 should_not be_into( 1, 3, 5 ) >>> 'should' should_not include('oul') Traceback (most recent call last): ... ShouldNotSatisfied: 'should' does include 'oul'",Unknown,Unknown 212,Unknown,Unknown,Unknown,"Django Microformat Application v0.1 (alpha) (c) 2009 Nicholas H.Tollervey See the file LICENSE.txt for the licensing terms and conditions. This Django application makes it easier to integrate and use Microformats in your web application. Microformats are a means of adding semantic information that is both human and machine readable to a web site. In order to work with Microformats you need to use a toolkit such as Oomph (included as a javascript plugin with the unit tests see or the Operator Add on for Firefox (that supports more types of microformat) see This application attempts to help in two ways: 1) You get models: so you can store data relating to the supported microformats (you don't have to use these models see below for more information) 2) You get markup: there are some example templates for the supported microformats in the /microformats/templates directory and I've written some template filters that wrap around these templates so you get a convenient shortcut. Currently the supported microformats are: hCard for representing people or organizations geo for representing a geolocation adr for representing an address hCalendar for representing an event hListing for representing an advertisement hReview for representing an opinion XFN for representing friends and relationships hAtom for syndicated content hNews for online journalism In the code, you get the following: Models relating to the geo, hCard, adr, hCalendar, hListing, hReview, hAtom and XFN microformats (models.py). hCard has two models: 1) hCard a flat model containing only the most common fields 2) hCardComplete a full implementation of the vCard specification (and related tables) Simplified forms for the geo, hCard, adr, org, email, tel and hCalendar, hListing, hReview, hFeed, hEntry and hNews microformats and fragments (forms.py). Some useful admin functionality (admin.py). Template filters for the geo, hCard, adr, hCalendar, hListing, hReview and XFN microformats (templatetags/microformat_extras.py). Some example templates for rendering the microformats (templates/ .html) To use the template filters you need to register the application and add: {% load microformat_extras %} to the top of the template you're using in your application. If you have an instance of a microformat model in your context you can use the appropriate template filter to display it: {{hCardInstance hcard}} will result in: Mr Joe Arthur Blogs PhD Vice President Acme Corp. joe.blogs@acme.com work joeblogs2000@home isp.com home 5445 N. 27th Street Milwaukee WI 53209 United States +44(0)1234 567890 work +44(0)1324 234123 home (This markup is based upon that produced by the hCard creator found at In addition you can pass individual fields thus: {{hCardInstance.role hcard:'role'}} Which will result in the following markup: Vice President (An example of the class design pattern: The template filters are clever enough to deal with different types of field. For example, if you pass a datetime value like this: {{datetimeInstance hcal:'dtstart'}} You'll get this: Sat 11 Apr 2009 1:30 p.m. (An example of the datetime design pattern: You can even do this: {{datetimeInstance hcal:'dtstart %B %d %Y }} To get this: June 06 1944 (Notice the passing of arguments for strftime.) If you pass a valid email address or URI then the span element will be replaced with an anchor with the appropriate href attribute. For example, if you do something like this: {{hReview.url hreview:'url'}} You'll get this: You don't even have to pass instances of the microformat models for the template filters to work. The templates the filters wrap around simply assume the same field names as found in the microformat specifications (where ' ' is replaced with the more Pythonic '_' so 'given name' becomes 'given_name'). For example, you could create a dictionary thus: hc dict() hc 'honorific_prefix' 'Mr' hc 'given_name' 'Joe' hc 'additional_name' 'Arthur' hc 'family_name' 'Blogs' hc 'honorific_suffix' 'PhD' hc 'url' ' hc 'email_work' 'joe.blogs@acme.com' hc 'email_home' 'joe.blogs@home isp.com' hc 'tel_work' '+44(0)1234 567876' hc 'tel_home' '+44(0)1543 234345' hc 'street_address' '5445 N. 27th Street' hc 'extended_address' '' hc 'locality' 'Milwaukee' hc 'region' 'WI' hc 'country_name' 'US' hc 'postal_code' '53209' hc 'title' 'Vice President' hc 'org' 'Acme Corp.' And pass it to the 'hcard' template filter to get similar markup to that shown above. Finally, you don't even have to use the supplied microformat templates for the filters. You can use your own by adding a reference to the appropriate template in the following constants in the settings.py file of your project: GEO_MICROFORMAT_TEMPLATE HCARD_MICROFORMAT_TEMPLATE HCAL_MICROFORMAT_TEMPLATE HLISTING_MICROFORMAT_TEMPLATE HREVIEW_MICROFORMAT_TEMPLATE ADR_MICROFORMAT_TEMPLATE HFEED_MICROFORMAT_TEMPLATE HENTRY_MICROFORMAT_TEMPLATE HNEWS_MICROFROMAT_TEMPLATE For more examples check out the end of the following test file: microformats/unit_tests/test_templatetags.py and take a look at: microformats/templates/test.html Running the unit tests (./manage.py test microformats) will result in an example file demonstrating the HTML markup produced by the template filters: microformats/unit_tests/html_test/microformat_test.html I've included the Oomph javascript library so you can play with the microformats. A more fully featured library is the Operator add on for Firefox. IE8 will support Microformats natively. Feedback is most welcome by sending email to the contact details found here:",Unknown,Unknown 213,Unknown,Unknown,Unknown,"PyInstanceVars A function decorator that automatically creates instance variables from function arguments. Arguments can be omitted by adding them to the 'omit' list argument of the decorator. Names are retained on a one to one basis (i.e '_arg' > 'self._arg'). Passing arguments as raw literals, using a keyword, or as defaults all work. If args and/or kwargs are used by the decorated function, they are not processed and must be handled explicitly. Basic Usage The simplest way to explain how to use it is with a quick code example: python >>> from instancevars import >>> class SimpleDemo(object): ... @instancevars ... def __init__(self, arg1, arg2, arg3): ... pass ... >>> simple SimpleDemo(1, 2, 3) >>> simple.arg1 1 >>> simple.arg2 2 >>> simple.arg3 3 This example shows how you can optionally skip arguments by adding them to the omit list. You can still manually do whatever you need with them in the function body. python >>> from instancevars import >>> class TestMe(object): ... @instancevars(omit 'arg2_' ) ... def __init__(self, _arg1, arg2_, arg3 'test'): ... self.arg2 arg2_ + 1 ... >>> testme TestMe(1, 2) >>> testme._arg1 1 >>> testme.arg2_ Traceback (most recent call last): File , line 1, in AttributeError: 'TestMe' object has no attribute 'arg2_' >>> testme.arg2 3 >>> testme.arg3 'test' >>> Why? Because Python initializer functions can get lengthy doing nothing more than one to one variable assignment. Languages such as Scala already have the ability to convert arguments to instance variables using 'val' or 'var'. Given Python's reputation of being succinct and terse it seems like this should be builtin. Some may think this is 'unpythonic', but I would respectfully disagree. We've listed a decorator and an omit list denoting our intent, so I think we've been plenty 'explicit'. In my opinion, the terseness gained by the decorator aids readability. Just my opinion, decide for yourself. Requirements It has been tested under CPython 2.7/3.3, PyPy 1.9, and Jython 2.5/2.7. There are no library dependencies other than the standard library. Performance Thanks to a contributed rewrite, performance is now only 30 40% worse than explicit initialization under CPython. This is likely to be an acceptable amount of degradation for nearly all scenarios. Credits The code was originally based on a great comment and snippet from StackOverflow .",Unknown,Unknown 214,Unknown,Unknown,Unknown,"The Fuck Version version badge version link Build Status travis badge travis link Windows Build Status appveyor badge appveyor link Coverage coverage badge coverage link MIT License license badge (LICENSE.md) The Fuck is a magnificent app, inspired by a @liamosaur tweet , that corrects errors in previous console commands. Is The Fuck too slow? Try the experimental instant mode! ( experimental instant mode) gif with examples examples link examples link More examples: bash ➜ apt get install vim E: Could not open lock file /var/lib/dpkg/lock open (13: Permission denied) E: Unable to lock the administration directory (/var/lib/dpkg/), are you root? ➜ fuck sudo apt get install vim enter/↑/↓/ctrl+c sudo password for nvbn: Reading package lists... Done ... bash ➜ git push fatal: The current branch master has no upstream branch. To push the current branch and set the remote as upstream, use git push set upstream origin master ➜ fuck git push set upstream origin master enter/↑/↓/ctrl+c Counting objects: 9, done. ... bash ➜ puthon No command 'puthon' found, did you mean: Command 'python' from package 'python minimal' (main) Command 'python' from package 'python3' (main) zsh: command not found: puthon ➜ fuck python enter/↑/↓/ctrl+c Python 3.4.2 (default, Oct 8 2014, 13:08:17) ... bash ➜ git brnch git: 'brnch' is not a git command. See 'git help'. Did you mean this? branch ➜ fuck git branch enter/↑/↓/ctrl+c master bash ➜ lein rpl 'rpl' is not a task. See 'lein help'. Did you mean this? repl ➜ fuck lein repl enter/↑/↓/ctrl+c nREPL server started on port 54848 on host 127.0.0.1 nrepl://127.0.0.1:54848 REPL y 0.3.1 ... If you're not afraid of blindly running corrected commands, the require_confirmation settings ( settings) option can be disabled: bash ➜ apt get install vim E: Could not open lock file /var/lib/dpkg/lock open (13: Permission denied) E: Unable to lock the administration directory (/var/lib/dpkg/), are you root? ➜ fuck sudo apt get install vim sudo password for nvbn: Reading package lists... Done ... Requirements python (3.4+) pip python dev Installation On OS X, you can install The Fuck via Homebrew homebrew (or via Linuxbrew linuxbrew on Linux): bash brew install thefuck On Ubuntu / Mint, install The Fuck with the following commands: bash sudo apt update sudo apt install python3 dev python3 pip python3 setuptools sudo pip3 install thefuck On FreeBSD, install The Fuck with the following commands: bash pkg install thefuck On ChromeOS, install The Fuck using chromebrew with the following command: bash crew install thefuck On other systems, install The Fuck by using pip : bash pip install thefuck Alternatively, you may use an OS package manager (OS X, Ubuntu, Arch). It is recommended that you place this command in your .bash_profile , .bashrc , .zshrc or other startup script: bash eval $(thefuck alias) You can use whatever you want as an alias, like for Mondays: eval $(thefuck alias FUCK) Or in your shell config (Bash, Zsh, Fish, Powershell, tcsh). Changes are only available in a new shell session. To make changes immediately available, run source /.bashrc (or your shell config file like .zshrc ). To run fixed commands without confirmation, use the yeah option (or just y for short): bash fuck yeah To fix commands recursively until succeeding, use the r option: bash fuck r Updating bash pip3 install thefuck upgrade Note: Alias functionality was changed in v1.34 of The Fuck How it works The Fuck attempts to match the previous command with a rule. If a match is found, a new command is created using the matched rule and executed. The following rules are enabled by default: adb_unknown_command – fixes misspelled commands like adb logcta ; ag_literal – adds Q to ag when suggested; aws_cli – fixes misspelled commands like aws dynamdb scan ; az_cli – fixes misspelled commands like az providers ; cargo – runs cargo build instead of cargo ; cargo_no_command – fixes wrongs commands like cargo buid ; cat_dir – replaces cat with ls when you try to cat a directory; cd_correction – spellchecks and correct failed cd commands; cd_mkdir – creates directories before cd'ing into them; cd_parent – changes cd.. to cd .. ; chmod_x – add execution bit; composer_not_command – fixes composer command name; cp_omitting_directory – adds a when you cp directory; cpp11 – adds missing std c++11 to g++ or clang++ ; dirty_untar – fixes tar x command that untarred in the current directory; dirty_unzip – fixes unzip command that unzipped in the current directory; django_south_ghost – adds delete ghost migrations to failed because ghosts django south migration; django_south_merge – adds merge to inconsistent django south migration; docker_login – executes a docker login and repeats the previous command; docker_not_command – fixes wrong docker commands like docker tags ; dry – fixes repetitions like git git push ; fab_command_not_found – fix misspelled fabric commands; fix_alt_space – replaces Alt+Space with Space character; fix_file – opens a file with an error in your $EDITOR ; gem_unknown_command – fixes wrong gem commands; git_add – fixes pathspec 'foo' did not match any file(s) known to git. ; git_add_force – adds force to git add ... when paths are .gitignore'd; git_bisect_usage – fixes git bisect strt , git bisect goood , git bisect rset , etc. when bisecting; git_branch_delete – changes git branch d to git branch D ; git_branch_exists – offers git branch d foo , git branch D foo or git checkout foo when creating a branch that already exists; git_branch_list – catches git branch list in place of git branch and removes created branch; git_checkout – fixes branch name or creates new branch; git_commit_amend – offers git commit amend after previous commit; git_commit_reset – offers git reset HEAD after previous commit; git_diff_no_index – adds no index to previous git diff on untracked files; git_diff_staged – adds staged to previous git diff with unexpected output; git_fix_stash – fixes git stash commands (misspelled subcommand and missing save ); git_flag_after_filename – fixes fatal: bad flag '...' after filename git_help_aliased – fixes git help commands replacing with the aliased command; git_merge – adds remote to branch names; git_merge_unrelated – adds allow unrelated histories when required git_not_command – fixes wrong git commands like git brnch ; git_pull – sets upstream before executing previous git pull ; git_pull_clone – clones instead of pulling when the repo does not exist; git_pull_uncommitted_changes – stashes changes before pulling and pops them afterwards; git_push – adds set upstream origin $branch to previous failed git push ; git_push_different_branch_names – fixes pushes when local brach name does not match remote branch name; git_push_pull – runs git pull when push was rejected; git_push_without_commits – Creates an initial commit if you forget and only git add . , when setting up a new project; git_rebase_no_changes – runs git rebase skip instead of git rebase continue when there are no changes; git_remote_delete – replaces git remote delete remote_name with git remote remove remote_name ; git_rm_local_modifications – adds f or cached when you try to rm a locally modified file; git_rm_recursive – adds r when you try to rm a directory; git_rm_staged – adds f or cached when you try to rm a file with staged changes git_rebase_merge_dir – offers git rebase ( continue abort skip) or removing the .git/rebase merge dir when a rebase is in progress; git_remote_seturl_add – runs git remote add when git remote set_url on nonexistant remote; git_stash – stashes your local modifications before rebasing or switching branch; git_stash_pop – adds your local modifications before popping stash, then resets; git_tag_force – adds force to git tag when the tag already exists; git_two_dashes – adds a missing dash to commands like git commit amend or git rebase continue ; go_run – appends .go extension when compiling/running Go programs; gradle_no_task – fixes not found or ambiguous gradle task; gradle_wrapper – replaces gradle with ./gradlew ; grep_arguments_order – fixes grep arguments order for situations like grep lir . test ; grep_recursive – adds r when you try to grep directory; grunt_task_not_found – fixes misspelled grunt commands; gulp_not_task – fixes misspelled gulp tasks; has_exists_script – prepends ./ when script/binary exists; heroku_multiple_apps – add app to heroku commands like heroku pg ; heroku_not_command – fixes wrong heroku commands like heroku log ; history – tries to replace command with most similar command from history; hostscli – tries to fix hostscli usage; ifconfig_device_not_found – fixes wrong device names like wlan0 to wlp2s0 ; java – removes .java extension when running Java programs; javac – appends missing .java when compiling Java files; lein_not_task – fixes wrong lein tasks like lein rpl ; long_form_help – changes h to help when the short form version is not supported ln_no_hard_link – catches hard link creation on directories, suggest symbolic link; ln_s_order – fixes ln s arguments order; ls_all – adds A to ls when output is empty; ls_lah – adds lah to ls ; man – changes manual section; man_no_space – fixes man commands without spaces, for example mandiff ; mercurial – fixes wrong hg commands; missing_space_before_subcommand – fixes command with missing space like npminstall ; mkdir_p – adds p when you try to create a directory without parent; mvn_no_command – adds clean package to mvn ; mvn_unknown_lifecycle_phase – fixes misspelled lifecycle phases with mvn ; npm_missing_script – fixes npm custom script name in npm run script ; npm_run_script – adds missing run script for custom npm scripts; npm_wrong_command – fixes wrong npm commands like npm urgrade ; no_command – fixes wrong console commands, for example vom/vim ; no_such_file – creates missing directories with mv and cp commands; open – either prepends to address passed to open or create a new file or directory and passes it to open ; pip_install – fixes permission issues with pip install commands by adding user or prepending sudo if necessary; pip_unknown_command – fixes wrong pip commands, for example pip instatl/pip install ; php_s – replaces s by S when trying to run a local php server; port_already_in_use – kills process that bound port; prove_recursively – adds r when called with directory; pyenv_no_such_command – fixes wrong pyenv commands like pyenv isntall or pyenv list ; python_command – prepends python when you try to run non executable/without ./ python script; python_execute – appends missing .py when executing Python files; quotation_marks – fixes uneven usage of ' and when containing args'; path_from_history – replaces not found path with similar absolute path from history; react_native_command_unrecognized – fixes unrecognized react native commands; remove_trailing_cedilla – remove trailling cedillas ç , a common typo for european keyboard layouts; rm_dir – adds rf when you try to remove a directory; scm_correction – corrects wrong scm like hg log to git log ; sed_unterminated_s – adds missing '/' to sed 's s commands; sl_ls – changes sl to ls ; ssh_known_hosts – removes host from known_hosts on warning; sudo – prepends sudo to previous command if it failed because of permissions; sudo_command_from_user_path – runs commands from users $PATH with sudo ; switch_lang – switches command from your local layout to en; systemctl – correctly orders parameters of confusing systemctl ; test.py – runs py.test instead of test.py ; touch – creates missing directories before touching ; tsuru_login – runs tsuru login if not authenticated or session expired; tsuru_not_command – fixes wrong tsuru commands like tsuru shell ; tmux – fixes tmux commands; unknown_command – fixes hadoop hdfs style unknown command , for example adds missing ' ' to the command on hdfs dfs ls ; unsudo – removes sudo from previous command if a process refuses to run on super user privilege. vagrant_up – starts up the vagrant instance; whois – fixes whois command; workon_doesnt_exists – fixes virtualenvwrapper env name os suggests to create new. yarn_alias – fixes aliased yarn commands like yarn ls ; yarn_command_not_found – fixes misspelled yarn commands; yarn_command_replaced – fixes replaced yarn commands; yarn_help – makes it easier to open yarn documentation; The following rules are enabled by default on specific platforms only: apt_get – installs app from apt if it not installed (requires python commandnotfound / python3 commandnotfound ); apt_get_search – changes trying to search using apt get with searching using apt cache ; apt_invalid_operation – fixes invalid apt and apt get calls, like apt get isntall vim ; apt_list_upgradable – helps you run apt list upgradable after apt update ; apt_upgrade – helps you run apt upgrade after apt list upgradable ; brew_cask_dependency – installs cask dependencies; brew_install – fixes formula name for brew install ; brew_reinstall – turns brew install into brew reinstall ; brew_link – adds overwrite dry run if linking fails; brew_uninstall – adds force to brew uninstall if multiple versions were installed; brew_unknown_command – fixes wrong brew commands, for example brew docto/brew doctor ; brew_update_formula – turns brew update into brew upgrade ; dnf_no_such_command – fixes mistyped DNF commands; pacman – installs app with pacman if it is not installed (uses yay or yaourt if available); pacman_not_found – fixes package name with pacman , yay or yaourt . The following commands are bundled with The Fuck , but are not enabled by default: git_push_force – adds force with lease to a git push (may conflict with git_push_pull ); rm_root – adds no preserve root to rm rf / command. Creating your own rules To add your own rule, create a file named your rule name.py in /.config/thefuck/rules . The rule file must contain two functions: python match(command: Command) > bool get_new_command(command: Command) > str list str Additionally, rules can contain optional functions: python side_effect(old_command: Command, fixed_command: str) > None Rules can also contain the optional variables enabled_by_default , requires_output and priority . Command has three attributes: script , output and script_parts . Your rule should not change Command . Rules api changed in 3.0: To access a rule's settings, import it with from thefuck.conf import settings settings is a special object assembled from /.config/thefuck/settings.py , and values from env ( see more below ( settings)). A simple example rule for running a script with sudo : python def match(command): return ('permission denied' in command.output.lower() or 'EACCES' in command.output) def get_new_command(command): return 'sudo {}'.format(command.script) Optional: enabled_by_default True def side_effect(command, fixed_command): subprocess.call('chmod 777 .', shell True) priority 1000 Lower first, default is 1000 requires_output True More examples of rules , utility functions for rules , app/os specific helpers . Settings Several The Fuck parameters can be changed in the file $XDG_CONFIG_HOME/thefuck/settings.py ( $XDG_CONFIG_HOME defaults to /.config ): rules – list of enabled rules, by default thefuck.conf.DEFAULT_RULES ; exclude_rules – list of disabled rules, by default ; require_confirmation – requires confirmation before running new command, by default True ; wait_command – max amount of time in seconds for getting previous command output; no_colors – disable colored output; priority – dict with rules priorities, rule with lower priority will be matched first; debug – enables debug output, by default False ; history_limit – numeric value of how many history commands will be scanned, like 2000 ; alter_history – push fixed command to history, by default True ; wait_slow_command – max amount of time in seconds for getting previous command output if it in slow_commands list; slow_commands – list of slow commands; num_close_matches – maximum number of close matches to suggest, by default 3 . An example of settings.py : python rules 'sudo', 'no_command' exclude_rules 'git_push' require_confirmation True wait_command 10 no_colors False priority {'sudo': 100, 'no_command': 9999} debug False history_limit 9999 wait_slow_command 20 slow_commands 'react native', 'gradle' num_close_matches 5 Or via environment variables: THEFUCK_RULES – list of enabled rules, like DEFAULT_RULES:rm_root or sudo:no_command ; THEFUCK_EXCLUDE_RULES – list of disabled rules, like git_pull:git_push ; THEFUCK_REQUIRE_CONFIRMATION – require confirmation before running new command, true/false ; THEFUCK_WAIT_COMMAND – max amount of time in seconds for getting previous command output; THEFUCK_NO_COLORS – disable colored output, true/false ; THEFUCK_PRIORITY – priority of the rules, like no_command 9999:apt_get 100 , rule with lower priority will be matched first; THEFUCK_DEBUG – enables debug output, true/false ; THEFUCK_HISTORY_LIMIT – how many history commands will be scanned, like 2000 ; THEFUCK_ALTER_HISTORY – push fixed command to history true/false ; THEFUCK_WAIT_SLOW_COMMAND – max amount of time in seconds for getting previous command output if it in slow_commands list; THEFUCK_SLOW_COMMANDS – list of slow commands, like lein:gradle ; THEFUCK_NUM_CLOSE_MATCHES – maximum number of close matches to suggest, like 5 . For example: bash export THEFUCK_RULES 'sudo:no_command' export THEFUCK_EXCLUDE_RULES 'git_pull:git_push' export THEFUCK_REQUIRE_CONFIRMATION 'true' export THEFUCK_WAIT_COMMAND 10 export THEFUCK_NO_COLORS 'false' export THEFUCK_PRIORITY 'no_command 9999:apt_get 100' export THEFUCK_HISTORY_LIMIT '2000' export THEFUCK_NUM_CLOSE_MATCHES '5' Third party packages with rules If you'd like to make a specific set of non public rules, but would still like to share them with others, create a package named thefuck_contrib_ with the following structure: thefuck_contrib_foo thefuck_contrib_foo rules __init__.py third party rules __init__.py third party utils setup.py The Fuck will find rules located in the rules module. Experimental instant mode The default behavior of The Fuck requires time to re run previous commands. When in instant mode, The Fuck saves time by logging output with script ), then reading the log. gif with instant mode instant mode gif link instant mode gif link Currently, instant mode only supports Python 3 with bash or zsh. zsh's autocorrect function also needs to be disabled in order for thefuck to work properly. To enable instant mode, add enable experimental instant mode to the alias initialization in .bashrc , .bash_profile or .zshrc . For example: bash eval $(thefuck alias enable experimental instant mode) Developing See CONTRIBUTING.md (CONTRIBUTING.md) License MIT Project License can be found here (LICENSE.md). version badge : version link : travis badge : travis link : appveyor badge : appveyor link : coverage badge : coverage link : license badge : examples link : instant mode gif link : homebrew : linuxbrew :",Unknown,Unknown 215,Unknown,Unknown,Unknown,python fu: Useful shell scripts for Python devs Build status Create dir structures for your modules easily: $ mkmodule foo.bar.qux $ tree foo foo ├── __init__.py └── bar ├── __init__.py └── qux.py Easily promote module files: $ promote foo.bar.qux $ tree foo foo ├── __init__.py └── bar ├── __init__.py └── qux └── __init__.py Easily demote modules files (if safe): $ demote foo.bar.qux $ tree foo foo ├── __init__.py └── bar ├── __init__.py └── qux.py Safety first These commands will never cause any data loss. Installation You can use pip to install python fu: console $ pip install python fu,Unknown,Unknown 216,Unknown,Unknown,Unknown,"Where is Who is hiring? hiring? Overview This repo contains: 1. The code for the site under app ; and 2. The code to scrape Hacker News' _Who is hiring?_ posts and create the database that powers the website, under database . You can find a list of FAQ about the project here: Building the database You can either use both Sqlite and MySQL; for the latter, you'll have to stick your password in both config.py , create_everything.sh and update_single_item.sh . Sqlite should just work. (yay!) Create the database from scratch Clone the repo and install the requirements; then: $ cd database $ ./create_everything.sh Update the database with a single HN post $ cd database $ ./upadte_single_item.sh Run the tests $ cd database $ python tests.py Running the web app $ python run.py The webapp is trivial enough that no tests are necessary for now. Contributing PRs are more than welcome! :) There's a ton of work that could be done, from tidying up the code to implementing new features. I have few ideas but I lack time. If you want to contribute, drop me a line or go ahead and open a PR. Mandatory boring disclaimer: this stuff is not affiliated with YCombinator, I don't make a single cent for running it etc etc. Authors manlio License Beerware",Unknown,Unknown 217,Unknown,Unknown,Unknown,"Status: Archive (code is provided as is, no updates expected) Multi Agent Particle Environment A simple multi agent particle world with a continuous observation and discrete action space, along with some basic simulated physics. Used in the paper Multi Agent Actor Critic for Mixed Cooperative Competitive Environments . Getting started: To install, cd into the root directory and type pip install e . To interactively view moving to landmark scenario (see others in ./scenarios/): bin/interactive.py scenario simple.py Known dependencies: Python (3.5.4), OpenAI gym (0.10.5), numpy (1.14.5) To use the environments, look at the code for importing them in make_env.py . Code structure make_env.py : contains code for importing a multiagent environment as an OpenAI Gym like object. ./multiagent/environment.py : contains code for environment simulation (interaction physics, _step() function, etc.) ./multiagent/core.py : contains classes for various objects (Entities, Landmarks, Agents, etc.) that are used throughout the code. ./multiagent/rendering.py : used for displaying agent behaviors on the screen. ./multiagent/policy.py : contains code for interactive policy based on keyboard input. ./multiagent/scenario.py : contains base scenario object that is extended for all scenarios. ./multiagent/scenarios/ : folder where various scenarios/ environments are stored. scenario code consists of several functions: 1) make_world() : creates all of the entities that inhabit the world (landmarks, agents, etc.), assigns their capabilities (whether they can communicate, or move, or both). called once at the beginning of each training session 2) reset_world() : resets the world by assigning properties (position, color, etc.) to all entities in the world called before every episode (including after make_world() before the first episode) 3) reward() : defines the reward function for a given agent 4) observation() : defines the observation space of a given agent 5) (optional) benchmark_data() : provides diagnostic data for policies trained on the environment (e.g. evaluation metrics) Creating new environments You can create new scenarios by implementing the first 4 functions above ( make_world() , reset_world() , reward() , and observation() ). List of environments Env name in code (name in paper) Communication? Competitive? Notes simple.py N N Single agent sees landmark position, rewarded based on how close it gets to landmark. Not a multiagent environment used for debugging policies. simple_adversary.py (Physical deception) N Y 1 adversary (red), N good agents (green), N landmarks (usually N 2). All agents observe position of landmarks and other agents. One landmark is the ‘target landmark’ (colored green). Good agents rewarded based on how close one of them is to the target landmark, but negatively rewarded if the adversary is close to target landmark. Adversary is rewarded based on how close it is to the target, but it doesn’t know which landmark is the target landmark. So good agents have to learn to ‘split up’ and cover all landmarks to deceive the adversary. simple_crypto.py (Covert communication) Y Y Two good agents (alice and bob), one adversary (eve). Alice must sent a private message to bob over a public channel. Alice and bob are rewarded based on how well bob reconstructs the message, but negatively rewarded if eve can reconstruct the message. Alice and bob have a private key (randomly generated at beginning of each episode), which they must learn to use to encrypt the message. simple_push.py (Keep away) N Y 1 agent, 1 adversary, 1 landmark. Agent is rewarded based on distance to landmark. Adversary is rewarded if it is close to the landmark, and if the agent is far from the landmark. So the adversary learns to push agent away from the landmark. simple_reference.py Y N 2 agents, 3 landmarks of different colors. Each agent wants to get to their target landmark, which is known only by other agent. Reward is collective. So agents have to learn to communicate the goal of the other agent, and navigate to their landmark. This is the same as the simple_speaker_listener scenario where both agents are simultaneous speakers and listeners. simple_speaker_listener.py (Cooperative communication) Y N Same as simple_reference, except one agent is the ‘speaker’ (gray) that does not move (observes goal of other agent), and other agent is the listener (cannot speak, but must navigate to correct landmark). simple_spread.py (Cooperative navigation) N N N agents, N landmarks. Agents are rewarded based on how far any agent is from each landmark. Agents are penalized if they collide with other agents. So, agents have to learn to cover all the landmarks while avoiding collisions. simple_tag.py (Predator prey) N Y Predator prey environment. Good agents (green) are faster and want to avoid being hit by adversaries (red). Adversaries are slower and want to hit good agents. Obstacles (large black circles) block the way. simple_world_comm.py Y Y Environment seen in the video accompanying the paper. Same as simple_tag, except (1) there is food (small blue balls) that the good agents are rewarded for being near, (2) we now have ‘forests’ that hide agents inside from being seen from outside; (3) there is a ‘leader adversary” that can see the agents at all times, and can communicate with the other adversaries to help coordinate the chase. Paper citation If you used this environment for your experiments or found it helpful, consider citing the following papers: Environments in this repo: @article{lowe2017multi, title {Multi Agent Actor Critic for Mixed Cooperative Competitive Environments}, author {Lowe, Ryan and Wu, Yi and Tamar, Aviv and Harb, Jean and Abbeel, Pieter and Mordatch, Igor}, journal {Neural Information Processing Systems (NIPS)}, year {2017} } Original particle world environment: @article{mordatch2017emergence, title {Emergence of Grounded Compositional Language in Multi Agent Populations}, author {Mordatch, Igor and Abbeel, Pieter}, journal {arXiv preprint arXiv:1703.04908}, year {2017} }",Unknown,Unknown 218,Unknown,Unknown,Unknown,opendc An entire datacenters worth of infrastructure running atop Kubernetes. Website Documentation Presentation,Unknown,Unknown 219,Unknown,Unknown,Unknown,NationalParks backend application (Python) This application is a backend that provides geolocation information about NationalParks. The information is stored in a mongodb Installation Assuming you're using the project : oc new project roadshow oc create f ./ose3/application template.json oc new app nationalparks py There's some options that can be parameterized: APPLICATION_NAME: Name of the application APPLICATION_HOSTNAME: Hostname/route to access your application Example: oc new app nationalparks py p APPLICATION_HOSTNAME nationalparks roadshow.127.0.0.1.xip.io,Unknown,Unknown 220,Unknown,Unknown,Unknown,"SpeedPerception: Perceived UX of Web Apps in the Wild SpeedPerception / Latest Updates : SpeedPerception Phase2 is live now. URL for the challenge: SpeedPerception Phase 2 is looking at IR 500 + Alexa 1000 / Chrome / ChromeMobile / Desktop / Mobile SpeedPerception Phase 1 looked at IR 500 / Chrome / Desktop Phase 1 overview and a few key insights : ( Winner of ACM SigComm Internet QoE best paper award !! ) What is SpeedPerception? : SpeedPerception is a large scale web performance crowdsourcing study focused on the perceived loading performance of above the fold content. Clearly, no one likes slow loading webpages. SpeedPerception is a study trying to understand what “slow” and “fast” mean to the human end user and how these perceptions are affected by the structure of the web applications. Traditional web performance metrics defined in W3C standards focus on timing each process along the content delivery pipeline, such as Time to First Byte (TTFB) and Page Load Time. We want to tackle the web performance measurement challenge by looking at it from a different angle: one which puts user experience into focus. Since people primarily consume the web visually, we are focusing on the visual perception of the webpage loading process. SpeedPerception / Goal : Our goal is to create free, open source, benchmark dataset(s) to advance the systematic study of how human end users perceive the webpage loading process: the above the fold rendering in particular. Our belief (and hope) is that such benchmarks can provide a quantitative basis to compare different algorithms and spur computer scientists to make progress on helping quantify perceived webpage performance. SpeedPerception / Team : Qingzhu (Clark) Gao Data Scientist @ Instart Logic Parvez Ahammad Head of Data Science & Machine Learning @ Instart Logic Prasenjit Dey Software Engineer @ Instart Logic SpeedPerception / Collaborators : Pat Meenan Staff Engineer @ Google / Creator of Estelle Weyl Open Web Evangelist @ Instart Logic SpeedPerception / Phase 1 : Phase 1 results: Phase 1 Web app code / Experimental Design Criteria: Phase 1 crowd sourcing challenge was hosted at Phase 1 crowd sourcing duration: 28th July 2016 to 30th September 2016.",Unknown,Unknown 221,Unknown,Unknown,Unknown,"Flask Flask is a lightweight WSGI _ web application framework. It is designed to make getting started quick and easy, with the ability to scale up to complex applications. It began as a simple wrapper around Werkzeug _ and Jinja _ and has become one of the most popular Python web application frameworks. Flask offers suggestions, but doesn't enforce any dependencies or project layout. It is up to the developer to choose the tools and libraries they want to use. There are many extensions provided by the community that make adding new functionality easy. Installing Install and update using pip _: .. code block:: text pip install U Flask A Simple Example .. code block:: python from flask import Flask app Flask(__name__) @app.route('/') def hello(): return 'Hello, World!' .. code block:: text $ env FLASK_APP hello.py flask run Serving Flask app hello Running on (Press CTRL+C to quit) Contributing For guidance on setting up a development environment and how to make a contribution to Flask, see the contributing guidelines _. .. _contributing guidelines: Donate The Pallets organization develops and supports Flask and the libraries it uses. In order to grow the community of contributors and users, and allow the maintainers to devote more time to the projects, please donate today _. .. _please donate today: Links Website: Documentation: License: BSD _ Releases: Code: Issue tracker: Test status: Test coverage: Official chat: .. _WSGI: .. _Werkzeug: .. _Jinja: .. _pip:",Unknown,Unknown 222,Unknown,Unknown,Unknown,"findi A port of a port of sosumi. Find your iPhone through Apple's API. Source from: comfuture's recordmylatitude That was inspired by: tyler hall's sosumi I will attempt to keep up with Apple's changing API, as it breaks horribly when they update the app and modify version numbers. Fair warning. Use Initialize findmyiphone with your Apple ID. python from findi import FindMyIPhone findi FindMyIPhone('email@example.com', 'password') List your devices. python print findi.devices Get the first devices location: python iphone findi.devices 0 print iphone.latitude, iphone.longitude Recording Your Location There is an example Heroku application called findi heroku example with instructions for setting up a service that stores your location over a period of time. LICENSE The MIT License Majority of source Copyright (c) 2011 comfuture . Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the Software ), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED AS IS , WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.",Unknown,Unknown 223,Unknown,Unknown,Unknown,"gorun (c) Peter Bengtsson, mail@peterbe.com, 2009 2012 License: Python Using (py)inotify to run commands when files change Tired of switching console, arrow up, Enter, switch console back for every little change you make when you're writing code that has tests? Running with gorun.py enables you to just save in your editor and the tests are run automatically and immediately. gorun.py does not use a slow pulling process which keeps taps on files modification time. Instead it uses the inotify_ which is a Linux kernel subsystem that provides file system event notification . .. _inotify: Installation This will only work on Linux which has the inotify module enabled in the kernel. (Most modern kernels do) :: pip install gorun This will install pyinotify_. .. _pyinotify: Then, create a settings file, which is just a python file that is expected to define a variable called DIRECTORIES . Here's an example: :: DIRECTORIES ( ('some/place/', './myframework test dir some/place'), ('some/place/unitests.py', './myframework test dir some/place testclass Unittests'), ('/var/log/torrentsdownload.log', 'growl downloads logfile /var/log/torrentsdownload.log'), ) Save that file as, for example, gorun_settings.py and then start it like this: :: $ gorun.py gorun_settings Configuration Once you've set gorun to monitor a directory it will kick off on any file that changes in that directory. By default things like autosave files from certain editors are automatically created (e.g. foo.py or foo.py ) and these are ignored. If there are other file extensions you want gorun to ignore add this to your settings file: :: IGNORE_EXTENSIONS ('log',) This will add to the list of already ignored file extensions such as .pyc . Similarly, if there are certain directories that you don't want the inotify to notice, you can list them like this: :: IGNORE_DIRECTORIES ('xapian_index', '.autosavefiles') Disclaimer This code hasn't been extensively tested and relies on importing python modules so don't let untrusted morons fiddle with your dev environment. Todo When doing Django development I often run on single test method over and over and over again till I get rid of all errors. When doing this I have to change the settings so it just runs one single test and when I'm done I go back to set it up so that it runs all tests when adjacent code works. This is a nuisance and I might try to solve that one day. If you have any tips please let me know.",Unknown,Unknown 224,Unknown,Unknown,Unknown,"Django Spectator .. image:: :target: .. image:: :target: Two Django apps: One to track book and periodical reading, including start and end dates, authors. One to track events attended (movie, plays, gigs, exhibitions, comedy, dance, classical), including date, venue, and people/organisations involved. For Django 1.11 to Django 2.2, running on Python 3.5 to 3.7. It has URLs, views and templates to create a site displaying all the data, and Django admin screens to add and edit them. The templates use Bootstrap v4.3 _. There are also template tags for displaying data in your own templates (see below). This is used on my personal website (with custom templates): reading _ and events _. Installation Install with pip:: pip install django spectator Add the apps to your project's INSTALLED_APPS in settings.py :: INSTALLED_APPS ... 'spectator.core', 'spectator.events', 'spectator.reading', While spectator.core is required, you can omit either spectator.events or spectator.reading if you only want to use one of them. Run migrations:: ./manage.py migrate Add to your project's urls.py :: urlpatterns ... url(r'^spectator/', include('spectator.core.urls')), You can change the initial path ( r'^spectator/' ) to whatever suits you. e.g. use r'^' to have Spectator's home page be the front page of your site. Then, go to Django Admin to add your data. Settings There are a few optional settings that can be used in your project's settings.py file. This is the full list, with their defaults. Descriptions of each are below:: SPECTATOR_GOOGLE_MAPS_API_KEY '' SPECTATOR_SLUG_ALPHABET 'abcdefghijkmnopqrstuvwxyz23456789' SPECTATOR_SLUG_SALT 'Django Spectator' SPECTATOR_DATE_FORMAT '% d %b %Y' If you get a Google Maps JavaScript API key _ and add it to the settings, it will enable using a map in the Django Admin to set the location of Venues, and the displaying of Venues' maps in the public templates:: SPECTATOR_GOOGLE_MAPS_API_KEY 'YOUR API KEY' URLs for all objects include automatically generated slugs, which are based on Hashids of the object's ID. You can change which characters are used in these slugs with this setting. e.g.:: SPECTATOR_SLUG_ALPHABET 'ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890' You can also change the salt value used to encode the slugs. While the slugs don't provide complete security (i.e. it's not impossible to determine the ID on which a slug is based), using your own salt value can't hurt. e.g.:: SPECTATOR_SLUG_SALT 'My special salt value is here' You can change the format used for the dates of Events and for the titles of some sidebar cards in templates, using strftime _ formatting:: SPECTATOR_DATE_FORMAT '%Y %m %d' Overview There are two main parts to Spectator: Reading and Events (movies, gigs, etc). They both share Creators. Creators Creators are the authors of books, directors of movies, actors in plays, groups who perfom at gigs, etc. A Creator has a name and a kind , of either individual (e.g. Anthony Sher ) or group (e.g. Royal Shakespeare Company ). A Creator is associated with books, movies, events, etc. through roles, which include an optional role_name such as Author , Illustrator , Director , Playwright , Company , etc. The roles can be given an order so that the creators of a thing will be listed in the appropriate order (such as the director before a movie's actors). See spectator/models/core.py for these models. Reading A Publication is a thing that's been read, and has a kind of either book or periodical . A Publication can optionally be part of a PublicationSeries. e.g. a Publication Vol. 3 No. 7 September 2005 could be part of the The Believer PublicationSeries. A Publication can have zero or more Readings. A Reading can have a start_date and end_date . If the start_date is set but the end_date isn't, the Publication is currently being read. When a Reading has been completed, and an end_date added, it can be marked as is_finished or not. If not, it's because you gave up on the Publication before getting to the end. Both start_date and end_date indicates a specific day by default. If you don't know the day, or the month, a granularity can be specified indicating whether the reading started/ended sometime during the month or year. See spectator/models/reading.py for these models. Events An Event specifies a date on which you saw a thing at a particular Venue. A Venue has a name and, optionally, location details. Each Event can have zero or more Creators associated directly with it. e.g. the performers at a gig, the comedians at a comedy event. These can be in a specific order, and each with an optional role. e.g: The Wedding Present Role: Headliner Order: 1 Buffalo Tom Role: Support Order: 2 Events can be different kinds, e.g. gig , cinema , theatre . This is only used for categorising Events into different lists it doesn't restrict the kinds of Works that can be associated with it. You could have a cinema Event that has a movie, play and dance piece associated with it. Each Event can have zero or more Works associated with it: movies, plays, classical works or dance pieces. Each Work can have zero or more Creators, each with optional roles, associated directly with it. e.g. Wolfgang Amadeus Mozart (Composer) , William Shakespeare (Playwright) or Steven Spielberg (Director) : Events can be given an optional title (e.g. Glastonbury Festival ). If a title isn't specified one is created automatically when needed, based on any Works associated with it, or else any Creators associated with it. Template tags Each app, core , events and reading , has some template tags. Core template tags To use any of these in a template, first:: {% load spectator_core %} Most Read Creators To get a QuerySet of Creators with the most Readings associated with them:: {% most_read_creators num 10 %} Each Creator will have a num_readings attribute. It will only include Creators whose role on a publication was Author or was left blank. i.e. Creators who were Illustrator or Translator would not be counted. To display this as a chart in a Bootstrap card:: {% most_read_creators_card num 10 %} This will exclude any Creators with only 1 Reading. Most Visited Venues To get a QuerySet of Venues with the most Events associated with them:: {% most_visited_venues num 10 %} Each Venue will have a num_visits attribute. To display this as a chart in a Bootstrap card:: {% most_visited_venues_card num 10 %} This will exclude any Venues with only 1 Event. Reading template tags To use any of these in a template, first:: {% load spectator_reading %} In progress Publications To get a QuerySet of Publications currently being read use in_progress_publications :: {% in_progress_publications as publications %} {% for pub in publications %} {{ pub }} {% for role in pub.roles.all %} {{ role.creator.name }} {% if role.role_name %}({{ role.role_name }}){% endif %} {% endfor %} {% endfor %} Or to display as a Bootstrap card:: {% in_progress_publications_card %} Publications being read on a day To get a QuerySet of Publications that were being read on a particular day use day_publications . If my_date is a python date object:: {% day_publications date my_date as publications %} And display the results as in the above example. Or to display as a Bootstrap card:: {% day_publications_card date my_date %} Years of reading To get a QuerySet of the years in which Publications were being read:: {% reading_years as years %} {% for year in years %} {{ year date: Y }} {% endfor %} Or to display as a Bootstrap card, with each year linking to the ReadingYearArchiveView :: {% reading_years_card current_year year %} Here, year is a date object indicating a year which shouldn't be linked. Annual reading counts For more detail than the reading_years tag, use this to get the number of Books, and Periodicals (and the total) finished per year:: {% annual_reading_counts as years %} {% for year_data in years %} {{ year_data.year }}: {{ year_data.book }} book(s), {{ year_data.periodical }} periodical(s), {{ year_data.total }} total. {% endfor %} Or to display as a Bootstrap card, with each year linking to ReadingYearArchiveView :: {% annual_reading_counts_card current_year year kind 'all' %} Here, year is a date object indicating a year which shouldn't be linked. And kind can be one of all (default), book or periodical . If it's all , then the result is rendered as a table, with a column each for year, book count, periodical count and total count. Otherwise it's a list of years with the book/periodical counts in parentheses. Events template tags To use any of these in a template, first:: {% load spectator_events %} Recent Events To get a QuerySet of Events that happened recently:: {% recent_events num 3 as events %} {% for event in events %} {{ event }} {{ event.venue.name }} {% endfor %} If num is not specified, 10 are returned by default. Or to display as a Boostrap card:: {% recent_events_card num 3 %} Events on a day To get a QuerySet of Events that happened on a particular day, use day_events . If my_date is a python date object:: {% day_events date my_date as events %} And display the results as in the above example. Or to display as a Bootstrap card:: {% day_events_card date my_date %} Years of Events To get a QuerySet of the years in which Events happened:: {% events_years as years %} {% for year in years %} {{ year date: Y }} {% endfor %} Or to display as a Bootstrap card, with each year linking to the EventYearArchiveView :: {% events_years_card current_year year %} Here, year is a date object indicating a year which shouldn't be linked. Annual Event Counts To include counts of Events per year:: {% annual_event_counts as years %} {% for year_data in years %} {{ year_data.year date: Y }}: {{ year_data.total }} event(s) {% endfor %} Restrict to one kind of Event:: {% annual_event_counts kind 'cinema' as years %} Or to display as a Bootstrap card, with each year linking to EventYearArchiveView :: {% annual_event_counts_card current_year year kind 'all' %} Here, year is a date object indicating a year which shouldn't be linked. Most Seen Creators To get a QuerySet of Creators involved with the most Events:: {% most_seen_creators num 10 event_kind 'gig' %} Each Creator will have a num_events attribute. event_kind can be omitted, or be None to include all kinds of Event. To display this as a chart in a Bootstrap card:: {% most_seen_creators_card num 10 event_kind 'gig' %} This will exclude any Creators with only 1 Event. Creators With Most Works To get a QuerySet of Creators that have the most Works (e.g, movies, plays, etc):: {% most_seen_creators_by_works num 10 work_kind 'movie', role_name 'Director' %} Each Creator will have a num_works attribute. work_kind can be omitted and all kinds of Work will be counted. role_name can be omitted and all roles will be counted. The above example would, for each Creator, only count movie Works on which their role was 'Director'. To display this as a chart in a Bootstrap card:: {% most_seen_creators_by_works_card num 10 work_kind 'movie', role_name 'Director' %} This will exclude any Creators with only 1 Work. Most Seen Works To get a QuerySet of Works involved with the most Events:: {% most_seen_works num 10 kind 'movie' %} Each Work will have a num_views attribute. kind can be omitted, or be None to include all kinds of Work. To display this as a chart in a Bootstrap card:: {% most_seen_works_card num 10 kind 'movie' %} This will exclude any Works with only 1 Event. Local development devproject/ is a basic Django project to use the app locally. Use it like this, installing requirements with pipenv:: $ cd devproject $ pipenv install $ pipenv run ./manage.py migrate $ pipenv run ./manage.py runserver Run tests with tox, from the top level directory (containing setup.py). Install it with:: $ pip install tox Run all tests in all environments like:: $ tox To run tests in only one environment, specify it. In this case, Python 3.6 and Django 2.0:: $ tox e py36 django20 To run a specific test, add its path after , eg:: $ tox e py36 django20 tests.core.test_models.CreatorTestCase.test_ordering Running the tests in all environments will generate coverage output. There will also be an htmlcov/ directory containing an HTML report. You can also generate these reports without running all the other tests:: $ tox e coverage Making a new release So I don't forget... 1. Put new changes on master . 2. Update the __version__ in spectator.__init__.py . 3. Update CHANGES.rst . 4. Do python setup.py tag . 5. Do python setup.py publish . Adding a new Event kind If it's simple (like, Gigs, Comedy, etc.) and doesn't require any specific kind of Works, then: In spectator.events.models.Event add it in KIND_CHOICES and KIND_SLUGS . Possibly add a special case for it in Event.get_kind_name_plural() . Add a simple factory for it in spectator.events.factories . In tests.events.test_models.EventTestCase : Add it to: test_get_kind() test_valid_kind_slugs() test_kind_slug() test_kind_name() test_kind_name_plural() test_get_kinds_data() Add a test_absolute_url_ () test for this kind. Adding a new Work kind In spectator.events.models.Work add it in KIND_CHOICES and KIND_SLUGS . On the Event model add a new method similar to get_classical_works() for this new kind of Work . On the spectator.core.models.Creator model add a new method similar to get_classical_works() for this new kind of Work . Add a simple factory for it in spectator.events.factories . In spectator/events/templates/spectator_events/event_detail.html add an include to list the works. In spectator/core/templates/spectator_core/creator_detail.html add an include to list the works. In tests/ add equivalents of: core.test_models.CreatorTestCase.test.get_classical_works() events.test_models.EventTestCase.test_get_classical_works() events.test_models.WorkTestCase.test_absolute_url_classicalwork() events.test_models.WorkTestCase.test_get_list_url_classicalwork() Contact Phil Gyford phil@gyford.com @philgyford on Twitter",Unknown,Unknown 225,Unknown,Unknown,Unknown,"Pinax Table of Contents About Pinax ( about pinax) Overview ( overview) Dependencies ( dependencies) Supported Django and Python versions ( supported django and python versions) Documentation ( documentation) Installation ( installation) Usage ( usage) Change Log ( change log) Contribute ( contribute) Code of Conduct ( code of conduct) Connect with Pinax ( connect with pinax) License ( license) About Pinax Pinax is an open source platform built on the Django Web Framework. It is an ecosystem of reusable Django apps, themes, and starter project templates. This collection can be found at Overview This repository contains documentation for the Pinax project. Documentation Updating Online Documentation The docs live online at After a pull request is merged, checkout your local master branch: shell git checkout master Pull the update into your local master branch: shell git pull origin master Push the update live to gh pages: shell mkdocs gh deploy Publishing a Distribution projects.json distribution.json Change Log Contribute For an overview on how contributing to Pinax works read this blog post and watch the included video, or read our How to Contribute section. For concrete contribution ideas, please see our Ways to Contribute/What We Need Help With section. In case of any questions we recommend you join our Pinax Slack team and ping us there instead of creating an issue on GitHub. Creating issues on GitHub is of course also valid but we are usually able to help you faster if you ping us in Slack. We also highly recommend reading our blog post on Open Source and Self Care . Code of Conduct In order to foster a kind, inclusive, and harassment free community, the Pinax Project has a code of conduct . We ask you to treat everyone as a smart human programmer that shares an interest in Python, Django, and Pinax with you. Connect with Pinax For updates and news regarding the Pinax Project, please follow us on Twitter @pinaxproject and check out our Pinax Project blog . License Copyright (c) 2012 2018 James Tauber and contributors under the MIT license .",Unknown,Unknown 226,Unknown,Unknown,Unknown,"Build Status License: LGPL Pixie Join the chat at Intro Pixie is a lightweight lisp suitable for both general use as well as shell scripting. The language is still in a pre alpha phase and as such changes fairly quickly. The standard library is heavily inspired by Clojure as well as several other functional programming languages. It is written in RPython and as such supports a fairly fast GC and an amazingly fast tracing JIT. Features Some planned and implemented features: Immutable datastructures Protocols first implementation Transducers at the bottom (most primitives are based off of reduce) A good enough JIT (implemented, tuning still a WIP, but not bad performance today) Easy FFI Pattern matching (TODO) Dependencies python or pypy to build libffi dev libedit dev libuv dev Version 1.0 or higher libboost all dev ( brew install boost for Mac) Building make build_with_jit ./pixie vm Note: Mac OS X does not come with the build tools required by default. Install the XCode Command Line tools ( Apple Developer Site ) or install them independently. Running the tests ./pixie vm run tests.pxi Examples There are examples in the /examples directory. Try out Hello World with: ./examples/hello world.pxi Build Tool Pixie now comes with a build tool called dust . Try it and start making magic of your own. FAQ So this is written in Python? It's actually written in RPython, the same language PyPy is written in. make build_with_jit will compile Pixie using the PyPy toolchain. After some time, it will produce an executable called pixie vm . This executable is a full blown native interpreter with a JIT, GC, etc. So yes, the guts are written in RPython, just like the guts of most lisp interpreters are written in C. At runtime the only thing that is interpreted is the Pixie bytecode, that is until the JIT kicks in... What's this bit about magical powers ? First of all, the word magic is in quotes as it's partly a play on words, pixies are small, light and often considered to have magical powers. However there are a few features of pixie that although may not be uncommon, are perhaps unexpected from a lisp. Pixie implements its own virtual machine. It does not run on the JVM, CLR or Python VM. It implements its own bytecode, has its own GC and JIT. And it's small. Currently the interpreter, JIT, GC, and stdlib clock in at about 10.3MB once compiled down to an executable. The JIT makes some things fast. Very fast. Code like the following compiles down to a loop with 6 CPU instructions. While this may not be too impressive for any language that uses a tracing jit, it is fairly unique for a language as young as Pixie. clojure ;; This code adds up to 10000 from 0 via calling a function that takes a variable number of arguments. ;; That function then reduces over the argument list to add up all given arguments. (defn add fn & args (reduce add 0 args)) (loop x 0 (if (eq x 10000) x (recur (add fn x 1)))) Math system is fully polymorphic. Math primitives (+, , etc.) are built off of polymorphic functions that dispatch on the types of the first two arguments. This allows the math system to be extended to complex numbers, matrices, etc. The performance penalty of such a polymorphic call is completely removed by the RPython generated JIT. (Planned magical Features) Influencing the JIT from user code. (Still in research) Eventually it would be nice to allow Pixie to hint to the JIT that certain values are constants, that certain functions are pure, etc. This can all be done from inside RPython, and the plan is to expose parts of that to the user via hints in the Pixie language, to what extent this will be possible is not yet known. STM for parallelism. Once STM gets merged into the mainline branch of PyPy, we'll adopt it pretty quickly. CSP for concurrency. We already have stacklets, it's not that hard to use them for CSP style concurrency as well. Where do the devs hangout? Mostly on FreeNode at pixie lang stop by and say hello . Contributing We have a very open contribution process. If you have a feature you'd like to implement, submit a PR or file an issue and we'll see what we can do. Most PRs are either rejected (if there is a technical flaw) or accepted within a day, so send an improvement our way and see what happens. Implementation Notes Although parts of the language may be very close to Clojure (they are both lisps after all), language parity is not a design goal. We will take the features from Clojure or other languages that are suitable to our needs, and feel free to reject those that aren't. Therefore this should not be considered a Clojure Dialect , but instead a Clojure inspired lisp . Disclaimer This project is the personal work of Timothy Baldridge and contributors. It is not supported by any entity, including Timothy's employer, or any employers of any other contributors. Copying Free use of this software is granted under the terms of the GNU Lesser General Public License (LGPL). For details see the files COPYING and COPYING.LESSER included with the source distribution. All copyrights are owned by their respective authors.",Unknown,Unknown 227,Unknown,Unknown,Unknown,"Powerline :Author: Kim Silkebækken (kim.silkebaekken+vim@gmail.com) :Source: :Version: beta Powerline is a statusline plugin for vim, and provides statuslines and prompts for several other applications, including zsh, bash, fish, tmux, IPython, Awesome, i3 and Qtile. Support forum _ (powerline support@googlegroups.com) Development discussion _ (powerline dev@googlegroups.com) .. image:: :target: travis build status _ :alt: Build status .. _travis build status: .. _ Support forum : .. _ Development discussion : Features Extensible and feature rich, written in Python. Powerline was completely rewritten in Python to get rid of as much vimscript as possible. This has allowed much better extensibility, leaner and better config files, and a structured, object oriented codebase with no mandatory third party dependencies other than a Python interpreter. Stable and testable code base. Using Python has allowed unit testing of all the project code. The code is tested to work in Python 2.6+ and Python 3. Support for prompts and statuslines in many applications. Originally created exclusively for vim statuslines, the project has evolved to provide statuslines in tmux and several WMs, and prompts for shells like bash/zsh and other applications. It’s simple to write renderers for any other applications that Powerline doesn’t yet support. Configuration and colorschemes written in JSON. JSON is a standardized, simple and easy to use file format that allows for easy user configuration across all of Powerline’s supported applications. Fast and lightweight, with daemon support for even better performance. Although the code base spans a couple of thousand lines of code with no goal of “less than X lines of code”, the main focus is on good performance and as little code as possible while still providing a rich set of features. The new daemon also ensures that only one Python instance is launched for prompts and statuslines, which provides excellent performance. But I hate Python / I don’t need shell prompts / this is just too much hassle for me / what happened to the original vim powerline project / … You should check out some of the Powerline derivatives. The most lightweight and feature rich alternative is currently the vim airline _ project. Consult the documentation _ for more information and installation instructions. Check out powerline fonts _ for pre patched versions of popular, open source coding fonts. Screenshots Vim statusline ^^^^^^^^^^^^^^ Mode dependent highlighting .. image:: :alt: Normal mode .. image:: :alt: Insert mode .. image:: :alt: Visual mode .. image:: :alt: Replace mode Automatic truncation of segments in small windows .. image:: :alt: Truncation illustration .. image:: :alt: Truncation illustration .. image:: :alt: Truncation illustration The font in the screenshots is Pragmata Pro _ by Fabrizio Schiavi. .. _ Pragmata Pro :",Unknown,Unknown 228,Unknown,Unknown,Unknown,"Notify.io Notify.io is the open notification platform for the web. These notes are for people that are interested in contributing or learning about about Notify.io works. If you just want to use it to get notifications, sign up at Notify.io . Getting Started You need the Python App Engine SDK installed. To start the server dev_appserver.py p 8081 www If your shell can't find the dev_appserver.py command, you need to create a symlink to this command . Alternatively, if you don't want to use the shell you can use the App Engine Launcher. Note : When running in development mode, these outlets will not work: Desktop Notifier Email (unless you set it up ) any outlet that requires keys Running tests You'll need these packages: nose NoseGAE WebTest Run the test suite with: nosetests with gae or edit your .noserc accordingly. Make sure you're in the www directory.",Unknown,Unknown 229,Unknown,Unknown,Unknown,"Public APIs Build Status A collective list of free APIs for use in software and web development. Sponsor: A public API for this project can be found here thanks to DigitalOcean for helping us provide this service! For information on contributing to this project, please see the contributing guide (.github/CONTRIBUTING.md). Please note a passing build status indicates all listed APIs are available since the last update. A failing build status indicates that 1 or more services may be unavailable at the moment. Index Animals ( animals) Anime ( anime) Anti Malware ( anti malware) Art & Design ( art design) Books ( books) Business ( business) Calendar ( calendar) Cloud Storage & File Sharing ( cloud storage file sharing) Continuous Integration ( continuous integration) Cryptocurrency ( cryptocurrency) Currency Exchange ( currency exchange) Data Validation ( data validation) Development ( development) Dictionaries ( dictionaries) Documents & Productivity ( documents productivity) Environment ( environment) Events ( events) Finance ( finance) Food & Drink ( food drink) Fraud Prevention ( fraud prevention) Games & Comics ( games comics) Geocoding ( geocoding) Government ( government) Health ( health) Jobs ( jobs) Machine Learning ( machine learning) Music ( music) News ( news) Open Data ( open data) Open Source Projects ( open source projects) Patent ( patent) Personality ( personality) Photography ( photography) Science & Math ( science math) Security ( security) Shopping ( shopping) Social ( social) Sports & Fitness ( sports fitness) Test Data ( test data) Text Analysis ( text analysis) Tracking ( tracking) Transportation ( transportation) URL Shorteners ( url shorteners) Vehicle ( vehicle) Video ( video) Weather ( weather) Animals API Description Auth HTTPS CORS Cat Facts Daily cat facts No Yes No Cats Pictures of cats from Tumblr apiKey Yes Unknown Dogs Based on the Stanford Dogs Dataset No Yes Yes HTTPCat Cat for every HTTP Status No Yes Unknown IUCN IUCN Red List of Threatened Species apiKey No Unknown Movebank Movement and Migration data of animals No Yes Unknown Petfinder Adoption apiKey Yes Unknown RandomCat Random pictures of cats No Yes Yes RandomDog Random pictures of dogs No Yes Yes RandomFox Random pictures of foxes No Yes No RescueGroups Adoption No Yes Unknown Shibe.Online Random pictures of Shibu Inu, cats or birds No No No Anime API Description Auth HTTPS CORS AniList Anime discovery & tracking OAuth Yes Unknown AnimeNewsNetwork Anime industry news No Yes Yes Jikan Unofficial MyAnimeList API No Yes Yes Kitsu Anime discovery platform OAuth Yes Unknown Studio Ghibli Resources from Studio Ghibli films No Yes Unknown Anti Malware API Description Auth HTTPS CORS AlienVault Open Threat Exchange (OTX) IP/domain/URL reputation apiKey Yes Unknown Google Safe Browsing Google Link/Domain Flagging apiKey Yes Unknown Metacert Metacert Link Flagging apiKey Yes Unknown VirusTotal VirusTotal File/URL Analysis apiKey Yes Unknown Web Of Trust (WOT) Website reputation apiKey Yes Unknown Art & Design API Description Auth HTTPS CORS Behance Design apiKey Yes Unknown Cooper Hewitt Smithsonian Design Museum apiKey Yes Unknown Dribbble Design OAuth No Unknown Harvard Art Museums Art apiKey No Unknown Iconfinder Icons apiKey Yes Unknown Icons8 Icons OAuth Yes Unknown Noun Project Icons OAuth No Unknown Rijksmuseum Art apiKey Yes Unknown Books API Description Auth HTTPS CORS Bhagavad Gita Bhagavad Gita text OAuth Yes Yes BookNomads Books published in the Netherlands and Flanders (about 2.5 million), book covers and related data No Yes Unknown British National Bibliography Books No No Unknown Goodreads Books apiKey Yes Unknown Google Books Books OAuth Yes Unknown LibGen Library Genesis search engine No No Unknown Open Library Books, book covers and related data No Yes Unknown Penguin Publishing Books, book covers and related data No Yes Unknown Business API Description Auth HTTPS CORS Charity Search Non profit charity data apiKey No Unknown Clearbit Logo Search for company logos and embed them in your projects apiKey Yes Unknown Domainsdb.info Registered Domain Names Search No Yes Unknown Freelancer Hire freelancers to get work done OAuth Yes Unknown Gmail Flexible, RESTful access to the user's inbox OAuth Yes Unknown Google Analytics Collect, configure and analyze your data to reach the right audience OAuth Yes Unknown mailgun Email Service apiKey Yes Unknown markerapi Trademark Search No No Unknown Ticksel Friendly website analytics made for humans No Yes Unknown Trello Boards, lists and cards to help you organize and prioritize your projects OAuth Yes Unknown Calendar API Description Auth HTTPS CORS Calendar Index Worldwide Holidays and Working Days apiKey Yes Yes Church Calendar Catholic liturgical calendar No No Unknown Czech Namedays Calendar Lookup for a name and returns nameday date No No Unknown Google Calendar Display, create and modify Google calendar events OAuth Yes Unknown Hebrew Calendar Convert between Gregarian and Hebrew, fetch Shabbat and Holiday times, etc No No Unknown Holidays Historical data regarding holidays apiKey Yes Unknown LectServe Protestant liturgical calendar No No Unknown Nager.Date Public holidays for more than 90 countries No Yes No Namedays Calendar Provides namedays for multiple countries No Yes Yes Non Working Days Database of ICS files for non working days No Yes Unknown Russian Calendar Check if a date is a Russian holiday or not No Yes No Cloud Storage & File Sharing API Description Auth HTTPS CORS Box File Sharing and Storage OAuth Yes Unknown Dropbox File Sharing and Storage OAuth Yes Unknown Google Drive File Sharing and Storage OAuth Yes Unknown OneDrive File Sharing and Storage OAuth Yes Unknown Pastebin Plain Text Storage apiKey Yes Unknown WeTransfer File Sharing apiKey Yes Yes Continuous Integration API Description Auth HTTPS CORS CircleCI Automate the software development process using continuous integration and continuous delivery apiKey Yes Unknown Codeship Codeship is a Continuous Integration Platform in the cloud apiKey Yes Unknown Travis CI Sync your GitHub projects with Travis CI to test your code in minutes apiKey Yes Unknown Cryptocurrency API Description Auth HTTPS CORS Binance Exchange for Trading Cryptocurrencies based in China apiKey Yes Unknown BitcoinAverage Digital Asset Price Data for the blockchain industry apiKey Yes Unknown BitcoinCharts Financial and Technical Data related to the Bitcoin Network No Yes Unknown Bitfinex Cryptocurrency Trading Platform apiKey Yes Unknown Bitmex Real Time Cryptocurrency derivatives trading platform based in Hong Kong apiKey Yes Unknown Bittrex Next Generation Crypto Trading Platform apiKey Yes Unknown Block Bitcoin Payment, Wallet & Transaction Data apiKey Yes Unknown Blockchain Bitcoin Payment, Wallet & Transaction Data No Yes Unknown CoinAPI All Currency Exchanges integrate under a single api apiKey Yes No Coinbase Bitcoin, Bitcoin Cash, Litecoin and Ethereum Prices apiKey Yes Unknown Coinbase Pro Cryptocurrency Trading Platform apiKey Yes Unknown CoinDesk Bitcoin Price Index No No Unknown CoinGecko Cryptocurrency Price, Market, and Developer/Social Data No Yes Yes Coinigy Interacting with Coinigy Accounts and Exchange Directly apiKey Yes Unknown CoinLayer Real time Crypto Currency Exchange Rates apiKey Yes Unknown Coinlib Crypto Currency Prices apiKey Yes Unknown Coinlore Cryptocurrencies prices, volume and more No Yes Unknown CoinMarketCap Cryptocurrencies Prices apiKey Yes Unknown Coinpaprika Cryptocurrencies prices, volume and more No Yes Yes CoinRanking Live Cryptocurrency data No Yes Unknown CryptoCompare Cryptocurrencies Comparison No Yes Unknown Cryptonator Cryptocurrencies Exchange Rates No Yes Unknown Gemini Cryptocurrencies Exchange No Yes Unknown ICObench Various information on listing, ratings, stats, and more apiKey Yes Unknown Livecoin Cryptocurrency Exchange No Yes Unknown MercadoBitcoin Brazilian Cryptocurrency Information No Yes Unknown Nexchange Automated cryptocurrency exchange service No No Yes NiceHash Largest Crypto Mining Marketplace apiKey Yes Unknown Poloniex US based digital asset exchange apiKey Yes Unknown WorldCoinIndex Cryptocurrencies Prices apiKey Yes Unknown Zloader Due diligence data platform apiKey Yes Unknown Currency Exchange API Description Auth HTTPS CORS 1Forge Forex currency market data apiKey Yes Unknown CryptoStandardizer Standardize crypto coin symbols (e.g. BTC, XBT) across 100+ exchanges apiKey Yes Unknown Currencylayer Exchange rates and currency conversion apiKey Yes Unknown Czech National Bank A collection of exchange rates No Yes Unknown Exchangeratesapi.io Exchange rates with currency conversion No Yes Yes Fixer.io Exchange rates and currency conversion apiKey Yes Unknown Data Validation API Description Auth HTTPS CORS Cloudmersive Validate Validate email addresses, phone numbers, VAT numbers and domain names apiKey Yes Yes languagelayer Language detection No Yes Unknown Lob.com US Address Verification apiKey Yes Unknown mailboxlayer Email address validation No Yes Unknown NumValidate Open Source phone number validation No Yes Unknown numverify Phone number validation No Yes Unknown PurgoMalum Content validator against profanity & obscenity No No Unknown vatlayer VAT number validation No Yes Unknown Development API Description Auth HTTPS CORS 24 Pull Requests Project to promote open source collaboration during December No Yes Yes ApiFlash Chrome based screenshot API for developers apiKey Yes Unknown Apility.io IP, Domains and Emails anti abuse API blocklist No Yes Yes APIs.guru Wikipedia for Web APIs, OpenAPI/Swagger specs for public APIs No Yes Unknown BetterMeta Return a site's meta tags in JSON format X Mashape Key Yes Unknown Bitbucket Pull public information for a Bitbucket account No Yes Unknown Bored Find random activities to fight boredom No Yes Unknown Browshot Easily make screenshots of web pages in any screen size, as any device apiKey Yes Unknown CDNJS Library info on CDNJS No Yes Unknown Changelogs.md Structured changelog metadata from open source projects No Yes Unknown CountAPI Free and simple counting service. You can use it to track page hits and specific events No Yes Yes DigitalOcean Status Status of all DigitalOcean services No Yes Unknown DomainDb Info Domain name search to find all domains containing particular words/phrases/etc No Yes Unknown Faceplusplus A tool to detect face OAuth Yes Unknown Genderize.io Determines a gender from a first name No Yes Unknown GitHub Make use of GitHub repositories, code and user info programmatically OAuth Yes Yes Gitlab Automate GitLab interaction programmatically OAuth Yes Unknown Gitter Chat for GitHub OAuth Yes Unknown HTTP2.Pro Test endpoints for client and server HTTP/2 protocol support No Yes Unknown IBM Text to Speech Convert text to speech apiKey Yes Yes import.io Retrieve structured data from a website or RSS feed apiKey Yes Unknown IPify A simple IP Address API No Yes Unknown IPinfo Another simple IP Address API No Yes Unknown JSON 2 JSONP Convert JSON to JSONP (on the fly) for easy cross domain data requests using client side JavaScript No Yes Unknown JSONbin.io Free JSON storage service. Ideal for small scale Web apps, Websites and Mobile apps apiKey Yes Yes Judge0 Compile and run source code No Yes Unknown Let's Validate Uncovers the technologies used on websites and URL to thumbnail No Yes Unknown License API Unofficial REST API for choosealicense.com No Yes No LiveEdu Live Coding Streaming OAuth Yes Unknown MAC address vendor lookup Retrieve vendor details and other information regarding a given MAC address or an OUI apiKey Yes Yes Myjson A simple JSON store for your web or mobile app No No Unknown OOPSpam Multiple spam filtering service No Yes Yes Plino Spam filtering system No Yes Unknown Postman Tool for testing APIs apiKey Yes Unknown ProxyCrawl Scraping and crawling anticaptcha service apiKey Yes Unknown Public APIs A collective list of free JSON APIs for use in web development No Yes Unknown Pusher Beams Push notifications for Android & iOS apiKey Yes Unknown QR code Create an easy to read QR code and URL shortener No Yes Yes QR code Generate and decode / read QR code graphics No Yes Unknown QuickChart Generate chart and graph images No Yes Yes ReqRes A hosted REST API ready to respond to your AJAX requests No Yes Unknown Scrape Website Email Grabs email addresses from a URL X Mashape Key Yes Unknown ScraperApi Easily build scalable web scrapers apiKey Yes Unknown SHOUTCLOUD ALL CAPS AS A SERVICE No No Unknown StackExchange Q&A forum for developers OAuth Yes Unknown Verse Check what's the latest version of your favorite open source project No Yes Unknown XML to JSON Integration developer utility APIs No Yes Unknown Dictionaries API Description Auth HTTPS CORS Merriam Webster Dictionary and Thesaurus Data apiKey Yes Unknown Oxford Dictionary Data apiKey Yes No Wordnik Dictionary Data apiKey No Unknown Words Definitions and synonyms for more than 150,000 words apiKey Yes Unknown Documents & Productivity API Description Auth HTTPS CORS Cloudmersive Document and Data Conversion HTML/URL to PDF/PNG, Office documents to PDF, image conversion apiKey Yes Yes File.io File Sharing No Yes Unknown Mercury Web parser apiKey Yes Unknown pdflayer HTML/URL to PDF apiKey Yes Unknown Pocket Bookmarking service OAuth Yes Unknown PrexView Data from XML or JSON to PDF, HTML or Image apiKey Yes Unknown Restpack Provides screenshot, HTML to PDF and content extraction APIs apiKey Yes Unknown Todoist Todo Lists OAuth Yes Unknown Vector Express Free vector file converting API No No Yes WakaTime Automated time tracking leaderboards for programmers No Yes Unknown Wunderlist Todo Lists OAuth Yes Unknown Environment API Description Auth HTTPS CORS AirVisual Air quality and weather data apiKey Yes Unknown OpenAQ Open air quality data apiKey Yes Unknown PM2.5.in Air quality of China apiKey No Unknown PVWatts Energy production photovoltaic (PV) energy systems apiKey Yes Unknown UK Carbon Intensity The Official Carbon Intensity API for Great Britain developed by National Grid No Yes Unknown Events API Description Auth HTTPS CORS Eventbrite Find events OAuth Yes Unknown Picatic Sell tickets anywhere apiKey Yes Unknown Ticketmaster Search events, attractions, or venues apiKey Yes Unknown Finance API Description Auth HTTPS CORS Alpha Vantage Realtime and historical stock data apiKey Yes Unknown Barchart OnDemand Stock, Futures and Forex Market Data apiKey Yes Unknown Consumer Financial Protection Bureau Financial services consumer complaint data apiKey Yes Unknown Financial Modeling Prep Stock information and data No Yes Unknown IEX Stocks and Market Data No Yes Unknown IG Spreadbetting and CFD Market Data apiKey Yes Unknown Plaid Connect with users’ bank accounts and access transaction data apiKey Yes Unknown Razorpay IFSC Indian Financial Systems Code (Bank Branch Codes) No Yes Unknown RoutingNumbers.info ACH/NACHA Bank Routing Numbers No Yes Unknown Tradier US equity/option market data (delayed, intraday, historical) OAuth Yes Yes VAT Rates A collection of all VAT rates for EU countries No Yes Unknown YNAB Budgeting & Planning OAuth Yes Yes Food & Drink API Description Auth HTTPS CORS Edamam Recipe Search apiKey Yes Unknown Food2Fork Recipe Search apiKey No Unknown LCBO Alcohol apiKey Yes Unknown Open Brewery DB Breweries, Cideries and Craft Beer Bottle Shops No Yes Yes Open Food Facts Food Products Database No Yes Unknown PunkAPI Brewdog Beer Recipes No Yes Unknown Recipe Puppy Food No No Unknown TacoFancy Community driven taco database No No Unknown The Report of the Week Food & Drink Reviews No Yes Unknown TheCocktailDB Cocktail Recipes apiKey Yes Yes TheMealDB Meal Recipes apiKey Yes Yes What's on the menu? NYPL human transcribed historical menu collection apiKey No Unknown Zomato Discover restaurants apiKey Yes Unknown Fraud Prevention API Description Auth HTTPS CORS Whitepages Pro Global identity verification with phone, address, email and IP apiKey Yes Unknown Whitepages Pro Phone reputation to detect spammy phones apiKey Yes Unknown Whitepages Pro Get an owner’s name, address, demographics based on the phone number apiKey Yes Unknown Whitepages Pro Phone number validation, line_type, carrier append apiKey Yes Unknown Whitepages Pro Get normalized physical address, residents, address type and validity apiKey Yes Unknown Games & Comics API Description Auth HTTPS CORS Age of Empires II Get information about Age of Empires II resources No Yes Unknown AmiiboAPI Amiibo Information No No Yes Battle.net Blizzard Entertainment apiKey Yes Unknown Battlefield 4 Battlefield 4 Information No Yes Unknown Chuck Norris Database Jokes No No Unknown Clash of Clans Clash of Clans Game Information apiKey Yes Unknown Clash Royale Clash Royale Game Information apiKey Yes Unknown Comic Vine Comics No Yes Unknown Deck of Cards Deck of Cards No No Unknown Destiny The Game Bungie Platform API apiKey Yes Unknown Dota 2 Provides information about Player stats , Match stats, Rankings for Dota 2 No Yes Unknown Eve Online Third Party Developer Documentation OAuth Yes Unknown Fortnite Fortnite Stats & Cosmetics No Yes Yes Fortnite Fortnite Stats apiKey Yes Unknown Giant Bomb Video Games No Yes Unknown Guild Wars 2 Guild Wars 2 Game Information apiKey Yes Unknown Halo Halo 5 and Halo Wars 2 Information apiKey Yes Unknown Hearthstone Hearthstone Cards Information X Mashape Key Yes Unknown Hypixel Hypixel player stats apiKey Yes Unknown IGDB.com Video Game Database apiKey Yes Unknown Jokes Programming and general jokes No Yes Unknown Jservice Jeopardy Question Database No No Unknown Magic The Gathering Magic The Gathering Game Information No No Unknown Marvel Marvel Comics apiKey No Unknown mod.io Cross Platform Mod API apiKey Yes Unknown Open Trivia Trivia Questions No Yes Unknown PandaScore E sports games and results apiKey Yes Unknown PlayerUnknown's Battlegrounds PUBG Stats apiKey Yes Unknown Pokéapi Pokémon Information No Yes Unknown Pokémon TCG Pokémon TCG Information No Yes Unknown Rick and Morty All the Rick and Morty information, including images No Yes Yes Riot Games League of Legends Game Information apiKey Yes Unknown Steam Steam Client Interaction OAuth Yes Unknown Vainglory Vainglory Players, Matches and Telemetry apiKey Yes Yes Wargaming.net Wargaming.net info and stats apiKey Yes No xkcd Retrieve xkcd comics as JSON No Yes Yes Geocoding API Description Auth HTTPS CORS adresse.data.gouv.fr Address database of France, geocoding and reverse No Yes Unknown Battuta A (country/region/city) in cascade location API apiKey Yes Unknown Bing Maps Create/customize digital maps based on Bing Maps data apiKey Yes Unknown City Context Crime, school and transportation data for US cities apiKey Yes Unknown CitySDK Open APIs for select European cities No Yes Unknown Daum Maps Daum Maps provide multiple APIs for Korean maps apiKey No Unknown GeoApi French geographical data No Yes Unknown Geocod.io Address geocoding / reverse geocoding in bulk apiKey Yes Unknown Geocode.xyz Provides worldwide forward/reverse geocoding, batch geocoding and geoparsing No Yes Unknown GeoJS IP geolocation with ChatOps integration No Yes Yes GeoNames Place names and other geographical data No No Unknown geoPlugin IP geolocation and currency conversion No Yes Yes Google Earth Engine A cloud based platform for planetary scale environmental data analysis apiKey Yes Unknown Google Maps Create/customize digital maps based on Google Maps data apiKey Yes Unknown GraphLoc Free GraphQL IP Geolocation API No Yes Unknown HelloSalut Get hello translation following user language No Yes Unknown HERE Maps Create/customize digital maps based on HERE Maps data apiKey Yes Unknown Indian Cities Get all Indian cities in a clean JSON Format No Yes Yes IP 2 Country Map an IP to a country No Yes Unknown IP Address Details Find geolocation with ip address No Yes Unknown IP Location Find IP address location information No Yes Unknown IP Sidekick Geolocation API that returns extra information about an IP address apiKey Yes Unknown IP Vigilante Free IP Geolocation API No Yes Unknown IPGeolocationAPI.com Locate your visitors by IP with country details No Yes Yes IPInfoDB Free Geolocation tools and APIs for country, region, city and time zone lookup by IP address apiKey Yes Unknown ipstack Locate and identify website visitors by IP address apiKey Yes Unknown LocationIQ Provides forward/reverse geocoding and batch geocoding apiKey Yes Yes Mapbox Create/customize beautiful digital maps apiKey Yes Unknown Mexico Mexico RESTful zip codes API No Yes Unknown One Map, Singapore Singapore Land Authority REST API services for Singapore addresses apiKey Yes Unknown OnWater Determine if a lat/lon is on water or land No Yes Unknown OpenCage Forward and reverse geocoding using open data apiKey Yes Yes OpenStreetMap Navigation, geolocation and geographical data OAuth No Unknown PostcodeData.nl Provide geolocation data based on postcode for Dutch addresses No No Unknown Postcodes.io Postcode lookup & Geolocation for the UK No Yes Yes REST Countries Get information about countries via a RESTful API No Yes Unknown Uebermaps Discover and share maps with friends apiKey Yes Unknown Utah AGRC Utah Web API for geocoding Utah addresses apiKey Yes Unknown ViaCep Brazil RESTful zip codes API No Yes Unknown Zipcodeapi Find out possible zip codes for a city, distance between zip codes etc apiKey Yes Unknown Zippopotam Get information about place such as country, city, state, etc No No Unknown Government API Description Auth HTTPS CORS BCLaws Access to the laws of British Columbia No No Unknown BusinessUSA Authoritative information on U.S. programs, events, services and more apiKey Yes Unknown Census.gov The US Census Bureau provides various APIs and data sets on demographics and businesses No Yes Unknown Colorado Data Engine Formatted and geolocated Colorado public data No Yes Unknown Colorado Information Marketplace Colorado State Government Open Data No Yes Unknown Data USA US Public Data No Yes Unknown Data.gov US Government Data apiKey Yes Unknown Data.parliament.uk Contains live datasets including information about petitions, bills, MP votes, attendence and more No No Unknown District of Columbia Open Data Contains D.C. government public datasets, including crime, GIS, financial data, and so on No Yes Unknown EPA Web services and data sets from the US Environmental Protection Agency No Yes Unknown FEC Information on campaign donations in federal elections apiKey Yes Unknown Federal Register The Daily Journal of the United States Government No Yes Unknown Food Standards Agency UK food hygiene rating data API No No Unknown Open Government, Australia Australian Government Open Data No Yes Unknown Open Government, Belgium Belgium Government Open Data No Yes Unknown Open Government, Canada Canadian Government Open Data No No Unknown Open Government, France French Government Open Data apiKey Yes Unknown Open Government, India Indian Government Open Data apiKey Yes Unknown Open Government, Italy Italy Government Open Data No Yes Unknown Open Government, New Zealand New Zealand Government Open Data No Yes Unknown Open Government, Taiwan Taiwan Government Open Data No Yes Unknown Open Government, USA United States Government Open Data No Yes Unknown Prague Opendata Prague City Open Data No No Unknown Regulations.gov Federal regulatory materials to increase understanding of the Federal rule making process apiKey Yes Unknown Represent by Open North Find Canadian Government Representatives No Yes Unknown USAspending.gov US federal spending data No Yes Unknown Health API Description Auth HTTPS CORS BetterDoctor Detailed information about doctors in your area apiKey Yes Unknown Diabetes Logging and retrieving diabetes information No No Unknown Flutrack Influenza like symptoms with geotracking No No Unknown Healthcare.gov Educational content about the US Health Insurance Marketplace No Yes Unknown Lexigram NLP that extracts mentions of clinical concepts from text, gives access to clinical ontology apiKey Yes Unknown Makeup Makeup Information No No Unknown Medicare Access to the data from the CMS medicare.gov No Yes Unknown NPPES National Plan & Provider Enumeration System, info on healthcare providers registered in US No Yes Unknown Nutritionix Worlds largest verified nutrition database apiKey Yes Unknown openFDA Public FDA data about drugs, devices and foods No Yes Unknown USDA Nutrients National Nutrient Database for Standard Reference No Yes Unknown Jobs API Description Auth HTTPS CORS Adzuna Job board aggregator apiKey Yes Unknown Authentic Jobs Job board for designers, hackers and creative pros apiKey Yes Unknown Careerjet Job search engine apiKey No Unknown Github Jobs Jobs for software developers No Yes Unknown Indeed Job board aggregator apiKey Yes Unknown Jobs2Careers Job aggregator apiKey Yes Unknown Jooble Job search engine apiKey Yes Unknown Juju Job search engine apiKey No Unknown Open Skills Job titles, skills and related jobs data No No Unknown Reed Job board aggregator apiKey Yes Unknown Search.gov Jobs Tap into a list of current jobs openings with the United States government No Yes Unknown The Muse Job board and company profiles apiKey Yes Unknown Upwork Freelance job board and management system OAuth Yes Unknown USAJOBS US government job board apiKey Yes Unknown ZipRecruiter Job search app and website apiKey Yes Unknown Machine Learning API Description Auth HTTPS CORS Clarifai Computer Vision OAuth Yes Unknown Cloudmersive Image captioning, face recognition, NSFW classification apiKey Yes Yes Dialogflow Natural Language Processing apiKey Yes Unknown Keen IO Data Analytics apiKey Yes Unknown Unplugg Forecasting API for timeseries data apiKey Yes Unknown Wit.ai Natural Language Processing OAuth Yes Unknown Music API Description Auth HTTPS CORS AI Mastering Automated Music Mastering apiKey Yes Yes Bandsintown Music Events No Yes Unknown Deezer Music OAuth Yes Unknown Discogs Music OAuth Yes Unknown Genius Crowdsourced lyrics and music knowledge OAuth Yes Unknown Genrenator Music genre generator No Yes Unknown iTunes Search Software products No Yes Unknown Jamendo Music OAuth Yes Unknown LastFm Music apiKey Yes Unknown Lyrics.ovh Simple API to retrieve the lyrics of a song No Yes Unknown Mixcloud Music OAuth Yes Yes MusicBrainz Music No Yes Unknown Musikki Music apiKey Yes Unknown Musixmatch Music apiKey Yes Unknown Openwhyd Download curated playlists of streaming tracks (YouTube, SoundCloud, etc...) No Yes No Songkick Music Events OAuth Yes Unknown Songsterr Provides guitar, bass and drums tabs and chords No Yes Unknown SoundCloud Allow users to upload and share sounds OAuth Yes Unknown Spotify View Spotify music catalog, manage users' libraries, get recommendations and more OAuth Yes Unknown TasteDive Similar artist API (also works for movies and TV shows) apiKey Yes Unknown TheAudioDB Music apiKey No Unknown Vagalume Crowdsourced lyrics and music knowledge apiKey Yes Unknown News API Description Auth HTTPS CORS Chronicling America Provides access to millions of pages of historic US newspapers from the Library of Congress No No Unknown Currents Latest news published in various news sources, blogs and forums apiKey Yes Yes Feedbin RSS reader OAuth Yes Unknown Feedster Searchable and categorized collections of RSS feeds apiKey Yes Unknown New York Times Provides news apiKey Yes Unknown News Headlines currently published on a range of news sources and blogs apiKey Yes Unknown NPR One Personalized news listening experience from NPR OAuth Yes Unknown The Guardian Access all the content the Guardian creates, categorised by tags and section apiKey Yes Unknown The Old Reader RSS reader apiKey Yes Unknown Open Data API Description Auth HTTPS CORS 18F Unofficial US Federal Government API Development No No Unknown Abbreviation Get abbreviations and meanings X Mashape Key Yes Unknown Archive.org The Internet Archive No Yes Unknown Callook.info United States ham radio callsigns No Yes Unknown CARTO Location Information Prediction apiKey Yes Unknown Celebinfo Celebrity information X Mashape Key Yes Unknown CivicFeed News articles and public datasets apiKey Yes Unknown Datakick The open product database apiKey Yes Unknown Enigma Public Broadest collection of public data apiKey Yes Yes fonoApi Mobile Device Description No Yes Unknown French Address Search Address search via the French Government No Yes Unknown LinkPreview Get JSON formatted summary with title, description and preview image for any requested URL apiKey Yes Yes Marijuana Strains Marijuana strains, races, flavors and effects apiKey No Unknown Microlink.io Extract structured data from any website No Yes Yes Quandl Stock Market Data No Yes Unknown Recreation Information Database Recreational areas, federal lands, historic sites, museums, and other attractions/resources(US) apiKey Yes Unknown Scoop.it Content Curation Service apiKey No Unknown Teleport Quality of Life Data No Yes Unknown Universities List University names, countries and domains No Yes Unknown University of Oslo Courses, lecture videos, detailed information for courses etc. for the University of Oslo (Norway) No Yes Unknown UPC database More than 1.5 million barcode numbers from all around the world apiKey Yes Unknown Wikidata Collaboratively edited knowledge base operated by the Wikimedia Foundation OAuth Yes Unknown Wikipedia Mediawiki Encyclopedia No Yes Unknown Yelp Find Local Business OAuth Yes Unknown Open Source Projects API Description Auth HTTPS CORS Countly Countly web analytics No No Unknown Drupal.org Drupal.org No Yes Unknown Evil Insult Generator Evil Insults No Yes Yes Libraries.io Open source software libraries apiKey Yes Unknown Patent API Description Auth HTTPS CORS EPO European patent search system api OAuth Yes Unknown TIPO Taiwan patent search system api apiKey Yes Unknown USPTO USA patent api services No Yes Unknown Personality API Description Auth HTTPS CORS Advice Slip Generate random advice slips No Yes Unknown chucknorris.io JSON API for hand curated Chuck Norris jokes No Yes Unknown FavQs.com FavQs allows you to collect, discover and share your favorite quotes apiKey Yes Unknown Forismatic Inspirational Quotes No No Unknown icanhazdadjoke The largest selection of dad jokes on the internet No Yes Unknown kanye.rest REST API for random Kanye West quotes No Yes Yes Medium Community of readers and writers offering unique perspectives on ideas OAuth Yes Unknown Quotes on Design Inspirational Quotes No Yes Unknown Traitify Assess, collect and analyze Personality No Yes Unknown tronalddump.io Api & web archive for the things Donald Trump has said No Yes Unknown Photography API Description Auth HTTPS CORS Flickr Flickr Services OAuth Yes Unknown Getty Images Build applications using the world's most powerful imagery OAuth Yes Unknown Gfycat Jiffier GIFs OAuth Yes Unknown Giphy Get all your gifs apiKey Yes Unknown Gyazo Upload images apiKey Yes Unknown Imgur Images OAuth Yes Unknown Lorem Picsum Images from Unsplash No Yes Unknown Pixabay Photography apiKey Yes Unknown Pixhost Upload images, photos, galleries No Yes Unknown PlaceKitten Resizable kitten placeholder images No Yes Unknown ScreenShotLayer URL 2 Image No Yes Unknown Unsplash Photography OAuth Yes Unknown Science & Math API Description Auth HTTPS CORS arcsecond.io Multiple astronomy data sources No Yes Unknown CORE Access the world's Open Access research papers apiKey Yes Unknown inspirehep.net High Energy Physics info. system No Yes Unknown Launch Library Upcoming Space Launches No Yes Unknown Minor Planet Center Asterank.com Information No No Unknown NASA NASA data, including imagery No Yes Unknown Newton Symbolic and Arithmetic Math Calculator No Yes Unknown Numbers Facts about numbers No No Unknown Open Notify ISS astronauts, current location, etc No No Unknown Open Science Framework Repository and archive for study designs, research materials, data, manuscripts, etc No Yes Unknown SHARE A free, open, dataset about research and scholarly activities No Yes Unknown SpaceX Company, vehicle, launchpad and launch data No Yes Unknown Sunrise and Sunset Sunset and sunrise times for a given latitude and longitude No Yes Unknown USGS Earthquake Hazards Program Earthquakes data real time No Yes Unknown USGS Water Services Water quality and level info for rivers and lakes No Yes Unknown World Bank World Data No No Unknown Security API Description Auth HTTPS CORS AXFR Database AXFR public database No No Unknown FilterLists Lists of filters for adblockers and firewalls No Yes Unknown HaveIBeenPwned Passwords which have previously been exposed in data breaches No Yes Unknown National Vulnerability Database U.S. National Vulnerability Database No Yes Unknown SecurityTrails Domain and IP related information such as current and historical WHOIS and DNS records apiKey Yes Unknown Shodan Search engine for Internet connected devices apiKey Yes Unknown UK Police UK Police data No Yes Unknown Shopping API Description Auth HTTPS CORS Best Buy Products, Buying Options, Categories, Recommendations, Stores and Commerce apiKey Yes Unknown Bratabase Database of different types of Bra Sizes OAuth Yes Unknown eBay Sell and Buy on eBay OAuth Yes Unknown Wal Mart Item price and availability apiKey Yes Unknown Wegmans Wegmans Food Markets apiKey Yes Unknown Social API Description Auth HTTPS CORS Buffer Access to pending and sent updates in Buffer OAuth Yes Unknown Cisco Spark Team Collaboration Software OAuth Yes Unknown Discord Make bots for Discord, integrate Discord onto an external platform OAuth Yes Unknown Disqus Communicate with Disqus data OAuth Yes Unknown Facebook Facebook Login, Share on FB, Social Plugins, Analytics and more OAuth Yes Unknown Foursquare Interact with Foursquare users and places (geolocation based checkins, photos, tips, events, etc) OAuth Yes Unknown Fuck Off as a Service Asks someone to fuck off No Yes Unknown Full Contact Get Social Media profiles and contact Information OAuth Yes Unknown HackerNews Social news for CS and entrepreneurship No Yes Unknown Instagram Instagram Login, Share on Instagram, Social Plugins and more OAuth Yes Unknown LinkedIn The foundation of all digital integrations with LinkedIn OAuth Yes Unknown Meetup.com Data about Meetups from Meetup.com apiKey Yes Unknown MySocialApp Seamless Social Networking features, API, SDK to any app apiKey Yes Unknown Open Collective Get Open Collective data No Yes Unknown Pinterest The world's catalog of ideas OAuth Yes Unknown PWRTelegram bot Boosted version of the Telegram bot API OAuth Yes Unknown Reddit Homepage of the internet OAuth Yes Unknown SharedCount Social media like and share data for any URL apiKey Yes Unknown Slack Team Instant Messaging OAuth Yes Unknown Telegram Bot Simplified HTTP version of the MTProto API for bots OAuth Yes Unknown Telegram MTProto Read and write Telegram data OAuth Yes Unknown Trash Nothing A freecycling community with thousands of free items posted every day OAuth Yes Yes Tumblr Read and write Tumblr Data OAuth Yes Unknown Twitch Game Streaming API OAuth Yes Unknown Twitter Read and write Twitter data OAuth Yes No vk Read and write vk data OAuth Yes Unknown Sports & Fitness API Description Auth HTTPS CORS balldontlie Ballldontlie provides access to stats data from the NBA No Yes Yes BikeWise Bikewise is a place to learn about and report bike crashes, hazards and thefts No Yes Unknown Canadian Football League (CFL) Official JSON API providing real time league, team and player statistics about the CFL apiKey Yes No Cartola FC The Cartola FC API serves to check the partial points of your team No Yes Unknown City Bikes City Bikes around the world No No Unknown Cricket Live Scores Live cricket scores X Mashape Key Yes Unknown Ergast F1 F1 data from the beginning of the world championships in 1950 No Yes Unknown Fitbit Fitbit Information OAuth Yes Unknown Football Prediction Predictions for upcoming football matches, odds, results and stats X Mashape Key Yes Unknown Football Data.org Football Data No No Unknown JCDecaux Bike JCDecaux's self service bicycles apiKey Yes Unknown NBA Stats Current and historical NBA Statistics No Yes Unknown NFL Arrests NFL Arrest Data No No Unknown Pro Motocross The RESTful AMA Pro Motocross lap times for every racer on the start gate No No Unknown Strava Connect with athletes, activities and more OAuth Yes Unknown SuredBits Query sports data, including teams, players, games, scores and statistics No No No TheSportsDB Crowd Sourced Sports Data and Artwork apiKey Yes Yes Wger Workout manager data as exercises, muscles or equipment apiKey Yes Unknown Test Data API Description Auth HTTPS CORS Adorable Avatars Generate random cartoon avatars No Yes Unknown Bacon Ipsum A Meatier Lorem Ipsum Generator No Yes Unknown Dicebear Avatars Generate random pixel art avatars No Yes No FakeJSON Service to generate test and fake data apiKey Yes Yes FHIR Fast Healthcare Interoperability Resources test data No Yes Unknown Hipster Ipsum Generates Hipster Ipsum text No No Unknown JSONPlaceholder Fake data for testing and prototyping No No Unknown Lorem Text Generates Lorem Ipsum text X Mashape Key Yes Unknown LoremPicsum Generate placeholder pictures No No Unknown Loripsum The lorem ipsum generator that doesn't suck No No Unknown RandomUser Generates random user data No Yes Unknown RoboHash Generate random robot/alien avatars No Yes Unknown UI Names Generate random fake names No Yes Unknown Yes No Generate yes or no randomly No Yes Unknown Text Analysis API Description Auth HTTPS CORS Aylien Text Analysis A collection of information retrieval and natural language APIs apiKey Yes Unknown Cloudmersive Natural Language Processing Natural language processing and text analysis apiKey Yes Yes Detect Language Detects text language apiKey Yes Unknown Google Cloud Natural Natural language understanding technology, including sentiment, entity and syntax analysis apiKey Yes Unknown Semantira Text Analytics with sentiment analysis, categorization & named entity extraction OAuth Yes Unknown Watson Natural Language Understanding Natural language processing for advanced text analysis OAuth Yes Unknown Tracking API Description Auth HTTPS CORS Postmon An API to query Brazilian ZIP codes and orders easily, quickly and free No No Unknown Sweden Provides information about parcels in transport apiKey No Unknown UPS Shipment and Address information apiKey Yes Unknown WhatPulse Small application that measures your keyboard/mouse usage No Yes Unknown Transportation API Description Auth HTTPS CORS ADS B Exchange Access real time and historical data of any and all airborne aircraft No Yes Unknown AIS Hub Real time data of any marine and inland vessel equipped with AIS tracking system apiKey No Unknown AIS Web Aeronautical information in digital media produced by the Department of Airspace Control (DECEA) apiKey No Unknown Amadeus Travel Innovation Sandbox Travel Search Limited usage apiKey Yes Unknown Bay Area Rapid Transit Stations and predicted arrivals for BART apiKey No Unknown Community Transit Transitland API No Yes Unknown Goibibo API for travel search apiKey Yes Unknown GraphHopper A to B routing with turn by turn instructions apiKey Yes Unknown Icelandic APIs Open APIs that deliver services in or regarding Iceland No Yes Unknown Indian Railways Indian Railways Information apiKey No Unknown Izi Audio guide for travellers apiKey Yes Unknown Navitia The open API for building cool stuff with transport data apiKey Yes Unknown REFUGE Restrooms Provides safe restroom access for transgender, intersex and gender nonconforming individuals No Yes Unknown Schiphol Airport Schiphol apiKey Yes Unknown TransitLand Transit Aggregation No Yes Unknown Transport for Atlanta, US Marta No No Unknown Transport for Auckland, New Zealand Auckland Transport No Yes Unknown Transport for Belgium Belgian transport API No Yes Unknown Transport for Berlin, Germany Third party VBB API No Yes Unknown Transport for Boston, US MBTA API No No Unknown Transport for Budapest, Hungary Budapest public transport API No Yes Unknown Transport for Chicago, US CTA No No Unknown Transport for Czech Republic Czech transport API No Yes Unknown Transport for Denver, US RTD No No Unknown Transport for Finland Finnish transport API No Yes Unknown Transport for Germany Deutsche Bahn (DB) API apiKey No Unknown Transport for Grenoble, France Grenoble public transport No No No Transport for Honolulu, US Honolulu Transportation Information apiKey No Unknown Transport for India India Public Transport API apiKey Yes Unknown Transport for London, England TfL API No Yes Unknown Transport for Madrid, Spain Madrid BUS transport API apiKey No Unknown Transport for Manchester, England TfGM transport network data apiKey Yes No Transport for Minneapolis, US NexTrip API OAuth No Unknown Transport for New York City, US MTA apiKey No Unknown Transport for Norway Norwegian transport API No No Unknown Transport for Ottawa, Canada OC Transpo next bus arrival API No No Unknown Transport for Paris, France Live schedules made simple No No Unknown Transport for Paris, France RATP Open Data API No No Unknown Transport for Philadelphia, US SEPTA APIs No No Unknown Transport for Sao Paulo, Brazil SPTrans OAuth No Unknown Transport for Sweden Public Transport consumer OAuth Yes Unknown Transport for Switzerland Official Swiss Public Transport Open Data apiKey Yes Unknown Transport for Switzerland Swiss public transport API No Yes Unknown Transport for The Netherlands NS, only trains apiKey No Unknown Transport for The Netherlands OVAPI, country wide public transport No Yes Unknown Transport for Toronto, Canada TTC No Yes Unknown Transport for United States NextBus API No No Unknown Transport for Vancouver, Canada TransLink OAuth Yes Unknown Transport for Victoria, AU PTV API apiKey Yes Unknown Transport for Washington, US Washington Metro transport API OAuth Yes Unknown Uber Uber ride requests and price estimation OAuth Yes Yes WhereIsMyTransport Platform for public transport data in emerging cities OAuth Yes Unknown URL Shorteners API Description Auth HTTPS CORS Bitly URL shortener and link management OAuth Yes Unknown CleanURI URL shortener service No Yes Yes ClickMeter Monitor, compare and optimize your marketing links apiKey Yes Unknown Rebrandly Custom URL shortener for sharing branded links apiKey Yes Unknown Vehicle API Description Auth HTTPS CORS Brazilian Vehicles and Prices Vehicles information from Fundação Instituto de Pesquisas Econômicas Fipe No Yes Unknown Kelley Blue Book Vehicle info, pricing, configuration, plus much more apiKey Yes No Mercedes Benz Telematics data, remotely access vehicle functions, car configurator, locate service dealers apiKey Yes No NHTSA NHTSA Product Information Catalog and Vehicle Listing No Yes Unknown Video API Description Auth HTTPS CORS An API of Ice And Fire Game Of Thrones API No Yes Unknown Breaking Bad Quotes Some Breaking Bad quotes No Yes Unknown Czech Television TV programme of Czech TV No No Unknown Dailymotion Dailymotion Developer API OAuth Yes Unknown Open Movie Database Movie information apiKey Yes Unknown Ron Swanson Quotes Television No Yes Unknown STAPI Information on all things Star Trek No No No SWAPI Star Wars Information No Yes Unknown TMDb Community based movie data apiKey Yes Unknown TVDB Television data apiKey Yes Unknown TVMaze TV Show Data No No Unknown Utelly Check where a tv show or movie is available X Mashape Key Yes Unknown Vimeo Vimeo Developer API OAuth Yes Unknown YouTube Add YouTube functionality to your sites and apps OAuth Yes Unknown Weather API Description Auth HTTPS CORS APIXU Weather apiKey Yes Unknown Dark Sky Weather apiKey Yes No MetaWeather Weather No Yes No NOAA Climate Data Weather and climate data apiKey Yes Unknown ODWeather Weather and weather webcams No No Unknown OpenUV Real time UV Index Forecast apiKey Yes Unknown OpenWeatherMap Weather apiKey No Unknown Storm Glass Global marine weather from multiple sources apiKey Yes Yes Weatherbit Weather apiKey Yes Unknown Yahoo! Weather Weather apiKey Yes Unknown",Unknown,Unknown 230,Unknown,Unknown,Unknown,"PyBuilder PyBuilder Gitter Build Status Windows build status PyPI version Coverage Status Ready in backlog Open bugs PyBuilder is a software build tool written in 100% pure Python, mainly targeting Python applications. PyBuilder is based on the concept of dependency based programming, but it also comes with a powerful plugin mechanism, allowing the construction of build life cycles similar to those known from other famous (Java) build tools. PyBuilder is running on the following versions of Python: 2.7, 3.4, 3.5, 3.6, 3.7, PyPy 2.7, and PyPy 3.5. See the Travis Build for version specific output. Installing PyBuilder is available using pip: $ pip install pybuilder For development builds use: $ pip install pre pybuilder See the Cheeseshop page for more information. Getting started PyBuilder emphasizes simplicity. If you want to build a pure Python project and use the recommended directory layout, all you have to do is create a file build.py with the following content: python from pybuilder.core import use_plugin use_plugin( python.core ) use_plugin( python.unittest ) use_plugin( python.coverage ) use_plugin( python.distutils ) default_task publish See the PyBuilder homepage for more details and a list of plugins. Release Notes The release notes can be found here . There will also be a git tag with each release. Please note that we do not currently promote tags to GitHub releases . Development See developing PyBuilder",Unknown,Unknown 231,Unknown,Unknown,Unknown,"Unmaintained Project This project is currently unmaintained. If you would like to see development continue, you have two options: 1. Hire the maintainers to work on this project. 2. Volunteer to take over the project yourself. To explore either option, please contact @pydanny. dj libcloud .. image:: :target: Adds easy python 3 and 2.7 support to Django for management of static assets. This is a wrapper around the excellent Apache Libcloud _ library. .. _ Apache Libcloud : Documentation The full documentation is at Quickstart Libcloud verifies server SSL certificates before it lets you do anything. It will search your system for the CA certificate, and if it doesn't find it then it will blow up. See Installing CA certificate bundle on Mac OS X:: Assuming you are using homebrew for Mac OS X dependency management. $ brew install curl ca bundle Install dj libcloud:: $ pip install dj libcloud Then use it in a project, e.g. for your static files:: settings.py STATIC_URL ' STATICFILES_STORAGE 'djlibcloud.storage.LibCloudStorage' LIBCLOUD_PROVIDERS { 'default': { 'type': 'libcloud.storage.types.Provider.S3', 'user': os.environ.get('AWS_ACCESS_KEY'), 'key': os.environ.get('AWS_SECRET_KEY'), 'bucket': 'my assets', 'secure': True, }, } Other LibCloud Providers If you want to use other libcloud providers (Rackspace, Openstack, other AWS centers, et al), please visit: The libcloud list of supported providers _ The dj libcloud cookbook _. .. _ libcloud list of supported providers : .. _ dj libcloud cookbook : Features Works for uploading media assets using Python 3.3 and Django 1.6. In theory supports all the backends that libcloud supports. FAQ Because you just had to ask. Why not use dj static or whitenoise? ++++++++++++++++++++++++++++++++++++++++++++++++++++++ Those are great libraries, but are not what you want when handling user uploaded media. Why not just update django storages? ++++++++++++++++++++++++++++++++++++++++++++++++++++++ libcloud is awesome and has a dedicated team devoted to it. We can have it do most of the heavy lifting. On the other hand, converting django storages to work with Python 3 looked like too much work. Sometimes you just have to start anew, right? What storage providers does dj libcloud support? +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ dj libcloud is a wrapper around libcloud, meaning it supports all the providers of that library. Check out the full list of supported providers _! .. _ full list of supported providers : How can I contribute? ++++++++++++++++++++++++++++++++++++ Please read What about compressors like django pipeline? ++++++++++++++++++++++++++++++++++++++++++++++++++++++ Working on it. Currently the PipelineCachedCloudStorage class breaks the second time you run it. See CREDIT Many thanks to Jannis Leidel (@jezdez) for giving me the code to get this started. He's a Django core developer, the master of Django static asset managment, and overall a great great guy.",Unknown,Unknown 232,Unknown,Unknown,Unknown,"Pipenv: Python Development Workflow for Humans image image image Azure Pipelines build status (Linux) ?branchName master&label Linux) Azure Pipelines build status (Windows) ?branchName master&label Windows) image image Pipenv is a tool that aims to bring the best of all packaging worlds (bundler, composer, npm, cargo, yarn, etc.) to the Python world. Windows is a first class citizen, in our world. It automatically creates and manages a virtualenv for your projects, as well as adds/removes packages from your Pipfile as you install/uninstall packages. It also generates the ever important Pipfile.lock , which is used to produce deterministic builds. ! image The problems that Pipenv seeks to solve are multi faceted: You no longer need to use pip and virtualenv separately. They work together. Managing a requirements.txt file can be problematic , so Pipenv uses the upcoming Pipfile and Pipfile.lock instead, which is superior for basic use cases. Hashes are used everywhere, always. Security. Automatically expose security vulnerabilities. Give you insight into your dependency graph (e.g. $ pipenv graph ). Streamline development workflow by loading .env files. You can quickly play with Pipenv right in your browser: Try in browser Installation If you\'re on MacOS, you can install Pipenv easily with Homebrew: $ brew install pipenv Or, if you\'re using Debian Buster+: $ sudo apt install pipenv Or, if you\'re using Fedora 28: $ sudo dnf install pipenv Or, if you\'re using FreeBSD: pkg install py36 pipenv Otherwise, refer to the documentation for instructions. ✨🍰✨ ☤ User Testimonials Jannis Leidel , former pip maintainer : Pipenv is the porcelain I always wanted to build for pip. It fits my brain and mostly replaces virtualenvwrapper and manual pip calls for me. Use it. David Gang : This package manager is really awesome. For the first time I know exactly what my dependencies are which I installed and what the transitive dependencies are. Combined with the fact that installs are deterministic, makes this package manager first class, like cargo . Justin Myles Holmes : Pipenv is finally an abstraction meant to engage the mind instead of merely the filesystem. ☤ Features Enables truly deterministic builds , while easily specifying only what you want . Generates and checks file hashes for locked dependencies. Automatically install required Pythons, if pyenv is available. Automatically finds your project home, recursively, by looking for a Pipfile . Automatically generates a Pipfile , if one doesn\'t exist. Automatically creates a virtualenv in a standard location. Automatically adds/removes packages to a Pipfile when they are un/installed. Automatically loads .env files, if they exist. The main commands are install , uninstall , and lock , which generates a Pipfile.lock . These are intended to replace $ pip install usage, as well as manual virtualenv management (to activate a virtualenv, run $ pipenv shell ). Basic Concepts A virtualenv will automatically be created, when one doesn\'t exist. When no parameters are passed to install , all packages packages specified will be installed. To initialize a Python 3 virtual environment, run $ pipenv three . To initialize a Python 2 virtual environment, run $ pipenv two . Otherwise, whatever virtualenv defaults to will be the default. Other Commands shell will spawn a shell with the virtualenv activated. run will run a given command from the virtualenv, with any arguments forwarded (e.g. $ pipenv run python ). check asserts that PEP 508 requirements are being met by the current environment. graph will print a pretty graph of all your installed dependencies. Shell Completion For example, with fish, put this in your /.config/fish/completions/pipenv.fish : eval (pipenv completion) Alternatively, with bash, put this in your .bashrc or .bash_profile : eval $(pipenv completion) Magic shell completions are now enabled! There is also a fish plugin , which will automatically activate your subshells for you! Fish is the best shell. You should use it. ☤ Usage $ pipenv Usage: pipenv OPTIONS COMMAND ARGS ... Options: where Output project home information. venv Output virtualenv information. py Output Python interpreter information. envs Output Environment Variable options. rm Remove the virtualenv. bare Minimal output. completion Output completion (to be eval'd). man Display manpage. three / two Use Python 3/2 when creating virtualenv. python TEXT Specify which version of Python virtualenv should use. site packages Enable site packages for the virtualenv. version Show the version and exit. h, help Show this message and exit. Usage Examples: Create a new project using Python 3.7, specifically: $ pipenv python 3.7 Remove project virtualenv (inferred from current directory): $ pipenv rm Install all dependencies for a project (including dev): $ pipenv install dev Create a lockfile containing pre releases: $ pipenv lock pre Show a graph of your installed dependencies: $ pipenv graph Check your installed dependencies for security vulnerabilities: $ pipenv check Install a local setup.py into your virtual environment/Pipfile: $ pipenv install e . Use a lower level pip command: $ pipenv run pip freeze Commands: check Checks for security vulnerabilities and against PEP 508 markers provided in Pipfile. clean Uninstalls all packages not specified in Pipfile.lock. graph Displays currently–installed dependency graph information. install Installs provided packages and adds them to Pipfile, or (if no packages are given), installs all packages from Pipfile. lock Generates Pipfile.lock. open View a given module in your editor. run Spawns a command installed into the virtualenv. shell Spawns a shell within the virtualenv. sync Installs all packages specified in Pipfile.lock. uninstall Un installs a provided package and removes it from Pipfile. Locate the project: $ pipenv where /Users/kennethreitz/Library/Mobile Documents/com apple CloudDocs/repos/kr/pipenv/test Locate the virtualenv: $ pipenv venv /Users/kennethreitz/.local/share/virtualenvs/test Skyy4vre Locate the Python interpreter: $ pipenv py /Users/kennethreitz/.local/share/virtualenvs/test Skyy4vre/bin/python Install packages: $ pipenv install Creating a virtualenv for this project... ... No package provided, installing all dependencies. Virtualenv location: /Users/kennethreitz/.local/share/virtualenvs/test EJkjoYts Installing dependencies from Pipfile.lock... ... To activate this project's virtualenv, run the following: $ pipenv shell Installing from git: You can install packages with pipenv from git and other version control systems using URLs formatted according to the following rule: + :// / / @ The only optional section is the @ section. When using git over SSH, you may use the shorthand vcs and scheme alias git+git@ : / @ . Note that this is translated to git+ssh://git@ when parsed. Valid values for include git , bzr , svn , and hg . Valid values for include ssh , and file . In specific cases you also have access to other schemes: svn may be combined with svn as a scheme, and bzr can be combined with sftp and lp . Note that it is strongly recommended that you install any version controlled dependencies in editable mode, using pipenv install e , in order to ensure that dependency resolution can be performed with an up to date copy of the repository each time it is performed, and that it includes all known dependencies. Below is an example usage which installs the git repository located at from tag v2.19.1 as package name requests : $ pipenv install e git+ Creating a Pipfile for this project... Installing e git+ ...snipped... Adding e git+ to Pipfile's packages ... ... You can read more about pip's implementation of vcs support here . Install a dev dependency: $ pipenv install pytest dev Installing pytest... ... Adding pytest to Pipfile's dev packages ... Show a dependency graph: $ pipenv graph requests 2.18.4 certifi required: > 2017.4.17, installed: 2017.7.27.1 chardet required: > 3.0.2, 2.5, 1.21.1, installed: 1.22 Generate a lockfile: $ pipenv lock Assuring all dependencies from Pipfile are installed... Locking dev packages dependencies... Locking packages dependencies... Note: your project now has only default packages installed. To install dev packages , run: $ pipenv install dev Install all dev dependencies: $ pipenv install dev Pipfile found at /Users/kennethreitz/repos/kr/pip2/test/Pipfile. Considering this to be the project home. Pipfile.lock out of date, updating... Assuring all dependencies from Pipfile are installed... Locking dev packages dependencies... Locking packages dependencies... Uninstall everything: $ pipenv uninstall all No package provided, un installing all dependencies. Found 25 installed package(s), purging... ... Environment now purged and fresh! Use the shell: $ pipenv shell Loading .env environment variables… Launching subshell in virtual environment. Type 'exit' or 'Ctrl+D' to return. $ ▯ ☤ Documentation Documentation resides over at pipenv.org .",Unknown,Unknown 233,Unknown,Unknown,Unknown,"Python based Simulations of Chemistry Framework Build Status 2019 03 15 Stable release 1.6.1 1.7 alpha Changelog (../master/CHANGELOG) Documentation Installation ( installation) Features (../master/FEATURES) Installation Prerequisites Cmake 2.8 or higher Python 2.6, 2.7, 3.4 or higher Numpy 1.8.0 or higher Scipy 0.10 or higher (0.12.0 or higher for python 3.4 3.6) h5py 2.3.0 or higher (requires HDF5 1.8.4 or higher) Compile core module cd pyscf/lib mkdir build; cd build cmake .. make Note during the compilation, external libraries (libcint, libxc, xcfun) will be downloaded and installed. If you want to disable the automatic downloading, this document shows how to manually build these packages and PySCF C libraries. To export PySCF to Python, you need to set environment variable PYTHONPATH . E.g. if PySCF is installed in /opt, your PYTHONPATH should be export PYTHONPATH /opt/pyscf:$PYTHONPATH Using Intel MKL as BLAS library. Enabling the cmake options DBLA_VENDOR Intel10_64lp_seq when executing cmake cmake DBLA_VENDOR Intel10_64lp_seq .. If cmake does not find MKL, you can define BLAS_LIBRARIES in CMakeLists.txt set(BLAS_LIBRARIES ${BLAS_LIBRARIES};/path/to/mkl/lib/intel64/libmkl_intel_lp64.so ) set(BLAS_LIBRARIES ${BLAS_LIBRARIES};/path/to/mkl/lib/intel64/libmkl_sequential.so ) set(BLAS_LIBRARIES ${BLAS_LIBRARIES};/path/to/mkl/lib/intel64/libmkl_core.so ) set(BLAS_LIBRARIES ${BLAS_LIBRARIES};/path/to/mkl/lib/intel64/libmkl_avx.so ) Using DMRG as the FCI solver for CASSCF. There are two DMRG solver interfaces available in pyscf. Block CheMPS2 After installing the DMRG solver, create a file dmrgscf/settings.py to store the path where the DMRG solver was installed. Using FCIQMC as the FCI solver for CASSCF. NECI After installing the NECI, create a file future/fciqmc/settings.py to store the path where the NECI was installed. Using optimized integral library on X86 platform. Qcint is a branch of libcint library. It is heavily optimized against X86_64 platforms. To replace the default libcint library with qcint library, edit the URL of the integral library in lib/CMakeLists.txt file ExternalProject_Add(libcint GIT_REPOSITORY ... Using pyberny as geometry optimizer. After downloading pyberny git clone /path/to/pyberny edit the environment variable to make pyberny a python module export PYTHONPATH /path/to/pyberny:$PYTHONPATH Tutorials A user guide written in Ipython notebook can be found in This repository documents the basic structure of PySCF input script and the use of regular methods which were routinely executed in most quantum chemistry packages. It also provides an implementation to drive PySCF program in a simple manner. Developer's tutorial can be found in the online documentation and the repository above Known problems mkl 2018.0.0 intel_3 from intelpython gives segfault update to mkl 2018.0.1 intel_4 or superior relaease conda update mkl Error message Library not loaded: libcint.3.0.dylib On OS X. libcint.dylib is installed in pyscf/lib/deps/lib by default. Add /path/to/pyscf/lib/deps/lib to DYLD_LIBRARY_PATH runtime error message OSError: ... mkl/lib/intel64/libmkl_avx.so: undefined symbol: ownLastTriangle_64fc or MKL FATAL ERROR: Cannot load libmkl_avx.so or libmkl_def.so. This is a MKL 11. bug when MKL is used with dlopen function. Preloading MKL libraries can solve this problem on most systems: export LD_PRELOAD $MKLROOT/lib/intel64/libmkl_def.so:$MKLROOT/lib/intel64/libmkl_sequential.so:$MKLROOT/lib/intel64/libmkl_core.so or export LD_PRELOAD $MKLROOT/lib/intel64/libmkl_avx.so:$MKLROOT/lib/intel64/libmkl_core.so h5py installation. If you got problems to install the latest h5py package, you can try the old releases: If you are using Intel compiler (version 16, 17), compilation may be stuck at 95% Building C object CMakeFiles/cint.dir/src/stg_roots.c.o This code is used by F12 integrals only. If you do not need F12 methods, the relevant compilation can be disabled, by searching DWITH_F12 in file lib/CMakeLists.txt and setting it to DWITH_F12 0 . Citing PySCF The following paper should be cited in publications utilizing the PySCF program package: PySCF: the Python based Simulations of Chemistry Framework, Q. Sun, T. C. Berkelbach, N. S. Blunt, G. H. Booth, S. Guo, Z. Li, J. Liu, J. McClain, E. R. Sayfutyarova, S. Sharma, S. Wouters, G. K. L. Chan (2018), PySCF: the Python‐based simulations of chemistry framework. WIREs Comput. Mol. Sci., 8: e1340. doi:10.1002/wcms.1340 Bug report Qiming Sun",Unknown,Unknown 234,Unknown,Unknown,Unknown,"! Black Logo The Uncompromising Code Formatter > “Any color you like.” Black is the uncompromising Python code formatter. By using it, you agree to cede control over minutiae of hand formatting. In return, Black gives you speed, determinism, and freedom from pycodestyle nagging about formatting. You will save time and mental energy for more important matters. Blackened code looks the same regardless of the project you're reading. Formatting becomes transparent after a while and you can focus on the content instead. Black makes code review faster by producing the smallest diffs possible. Try it out now using the Black Playground . Watch the PyCon 2019 talk to learn more. Contents: Installation and usage ( installation and usage) Code style ( the black code style) pyproject.toml ( pyprojecttoml) Editor integration ( editor integration) blackd ( blackd) Version control integration ( version control integration) Ignoring unmodified files ( ignoring unmodified files) Used by ( used by) Testimonials ( testimonials) Show your style ( show your style) Contributing ( contributing to black) Change Log ( change log) Authors ( authors) Installation and usage Installation Black can be installed by running pip install black . It requires Python 3.6.0+ to run but you can reformat Python 2 code with it, too. Usage To get started right away with sensible defaults: black {source_file_or_directory} Command line options Black doesn't provide many options. You can list them by running black help : text black OPTIONS SRC ... Options: c, code TEXT Format the code passed in as a string. l, line length INTEGER How many characters per line to allow. default: 88 t, target version py27 py33 py34 py35 py36 py37 py38 Python versions that should be supported by Black's output. default: per file auto detection py36 Allow using Python 3.6 only syntax on all input files. This will put trailing commas in function signatures and calls also after args and kwargs. Deprecated; use target version instead. default: per file auto detection pyi Format all input files like typing stubs regardless of file extension (useful when piping source on standard input). S, skip string normalization Don't normalize string quotes or prefixes. check Don't write the files back, just return the status. Return code 0 means nothing would change. Return code 1 means some files would be reformatted. Return code 123 means there was an internal error. diff Don't write the files back, just output a diff for each file on stdout. fast / safe If fast given, skip temporary sanity checks. default: safe include TEXT A regular expression that matches files and directories that should be included on recursive searches. An empty value means all files are included regardless of the name. Use forward slashes for directories on all platforms (Windows, too). Exclusions are calculated first, inclusions later. default: \.pyi?$ exclude TEXT A regular expression that matches files and directories that should be excluded on recursive searches. An empty value means no paths are excluded. Use forward slashes for directories on all platforms (Windows, too). Exclusions are calculated first, inclusions later. default: /(\.eggs \.git \.hg \.mypy _cache \.nox \.tox \.venv _build buck out build dist)/ q, quiet Don't emit non error messages to stderr. Errors are still emitted, silence those with 2>/dev/null. v, verbose Also emit messages to stderr about files that were not changed or were ignored due to exclude . version Show the version and exit. config PATH Read configuration from PATH. h, help Show this message and exit. Black is a well behaved Unix style command line tool: it does nothing if no sources are passed to it; it will read from standard input and write to standard output if is used as the filename; it only outputs messages to users on standard error; exits with code 0 unless an internal error occurred (or check was used). NOTE: This is a beta product Black is already successfully used ( used by) by many projects, small and big. It also sports a decent test suite. However, it is still very new. Things will probably be wonky for a while. This is made explicit by the Beta trove classifier, as well as by the b in the version number. What this means for you is that until the formatter becomes stable, you should expect some formatting to change in the future . That being said, no drastic stylistic changes are planned, mostly responses to bug reports. Also, as a temporary safety measure, Black will check that the reformatted code still produces a valid AST that is equivalent to the original. This slows it down. If you're feeling confident, use fast . The Black code style Black reformats entire files in place. It is not configurable. It doesn't take previous formatting into account. It doesn't reformat blocks that start with fmt: off and end with fmt: on . fmt: on/off have to be on the same level of indentation. It also recognizes YAPF 's block comments to the same effect, as a courtesy for straddling code. How Black wraps lines Black ignores previous formatting and applies uniform horizontal and vertical whitespace to your code. The rules for horizontal whitespace can be summarized as: do whatever makes pycodestyle happy. The coding style used by Black can be viewed as a strict subset of PEP 8. As for vertical whitespace, Black tries to render one full expression or simple statement per line. If this fits the allotted line length, great. py3 in: l 1, 2, 3, out: l 1, 2, 3 If not, Black will look at the contents of the first outer matching brackets and put that in a separate indented line. py3 in: ImportantClass.important_method(exc, limit, lookup_lines, capture_locals, extra_argument) out: ImportantClass.important_method( exc, limit, lookup_lines, capture_locals, extra_argument ) If that still doesn't fit the bill, it will decompose the internal expression further using the same rule, indenting matching brackets every time. If the contents of the matching brackets pair are comma separated (like an argument list, or a dict literal, and so on) then Black will first try to keep them on the same line with the matching brackets. If that doesn't work, it will put all of them in separate lines. py3 in: def very_important_function(template: str, variables, file: os.PathLike, engine: str, header: bool True, debug: bool False): Applies variables to the template and writes to file . with open(file, 'w') as f: ... out: def very_important_function( template: str, variables, file: os.PathLike, engine: str, header: bool True, debug: bool False, ): Applies variables to the template and writes to file . with open(file, w ) as f: ... You might have noticed that closing brackets are always dedented and that a trailing comma is always added. Such formatting produces smaller diffs; when you add or remove an element, it's always just one line. Also, having the closing bracket dedented provides a clear delimiter between two distinct sections of the code that otherwise share the same indentation level (like the arguments list and the docstring in the example above). If a data structure literal (tuple, list, set, dict) or a line of from imports cannot fit in the allotted length, it's always split into one element per line. This minimizes diffs as well as enables readers of code to find which commit introduced a particular entry. This also makes Black compatible with isort with the following configuration. A compatible .isort.cfg settings multi_line_output 3 include_trailing_comma True force_grid_wrap 0 use_parentheses True line_length 88 The equivalent command line is: $ isort multi line 3 trailing comma force grid wrap 0 use parentheses line width 88 file.py Line length You probably noticed the peculiar default line length. Black defaults to 88 characters per line, which happens to be 10% over 80. This number was found to produce significantly shorter files than sticking with 80 (the most popular), or even 79 (used by the standard library). In general, 90 ish seems like the wise choice . If you're paid by the line of code you write, you can pass line length with a lower number. Black will try to respect that. However, sometimes it won't be able to without breaking other rules. In those rare cases, auto formatted code will exceed your allotted limit. You can also increase it, but remember that people with sight disabilities find it harder to work with line lengths exceeding 100 characters. It also adversely affects side by side diff review on typical screen resolutions. Long lines also make it harder to present code neatly in documentation or talk slides. If you're using Flake8, you can bump max line length to 88 and forget about it. Alternatively, use Bugbear 's B950 warning instead of E501 and keep the max line length at 80 which you are probably already using. You'd do it like this: ini flake8 max line length 80 ... select C,E,F,W,B,B950 ignore E501 You'll find Black 's own .flake8 config file is configured like this. If you're curious about the reasoning behind B950, Bugbear's documentation explains it. The tl;dr is it's like highway speed limits, we won't bother you if you overdo it by a few km/h . Empty lines Black avoids spurious vertical whitespace. This is in the spirit of PEP 8 which says that in function vertical whitespace should only be used sparingly. Black will allow single empty lines inside functions, and single and double empty lines on module level left by the original editors, except when they're within parenthesized expressions. Since such expressions are always reformatted to fit minimal space, this whitespace is lost. It will also insert proper spacing before and after function definitions. It's one line before and after inner functions and two lines before and after module level functions and classes. Black will not put empty lines between function/class definitions and standalone comments that immediately precede the given function/class. Black will enforce single empty lines between a class level docstring and the first following field or method. This conforms to PEP 257 . Black won't insert empty lines after function docstrings unless that empty line is required due to an inner function starting immediately after. Trailing commas Black will add trailing commas to expressions that are split by comma where each element is on its own line. This includes function signatures. Unnecessary trailing commas are removed if an expression fits in one line. This makes it 1% more likely that your line won't exceed the allotted line length limit. Moreover, in this scenario, if you added another argument to your call, you'd probably fit it in the same line anyway. That doesn't make diffs any larger. One exception to removing trailing commas is tuple expressions with just one element. In this case Black won't touch the single trailing comma as this would unexpectedly change the underlying data type. Note that this is also the case when commas are used while indexing. This is a tuple in disguise: numpy_array 3, . One exception to adding trailing commas is function signatures containing , args , or kwargs . In this case a trailing comma is only safe to use on Python 3.6. Black will detect if your file is already 3.6+ only and use trailing commas in this situation. If you wonder how it knows, it looks for f strings and existing use of trailing commas in function signatures that have stars in them. In other words, if you'd like a trailing comma in this situation and Black didn't recognize it was safe to do so, put it there manually and Black will keep it. Strings Black prefers double quotes ( and ) over single quotes ( ' and ''' ). It will replace the latter with the former as long as it does not result in more backslash escapes than before. Black also standardizes string prefixes, making them always lowercase. On top of that, if your code is already Python 3.6+ only or it's using the unicode_literals future import, Black will remove u from the string prefix as it is meaningless in those scenarios. The main reason to standardize on a single form of quotes is aesthetics. Having one kind of quotes everywhere reduces reader distraction. It will also enable a future version of Black to merge consecutive string literals that ended up on the same line (see 26 for details). Why settle on double quotes? They anticipate apostrophes in English text. They match the docstring standard described in PEP 257 . An empty string in double quotes ( ) is impossible to confuse with a one double quote regardless of fonts and syntax highlighting used. On top of this, double quotes for strings are consistent with C which Python interacts a lot with. On certain keyboard layouts like US English, typing single quotes is a bit easier than double quotes. The latter requires use of the Shift key. My recommendation here is to keep using whatever is faster to type and let Black handle the transformation. If you are adopting Black in a large project with pre existing string conventions (like the popular single quotes for data, double quotes for human readable strings ), you can pass skip string normalization on the command line. This is meant as an adoption helper, avoid using this for new projects. Numeric literals Black standardizes most numeric literals to use lowercase letters for the syntactic parts and uppercase letters for the digits themselves: 0xAB instead of 0XAB and 1e10 instead of 1E10 . Python 2 long literals are styled as 2L instead of 2l to avoid confusion between l and 1 . Line breaks & binary operators Black will break a line before a binary operator when splitting a block of code over multiple lines. This is so that Black is compliant with the recent changes in the PEP 8 style guide, which emphasizes that this approach improves readability. This behaviour may raise W503 line break before binary operator warnings in style guide enforcement tools like Flake8. Since W503 is not PEP 8 compliant, you should tell Flake8 to ignore these warnings. Slices PEP 8 recommends to treat : in slices as a binary operator with the lowest priority, and to leave an equal amount of space on either side, except if a parameter is omitted (e.g. ham 1 + 1 : ). It also states that for extended slices, both : operators have to have the same amount of spacing, except if a parameter is omitted ( ham 1 + 1 :: ). Black enforces these rules consistently. This behaviour may raise E203 whitespace before ':' warnings in style guide enforcement tools like Flake8. Since E203 is not PEP 8 compliant, you should tell Flake8 to ignore these warnings. Parentheses Some parentheses are optional in the Python grammar. Any expression can be wrapped in a pair of parentheses to form an atom. There are a few interesting cases: if (...): while (...): for (...) in (...): assert (...), (...) from X import (...) assignments like: target (...) target: type (...) some, un, packing (...) augmented + (...) In those cases, parentheses are removed when the entire statement fits in one line, or if the inner expression doesn't have any delimiters to further split on. If there is only a single delimiter and the expression starts or ends with a bracket, the parenthesis can also be successfully omitted since the existing bracket pair will organize the expression neatly anyway. Otherwise, the parentheses are added. Please note that Black does not add or remove any additional nested parentheses that you might want to have for clarity or further code organization. For example those parentheses are not going to be removed: py3 return not (this or that) decision (maybe.this() and values > 0) or (maybe.that() and values Example pyproject.toml toml tool.black line length 88 target version 'py37' include '\.pyi?$' exclude ''' ( /( \.eggs exclude a few common directories in the \.git root of the project \.hg \.mypy_cache \.tox \.venv _build buck out build dist )/ foo.py also separately exclude a file named foo.py in the root of the project ) ''' Lookup hierarchy Command line options have defaults that you can see in help . A pyproject.toml can override those defaults. Finally, options provided by the user on the command line override both. Black will only ever use one pyproject.toml file during an entire run. It doesn't look for multiple files, and doesn't compose configuration from different levels of the file hierarchy. Editor integration Emacs Use proofit404/blacken or Elpy . PyCharm/IntelliJ IDEA 1. Install black . console $ pip install black 2. Locate your black installation folder. On macOS / Linux / BSD: console $ which black /usr/local/bin/black possible location On Windows: console $ where black %LocalAppData%\Programs\Python\Python36 32\Scripts\black.exe possible location 3. Open External tools in PyCharm/IntelliJ IDEA On macOS: PyCharm > Preferences > Tools > External Tools On Windows / Linux / BSD: File > Settings > Tools > External Tools 4. Click the + icon to add a new external tool with the following values: Name: Black Description: Black is the uncompromising Python code formatter. Program: Arguments: $FilePath$ 5. Format the currently opened file by selecting Tools > External Tools > black . Alternatively, you can set a keyboard shortcut by navigating to Preferences or Settings > Keymap > External Tools > External Tools Black . 6. Optionally, run Black on every file save: 1. Make sure you have the File Watcher plugin installed. 2. Go to Preferences or Settings > Tools > File Watchers and click + to add a new watcher: Name: Black File type: Python Scope: Project Files Program: Arguments: $FilePath$ Output paths to refresh: $FilePath$ Working directory: $ProjectFileDir$ Uncheck Auto save edited files to trigger the watcher Wing IDE Wing supports black via the OS Commands tool, as explained in the Wing documentation on pep8 formatting . The detailed procedure is: 1. Install black . console $ pip install black 2. Make sure it runs from the command line, e.g. console $ black help 3. In Wing IDE, activate the OS Commands panel and define the command black to execute black on the currently selected file: Use the Tools > OS Commands menu selection click on + in OS Commands > New: Command line.. Title: black Command Line: black %s I/O Encoding: Use Default Key Binding: F1 x Raise OS Commands when executed x Auto save files before execution x Line mode 4. Select a file in the editor and press F1 , or whatever key binding you selected in step 3, to reformat the file. Vim Commands and shortcuts: :Black to format the entire file (ranges not supported); :BlackUpgrade to upgrade Black inside the virtualenv; :BlackVersion to get the current version of Black inside the virtualenv. Configuration: g:black_fast (defaults to 0 ) g:black_linelength (defaults to 88 ) g:black_skip_string_normalization (defaults to 0 ) g:black_virtualenv (defaults to /.vim/black ) To install with vim plug : Plug 'python/black' or with Vundle : Plugin 'python/black' or you can copy the plugin from plugin/black.vim . Let me know if this requires any changes to work with Vim 8's builtin packadd , or Pathogen, and so on. This plugin requires Vim 7.0+ built with Python 3.6+ support . It needs Python 3.6 to be able to run Black inside the Vim process which is much faster than calling an external command. On first run, the plugin creates its own virtualenv using the right Python version and automatically installs Black . You can upgrade it later by calling :BlackUpgrade and restarting Vim. If you need to do anything special to make your virtualenv work and install Black (for example you want to run a version from master), create a virtualenv manually and point g:black_virtualenv to it. The plugin will use it. To run Black on save, add the following line to .vimrc or init.vim : autocmd BufWritePre .py execute ':Black' How to get Vim with Python 3.6? On Ubuntu 17.10 Vim comes with Python 3.6 by default. On macOS with Homebrew run: brew install vim with python3 . When building Vim from source, use: ./configure enable python3interp yes . There's many guides online how to do this. Visual Studio Code Use the Python extension ( instructions ). SublimeText 3 Use sublack plugin . Jupyter Notebook Magic Use blackcellmagic . Python Language Server If your editor supports the Language Server Protocol (Atom, Sublime Text, Visual Studio Code and many more), you can use the Python Language Server with the pyls black plugin. Atom/Nuclide Use python black . Other editors Other editors will require external contributions. Patches welcome! ✨ 🍰 ✨ Any tool that can pipe code through Black using its stdio mode (just use as the file name ). The formatted code will be returned on stdout (unless check was passed). Black will still emit messages on stderr but that shouldn't affect your use case. This can be used for example with PyCharm's or IntelliJ's File Watchers . blackd blackd is a small HTTP server that exposes Black 's functionality over a simple protocol. The main benefit of using it is to avoid paying the cost of starting up a new Black process every time you want to blacken a file. Usage blackd is not packaged alongside Black by default because it has additional dependencies. You will need to do pip install black d to install it. You can start the server on the default port, binding only to the local interface by running blackd . You will see a single line mentioning the server's version, and the host and port it's listening on. blackd will then print an access log similar to most web servers on standard output, merged with any exception traces caused by invalid formatting requests. blackd provides even less options than Black . You can see them by running blackd help : text Usage: blackd OPTIONS Options: bind host TEXT Address to bind the server to. bind port INTEGER Port to listen on version Show the version and exit. h, help Show this message and exit. There is no official blackd client tool (yet!). You can test that blackd is working using curl : blackd bind port 9090 & or let blackd choose a port curl s XPOST localhost:9090 d print('valid') Protocol blackd only accepts POST requests at the / path. The body of the request should contain the python source code to be formatted, encoded according to the charset field in the Content Type request header. If no charset is specified, blackd assumes UTF 8 . There are a few HTTP headers that control how the source is formatted. These correspond to command line flags for Black . There is one exception to this: X Protocol Version which if present, should have the value 1 , otherwise the request is rejected with HTTP 501 (Not Implemented). The headers controlling how code is formatted are: X Line Length : corresponds to the line length command line flag. X Skip String Normalization : corresponds to the skip string normalization command line flag. If present and its value is not the empty string, no string normalization will be performed. X Fast Or Safe : if set to fast , blackd will act as Black does when passed the fast command line flag. X Python Variant : if set to pyi , blackd will act as Black does when passed the pyi command line flag. Otherwise, its value must correspond to a Python version or a set of comma separated Python versions, optionally prefixed with py . For example, to request code that is compatible with Python 3.5 and 3.6, set the header to py3.5,py3.6 . If any of these headers are set to invalid values, blackd returns a HTTP 400 error response, mentioning the name of the problematic header in the message body. Apart from the above, blackd can produce the following response codes: HTTP 204 : If the input is already well formatted. The response body is empty. HTTP 200 : If formatting was needed on the input. The response body contains the blackened Python code, and the Content Type header is set accordingly. HTTP 400 : If the input contains a syntax error. Details of the error are returned in the response body. HTTP 500 : If there was any kind of error while trying to format the input. The response body contains a textual representation of the error. Version control integration Use pre commit . Once you have it installed , add this to the .pre commit config.yaml in your repository: yaml repos: repo: rev: stable hooks: id: black language_version: python3.6 Then run pre commit install and you're ready to go. Avoid using args in the hook. Instead, store necessary configuration in pyproject.toml so that editors and command line usage of Black all behave consistently for your project. See Black 's own pyproject.toml (/pyproject.toml) for an example. If you're already using Python 3.7, switch the language_version accordingly. Finally, stable is a tag that is pinned to the latest release on PyPI. If you'd rather run on master, this is also an option. Ignoring unmodified files Black remembers files it has already formatted, unless the diff flag is used or code is passed via standard input. This information is stored per user. The exact location of the file depends on the Black version and the system on which Black is run. The file is non portable. The standard location on common operating systems is: Windows: C:\\Users\ \AppData\Local\black\black\Cache\ \cache. . .pickle macOS: /Users/ /Library/Caches/black/ /cache. . .pickle Linux: /home/ /.cache/black/ /cache. . .pickle file mode is an int flag that determines whether the file was formatted as 3.6+ only, as .pyi, and whether string normalization was omitted. To override the location of these files on macOS or Linux, set the environment variable XDG_CACHE_HOME to your preferred location. For example, if you want to put the cache in the directory you're running Black from, set XDG_CACHE_HOME .cache . Black will then write the above files to .cache/black/ / . Used by The following notable open source projects trust Black with enforcing a consistent code style: pytest, tox, Pyramid, Django Channels, Hypothesis, attrs, SQLAlchemy, Poetry, PyPA applications (Warehouse, Pipenv, virtualenv), every Datadog Agent Integration. Are we missing anyone? Let us know. Testimonials Dusty Phillips , writer : > Black is opinionated so you don't have to be. Hynek Schlawack , creator of attrs , core developer of Twisted and CPython: > An auto formatter that doesn't suck is all I want for Xmas! Carl Meyer , Django core developer: > At least the name is good. Kenneth Reitz , creator of requests and pipenv : > This vastly improves the formatting of our code. Thanks a ton! Show your style Use the badge in your project's README.md: markdown Code style: black Using the badge in README.rst: .. image:: :target: Looks like this: Code style: black License MIT Contributing to Black In terms of inspiration, Black is about as configurable as gofmt . This is deliberate. Bug reports and fixes are always welcome! However, before you suggest a new feature or configuration knob, ask yourself why you want it. If it enables better integration with some workflow, fixes an inconsistency, speeds things up, and so on go for it! On the other hand, if your answer is because I don't like a particular formatting then you're not ready to embrace Black yet. Such changes are unlikely to get accepted. You can still try but prepare to be disappointed. More details can be found in CONTRIBUTING (CONTRIBUTING.md). Change Log 19.5b0 added black c as a way to format code passed from the command line ( 761) safe now works with Python 2 code ( 840) fixed grammar selection for Python 2 specific code ( 765) fixed feature detection for trailing commas in function definitions and call sites ( 763) Black can now format async generators ( 593) Black no longer crashes on Windows machines with more than 61 cores ( 838) Black no longer crashes on standalone comments prepended with a backslash ( 767) Black no longer crashes on from ... import blocks with comments ( 829) removed unnecessary parentheses around yield expressions ( 834) added parentheses around long tuples in unpacking assignments ( 832) fixed bug that led Black format some code with a line length target of 1 ( 762) Black no longer introduces quotes in f string subexpressions on string boundaries ( 863) 19.3b0 new option target version to control which Python versions Black formatted code should target ( 618) deprecated py36 (use target version py36 instead) ( 724) Black no longer normalizes numeric literals to include _ separators ( 696) long del statements are now split into multiple lines ( 698) type comments are no longer mangled in function signatures improved performance of formatting deeply nested data structures ( 509) Black now properly formats multiple files in parallel on Windows ( 632) Black now creates cache files atomically which allows it to be used in parallel pipelines (like xargs P8 ) ( 673) Black now correctly indents comments in files that were previously formatted with tabs ( 262) blackd now supports CORS ( 622) 18.9b0 numeric literals are now formatted by Black ( 452, 461, 464, 469): numeric literals are normalized to include _ separators on Python 3.6+ code added skip numeric underscore normalization to disable the above behavior and leave numeric underscores as they were in the input code with _ in numeric literals is recognized as Python 3.6+ most letters in numeric literals are lowercased (e.g., in 1e10 , 0x01 ) hexadecimal digits are always uppercased (e.g. 0xBADC0DE ) added blackd , see its documentation ( blackd) for more info ( 349) adjacent string literals are now correctly split into multiple lines ( 463) trailing comma is now added to single imports that don't fit on a line ( 250) cache is now populated when check is successful for a file which speeds up consecutive checks of properly formatted unmodified files ( 448) whitespace at the beginning of the file is now removed ( 399) fixed mangling pweave and Spyder IDE special comments ( 532) fixed unstable formatting when unpacking big tuples ( 267) fixed parsing of __future__ imports with renames ( 389) fixed scope of fmt: off when directly preceding yield and other nodes ( 385) fixed formatting of lambda expressions with default arguments ( 468) fixed async for statements: Black no longer breaks them into separate lines ( 372) note: the Vim plugin stopped registering , as a default chord as it turned out to be a bad idea ( 415) 18.6b4 hotfix: don't freeze when multiple comments directly precede fmt: off ( 371) 18.6b3 typing stub files ( .pyi ) now have blank lines added after constants ( 340) fmt: off and fmt: on are now much more dependable: they now work also within bracket pairs ( 329) they now correctly work across function/class boundaries ( 335) they now work when an indentation block starts with empty lines or misaligned comments ( 334) made Click not fail on invalid environments; note that Click is right but the likelihood we'll need to access non ASCII file paths when dealing with Python source code is low ( 277) fixed improper formatting of f strings with quotes inside interpolated expressions ( 322) fixed unnecessary slowdown when long list literals where found in a file fixed unnecessary slowdown on AST nodes with very many siblings fixed cannibalizing backslashes during string normalization fixed a crash due to symbolic links pointing outside of the project directory ( 338) 18.6b2 added config ( 65) added h equivalent to help ( 316) fixed improper unmodified file caching when S was used fixed extra space in string unpacking ( 305) fixed formatting of empty triple quoted strings ( 313) fixed unnecessary slowdown in comment placement calculation on lines without comments 18.6b1 hotfix: don't output human facing information on stdout ( 299) hotfix: don't output cake emoji on non zero return code ( 300) 18.6b0 added include and exclude ( 270) added skip string normalization ( 118) added verbose ( 283) the header output in diff now actually conforms to the unified diff spec fixed long trivial assignments being wrapped in unnecessary parentheses ( 273) fixed unnecessary parentheses when a line contained multiline strings ( 232) fixed stdin handling not working correctly if an old version of Click was used ( 276) Black now preserves line endings when formatting a file in place ( 258) 18.5b1 added pyi ( 249) added py36 ( 249) Python grammar pickle caches are stored with the formatting caches, making Black work in environments where site packages is not user writable ( 192) Black now enforces a PEP 257 empty line after a class level docstring (and/or fields) and the first method fixed invalid code produced when standalone comments were present in a trailer that was omitted from line splitting on a large expression ( 237) fixed optional parentheses being removed within fmt: off sections ( 224) fixed invalid code produced when stars in very long imports were incorrectly wrapped in optional parentheses ( 234) fixed unstable formatting when inline comments were moved around in a trailer that was omitted from line splitting on a large expression ( 238) fixed extra empty line between a class declaration and the first method if no class docstring or fields are present ( 219) fixed extra empty line between a function signature and an inner function or inner class ( 196) 18.5b0 call chains are now formatted according to the fluent interfaces style ( 67) data structure literals (tuples, lists, dictionaries, and sets) are now also always exploded like imports when they don't fit in a single line ( 152) slices are now formatted according to PEP 8 ( 178) parentheses are now also managed automatically on the right hand side of assignments and return statements ( 140) math operators now use their respective priorities for delimiting multiline expressions ( 148) optional parentheses are now omitted on expressions that start or end with a bracket and only contain a single operator ( 177) empty parentheses in a class definition are now removed ( 145, 180) string prefixes are now standardized to lowercase and u is removed on Python 3.6+ only code and Python 2.7+ code with the unicode_literals future import ( 188, 198, 199) typing stub files ( .pyi ) are now formatted in a style that is consistent with PEP 484 ( 207, 210) progress when reformatting many files is now reported incrementally fixed trailers (content with brackets) being unnecessarily exploded into their own lines after a dedented closing bracket ( 119) fixed an invalid trailing comma sometimes left in imports ( 185) fixed non deterministic formatting when multiple pairs of removable parentheses were used ( 183) fixed multiline strings being unnecessarily wrapped in optional parentheses in long assignments ( 215) fixed not splitting long from imports with only a single name fixed Python 3.6+ file discovery by also looking at function calls with unpacking. This fixed non deterministic formatting if trailing commas where used both in function signatures with stars and function calls with stars but the former would be reformatted to a single line. fixed crash on dealing with optional parentheses ( 193) fixed is , is not , in , and not in not considered operators for splitting purposes fixed crash when dead symlinks where encountered 18.4a4 don't populate the cache on check ( 175) 18.4a3 added a cache ; files already reformatted that haven't changed on disk won't be reformatted again ( 109) check and diff are no longer mutually exclusive ( 149) generalized star expression handling, including double stars; this fixes multiplication making expressions unsafe for trailing commas ( 132) Black no longer enforces putting empty lines behind control flow statements ( 90) Black now splits imports like Mode 3 + trailing comma of isort ( 127) fixed comment indentation when a standalone comment closes a block ( 16, 32) fixed standalone comments receiving extra empty lines if immediately preceding a class, def, or decorator ( 56, 154) fixed diff not showing entire path ( 130) fixed parsing of complex expressions after star and double stars in function calls ( 2) fixed invalid splitting on comma in lambda arguments ( 133) fixed missing splits of ternary expressions ( 141) 18.4a2 fixed parsing of unaligned standalone comments ( 99, 112) fixed placement of dictionary unpacking inside dictionary literals ( 111) Vim plugin now works on Windows, too fixed unstable formatting when encountering unnecessarily escaped quotes in a string ( 120) 18.4a1 added quiet ( 78) added automatic parentheses management ( 4) added pre commit integration ( 103, 104) fixed reporting on check with multiple files ( 101, 102) fixed removing backslash escapes from raw strings ( 100, 105) 18.4a0 added diff ( 87) add line breaks before all delimiters, except in cases like commas, to better comply with PEP 8 ( 73) standardize string literals to use double quotes (almost) everywhere ( 75) fixed handling of standalone comments within nested bracketed expressions; Black will no longer produce super long lines or put all standalone comments at the end of the expression ( 22) fixed 18.3a4 regression: don't crash and burn on empty lines with trailing whitespace ( 80) fixed 18.3a4 regression: yapf: disable usage as trailing comment would cause Black to not emit the rest of the file ( 95) when CTRL+C is pressed while formatting many files, Black no longer freaks out with a flurry of asyncio related exceptions only allow up to two empty lines on module level and only single empty lines within functions ( 74) 18.3a4 fmt: off and fmt: on are implemented ( 5) automatic detection of deprecated Python 2 forms of print statements and exec statements in the formatted file ( 49) use proper spaces for complex expressions in default values of typed function arguments ( 60) only return exit code 1 when check is used ( 50) don't remove single trailing commas from square bracket indexing ( 59) don't omit whitespace if the previous factor leaf wasn't a math operator ( 55) omit extra space in kwarg unpacking if it's the first argument ( 46) omit extra space in Sphinx auto attribute comments ( 68) 18.3a3 don't remove single empty lines outside of bracketed expressions ( 19) added ability to pipe formatting from stdin to stdin ( 25) restored ability to format code with legacy usage of async as a name ( 20, 42) even better handling of numpy style array indexing ( 33, again) 18.3a2 changed positioning of binary operators to occur at beginning of lines instead of at the end, following a recent change to PEP 8 ( 21) ignore empty bracket pairs while splitting. This avoids very weirdly looking formattings ( 34, 35) remove a trailing comma if there is a single argument to a call if top level functions were separated by a comment, don't put four empty lines after the upper function fixed unstable formatting of newlines with imports fixed unintentional folding of post scriptum standalone comments into last statement if it was a simple statement ( 18, 28) fixed missing space in numpy style array indexing ( 33) fixed spurious space after star based unary expressions ( 31) 18.3a1 added check only put trailing commas in function signatures and calls if it's safe to do so. If the file is Python 3.6+ it's always safe, otherwise only safe if there are no args or kwargs used in the signature or call. ( 8) fixed invalid spacing of dots in relative imports ( 6, 13) fixed invalid splitting after comma on unpacked variables in for loops ( 23) fixed spurious space in parenthesized set expressions ( 7) fixed spurious space after opening parentheses and in default arguments ( 14, 17) fixed spurious space after unary operators when the operand was a complex expression ( 15) 18.3a0 first published version, Happy 🍰 Day 2018! alpha quality date versioned (see: Authors Glued together by Łukasz Langa (mailto:lukasz@langa.pl). Maintained with Carol Willing (mailto:carolcode@willingconsulting.com), Carl Meyer (mailto:carl@oddbird.net), Jelle Zijlstra (mailto:jelle.zijlstra@gmail.com), Mika Naylor (mailto:mail@autophagy.io), and Zsolt Dollenstein (mailto:zsol.zsol@gmail.com). Multiple contributions by: Anthony Sottile (mailto:asottile@umich.edu) Artem Malyshev (mailto:proofit404@gmail.com) Benjamin Woodruff (mailto:github@benjam.info) Christian Heimes (mailto:christian@python.org) Daniel M. Capella (mailto:polycitizen@gmail.com) Eli Treuherz (mailto:eli@treuherz.com) hauntsaninja Hugo van Kemenade Ivan Katanić (mailto:ivan.katanic@gmail.com) Jason Fried (mailto:me@jasonfried.info) Jonas Obrist (mailto:ojiidotch@gmail.com) Luka Sterbic (mailto:luka.sterbic@gmail.com) Miguel Gaiowski (mailto:miggaiowski@gmail.com) Miroslav Shubernetskiy (mailto:miroslav@miki725.com) Neraste (mailto:neraste.herr10@gmail.com) Osaetin Daniel (mailto:osaetindaniel@gmail.com) Peter Bengtsson (mailto:mail@peterbe.com) Stavros Korokithakis (mailto:hi@stavros.io) Sunil Kapil (mailto:snlkapil@gmail.com) Utsav Shah (mailto:ukshah2@illinois.edu) Vishwas B Sharma (mailto:sharma.vishwas88@gmail.com) Chuck Wooters (mailto:chuck.wooters@microsoft.com)",Unknown,Unknown 235,Unknown,Unknown,Unknown,"This is Python version 3.9.0 alpha 0 .. image:: :alt: CPython build status on Travis CI :target: .. image:: :alt: CPython build status on Appveyor :target: .. image:: :alt: CPython build status on Azure DevOps :target: .. image:: :alt: CPython code coverage on Codecov :target: .. image:: :alt: Python Zulip chat :target: Copyright (c) 2001 2019 Python Software Foundation. All rights reserved. See the end of this file for further copyright and license information. .. contents:: General Information Website: Source code: Issue tracker: Documentation: Developer's Guide: Contributing to CPython For more complete instructions on contributing to CPython development, see the Developer Guide _. .. _Developer Guide: Using Python Installable Python kits, and information about using Python, are available at python.org _. .. _python.org: Build Instructions On Unix, Linux, BSD, macOS, and Cygwin:: ./configure make make test sudo make install This will install Python as python3 . You can pass many options to the configure script; run ./configure help to find out more. On macOS and Cygwin, the executable is called python.exe ; elsewhere it's just python . If you are running on macOS with the latest updates installed, make sure to install OpenSSL or some other SSL software along with Homebrew or another package manager. If issues persist, see for more information. On macOS, if you have configured Python with enable framework , you should use make frameworkinstall to do the installation. Note that this installs the Python executable in a place that is not normally on your PATH, you may want to set up a symlink in /usr/local/bin . On Windows, see PCbuild/readme.txt _. If you wish, you can create a subdirectory and invoke configure from there. For example:: mkdir debug cd debug ../configure with pydebug make make test (This will fail if you also built at the top level directory. You should do a make clean at the top level first.) To get an optimized build of Python, configure enable optimizations before you run make . This sets the default make targets up to enable Profile Guided Optimization (PGO) and may be used to auto enable Link Time Optimization (LTO) on some platforms. For more details, see the sections below. Profile Guided Optimization ^^^^^^^^^^^^^^^^^^^^^^^^^^^ PGO takes advantage of recent versions of the GCC or Clang compilers. If used, either via configure enable optimizations or by manually running make profile opt regardless of configure flags, the optimized build process will perform the following steps: The entire Python directory is cleaned of temporary files that may have resulted from a previous compilation. An instrumented version of the interpreter is built, using suitable compiler flags for each flavour. Note that this is just an intermediary step. The binary resulting from this step is not good for real life workloads as it has profiling instructions embedded inside. After the instrumented interpreter is built, the Makefile will run a training workload. This is necessary in order to profile the interpreter execution. Note also that any output, both stdout and stderr, that may appear at this step is suppressed. The final step is to build the actual interpreter, using the information collected from the instrumented one. The end result will be a Python binary that is optimized; suitable for distribution or production installation. Link Time Optimization ^^^^^^^^^^^^^^^^^^^^^^ Enabled via configure's with lto flag. LTO takes advantage of the ability of recent compiler toolchains to optimize across the otherwise arbitrary .o file boundary when building final executables or shared libraries for additional performance gains. What's New We have a comprehensive overview of the changes in the What's New in Python 3.9 _ document. For a more detailed change log, read Misc/NEWS _, but a full accounting of changes can only be gleaned from the commit history _. If you want to install multiple versions of Python, see the section below entitled Installing multiple versions . Documentation Documentation for Python 3.9 _ is online, updated daily. It can also be downloaded in many formats for faster access. The documentation is downloadable in HTML, PDF, and reStructuredText formats; the latter version is primarily for documentation authors, translators, and people with special formatting requirements. For information about building Python's documentation, refer to Doc/README.rst _. Converting From Python 2.x to 3.x Significant backward incompatible changes were made for the release of Python 3.0, which may cause programs written for Python 2 to fail when run with Python 3. For more information about porting your code from Python 2 to Python 3, see the Porting HOWTO _. Testing To test the interpreter, type make test in the top level directory. The test set produces some output. You can generally ignore the messages about skipped tests due to optional features which can't be imported. If a message is printed about a failed test or a traceback or core dump is produced, something is wrong. By default, tests are prevented from overusing resources like disk space and memory. To enable these tests, run make testall . If any tests fail, you can re run the failing test(s) in verbose mode. For example, if test_os and test_gdb failed, you can run:: make test TESTOPTS v test_os test_gdb If the failure persists and appears to be a problem with Python rather than your environment, you can file a bug report _ and include relevant output from that command to show the issue. See Running & Writing Tests _ for more on running tests. Installing multiple versions On Unix and Mac systems if you intend to install multiple versions of Python using the same installation prefix ( prefix argument to the configure script) you must take care that your primary python executable is not overwritten by the installation of a different version. All files and directories installed using make altinstall contain the major and minor version and can thus live side by side. make install also creates ${prefix}/bin/python3 which refers to ${prefix}/bin/pythonX.Y . If you intend to install multiple versions using the same prefix you must decide which version (if any) is your primary version. Install that version using make install . Install all other versions using make altinstall . For example, if you want to install Python 2.7, 3.6, and 3.9 with 3.9 being the primary version, you would execute make install in your 3.9 build directory and make altinstall in the others. Issue Tracker and Mailing List Bug reports are welcome! You can use the issue tracker _ to report bugs, and/or submit pull requests on GitHub _. You can also follow development discussion on the python dev mailing list _. Proposals for enhancement If you have a proposal to change Python, you may want to send an email to the comp.lang.python or python ideas _ mailing lists for initial feedback. A Python Enhancement Proposal (PEP) may be submitted if your idea gains ground. All current PEPs, as well as guidelines for submitting a new PEP, are listed at python.org/dev/peps/ _. .. _python ideas: Release Schedule See :pep: 596 for Python 3.9 release details. Copyright and License Information Copyright (c) 2001 2019 Python Software Foundation. All rights reserved. Copyright (c) 2000 BeOpen.com. All rights reserved. Copyright (c) 1995 2001 Corporation for National Research Initiatives. All rights reserved. Copyright (c) 1991 1995 Stichting Mathematisch Centrum. All rights reserved. See the file LICENSE for information on the history of this software, terms & conditions for usage, and a DISCLAIMER OF ALL WARRANTIES. This Python distribution contains no GNU General Public License (GPL) code, so it may be used in proprietary projects. There are interfaces to some GNU code but these are entirely optional. All trademarks referenced herein are property of their respective holders.",Unknown,Unknown 236,Unknown,Unknown,Unknown,"PythonBrasil 8 website This is the source code of PythonBrasil 8 's website. The 8th edition of the conference will happen during November 2012, in Rio de Janeiro, Brazil. The official website is under construction and can be seen at: Since its begining, this website has been developed as open source by volunteers, using mainly: Python Django One of our major concerns was reusability of code. Due to this, we decided to develop and improve a conference Django app, called Mittun: The aim of this app is to provide features useful to any conference website. One of the qualities of Mittun, when compared to other similar projects, is its test coverage. Whenever possible, TDD (Test Driven Development) was used. Install From the command line: :: $ make deps Running tests This project has both server side and client side tests. To run the test suites do: :: make test make jasmine Contribute We need your help! Please, report bugs and share patches, based on our Issues _. Don't worry if you can't assign an issue to yourself, simply comment that you'll be working on it. Please, if we don't do it, add or remind us of adding your name to the contributors.txt. Thank you! Build status .. image:: :target:",Unknown,Unknown 237,Unknown,Unknown,Unknown,"A git repository for PyCon Korea 2015 What is this for? For issue handling of PyConr Korea 2015. All codes about PyCon Korea 2015. Allow any languages, but don't know can understand or not. :) Members @bloodevil @darjeeling @lexifdev @miyunim @OuO @tebica @corazzon @lqez @scari Roles Communicate officially (@darjeeling) Media (@lexifdev, @tebica) Volunteers (@OuO) Sponsors (@darjeeling) Speakers (@tebica) Issue Handling (@bloodevil) Sprint / Tutorial (@lexifdev) Venue (@bloodevil) Budget (@bloodevil) Homepage (@lqez) Design (@lqez) Goods (@miyunim, @corazzon) Tools Slack Mailing (pyconkr@googlegroups.com, pycon@pycon.kr) Code and Issue tracking Google Drive Amazon Web Service",Unknown,Unknown 238,Unknown,Unknown,Unknown,"Udacity student projects Projects created by students of Udacity, the XXI century university. classic games Student created games, written while learning object oriented programming at Udacity tools Tools and addons that improve usability",Unknown,Unknown 239,Unknown,Unknown,Unknown,".. image:: :target: :width: 212px :align: center :alt: Zipline Gitter version status travis status appveyor status Coverage Status Zipline is a Pythonic algorithmic trading library. It is an event driven system for backtesting. Zipline is currently used in production as the backtesting and live trading engine powering Quantopian _ a free, community centered, hosted platform for building and executing trading strategies. Join our Community! _ Documentation _ Want to Contribute? See our Development Guidelines _ Features Ease of Use: Zipline tries to get out of your way so that you can focus on algorithm development. See below for a code example. Batteries Included : many common statistics like moving average and linear regression can be readily accessed from within a user written algorithm. PyData Integration: Input of historical data and output of performance statistics are based on Pandas DataFrames to integrate nicely into the existing PyData ecosystem. Statistics and Machine Learning Libraries: You can use libraries like matplotlib, scipy, statsmodels, and sklearn to support development, analysis, and visualization of state of the art trading systems. Installation Installing With pip Assuming you have all required (see note below) non Python dependencies, you can install Zipline with pip via: .. code block:: bash $ pip install zipline Note: Installing Zipline via pip is slightly more involved than the average Python package. Simply running pip install zipline will likely fail if you've never installed any scientific Python packages before. There are two reasons for the additional complexity: 1. Zipline ships several C extensions that require access to the CPython C API. In order to build the C extensions, pip needs access to the CPython header files for your Python installation. 2. Zipline depends on numpy _, the core library for numerical array computing in Python. Numpy depends on having the LAPACK _ linear algebra routines available. Because LAPACK and the CPython headers are binary dependencies, the correct way to install them varies from platform to platform. On Linux, users generally acquire these dependencies via a package manager like apt , yum , or pacman . On OSX, Homebrew _ is a popular choice providing similar functionality. See the full Zipline Install Documentation _ for more information on acquiring binary dependencies for your specific platform. conda Another way to install Zipline is via the conda package manager, which comes as part of Anaconda _ or can be installed via pip install conda . Once set up, you can install Zipline from our Quantopian channel: .. code block:: bash $ conda install c Quantopian zipline Currently supported platforms include: GNU/Linux 64 bit OSX 64 bit Windows 64 bit .. note:: Windows 32 bit may work; however, it is not currently included in continuous integration tests. Quickstart See our getting started tutorial _. The following code implements a simple dual moving average algorithm. .. code:: python from zipline.api import order_target, record, symbol def initialize(context): context.i 0 context.asset symbol('AAPL') def handle_data(context, data): Skip first 300 days to get full windows context.i + 1 if context.i long_mavg: order_target orders as many shares as needed to achieve the desired number of shares. order_target(context.asset, 100) elif short_mavg __ API key to ingest the default data bundle. Once you have your key, run the following from the command line: .. code:: bash $ QUANDL_API_KEY zipline ingest b quandl $ zipline run f dual_moving_average.py start 2014 1 1 end 2018 1 1 o dma.pickle This will download asset pricing data data from quandl , and stream it through the algorithm over the specified time range. Then, the resulting performance DataFrame is saved in dma.pickle , which you can load an analyze from within Python. You can find other examples in the zipline/examples directory. Questions? If you find a bug, feel free to open an issue _ and fill out the issue template. Contributing All contributions, bug reports, bug fixes, documentation improvements, enhancements, and ideas are welcome. Details on how to set up a development environment can be found in our development guidelines _. If you are looking to start working with the Zipline codebase, navigate to the GitHub issues tab and start looking through interesting issues. Sometimes there are issues labeled as Beginner Friendly _ or Help Wanted _. Feel free to ask questions on the mailing list _ or on Gitter _. .. Gitter image:: :target: .. version status image:: :target: .. travis status image:: :target: .. appveyor status image:: :target: .. Coverage Status image:: :target: .. _ Zipline Install Documentation :",Unknown,Unknown 240,Unknown,Unknown,Unknown,"Faker Faker makes fake data. It looks like this: >>> import faker >>> faker.name.name() 'Gregory Spinka III' >>> faker.internet.email() 'jayne@miller.com' >>> faker.company.name() 'Barton, Heller and Considine' >>> faker.address.street_address() '106810 Leannon Drive' If you are a Django developer, you might also be interested in django poseur, which provides some higher level tools for Django built on this library.",Unknown,Unknown 241,Unknown,Unknown,Unknown,"mimic ab using Unicode to create tragedy Introduction mimic provokes: fun frustration curiosity murderous rage It's inspired by this terrible idea floating around: MT: Replace a semicolon (;) with a greek question mark (;) in your friend& 39;s C code and watch them pull their hair out over the syntax error — Peter Ritchie (@peterritchie) November 16, 2014 There are many more characters in the Unicode character set that look, to some extent or another, like others – homoglyphs. Mimic substitutes common ASCII characters for obscure homoglyphs. Fun games to play with mimic: Pipe some source code through and see if you can find all of the problems Pipe someone else's source code through without telling them Be fired, and then killed Results Observe the mayhem: BUT WHY? Or, if you've been mimicked a little harder, Discussion People have noticed how terrible this is. SlashDot Reddit ycombinator BoingBoing The Register Further Reading mimic wiki",Unknown,Unknown 242,Unknown,Unknown,Unknown,"nbformatjs Reference implementation of the Jupyter Notebook format If you just want the schemas for v3 and v4, check out nbschema instead.",Unknown,Unknown 243,Unknown,Unknown,Unknown,"A prototype for learning pitching. Allopws you to scan qrcodes generated by the app and collect them. Real life treasure hunt of sorts. Start with pip install r requirements.txt (hopefully in a virtual env) If you use postgresql as your databse, install psycopg2 Afterwards qrtrack deploy command wil be working which allows you to deploy django app. Configure to your needs and startup like always.",Unknown,Unknown 244,Unknown,Unknown,Unknown,"Learning Scipy This repository contains source code programs and some notes to complement the book about the scientific Python module SciPy entitle Learning SciPy for Numerical and Scientific Computing Second Edition (2015) , of Pack Publishing authored by Sergio J. Rojas G. , Erik A Christensen, and Francisco J. Blanco Silva . This repository is maintained by one of the authors, Sergio Rojas . The functionality of SciPy covered in the aforementioned book can be used as either a complement for a full course about scientific computing with Python or as an introduction to Scipy . Consequently, feel free to reuse and/or modify the content presented here for your own teaching needs, though proper acknowledgement of the source content will be highly appreciated. As we would like this reference material to be improved over time, we encourage you to contribute changes or corrections, which will be reviewed ,edited, and properly acknowledged by the maintainer of this site. The Following web links bring you to the nbviewer static (HTML) display corresponding to the respective IPython notebooks included with each chapter of the book (which you could download following each chapter link already shown above): >> Chapter 1. _Introduction to SciPy_ >> Chapter 2. _Working with Numpy Array as a first step to Scipy_ >> Chapter 3. _SciPy for Linear Algebra_ >> Chapter 4. SciPy for Numerical Analysis_ >> Chapter 5. _SciPy for Signal Processing_ >> Chapter 6. _SciPy for Data Mining_ >> Chapter 7. _SciPy for Computational Geometry_ >> Chapter 8. _Interaction with Other Languages_ Otros archivos en Learning Scipy > _Análisis Numérico y Cómputo Científico vía el IPython Notebook_ We are on Twitter : Twitter",Unknown,Unknown 245,Unknown,Unknown,Unknown,"Django Template i18n lint Build Status Coverage Status PyPI version PyPI Downloads A simple script to find non i18n text in a Django template. It can also automatically wrap the strings in {% trans %} tags, by running it with the r command line flag. The translation will be written to a new file, _translated.html . For more info see Lint tool to find non i18n strings in a django template Code is copyright Rory McCann 2013, and dual licenced under the GNU GPL version3 (or at your option a later version), and the BSD licence. See the files LICENCE.GPLv3 and LICENCE.BSD for more information Bitdeli Badge",Unknown,Unknown 246,Unknown,Unknown,Unknown,"Sublime Tweet Sublime Tweet is an open source Twitter plugin for Sublime Text 3 sublime editor. It allows you to read and post tweets right from our favorite Sublime Text 3! After 2 years of silence, I'm finally updating it to support ST3 and Python 3.3! Updates (11 Feb 2014) Ported to Sublime Text 3, replaced underlying Twitter library, removed proxy support. (29 Feb 2012) Applied some patches, added Back button. (28 Oct 2011) Added proxy availability detection (now if you're using proxy at work you shouldn't turn it off at home, Sublime Tweet will handle it). Timeline now downloads in background and Sublime Text never freezes. (27 Oct 2011) Added proxy support to authorization (25 Oct 2011) Added proxy support (23 Oct 2011) Added Related tweets and Mark new tweets features (22 Oct 2011) Sublime Tweet now can favorite , retweet , reply and open URLs from tweets! Installation With Package Control If you have the Package Control package_control installed, you can install Sublime Tweet from inside Sublime Text. Open the Command Palette and select Package Control: Install Package , then search for Sublime Tweet and you're done! Without Package Control Install Package Control package_control first. Then install Sublime Tweet. How to tweet Tools → Tweet Tweet in Command Palette { keys : ctrl+alt+shift+t , command : tweet } How to read your public timeline, favorite , retweet , reply or open URLs Tools → Read twitter timeline Twitter timeline in Command Palette { keys : ctrl+shift+c , command : read_tweets } You'll see a list of latest tweets in your timeline. Just hit Enter on a tweet to favorite, retweet, reply, or open an URL from the tweet. First run On the first run of Sublime Tweet you will be prompted to authorize it, don't be surprised. Default browser will be opened automatically, copy code from the page and paste it to text prompt in Sublime Text (on the bottom of the screen). You should only do it once. Donate If you like Sublime Tweet you can donate donate to the author (via PayPal). sublime : package_control : donate : https://www.paypal.com/cgi bin/webscr?cmd _donations&business TVLQ2XQGFDS6Y&lc US&item_name Sublime%20Tweet&item_number SublimeTweet¤cy_code USD&bn PP%2dDonationsBF%3abtn_donate_SM%2egif%3aNonHosted",Unknown,Unknown 247,Unknown,Unknown,Unknown,"Inter Inter is a typeface specially designed for user interfaces with focus on high legibility of small to medium sized text on computer screens. The family features a tall x height to aid in readability of mixed case and lower case text. Several OpenType features are provided as well, like contextual alternates that adjusts punctuation depending on the shape of surrounding glyphs, slashed zero for when you need to disambiguate 0 from o , tabular numbers, etc. Sample (docs/res/sample.png) ⬇︎ Download the latest release After downloading the zip from above: 1. Double click the downloaded zip file to unpack or open it. 2. Follow the instructions in install mac.txt or install win.txt , depending on what operating system you're using. Design Inter is similar to Roboto, San Francisco, Akkurat, Asap, Lucida Grande and other UI and Text typefaces. Some trade offs were made in order to make this typeface work really well at small sizes: Currently not suitable for very large sizes because of some small scale glyph optimizations (like pits and traps ) that help rasterization at small sizes but stand out and interfere at large sizes. Rasterized at sizes below 12px, some stems—like the horizontal center of E , F , or vertical center of m —are drawn with two semi opaque pixels instead of one solid. This is because we prioritize (optimize for) higher density rasterizations. If we move these stems to an off center position—so that they can be drawn sharply at e.g. 11px—text will be less legible at higher resolutions. Current font styles: Name Weight class Thin 100 Thin Italic 100 Extra Light 200 Extra Light Italic 200 Light 300 Light Italic 300 Regular 400 Italic 400 Medium 500 Medium Italic 500 Semi Bold 600 Semi Bold Italic 600 Bold 700 Bold Italic 700 Extra Bold 800 Extra Bold Italic 800 Black 900 Black Italic 900 Inter also ships as a variable font. Font metrics This font was originally designed to work at a specific size: 11px. Thus, the Units per EM ) (UPM) is defined in such a way that a power of two multiple of one EM unit ends up at an integer value compared to a pixel. Most fonts are designed with a UPM of either 1000 or 2048. Because of this we picked a value that is as high as possible but also as close as possible to one of those common values (since it's reasonable to assume that some layout engines and rasterizers are optimized for those value magnitudes.) We ended up picking a UPM of 2816 which equates to exactly 256 units per pixel when rasterized for size 11pt at 1x scale. This also means that when rasterized at power of two scales (like 2x and 4x) the number of EM units corresponding to a pixel is an integer (128 units for 2x, 64 for 4x, and so on.) However, as the project progressed and the typeface was put into use, it quickly became clear that for anything longer than a short word, it was actually hard to read the almost monotonically spaced letters. A second major revision was created where the previously strict rule of geometry being even multiples of 256 was relaxed and now the rule is try to stick with 128x, if you can't, stick with 64x and if you can't do that either, never go below 16x. This means that Inter is now much more variable in pace than it used to be, making it work better at higher resolutions and work much better in longer text, but losing some contrast and sharpness at small sizes. ! Metrics (docs/res/metrics.png) The glyphs are designed based on this plan ; most stems and lines will be positioned at EM units that are even multiples of 128, and in a few cases they are at even multiples of 64 or as low as 16. Metrics: UPM: 2816 Descender: 640 x height: 1536 Cap height: 2048 Ascender: 2688 Translating between EM units and pixels: Rasterized at 11px: 1px 256 units Rasterized at 22px: 1px 128 units Rasterized at 44px: 1px 64 units There's a Figma workspace for glyphs, with configured metrics: Inter glyphs See also Contributing (CONTRIBUTING.md) Compiling font files (CONTRIBUTING.md compiling font files) SIL Open Font License (LICENSE.txt)",Unknown,Unknown 248,Unknown,Unknown,Unknown,"commonmark.py commonmark.py is a pure Python port of jgm __'s commonmark.js __, a Markdown parser and renderer for the CommonMark __ specification, using only native modules. Once both this project and the CommonMark specification are stable we will release the first 1.0 version and attempt to keep up to date with changes in commonmark.js . commonmark.py is tested against the CommonMark spec with Python versions 2.7, 3.4, 3.5, 3.6, and 3.7. Current version: 0.9.0 Pypi Link Build Status Doc Link Installation :: $ pip install commonmark Usage :: >>> import commonmark >>> commonmark.commonmark(' hello! ') ' hello! \n' Or, without the syntactic sugar: .. code:: python import commonmark parser commonmark.Parser() ast parser.parse( Hello World ) renderer commonmark.HtmlRenderer() html renderer.render(ast) print(html) Hello World inspecting the abstract syntax tree json commonmark.dumpJSON(ast) commonmark.dumpAST(ast) pretty print generated AST structure There is also a CLI: :: $ cmark README.md o README.html $ cmark README.md o README.json aj output AST as JSON $ cmark README.md a pretty print generated AST structure $ cmark h usage: cmark h o O a aj infile Process Markdown according to the CommonMark specification. positional arguments: infile Input Markdown file to parse, defaults to stdin optional arguments: h, help show this help message and exit o O Output HTML/JSON file, defaults to stdout a Print formatted AST aj Output JSON AST Contributing If you would like to offer suggestions/optimizations/bugfixes through pull requests please do! Also if you find an error in the parser/renderer that isn't caught by the current test suite please open a new issue and I would also suggest you send the commonmark.js __ project a pull request adding your test to the existing test suite. Tests To work on commonmark.py, you will need to be able to run the test suite to make sure your changes don't break anything. To run the tests, you can do something like this: :: $ pyvenv venv $ ./venv/bin/python setup.py develop test The tests script, commonmark/tests/run_spec_tests.py , is pretty much a devtool. As well as running all the tests embedded in spec.txt it also allows you to run specific tests using the t argument, provide information about passed tests with p , percentage passed by category of test with s , and enter markdown interactively with i (In interactive mode end a block by inputting a line with just end , to quit do the same but with quit ). d can be used to print call tracing. :: $ ./venv/bin/python commonmark/tests/run_spec_tests.py h usage: run_spec_tests.py h t T p f i d np s script to run the CommonMark specification tests against the commonmark.py parser. optional arguments: h, help show this help message and exit t T Single test to run or comma separated list of tests ( t 10 or t 10,11,12,13) p Print passed test information f Print failed tests (during np...) i Interactive Markdown input mode d Debug, trace calls np Only print section header, tick, or cross s Print percent of tests passed by category Authors Bibek Kafle __ Roland Shoemaker __ Nikolas Nyby __ .. Pypi Link image:: :target: .. Build Status image:: :target: .. Doc Link image:: :target: :alt: Documentation Status",Unknown,Unknown 249,Unknown,Unknown,Unknown,"What is SaltStack? SaltStack makes software for complex systems management at scale. SaltStack is the company that created and maintains the Salt Open project and develops and sells SaltStack Enterprise software, services and support. Easy enough to get running in minutes, scalable enough to manage tens of thousands of servers, and fast enough to communicate with them in seconds . Salt is a new approach to infrastructure management built on a dynamic communication bus. Salt can be used for data driven orchestration, remote execution for any infrastructure, configuration management for any app stack, and much more. Download Salt Open Salt Open is tested and packaged to run on CentOS, Debian, RHEL, Ubuntu, Windows. Download Salt Open and get started now. _ Installation Instructions _ SaltStack Documentation Installation instructions, getting started guides, and in depth API documentation. _ _ Engage SaltStack SaltConf _, User Groups and Meetups SaltStack has a vibrant and global community _ of customers, users, developers and enthusiasts. Connect with other Salted folks in your area of the world, or join SaltConf _, the SaltStack annual user conference held in Salt Lake City. Please visit the SaltConf _ site for details of our next conference. Also, please let us know if you would like to start a user group or if we should add your existing SaltStack user group to this list by emailing: info@saltstack.com SaltStack Training Get access to proprietary SaltStack education offerings _ through instructor led training offered on site, virtually or at SaltStack headquarters in Salt Lake City. SaltStack Enterprise training helps increase the value and effectiveness of SaltStack software for any customer and is a prerequisite for coveted SaltStack Certified Engineer (SSCE) _ status. SaltStack training is also available through several SaltStack professional services _ offerings. Follow SaltStack on YouTube _ Twitter _ Facebook _ LinkedIn _ LinkedIn Group _ Google+ _ .. _global community: .. _SaltConf: .. _SaltStack education offerings: .. _SaltStack Certified Engineer (SSCE): .. _SaltStack professional services: License SaltStack is licensed by the SaltStack Team under the Apache 2.0 license. Please see the LICENSE file for the full text of the Apache license, followed by a full summary of the licensing used by external modules.",Unknown,Unknown 250,Unknown,Unknown,Unknown,"pypdflib: Python based PDF Rendering Library Features: Uses Pango Cairo for rendering Supports Complex scripts such as Indic scripts How to Install: 1. Get the sourcecode git clone or git clone 2. Create the eggs python setup.py build 3. Install the library sudo python setup.py install Copyright (C) 2010, Santhosh Thottingal. Copying and distribution of this file, with or without modification, are permitted in any medium without royalty provided the copyright notice and this notice are preserved. This file is offered as is, without warranty of any kind.",Unknown,Unknown 251,Unknown,Unknown,Unknown,"saxo Build Status PyPI version Quick and flexible irc bot , extensible in any language. ! The blocks signify modularity. Credit: @Abdur rahmaanJ (logo.png?raw true) Quick start You will need Python 3.3 or later. git clone cd saxo ./saxo create Edit your /.saxo/config file in a text editor. Then start the bot: ./saxo f start Support Ask in saxo on freenode , or tweet @sbp . Documentation Try the docs (docs) directory: Getting started (docs/getting started.md) Saxo config (docs/config.md) List of saxo commands (docs/commands.md) Write saxo commands (docs/write commands.md) Changes (docs/changes.md) Credits Thanks to Scott Arciszewski (@sarciszewski), and to Abdur Rahmaan Janhangeer (@Abdur rahmaanJ) for designing the logo. License Apache License 2.0.",Unknown,Unknown 252,Unknown,Unknown,Unknown,"quicktions Python's Fraction data type is an excellent way to do exact money calculations and largely beats Decimal in terms of simplicity, accuracy and safety. Clearly not in terms of speed, though, given the cdecimal accelerator in Py3.3+. quicktions is an adaptation of the original fractions module (as included in CPython 3.5) that is compiled and optimised with Cython _ into a fast, native extension module. Compared to the standard library fractions module in Py2.7 and Py3.4, quicktions is currently about 10x faster, and still about 6x faster than the current version in Python 3.5. It's also about 15x faster than the (Python implemented) decimal module in Py2.7. For documentation, see the Python standard library's fractions module:",Unknown,Unknown 253,Unknown,Unknown,Unknown,"Scrapy .. image:: :target: :alt: PyPI Version .. image:: :target: :alt: Supported Python Versions .. image:: :target: :alt: Build Status .. image:: :target: :alt: Wheel Status .. image:: :target: :alt: Coverage report .. image:: :target: :alt: Conda Version Overview Scrapy is a fast high level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing. For more information including a list of features check the Scrapy homepage at: Requirements Python 2.7 or Python 3.4+ Works on Linux, Windows, Mac OSX, BSD Install The quick way:: pip install scrapy For more details see the install section in the documentation: Documentation Documentation is available online at and in the docs directory. Releases You can find release notes at Community (blog, twitter, mail list, IRC) See Contributing See Code of Conduct Please note that this project is released with a Contributor Code of Conduct (see By participating in this project you agree to abide by its terms. Please report unacceptable behavior to opensource@scrapinghub.com. Companies using Scrapy See Commercial Support See",Unknown,Unknown 254,Unknown,Unknown,Unknown,"Pyinotify License : MIT Project URL : Project Wiki : API Documentation: Dependencies Linux ≥ 2.6.13 Python ≥ 2.4 (including Python 3.x) Install Get the current stable version from PyPI and install it with pip To install pip follow $ sudo pip install pyinotify Or install Pyinotify directly from source Choose your Python interpreter: either python, python2.7, python3.2,.. Replacing XXX accordingly, type: $ sudo pythonXXX setup.py install Watch a directory Install pyinotify and run this command from a shell: $ python m pyinotify v /my dir to watch",Unknown,Unknown 255,Unknown,Unknown,Unknown,"OSfooler NG License: GPL v3 ! Version: 1.0 Maintenance An outsider has the capability to discover general information, such as which operating system a host is running, by searching for default stack parameters, ambiguities in IETF RFCs or non compliant TCP/IP implementations in responses to malformed requests. By pinpointing the exact OS of a host, an attacker can launch an educated and precise attack against a target machine. There are lot of reasons to hide your OS to the entire world: Revealing your OS makes things easier to find and successfully run an exploit against any of your devices. Having and unpatched or antique OS version is not very convenient for your company prestige. Imagine that your company is a bank and some users notice that you are running an unpatched box. They won't trust you any longer! In addition, these kind of 'bad' news are always sent to the public opinion. Knowing your OS can also become more dangerous, because people can guess which applications are you running in that OS (data inference). For example if your system is a MS Windows, and you are running a database, it's highly likely that you are running MS SQL. It could be convenient for other software companies, to offer you a new OS environment (because they know which you are running). And finally, privacy; nobody needs to know the systems you've got running. OSfooler was presented at Blackhat Arsenal 2013. It was built on NFQUEUE, an iptables/ip6tables target which delegate the decision on packets to a userspace. It transparently intercepted all traffic that your box was sending in order to camouflage and modify in real time the flags in TCP/IP packets that discover your system. OSfooler NG has been complete rewriten from the ground up, being highly portable, more efficient and combining all known techniques to detect and defeat at the same time: Active remote OS fingerprinting: like Nmap Passive remote OS fingeprinting: like p0f v2 Commercial engines like Sourcefire’s FireSiGHT OS fingerprinting Some additional features are: No need for kernel modification or patches Simple user interface and several logging features Transparent for users, internal process and services Detecting and defeating mode: active, passive & combined Will emulate any OS Capable of handling updated nmap and p0f v2 fingerprint database Undetectable for the attacker Install To get the latest versions, with bugfixes and new features, but maybe not as stable, use the the Github repository: $ git clone You need to install python nfqueue (v0.5 1build2) linux package. Download from Ubuntu Packages : $ wget $ dpkg i python nfqueue_0.5 1build2_amd64.deb Install OSfooler ng in the standard way: $ sudo python setup.py install Usage Active Fingerprinting: nmap To get the full list of OS to emulate, just use the flag ' n': $ osfooler ng n + Please, select nmap OS to emulate + 2N Helios IP VoIP doorbell + 2Wire BT2700HG V ADSL modem + 2Wire 1701HG wireless ADSL modem ... + ZyXEL Prestige 660HW 61 ADSL router (ZyNOS 3.40) + ZyXEL Prestige 660HW D1 wireless ADSL router + ZyXEL ZyWALL 2 Plus firewall To emulate an specific OS, just use the flag ' o' with the OS you want to emulate: $ osfooler ng m Sony Ericsson W705 or W715 Walkman mobile phone + Mutating to nmap: Fingerprint Sony Ericsson W705 or W715 Walkman mobile phone Class Sony Ericsson embedded phone CPE cpe:/h:sonyericsson:w705 CPE cpe:/h:sonyericsson:w715 SEQ(CI RD%II I) OPS(R N) WIN(R N) ECN(R N) T1(R N) T2(R Y%DF N%T 3B 45%TG 40%W 0%S Z%A S%F AR%O %RD 0%Q ) T3(R N) T4(R Y%DF N%T 3B 45%TG 40%W 0%S A%A Z%F R%O %RD 0%Q ) T5(R Y%DF N%T 3B 45%TG 40%W 0%S Z%A S+%F AR%O %RD 0%Q ) T6(R Y%DF N%T 3B 45%TG 40%W 0%S A%A Z%F R%O %RD 0%Q ) T7(R Y%DF N%T 3B 45%TG 40%W 0%S Z%A S+%F AR%O %RD 0%Q ) U1(DF N%T 3B 45%TG 40%IPL 164%UN 0%RIPL G%RID G%RIPCK G%RUCK G%RUD G) IE(DFI N%T 3B 45%TG 40%CD S) + Activating queues > Process 1: nmap packet processor Passive Fingerprinting: p0f v2 To get the full list of OS to emulate, just use the flag ' l': $ osfooler ng p Please, select p0f OS Genre and Details OS Genre AIX Details 4.3 OS Genre AIX Details 4.3.2 and earlier OS Genre AIX Details 4.3.3 5.2 (1) ... OS Genre NMAP Details OS detection probe w/flags (3) OS Genre NMAP Details OS detection probe w/flags (4) OS Genre NAST Details syn scan To emulate any p0f OS, just use the flag ' o' with the OS Genre. This will choose the main OS and custom version will be randomly loaded when a SYN packet is detected. For example: $ osfooler ng o PalmOS + Mutating to p0f: WWW:S9 TTL:255 D:0 SS:44 OOO:M536 QQ:. OS:PalmOS DETAILS:Tungsten T3/C WWW:S5 TTL:255 D:0 SS:44 OOO:M536 QQ:. OS:PalmOS DETAILS:3/4 WWW:S4 TTL:255 D:0 SS:44 OOO:M536 QQ:. OS:PalmOS DETAILS:3.5 WWW:2948 TTL:255 D:0 SS:44 OOO:M536 QQ:. OS:PalmOS DETAILS:3.5.3 (Handera) WWW:S29 TTL:255 D:0 SS:44 OOO:M536 QQ:. OS:PalmOS DETAILS:5.0 WWW:16384 TTL:255 D:0 SS:44 OOO:M1398 QQ:. OS:PalmOS DETAILS:5.2 (Clie) WWW:S14 TTL:255 D:0 SS:44 OOO:M1350 QQ:. OS:PalmOS DETAILS:5.2.1 (Treo) WWW:16384 TTL:255 D:0 SS:44 OOO:M1400 QQ:. OS:PalmOS DETAILS:5.2 (Sony) + Activating queues > Process 1: p0f packet processor You can also emulate the full p0f OS, using ' ' with the OS Genre and ' d' with custom details: $ osfooler ng o Windows d XP bare bone + Mutating to p0f: WWW:65520 TTL:128 D:1 SS:48 OOO:M ,N,N,S QQ:. OS:Windows DETAILS:XP bare bone + Activating queues > Process 1: p0f packet processor Active and Passive Fingerprinting: nmap & p0f OSfooler ng is also capable os emulating both OS to defeat nmap and p0f. Just combine the parameters above: $ osfooler ng m Microsoft Windows 2000 SP4 o Windows d 2000 SP4 + Mutating to nmap: Fingerprint Microsoft Windows 2000 SP4 Class Microsoft Windows 2000 general purpose CPE cpe:/o:microsoft:windows_2000::sp4 SEQ(SP 7C 86%GCD 1 6%ISR 95 9F%TI I%II I%SS O S%TS 0) OPS(O1 NNT11 M5B4NW0NNT00NNS%O2 NNT11 M5B4NW0NNT00NNS%O3 NNT11 M5B4NW0NNT00%O4 NNT11 M5B4NW0NNT00NNS%O5 NNT11 M5B4NW0NNT00NNS%O6 NNT11 M5B4NNT00NNS) WIN(W1 FFFF%W2 FFFF%W3 FFFF%W4 FFFF%W5 FFFF%W6 FFFF) ECN(R Y%DF N%T 7B 85%TG 80%W 0%O %CC N%Q U) T1(R Y%DF Y%T 7B 85%TG 80%S O%A O S+%F A AS%RD 0%Q U) T2(R Y%DF N%T 7B 85%TG 80%W 0%S Z%A S%F AR%O %RD 0%Q U) T3(R Y%DF N%T 7B 85%TG 80%W 0%S Z%A O%F AR%O %RD 0%Q U) T4(R Y%DF N%T 7B 85%TG 80%W 0%S A%A O%F R%O %RD 0%Q U) T5(R Y%DF N%T 7B 85%TG 80%W 0%S Z%A S+%F AR%O %RD 0%Q U) T6(R Y%DF N%T 7B 85%TG 80%W 0%S A%A O%F R%O %RD 0%Q U) T7(R Y%DF N%T 7B 85%TG 80%W 0%S Z%A S+%F AR%O %RD 0%Q U) U1(DF N%T 7B 85%TG 80%IPL 38%UN 0%RIPL G%RID G%RIPCK G%RUCK G%RUD G) IE(DFI S%T 7B 85%TG 80%CD Z) + Mutating to p0f: WWW:40320 TTL:128 D:1 SS:48 OOO:M ,N,N,S QQ:. OS:Windows DETAILS:2000 SP4 + Activating queues > Process 1: nmap packet processor > Process 2: p0f packet processor Searching for Operating Systems You can search inside nmap/p0f database for a specific OS, instead of getting the whole list. Just use the flag ' s' and enter the keyword you want to search for (case insensitive). You'll get any match found, and if it belongs to nmap or p0f databases: $ osfooler ng s playstation + Searching databases for: 'playstation' nmap Sony Playstation 4 or FreeBSD 10.2 RELEASE nmap Sony PlayStation 2 game console test kit 2.2.1 nmap Sony PlayStation 3 game console nmap Sony PlayStation 3 game console test kit nmap Sony PlayStation 2 game console p0f OS: Sony DETAILS: Playstation 2 (SOCOM?) Update nmap database Use the flag ' u' to check if there's a new version of nmap's database avaiable and to download it $ osfooler ng u + Checking nmap database... latest! Custom flags There are other interesting flags: ' v': Show info about every modified packet ' i ': Choose network interface (eth0 by default) ' V': Show OSfooler ng banner and current version installed Authors Jaime Sánchez ( @segofensiva) License This project is licensed under the The GNU General Public License v3.0 see the LICENSE.md (LICENSE.md) file for details Acknowledgments Defcon China , for leting me show this tool on Demo Labs All those people who have worked and released software on OS fingerprinting (attack and defense), specially nmap & p0f (lcamtuf.coredump.cx/), but also Xprobe, IP Personality etc. OSfooler ng makes use of the Scapy Project and The netfilter.org libnetfilter_queue project",Unknown,Unknown 256,Unknown,Unknown,Unknown,"Build Status Python daemonizer class This is a Python class that will daemonize your Python script so it can continue running in the background. It works on Unix, Linux and OS X, creates a PID file and has standard commands (start, stop, restart) + a foreground mode. Based on this original version from jejik.com . Usage Define a class which inherits from Daemon and has a run() method (which is what will be called once the daemonization is completed. from daemon import Daemon class pantalaimon(Daemon): def run(self): Do stuff Create a new object of your class, specifying where you want your PID file to exist: pineMarten pantalaimon('/path/to/pid.pid') pineMarten.start() Actions start() starts the daemon (creates PID and daemonizes). stop() stops the daemon (stops the child process and removes the PID). restart() does stop() then start() . Foreground This is useful for debugging because you can start the code without making it a daemon. The running script then depends on the open shell like any normal Python script. To do this, just call the run() method directly. pineMarten.run() Continuous execution The run() method will be executed just once so if you want the daemon to be doing stuff continuously you may wish to use the sched 1 module to execute code repeatedly ( example 2 ). 1 : 2 :",Unknown,Unknown 257,Unknown,Unknown,Unknown,"speedtest cli Command line interface for testing internet bandwidth using speedtest.net .. image:: :target: :alt: Latest Version .. image:: :target: :alt: Travis .. image:: :target: :alt: License Versions speedtest cli works with Python 2.4 3.7 .. image:: :target: :alt: Versions Installation pip / easy\_install :: pip install speedtest cli or :: easy_install speedtest cli Github :: pip install git+ or :: git clone cd speedtest cli python setup.py install Just download (Like the way it used to be) :: wget O speedtest cli chmod +x speedtest cli or :: curl Lo speedtest cli chmod +x speedtest cli Usage :: $ speedtest cli h usage: speedtest cli h no download no upload single bytes share simple csv csv delimiter CSV_DELIMITER csv header json list server SERVER exclude EXCLUDE mini MINI source SOURCE timeout TIMEOUT secure no pre allocate version Command line interface for testing internet bandwidth using speedtest.net. optional arguments: h, help show this help message and exit no download Do not perform download test no upload Do not perform upload test single Only use a single connection instead of multiple. This simulates a typical file transfer. bytes Display values in bytes instead of bits. Does not affect the image generated by share, nor output from json or csv share Generate and provide a URL to the speedtest.net share results image, not displayed with csv simple Suppress verbose output, only show basic information csv Suppress verbose output, only show basic information in CSV format. Speeds listed in bit/s and not affected by bytes csv delimiter CSV_DELIMITER Single character delimiter to use in CSV output. Default , csv header Print CSV headers json Suppress verbose output, only show basic information in JSON format. Speeds listed in bit/s and not affected by bytes list Display a list of speedtest.net servers sorted by distance server SERVER Specify a server ID to test against. Can be supplied multiple times exclude EXCLUDE Exclude a server from selection. Can be supplied multiple times mini MINI URL of the Speedtest Mini server source SOURCE Source IP address to bind to timeout TIMEOUT HTTP timeout in seconds. Default 10 secure Use HTTPS instead of HTTP when communicating with speedtest.net operated servers no pre allocate Do not pre allocate upload data. Pre allocation is enabled by default to improve upload performance. To support systems with insufficient memory, use this option to avoid a MemoryError version Show the version number and exit Python API See the wiki _. Inconsistency It is not a goal of this application to be a reliable latency reporting tool. Latency reported by this tool should not be relied on as a value indicative of ICMP style latency. It is a relative value used for determining the lowest latency server for performing the actual speed test against. There is the potential for this tool to report results inconsistent with Speedtest.net. There are several concepts to be aware of that factor into the potential inconsistency: 1. Speedtest.net has migrated to using pure socket tests instead of HTTP based tests 2. This application is written in Python 3. Different versions of Python will execute certain parts of the code faster than others 4. CPU and Memory capacity and speed will play a large part in inconsistency between Speedtest.net and even other machines on the same network Issues relating to inconsistencies will be closed as wontfix and without additional reason or context.",Unknown,Unknown 258,Unknown,Unknown,Unknown,"Sixpack .. image:: :target: .. image:: :target: Sixpack is a framework to enable A/B testing across multiple programming languages. It does this by exposing a simple API for client libraries. Client libraries can be written in virtually any language. Sixpack has two main parts. The first, Sixpack server , is responsible for responding to web requests. The second, Sixpack web , is a web dashboard for tracking and acting on your A/B tests. Sixpack web is optional. Requirements Redis > 2.6 Python > 2.7 (3.0 untested, pull requests welcome) Getting Started To get going, create (or don't, but you really should) a new virtualenv for your Sixpack installation. Follow that up with pip install :: $ pip install sixpack Note: If you get an error like src/hiredis.h:4:20: fatal error: Python.h: No such file or directory you need to install the python development tools. apt get install python dev on Ubuntu. Next, create a Sixpack configuration. A configuration must be created for Sixpack to run. Here's the default:: redis_port: 6379 Redis port redis_host: localhost Redis host redis_prefix: sixpack all Redis keys will be prefixed with this redis_db: 15 DB number in redis metrics: false send metrics to StatsD (response times, of calls, etc)? statsd_url: 'udp://localhost:8125/sixpack' StatsD url to connect to (used only when metrics: true) The regex to match for robots robot_regex: $^ trivial facebook MetaURI butterfly google amazon goldfire sleuth xenu msnbot SiteUptime Slurp WordPress ZIBB ZyBorg pingdom bot yahoo slurp java fetch spider url crawl oneriot abby commentreader twiceler ignored_ip_addresses: List of IP asset_path: gen Path for compressed assets to live. This path is RELATIVE to sixpack/static secret_key: ' ' Random key (any string is valid, required for sixpack web to run) You can store this file anywhere (we recommend /etc/sixpack/config.yml ). As long as Redis is running, you can now start the Sixpack server like this:: $ SIXPACK_CONFIG sixpack Sixpack server will be listening on port 5000 by default but can be changed with the SIXPACK_PORT environment variable. For use in a production environment, please see the Production Notes section below. Alternatively, as of version 1.1, all Sixpack configuration can be set by environment variables. The following environment variables are available: SIXPACK_CONFIG_ENABLED SIXPACK_CONFIG_REDIS_PORT SIXPACK_CONFIG_REDIS_HOST SIXPACK_CONFIG_REDIS_PASSWORD SIXPACK_CONFIG_REDIS_PREFIX SIXPACK_CONFIG_REDIS_DB SIXPACK_CONFIG_ROBOT_REGEX SIXPACK_CONFIG_IGNORE_IPS comma separated SIXPACK_CONFIG_ASSET_PATH SIXPACK_CONFIG_SECRET SIXPACK_CORS_ORIGIN SIXPACK_CORS_HEADERS SIXPACK_CORS_CREDENTIALS SIXPACK_CORS_METHODS SIXPACK_CORS_EXPOSE_HEADERS SIXPACK_METRICS STATSD_URL Using the API All interaction with Sixpack is done via HTTP GET requests. Sixpack allows for cross language testing by accepting a unique client_id (which the client is responsible for generating) that links a participation to a conversion. All requests to Sixpack require a client_id . The Sixpack API can be used from front end Javascript via CORS enabled requests. The Sixpack API server will accept CORS requests from any domain. Participating in an Experiment You can participate in an experiment with a GET request to the participate endpoint:: $ curl If the test does not exist, it will be created automatically. You do not need to create the test in Sixpack web. Experiment names are not validated, so it is possible to explode the Redis keyspace. If you need to validate that the experiments being created are only those you wish to whitelist, consider fronting Sixpack with either Nginx+Lua / Openresty or Varnish , and performing your whitelisting logic there. Arguments experiment (required) is the name of the test. Valid experiment names must be a lowercase alphanumeric string and can contain _ and . alternatives (required) are the potential responses from Sixpack. One of them will be the bucket that the client_id is assigned to. client_id (required) is the unique id for the user participating in the test. user_agent (optional) user agent of the user making a request. Used for bot detection. ip_address (optional) IP address of user making a request. Used for bot detection. force (optional) force a specific alternative to be returned, example:: $ curl In this example, red will always be returned. This is used for testing only, and no participation will be recorded. record_force (optional) for use with force , participation will be recorded. traffic_fraction (optional) Sixpack allows for limiting experiments to a subset of traffic. You can pass the percentage of traffic you'd like to expose the test to as a decimal number here. ( ?traffic_fraction 0.10 for 10%) Response A typical Sixpack participation response will look something like this:: { status: ok , alternative: { name: red }, experiment: { name: button_color }, client_id: 12345678 1234 5678 1234 567812345678 } The most interesting part of this is alternative . This is a representation of the alternative that was chosen for the test and assigned to a client_id . All subsequent requests to this experiment/client_id combination will be returned the same alternative. Converting a user You can convert a user with a GET request to the convert endpoint:: $ curl Conversion Arguments experiment (required) the name of the experiment you would like to convert on. client_id (required) the client you would like to convert. kpi (optional) sixpack supports recording multiple KPIs. If you would like to track conversion against a specfic KPI, you can do that here. If the KPI does not exist, it will be created automatically. Notes You'll notice that the convert endpoint does not take an alternative query parameter. This is because Sixpack handles that internally with the client_id . We've included a 'health check' endpoint, available at /_status . This is helpful for monitoring and alerting if the Sixpack service becomes unavailable. The health check will respond with either 200 (success) or 500 (failure) headers. Clients We've already provided clients in four languages. We'd love to add clients in additional languages. If you feel inclined to create one, please first read the CLIENTSPEC_. After writing your client, please update and pull request this file so we know about it. Ruby_ Python_ JavaScript_ PHP_ iOS_ Go_ Perl_ C _ Java/Android _ .. _Ruby: .. _Python: .. _JavaScript: .. _PHP: .. _iOS: .. _Go: .. _Perl: .. _C : .. _Java/Android: Algorithm As of version 2.0 of Sixpack, we use a deterministic algorithm to choose which alternative a client will receive. The algorithm was ported from Facebook's Planout project, and more information can be found HERE_. Dashboard Sixpack comes with a built in dashboard. You can start the dashboard with:: $ SIXPACK_CONFIG sixpack web The Sixpack dashboard allows you to visualize how each experiment's alternatives are doing compared to the rest, select alternatives as winners, and update experiment descriptions to something more human readable. Sixpack web defaults to run on port 5001 but can be changed with the SIXPACK_WEB_PORT environment variable. Sixpack web will not work properly until you set the secret_key variable in the configuration file. API Sixpack web dashboard has a bit of a read only API built in. To get a list of all experiment information you can make a request like:: $ curl To get the information for a single experiment, you can make a request like:: $ curl Production Notes We recommend running Sixpack on gunicorn_ in production. You will need to install gunicorn in your virtual environment before running the following. To run the sixpack server using gunicorn/gevent (a separate installation) you can run the following:: gunicorn access logfile w 8 worker class gevent sixpack.server:start To run the sixpack web dashboard using gunicorn/gevent (a separate installation) you can run the following:: gunicorn access logfile w 2 worker class gevent sixpack.web:start Note: After selecting an experiment winner, it is best to remove the Sixpack experiment code from your codebase to avoid unnecessary requests. CORS Cross origin resource sharing can be adjusted with the following config attributes:: cors_origin: cors_headers: ... cors_credentials: true cors_methods: GET cors_expose_headers: ... Contributing 1. Fork it 2. Start Sixpack in development mode with:: $ PYTHONPATH . SIXPACK_CONFIG bin/sixpack and:: $ PYTHONPATH . SIXPACK_CONFIG bin/sixpack web We've also included a small script that will seed Sixpack with lots of random data for testing and development on sixpack web. You can seed Sixpack with the following command:: $ PYTHONPATH . SIXPACK_CONFIG sixpack/test/seed This command will make a few dozen requests to the participate and convert endpoints. Feel free to run it multiple times to get additional data. Note: By default the server runs in production mode. If you'd like to turn on Flask and Werkzeug debug modes set the SIXPACK_DEBUG environment variable to true . 3. Create your feature branch ( git checkout b my new feature ) 4. Write tests 5. Run tests with nosetests 6. Commit your changes ( git commit am 'Added some feature' ) 7. Push to the branch ( git push origin my new feature ) 8. Create new pull request Please avoid changing versions numbers; we'll take care of that for you. Using Sixpack in production? If you're a company using Sixpack in production, kindly let us know! We're going to add a 'using Sixpack' section to the project landing page, and we'd like to include you. Drop Jack a line at jack at seatgeek dot.com with your company name. License Sixpack is released under the BSD 2 Clause License _. .. _gunicorn: .. _CLIENTSPEC: .. _HERE: .. _ BSD 2 Clause License :",Unknown,Unknown 259,Unknown,Unknown,Unknown,"Dogefy Telegram bot This is the source code of the Dogefy Telegram bot. The bot is pretty simple, just listens for photos, downloads them, search for human front faces and replaces them with a doge. Please rate it following this link Requirements You need the python pyTelegramBotAPI and OpenCV . pyTelegramBotAPI installation pip install pyTelegramBotAPI or if you want to install it as user, not system wide pip install user pyTelegramBotAPI if you have python3 as default you should use pip2 OpenCV installation Debian based systems (Ubuntu, Linux Mint, ElementaryOS, ...) apt get install python opencv You may need to install opencv data if not working. Arch Linux pacman S opencv Usage DOGEFY_TKN 123456789:AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA python2 dogefybot.py if you have python3 as default you should use python2 To do list Maybe add a listener for photos sent as files (no compression). @bot.message_handler(func lambda m: m.document.mime_type.startswith('image/'), content_types 'document' ) def handle_photo_as_document(m): ... License: GPLv3 This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see .",Unknown,Unknown 260,Unknown,Unknown,Unknown,"sharknado Build Status sharknado is a super simple and super fast messaging server for the Internet of Things built with tornado and mongodb, inspired by dweet.io, compatible with python 2.7, 3.2+ and pypy It implements HAPI standards to provide web based APIs that are machine ready but human/developer friendly. Deploy usage Install the dependencies with pip: pip install r requirements.txt launch the server: python sharknado.py OPTIONS options: + port binds the server to the given port. default 8000 + processes number of processes to fork, 1: no child processes forked, 0: n_cpus processes, n: n processes. default 1 + mongo_uri mongodb url. default mongodb://localhost:27017/sharknado + messages_expire expiration of messages, in seconds, 0 to disable expiration. default 1 month + cors_origin Access Control Allow Origin header content. default , set to to disable cors tests you can run the test suite using the tornado test runner: python m tornado.testing tests.test_sharknado you can also use your preferred runner :) hapi interface send messages to send a message, just call a URL: send/message/for/my thing name?hello world&foo bar Any query parameters you add to the request will be added as key value pairs to the content of the message. You can also send any valid JSON data in the body of the request with a POST. sharknado will respond with { this : succeeded , by : sending , the : message , with : { _id : 5452180080cd99000268e0cb , thing : my thing name , created : 2014 10 30T10:50:40.220000 , content : { hello : world , foo : bar } } } retrieve messages to retrieve messages, call the URL: get/messages/for/my thing name sharknado will respond with { this : succeeded , by : getting , the : messages , with : { _id : 5452180080cd99000268e0cf , thing : my thing name , created : 2014 10 30T10:50:47.220000 , content : { this : is cool! } }, { _id : 5452180080cd99000268e0cb , thing : my thing name , created : 2014 10 30T10:50:40.220000 , content : { hello : world , foo : bar } } } by default sharknado will return messages from the past 30 days, you can override this behaviour by calling the url get/messages/for/my thing name/past/n days you can also retrieve only the latest message /get/latest/message/for/my thing name count messages sharknado provides a fast message counter endpoint /count/messages/for/my thing name todo list in no particular order: + message streaming + message locking + mongodb write concern customization + support for python 3",Unknown,Unknown 261,Unknown,Unknown,Unknown,Linode DynDNS Updater By Jed Smith For customers of Linode that use the Linode DNS manager. Released into the public domain. Requires Python 3.0 or above. Python 2.6 may work. Contains directions in the script (which you'll have to edit anyway).,Unknown,Unknown 262,Unknown,Unknown,Unknown,"Esp Buddy _Tired of typing very long commands to upload your custom firmwares?_ _Bored to manually upload your firmwares in two steps for 1MB devices?_ _Want to batch upload new firmwares to all your devices via OTA or backup all settings in one command?_ This script allows you to easily upload firmwares to remote (ESP8266 based) devices via Wifi (Over The Air) or Serial, in one short command. It also gathers various tool commands to be used in batch mode. Features OTA upload on 4M devices OTA upload on 1M devices using an intermediate firmware (automatic two steps) Use configuration presets for devices Optional compilation using platformio Optionally pass various D flags to the compiler, including extracted parameters like IP or hostname Fetch versions of remote devices Archive current firmware & previous firmware per target Backup current settings & previous settings per target Parse Repositories' installed versions Git Pull Repositories Ping Remote Host Supported Firmwares Works with : ESPeasy Espurna Tasmota _since v5.12.0h_ should virtually work with any ESP8266 firmware, just add a small espb_repo_xxx class to describe it. Requirements Linux or OSX Operating System php5 or newer PlatformIO __needed only for compiling__ Installation Rename _config sample.php_ to _config.php_. Fill in some hosts and configurations in config.php Usage espbuddy.php ACTION TARGET OPTIONS Valid Actions are: upload : Build and/or Upload current repo version to Device(s) build : Build current repo version backup : Backup remote devices' settings monitor : Monitor the serial port version : Show Device(s) Version reboot : Reboot remote devive gpios : Test (On/Off) each GPIOs ping : Ping Device(s) repo_version : Show Repo's Current version repo_pull : Git Pull Repo's master version list_hosts : List all available hosts list_configs : List all available configurations list_repos : List all available repositories help : Show full help Examples: espbuddy.php upload select the one to upload to from the list of targets espbuddy.php upload relay1 upload to target 'relay1' espbuddy.php upload all b upload to all defined targets, while building the firmware first espbuddy.php upload relay1 w upload using serial to target 'relay1' espbuddy.php backup all backup settings all defined targets espbuddy.php monitor relay1 rate 9600 serial monitor target 'led1' at 9600 bauds espbuddy.php version all show versions of all defined targets espbuddy.php ping all ping the all defined targets Contribute! Whether you are a developer or a regular user, your help is most welcome (.github/CONTRIBUTING.md)! Licence This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110 1301 USA.",Unknown,Unknown 263,Unknown,Unknown,Unknown,"Spack Build Status codecov Read the Docs Slack Spack is a multi platform package manager that builds and installs multiple versions and configurations of software. It works on Linux, macOS, and many supercomputers. Spack is non destructive: installing a new version of a package does not break existing installations, so many configurations of the same package can coexist. Spack offers a simple spec syntax that allows users to specify versions and configuration options. Package files are written in pure Python, and specs allow package authors to write a single script for many different builds of the same package. With Spack, you can build your software all the ways you want to. See the Feature Overview for examples and highlights. To install spack and your first package, make sure you have Python. Then: $ git clone $ cd spack/bin $ ./spack install libelf Documentation Full documentation for Spack is the first place to look. Try the Spack Tutorial , to learn how to use spack, write packages, or deploy packages for users at your site. See also: Technical paper and slides on Spack's design and implementation. Short presentation from the Getting Scientific Software Installed BOF session at Supercomputing 2015. Get Involved! Spack is an open source project. Questions, discussion, and contributions are welcome. Contributions can be anything from new packages to bugfixes, or even new core features. Mailing list If you are interested in contributing to spack, join the mailing list. We're using Google Groups for this: Spack Google Group Slack channel Spack has a Slack channel where you can chat about all things Spack: Spack on Slack Sign up here to get an invitation mailed to you. Twitter You can follow @spackpm on Twitter for updates. Also, feel free to @mention us in in questions or comments about your own experience with Spack. Contributions Contributing to Spack is relatively easy. Just send us a pull request . When you send your request, make develop the destination branch on the Spack repository . Your PR must pass Spack's unit tests and documentation tests, and must be PEP 8 compliant. We enforce these guidelines with Travis CI . To run these tests locally, and for helpful tips on git, see our Contribution Guide . Spack uses a rough approximation of the Git Flow branching model. The develop branch contains the latest contributions, and master is always tagged and points to the latest stable release. Authors Many thanks go to Spack's contributors . Spack was created by Todd Gamblin, tgamblin@llnl.gov. Citing Spack If you are referencing Spack in a publication, please cite the following paper: Todd Gamblin, Matthew P. LeGendre, Michael R. Collette, Gregory L. Lee, Adam Moody, Bronis R. de Supinski, and W. Scott Futral. The Spack Package Manager: Bringing Order to HPC Software Chaos . In Supercomputing 2015 (SC’15) , Austin, Texas, November 15 20 2015. LLNL CONF 669890. License Spack is distributed under the terms of both the MIT license and the Apache License (Version 2.0). Users may choose either license, at their option. All new contributions must be made under both the MIT and Apache 2.0 licenses. See LICENSE MIT , LICENSE APACHE , COPYRIGHT , and NOTICE for details. SPDX License Identifier: (Apache 2.0 OR MIT) LLNL CODE 647188",Unknown,Unknown 264,Unknown,Unknown,Unknown,"Django PayPal .. image:: :target: :alt: Build Status .. image:: :target: :alt: Latest PyPI version Django PayPal is a pluggable application that integrates with PayPal Payments Standard and Payments Pro. See for documentation. django paypal supports: Django 1.11+ Python 2.7, and 3.4+ (Not all combinations are supported). Project status This is an Open Source project that is active but in maintenance mode . The maintainers see their primary responsibilities as: fixing any critical data loss or security bugs. keeping the project up to date with new versions of Django (or other dependencies). merging well written patches from the community, and doing so promptly. Large scale development work and feature additions are not planned by the maintainers. Some important parts of the code base are not covered by automated tests, and may be broken for some versions of Django or Python. These parts of the code base currently issue warnings, and the maintainers are waiting for tests to be contributed by those who actually need those parts, and docs where appropriate. Please bear these things in mind if filing an issue. If you discover a bug, unless it is a critical data loss or security bug, the maintainers are unlikely to work for free to fix it, and a new feature, or tests for existing functionality, will only be added by the maintainers if they need it themselves. That said, if you do have large changes that you want to contribute, including large new features (such as implementing newer PayPal payment methods), they will be gladly accepted if they are implemented well. Please see CONTRIBUTING.rst _ for more information about using the issue tracker and pull requests. Paid support Some of the maintainers are able to provide support on a paid basis for this Open Source project. This includes the following kinds of things: Paying for bug fixes or new features (with the understanding that these changes will become freely available as part of the project and are not 'owned' by the person who paid for them). Debugging or other support for integrating django paypal into your project. Implementing the integration for you from scratch. If you are interested in these, you can contact the follower developers: Luke Plant homepage _, email _ long time Django expert and contributor.",Unknown,Unknown 265,Unknown,Unknown,Unknown,"sshuttle: where transparent proxy meets VPN meets ssh As far as I know, sshuttle is the only program that solves the following common case: Your client machine (or router) is Linux, FreeBSD, or MacOS. You have access to a remote network via ssh. You don't necessarily have admin access on the remote network. The remote network has no VPN, or only stupid/complex VPN protocols (IPsec, PPTP, etc). Or maybe you are the admin and you just got frustrated with the awful state of VPN tools. You don't want to create an ssh port forward for every single host/port on the remote network. You hate openssh's port forwarding because it's randomly slow and/or stupid. You can't use openssh's PermitTunnel feature because it's disabled by default on openssh servers; plus it does TCP over TCP, which has terrible performance (see below). Obtaining sshuttle Debian stretch or later:: apt get install sshuttle Arch Linux:: pacman sync sshuttle Fedora:: dnf install sshuttle From PyPI:: sudo pip install sshuttle Clone:: git clone cd sshuttle sudo ./setup.py install It is also possible to install into a virtualenv as a non root user. From PyPI:: virtualenv p python3 /tmp/sshuttle . /tmp/sshuttle/bin/activate pip install sshuttle Clone:: virtualenv p python3 /tmp/sshuttle . /tmp/sshuttle/bin/activate git clone cd sshuttle ./setup.py install Homebrew:: brew install sshuttle Documentation The documentation for the stable version is available at: The documentation for the latest development version is available at:",Unknown,Unknown 266,Unknown,Unknown,Unknown,"django skd smoke .. image:: :target: .. image:: :target: .. image:: :target: .. image:: :target: .. image:: :target: This package is intended for simplification of smoke tests creation. .. contents:: Installation You can get django skd smoke by using pip:: $ pip install django skd smoke Usage After installation you should create new TestCase derived from skd_smoke.SmokeTestCase and define your smoke tests configuration. Please review Examples _ section which demonstrates different usecases. They are related to example_project directory which contains common django project. Configuration TESTS_CONFIGURATION of your TestCase should contain tuple/list of tuples for every request with the next structure: .. code block:: python (url, status, method, {'comment': None, 'initialize': None, 'url_kwargs': None, 'request_data': None, 'user_credentials': None, 'redirect_to': None}) .. list table:: :widths: 15 80 5 :header rows: 1 Parameter Description Required url plain url or urlname as string Yes status expected status code (200, 404, etc.) as int Yes method request method (GET, POST, etc.) as string Yes comment string which is added to __doc__ of generated test method No initialize callable object to do any required initialization No url_args list or callable object which returns args list to resolve url using django.shortcuts.resolve_url No url_kwargs dict or callable object which returns kwargs dict to resolve url using django.shortcuts.resolve_url No request_data dict or callable object which returns dict to pass it into method request No user_credentials dict or callable object which returns dict to login user using django.test.TestCase.client.login No redirect_to plain url as string which is checked if only status is one of the next: 301, 302, 303, 307 No NOTE! All callables take your TestCase as the first argument so you can use it to transfer state between them. But take into account that order of callbacks usage is next: . initialize . url_kwargs . user_credentials . request_data Examples All examples are taken from example_project package and can be run after repository cloning. \1. Demonstration of simple requests: 1. GET 200 2. GET 200 with request_data as dict 3. POST 200 4. POST 302 with request_data as callable 5. GET 302 (unauthorized access) 6. GET 200 (authorized access) 7. POST 405 (method not allowed) .. code block:: python from django.contrib.auth import get_user_model from skd_smoke import SmokeTestCase def get_article_data(testcase): return {'headline': 'new article'} def get_user_credentials(testcase): username 'test_user' password '1234' credentials {'username': username, 'password': password} User get_user_model() new_user User.objects.create(username username) new_user.set_password(password) new_user.save() testcase.user new_user return credentials class SimpleSmokeTestCase(SmokeTestCase): TESTS_CONFIGURATION ( ('home', 200, 'GET',), 1 ('home', 200, 'GET', {'request_data': {'scrollTop': 1}}), 2 ('articles:create', 200, 'POST',), 3 ('articles:create', 302, 'POST', {'request_data': get_article_data}), 4 ('is_authenticated', 302, 'GET',), 5 ('is_authenticated', 200, 'GET', {'user_credentials': get_user_credentials}), 6 ('/only_post_request/', 405, 'GET',), 7 ) 2. Usage of initialize callback to create several objects to test objects list. Suppose you want to make smoke test for articles list page but initially your test db does not contain any. You can use initialize callback here to create several articles. .. code block:: python from skd_smoke import SmokeTestCase from articles.models import Article def create_articles(testcase): for i in range(3): Article.objects.create(headline 'article %s' % i) class ArticlesListSmokeTestCase(SmokeTestCase): TESTS_CONFIGURATION ( ('articles:articles', 200, 'GET', {'initialize': create_articles} pass your func here ), ) 3. Usage of redirect_to setting to test anonymous access of login required pages. .. code block:: python from django.core.urlresolvers import reverse from skd_smoke import SmokeTestCase class RedirectToSmokeTestCase(SmokeTestCase): TESTS_CONFIGURATION ( ('is_authenticated', 302, 'GET', { 'redirect_to': '%s?next %s' % (reverse('login'), reverse('is_authenticated')), 'comment': 'Anonymous profile access with check of redirect url' }), ) 4. Usage of url_kwargs and user_credentials callbacks to test authorized access of owner to newly created object. Suppose you have a model Article which unpublished version can be viewed by its owner only. You can test this situation by creating of user in url_kwargs callback and transfering user to user_credentials callback. Unfortunately, you cannot get password from user model cause it contains hashed password. So you should return password as plain text. Lets smoke test two other situations when 404 page is showed. Finally we have three testcases: i. Anonymous access should show 404 page. ii. Some ordinary user access should also show 404 page. iii. Only owner access returns actual article with status 200. .. code block:: python from django.contrib.auth import get_user_model from skd_smoke import SmokeTestCase from articles.models import Article def create_user(): UserModel get_user_model() new_user UserModel.objects.create(username 'test_user') new_user.set_password('1234') new_user.save() return new_user def create_unpublished_article(commit True): article Article(headline 'unpublished', published False) if commit: article.save() return article def create_article_without_owner(testcase): return {'pk': create_unpublished_article().pk} def create_and_return_user_credentials(testcase): user create_user() return { 'username': user.username, 'password': '1234' User contains hashed password only so we should return it as plain text } def create_article_with_its_owner(testcase): owner create_user() testcase.owner owner unpublished create_unpublished_article(commit False) unpublished.owner owner unpublished.save() return {'pk': unpublished.pk} def get_owner_credentials(testcase): return { 'username': testcase.owner.username, 'password': '1234' User contains hashed password only } class UnpublishedArticleSmokeTestCase(SmokeTestCase): TESTS_CONFIGURATION ( ('articles:article', 404, 'GET', {'url_kwargs': create_article_without_owner, 'comment': 'Anonymous access to unpublished article.'}), 1 ('articles:article', 404, 'GET', {'url_kwargs': create_article_without_owner, 'user_credentials': create_and_return_user_credentials, 'comment': 'Some user access to unpublished article.'}), 2 ('articles:article', 200, 'GET', {'url_kwargs': create_article_with_its_owner, 'user_credentials': get_owner_credentials, 'comment': 'Owner access to unpublished article.'}), 3 ) License MIT",Unknown,Unknown 267,Unknown,Unknown,Unknown,"Redqueue is a light weight queue server that speaks memcache protocol and provides persistent queue based on log. It is writen in python using the high performance tornado frameworks, currently all the functions are in a single file which contains only less than 300 lines of effective code. Redqueue is free and unencumbered public domain software. Install and Run Install tornado and (optional) python memcached for client testing Get the source from git@github.com:superisaac/redqueue.git Install % python setup.py install Make the log dir % mkdir p log Run the server % redqueue_server.py For more options please run % redqueue_server.py help Reserve/delete mode Reserve/delete mode is currently the sole mode, once an item is fetched, a delete request must be send later to mark the item is used, or else the item will be recycled back later. >>> mc.set('abc', '123') >>> v mc.get('abc') >>> if v is not None: >>> mc.delete('abc')",Unknown,Unknown 268,Unknown,Unknown,Unknown,"pyresttest Table of Contents What Is It? ( what is it) Status ( status) Installation ( installation) Sample Test ( sample test) Examples ( examples) Installation ( installation) How Do I Use It? ( how do i use it) Running A Simple Test ( running a simple test) Using JSON Validation ( using json validation) Interactive Mode ( interactive mode) Verbose Output ( verbose output) Other Goodies ( other goodies) Basic Test Set Syntax ( basic test syntax) Import example ( import example) Url Test ( url test with timeout) Custom HTTP Options (special curl settings) ( custom Syntax Limitations ( syntax limitations) Benchmarking? ( benchmarking) Metrics ( metrics) Benchmark report formats: ( benchmark report formats) RPM based installation ( rpm based installation) Project Policies ( project policies) FAQ ( faq) Feedback and Contributions ( feedback and contributions) What Is It? A REST testing and API microbenchmarking tool Tests are defined in basic YAML or JSON config files, no code needed Minimal dependencies (pycurl, pyyaml, optionally future), making it easy to deploy on server for smoketests/healthchecks Supports generate/extract/validate (advanced_guide.md) mechanisms to create full test scenarios Returns exit codes on failure, to slot into automated configuration management/orchestration tools (also supplies parseable logs) Logic is written and extensible (extensions.md) in Python Status NEW: Full Python 3 Support in Alpha download it, 'pip install future' and give it a try! Apache License, Version 2.0 ! Status Badge PyPI version PyPI () Join the chat at Changelog (CHANGELOG.md) shows the past and present, milestones show the future roadmap. The changelog will also show features/fixes currently merged to the master branch but not released to PyPi yet (pending installation tests across platforms). Installation PyRestTest works on Linux or Mac with Python 2.6, 2.7, or 3.3+ (with module 'future' installed) First we need to install package python pycurl: Ubuntu/Debian: (sudo) apt get install python pycurl CentOS/RHEL: (sudo) yum install python pycurl Alpine: (sudo) apk add curl dev Mac: don't worry about it Other platforms: unsupported. You may get it to work by installing pycurl & pyyaml manually. Also include 'future' for Python 3. No guarantees though. This is needed because the pycurl dependency may fail to install by pip. In very rare cases you may need to intall python pyyaml if pip cannot install it correctly. It is easy to install the latest release by pip: (sudo) pip install pyresttest (also install 'future' if on Python 3) If pip isn't installed, we'll want to install it first: If that is not installed, we'll need to install it first: Ubuntu/Debian: (sudo) apt get install python pip CentOS/RHEL: (sudo) yum install python pip Mac OS X with homebrew: brew install python (it's included) Or with just python installed: wget && sudo python get pip.py Releases occur every few months, if you want to use unreleased features, it's easy to install from source: See the Change Log (CHANGELOG.md) for feature status. shell git clone cd pyresttest sudo python setup.py install The master branch tracks the latest; it is unit tested, but less stable than the releases (the 'stable' branch tracks tested releases). Troubleshooting Installation Almost all installation issues are due to problems with PyCurl and PyCurl's native libcurl bindings. It is easy to check if PyCurl is installed correctly: python c 'import pycurl' If this returns correctly, pycurl is installed, if you see an ImportError or similar, it isn't. You may also verify the pyyaml installation as well, since that can fail to install by pip in rare circumstances. Error installing by pip __main__.ConfigurationError: Could not run curl config: Errno 2 No such file or directory This is caused by libcurl not being installed or recognized: first install pycurl using native packages as above. Alternately, try installing just the libcurl libraries: On Ubuntu/Debian: sudo apt get install libcurl4 openssl dev On CentOS/RHEL: yum install libcurl devel VirtualEnv installation PyCurl should install by pip, but sometimes has issues with pycurl/libcurl. Manually copying in a working system pycurl installation may help: cp /usr/lib/python2.7/dist packages/pycurl env/local/lib/python2.7/site packages/ Sample Test This will check that APIs accept operations, and will smoketest an application yaml config: testset: Basic tests timeout: 100 Increase timeout from the default 10 seconds test: name: Basic get url: /api/person/ test: name: Get single person url: /api/person/1/ test: name: Delete a single person, verify that works url: /api/person/1/ method: 'DELETE' test: create entity by PUT name: Create/update person url: /api/person/1/ method: PUT body: '{ first_name : Gaius , id : 1, last_name : Baltar , login : gbaltar }' headers: {'Content Type': 'application/json'} validators: This is how we do more complex testing! compare: {header: content type, comparator: contains, expected:'json'} compare: {jsonpath_mini: 'login', expected: 'gbaltar'} JSON extraction compare: {raw_body: , comparator:contains, expected: 'Baltar' } Tests on raw response test: create entity by POST name: Create person url: /api/person/ method: POST body: '{ first_name : William , last_name : Adama , login : theadmiral }' headers: {Content Type: application/json} Examples The Quickstart (quickstart.md) should be everyone's starting point Here's a really good example (examples/miniapp extract validate.yaml) for how to create a user and then do tests on it. This shows how to use extraction from responses, templating, and different test types If you're trying to do something fancy, take a look at the content test.yaml (pyresttest/content test.yaml). This shows most kinds of templating & variable uses. It shows how to read from file, using a variable in the file path, and templating on its content! PyRestTest isn't limited to JSON; there's an example for submitting form data There's a whole folder of example tests to help get started How Do I Use It? The Quickstart (quickstart.md) walks through common use cases Benchmarking has its own section ( benchmarking) below Advanced features have separate documentation (advanced_guide.md) (templating, generators, content extraction, complex validation). How to extend PyRestTest (extensions.md) is its own document There are a ton of examples @BastienAr has created an Atom editor package for PyRestTest development (thank you!) Running A Simple Test Run a basic test of the github API: shell pyresttest examples/github_api_smoketest.yaml Using JSON Validation A simple set of tests that show how json validation can be used to check contents of a response. Test includes both successful and unsuccessful validation using github API. shell pyresttest examples/github_api_test.yaml (For help: pyresttest help ) Interactive Mode Same as the other test but running in interactive mode. shell pyresttest examples/github_api_test.yaml interactive true print bodies true Verbose Output shell pyresttest examples/github_api_test.yaml log debug Other Goodies Simple templating of HTTP request bodies, URLs, and validators, with user variables Generators to create dummy data for testing, with support for easily writing your own Sequential tests: extract info from one test to use in the next Import test sets in other test sets, to compose suites of tests easily Easy benchmarking: convert any test to a benchmark, by changing the element type and setting output options if needed Lightweight benchmarking: 0.3 ms of overhead per request, and plans to reduce that in the future Accurate benchmarking: network measurements come from native code in LibCurl, so test overhead doesn't alter them Optional interactive mode for debugging and demos Basic Test Set Syntax As you can see, tests are defined in YAML format. There are 5 top level test syntax elements: url: a simple test, fetches given url via GET request and checks for good response code test : a fully defined test (see below) benchmark : a fully defined benchmark (see below) config or configuration : overall test configuration (timeout is the most common option) import : import another test set file so you Don't Repeat Yourself Import example yaml Will load the test sets from miniapp test.yaml and run them Note that this will run AFTER the current test set is executed Also note that imported tests get a new Context: any variables defined will be lost between test sets import: examples/miniapp test.yaml Imports are intended to let you create top level test suites that run many independent, isolated test scenarios (test sets). They may also be used to create sample data or perform cleanup as long as you don't rely on variables to store this information. For example, if one testset creates a user for a set of scenarios, tests that rely on that user's ID need to start by querying the API to get the ID. Url Test With Timeout A simple URL test is equivalent to a basic GET test with that URL. Also shows how to use the timeout option in testset config to descrease the default timeout from 10 seconds to 1. yaml config: testset: Basic tests timeout: 1 url: /api/person/ This is a simple test test: url: /api/person/ This does the same thing Custom HTTP Options (special curl settings) For advanced cases (example: SSL client certs), sometimes you will want to use custom Curl settings that don't have a corresponding option in PyRestTest. This is easy to do: for each test, you can specify custom Curl arguments with 'curl_option_optionname.' For this, 'optionname' is case insensitive and the optionname is a Curl Easy Option with 'CURLOPT_' removed. For example, to follow redirects up to 5 times (CURLOPT_FOLLOWLOCATION and CURLOPT_MAXREDIRS): yaml test: url: /api/person/1 curl_option_followlocation: True curl_option_maxredirs: 5 Note that while option names are validated, no validation is done on their values. Syntax Limitations Whenever possible, the YAML configuration handler tries to convert variable types as needed. We're all responsible adults, don't do anything crazy and it will play nicely. Only a handful of elements can use dynamic variables (URLs, headers, request bodies, validators) there are plans to change this in the next few releases. The templating is quite limited (it's doing simple string subsitution). There are plans to improve this in the next few releases, but it isn't there yet. One caveat: if you define the same element (example, URL) twice in the same enclosing element, the last value will be used. In order to preserve sanity, I use last value wins. No support for for each on requests/responses natively this can be done via custom extensions, and may be available in the distant future but it's a while out. Benchmarking? Oh, yes please! PyRestTest allows you to collect low level network performance metrics from Curl itself. Benchmarks are based off of tests: they extend the configuration elements in a test, allowing you to configure the REST call similarly. However, they do not perform validation on the HTTP response, instead they collect metrics. There are a few custom configuration options specific to benchmarks: warmup_runs : (default 10 if unspecified) run the benchmark calls this many times before starting to collect data, to allow for JVM warmup, caching, etc benchmark_runs : (default 100 if unspecified) run the benchmark this many times to collect data output_file : (default is None) file name to write benchmark output to, will get overwritten with each run, if none given, will write to terminal only output_format : (default CSV if unspecified) format to write the results in ('json' or 'csv'). More on this below. metrics : which metrics to gather (explained below), MUST be specified or benchmark will do nothing Metrics There are two ways to collect performance metrics: raw data, and aggregated stats. Each metric may yield raw data, plus one or more aggregate values. Raw Data : returns an array of values, one for each benchmark run Aggregates : runs a reduction function to return a single value over the entire benchmark run (median, average, etc) To return raw data, in the 'metrics' configuration element, simply input the metric name in a list of values. The example below will return raw data for total time and size of download (101 values each). benchmark: create entity name: Basic get url: /api/person/ warmup_runs: 7 'benchmark_runs': '101' output_file: 'miniapp benchmark.csv' metrics: total_time size_download Aggregates are pretty straightforward: mean or mean_arithmetic : arithmetic mean of data (normal 'average') mean_harmonic : harmonic mean of data (useful for rates) median : median, the value in the middle of sorted result set std_deviation : standard deviation of values, useful for measuring how consistent they are total or sum : total up the values given Currently supported metrics are listed below, and these are a subset of Curl get_info variables. These variables are explained here (with the CURLINFO_ prefix removed): curl_easy_get_info documentation Metrics: 'appconnect_time', 'connect_time', 'namelookup_time', 'num_connects', 'pretransfer_time', 'redirect_count', 'redirect_time', 'request_size', 'size_download', 'size_upload', 'speed_download', 'speed_upload', 'starttransfer_time', 'total_time' Benchmark report formats: CSV is the default report format. CSV ouput will include: Benchmark name Benchmark group Benchmark failure count (raw HTTP failures) Raw data arrays, as a table, with headers being the metric name, sorted alphabetically Aggregates: a table of results in the format of (metricname, aggregate_name, result) In JSON, the data is structured slightly differently: { failures : 0, aggregates : metric_name , aggregate , aggregateValue ... , failures : failureCount, group : Default , results : { total_time : value1, value2, etc , metric2 : value1, value2, etc , ... } } Samples: config: testset: Benchmark tests using test app benchmark: create entity name: Basic get url: /api/person/ warmup_runs: 7 'benchmark_runs': '101' output_file: 'miniapp benchmark.csv' metrics: total_time total_time: mean total_time: median size_download speed_download: median benchmark: create entity name: Get single person url: /api/person/1/ metrics: {speed_upload: median, speed_download: median, redirect_time: mean} output_format: json output_file: 'miniapp single.json' RPM based installation Pure RPM based install? It's easy to build and install from RPM: Building the RPM: shell python setup.py bdist_rpm Build RPM find iname ' .rpm' Gets the RPM name Installing from RPM shell sudo yum localinstall my_rpm_name sudo yum install PyYAML python pycurl If using python3, needs 'future' too You need to install PyYAML & PyCurl manually because Python distutils can't translate python dependencies to RPM packages. Gotcha: Python distutils add a dependency on your major python version. This means you can't build an RPM for a system with Python 2.6 on a Python 2.7 system. Building an RPM for RHEL 6/CentOS 6 You'll need to install rpm build, and then it should work. shell sudo yum install rpm build Project Policies PyRestTest uses the Github flow The master branch is an integration branch for mature features Releases are cut periodically from master (every 3 6 months generally, or more often if breaking bugs are present) and released to PyPi Feature development is done in feature branches and merged to master by PR when tested (validated by continuous integration in Jenkins) The 'stable' branch tracks the last release, use this if you want to run PyRestTest from source The changelog is here (CHANGELOG.md), this will show past releases and features merged to master for the next release but not released Testing: tested on Ubuntu 14/python 2.7 and CentOS 6/python 6.6, plus Debian Wheezy for Python 3.4.3 Releases occur every few months to PyPi once a few features are ready to go PyRestTest uses Semantic Versioning 2.0 Back compatibility is important! PyRestTest makes a strong effort to maintain command line and YAML format back compatibility since 1.0. Extension method signatures (extensions.md) are maintained as much as possible. However, internal python implementations are subject to change. Major version releases (1.x to 2.x, etc) may introduce breaking API changes, but only with a really darned good reason, and only there's not another way. Feedback and Contributions We welcome any feedback you have, including pull requests, reported issues, etc! For new contributors there are a whole set of issues labelled with help wanted which are excellent starting points to offer a contribution! For instructions on how to set up a dev environment for PyRestTest, see building.md (building.md). For pull requests to get easily merged, please: Include unit tests (and functional tests, as appropriate) and verify that run_tests.sh passes Include documentation as appropriate Attempt to adhere to PEP8 style guidelines and project style Bear in mind that this is largely a one man, outside of working hours effort at the moment, so response times will vary. That said: every feature request gets heard, and even if it takes a while, all the reasonable features will get incorporated. If you fork the main repo, check back periodically... you may discover that the next release includes something to meet your needs and then some! FAQ Why not pure python tests? This is written for an environment where Python is not the sole or primary language You totally can do pure Python tests if you want! Extensions (extensions.md) provide a stable API for adding more complex functionality in python All modules can be imported and used as libraries Gotcha: the project is still young, so internal implementation may change often, much more than YAML features Why YAML and not XML/JSON? XML is extremely verbose and has many gotchas for parsing You CAN use JSON for tests , it's a subset of YAML. See miniapp test.json (examples/miniapp test.json) for an example. YAML tends to be the most concise, natural, and easy to write of these three options Does it do load tests? No, this is a separate niche and there are already many excellent tools to fill it Adding load testing features would greatly increase complexity But some form might come eventually! Why do you use PyCurl and not requests? Maybe eventually. PyRestTest needs the low level features of PyCurl for benchmarking, and benefits from its performance. However we may eventually abstract some of the core testing features away to allow for pure python execution",Unknown,Unknown 269,Unknown,Unknown,Unknown,"SymPy pypi version Build status Gitter Badge Zenodo Badge .. pypi version image:: :target: .. Build status image:: :target: .. Gitter Badge image:: :alt: Join the chat at :target: .. Zenodo Badge image:: :target: A Python library for symbolic mathematics. See the AUTHORS file for the list of authors. And many more people helped on the SymPy mailing list, reported bugs, helped organize SymPy's participation in the Google Summer of Code, the Google Highly Open Participation Contest, Google Code In, wrote and blogged about SymPy... License: New BSD License (see the LICENSE file for details) covers all files in the sympy repository unless stated otherwise. Our mailing list is at We have community chat at Gitter _. Feel free to ask us anything there. We have a very welcoming and helpful community. Download The recommended installation method is through Anaconda, You can also get the latest version of SymPy from To get the git version do :: $ git clone git://github.com/sympy/sympy.git For other options (tarballs, debs, etc.), see Documentation and usage Everything is at: You can generate everything at the above site in your local copy of SymPy by:: $ cd doc $ make html Then the docs will be in _build/html . If you don't want to read that, here is a short usage: From this directory, start python and:: >>> from sympy import Symbol, cos >>> x Symbol('x') >>> e 1/cos(x) >>> print e.series(x, 0, 10) 1 + x 2/2 + 5 x 4/24 + 61 x 6/720 + 277 x 8/8064 + O(x 10) SymPy also comes with a console that is a simple wrapper around the classic python console (or IPython when available) that loads the sympy namespace and executes some common commands for you. To start it, issue:: $ bin/isympy from this directory, if SymPy is not installed or simply:: $ isympy if SymPy is installed. Installation SymPy has a hard dependency on the mpmath _ library (version > 0.19). You should install it first, please refer to the mpmath installation guide: To install SymPy itself, then simply run:: $ python setup.py install If you install it system wide, you may need to prefix the previous command with sudo :: $ sudo python setup.py install See for more information. Contributing We welcome contributions from anyone, even if you are new to open source. Please read our introduction to contributing _. If you are new and looking for some way to contribute a good place to start is to look at the issues tagged Easy to Fix _. Please note that all participants of this project are expected to follow our Code of Conduct. By participating in this project you agree to abide by its terms. See CODE_OF_CONDUCT.md _. Tests To execute all tests, run:: $./setup.py test in the current directory. For more fine grained running of tests or doctest, use bin/test or respectively bin/doctest . The master branch is automatically tested by Travis CI. To test pull requests, use sympy bot _. Regenerate Experimental \LaTeX Parser/Lexer The parser and lexer generated with the ANTLR4 _ toolchain in sympy/parsing/latex/_antlr and checked into the repo. Presently, most users should not need to regenerate these files, but if you plan to work on this feature, you will need the antlr4 command line tool available. One way to get it is:: $ conda install c conda forge antlr 4.7 After making changes to sympy/parsing/latex/LaTeX.g4 , run:: $ ./setup.py antlr Clean To clean everything (thus getting the same tree as in the repository):: $ ./setup.py clean You can also clean things with git using:: $ git clean Xdf which will clear everything ignored by .gitignore , and:: $ git clean df to clear all untracked files. You can revert the most recent changes in git with:: $ git reset hard WARNING: The above commands will all clear changes you may have made, and you will lose them forever. Be sure to check things with git status , git diff , git clean Xn and git clean n before doing any of those. Bugs Our issue tracker is at Please report any bugs that you find. Or, even better, fork the repository on GitHub and create a pull request. We welcome all changes, big or small, and we will help you make the pull request if you are new to git (just ask on our mailing list or Gitter). Brief History SymPy was started by Ondřej Čertík in 2005, he wrote some code during the summer, then he wrote some more code during summer 2006. In February 2007, Fabian Pedregosa joined the project and helped fixed many things, contributed documentation and made it alive again. 5 students (Mateusz Paprocki, Brian Jorgensen, Jason Gedge, Robert Schwarz, and Chris Wu) improved SymPy incredibly during summer 2007 as part of the Google Summer of Code. Pearu Peterson joined the development during the summer 2007 and he has made SymPy much more competitive by rewriting the core from scratch, that has made it from 10x to 100x faster. Jurjen N.E. Bos has contributed pretty printing and other patches. Fredrik Johansson has written mpmath and contributed a lot of patches. SymPy has participated in every Google Summer of Code since 2007. You can see for full details. Each year has improved SymPy by bounds. Most of SymPy's development has come from Google Summer of Code students. In 2011, Ondřej Čertík stepped down as lead developer, with Aaron Meurer, who also started as a Google Summer of Code student, taking his place. Ondřej Čertík is still active in the community but is too busy with work and family to play a lead development role. Since then, a lot more people have joined the development and some people have also left. You can see the full list in doc/src/aboutus.rst, or online at: The git history goes back to 2007 when development moved from svn to hg. To see the history before that point, look at You can use git to see the biggest developers. The command:: $ git shortlog ns will show each developer, sorted by commits to the project. The command:: $ git shortlog ns since 1 year will show the top developers from the last year. Citation To cite SymPy in publications use Meurer A, Smith CP, Paprocki M, Čertík O, Kirpichev SB, Rocklin M, Kumar A, Ivanov S, Moore JK, Singh S, Rathnayake T, Vig S, Granger BE, Muller RP, Bonazzi F, Gupta H, Vats S, Johansson F, Pedregosa F, Curry MJ, Terrel AR, Roučka Š, Saboo A, Fernando I, Kulal S, Cimrman R, Scopatz A. (2017) SymPy: symbolic computing in Python. PeerJ Computer Science 3:e103 A BibTeX entry for LaTeX users is .. code block:: none @article{10.7717/peerj cs.103, title {SymPy: symbolic computing in Python}, author {Meurer, Aaron and Smith, Christopher P. and Paprocki, Mateusz and \v{C}ert\'{i}k, Ond\v{r}ej and Kirpichev, Sergey B. and Rocklin, Matthew and Kumar, Amit and Ivanov, Sergiu and Moore, Jason K. and Singh, Sartaj and Rathnayake, Thilina and Vig, Sean and Granger, Brian E. and Muller, Richard P. and Bonazzi, Francesco and Gupta, Harsh and Vats, Shivam and Johansson, Fredrik and Pedregosa, Fabian and Curry, Matthew J. and Terrel, Andy R. and Rou\v{c}ka, \v{S}t\v{e}p\'{a}n and Saboo, Ashutosh and Fernando, Isuru and Kulal, Sumith and Cimrman, Robert and Scopatz, Anthony}, year 2017, month jan, keywords {Python, Computer algebra system, Symbolics}, abstract { SymPy is an open source computer algebra system written in pure Python. It is built with a focus on extensibility and ease of use, through both interactive and programmatic applications. These characteristics have led SymPy to become a popular symbolic library for the scientific Python ecosystem. This paper presents the architecture of SymPy, a description of its features, and a discussion of select submodules. The supplementary material provides additional examples and further outline details of the architecture and features of SymPy. }, volume 3, pages {e103}, journal {PeerJ Computer Science}, issn {2376 5992}, url { doi {10.7717/peerj cs.103} } SymPy is BSD licensed, so you are free to use it whatever you like, be it academic, commercial, creating forks or derivatives, as long as you copy the BSD statement if you redistribute it (see the LICENSE file for details). That said, although not required by the SymPy license, if it is convenient for you, please cite SymPy when using it in your work and also consider contributing all your changes back, so that we can incorporate it and all of us will benefit in the end.",Unknown,Unknown 270,Unknown,Unknown,Unknown,"Django Config An architecture for maintaining multiple settings files in Django Overview django config is an easy way to maintain multiple configurations for django. It relies on the concept of having a shared configuration file (base) and a per user/ server custom configuration file (dev1/ dev2/ local/ staging). settings.py combines the base & custom configuration and loads it up. Installation 1. Include the djangoconfig application in your django application set. 2. Create a directory named 'config' at the root directory of your project. 'config' directory will contain your global settings file: 'base.py' & all custom configuration file e.g. 'local.py'. 3. Overwrite 'manage.py' & 'settings.py' with the files supplied with django config Usage 1. Add new settings to your custom settings file. 2. Override base settings in custom settings file. 3. Whenever you run manage.py, select your configuration identifier More The primary repository for Django Config is located at: _ Django Config was created by Nowell Strite. Extended and maintained by Tareque Hossain",Unknown,Unknown 271,Unknown,Unknown,Unknown,"teambot Build Status A bot to create per channel notification lists in Slack. At Square, we have a lot of teams that make heavy use of slack. Each team owns a few different services, so people might hang around in the team's channel to get updates or ask questions. However, this makes it difficult to use @channel or @here to contact just your team. By setting up teambot as a bot whose name is team , you can notify your team with @team. Teambot is my personal project. We have been using it at Square for over a year. At last count, 90 different team channels are using teambot. Despite its simplicity, it has only broken once during a slack outage. I tend to just forget it's running. Setting up the bot Teambot requires Python and virtualenv. 1. Install the bot and its dependencies: shell $ git clone $ cd teambot $ pip env $ . env/bin/activate $ pip install r requirements.txt 2. Obtain a token for your slack bot (see the Slack documentation ). I recommend naming it team . 3. You can configure the bot in one of two ways: i. Specify configuration in the environment. At a minimum, you'll need to set the env var SLACK_TOKEN . See the section below for more details. ii. Provide a configuration file with the slack token. If the file rtmbot.conf is present in the working directory, teambot will use that. Or you can specify a different configuration file with the config option. One simple way is to cp rtmbot.conf.example rtmbot.conf , then edit rtmbot.conf and replace with the bot's token from your Slack dashboard. 4. That's it. Start the bot with python rtmbot.py . NOTE : Teambot will store its directory in a file called teams.db in the working directory from which it was run. You can use a different path by providing it in TEAM_DB_FILE . More on configuration Teambot takes the following configuration options, either from the environment or from a configuration file. The configuration file is in YAML format, although for historical reasons its default name is rtmbot.conf . SLACK_TOKEN (required) The token used to connect to your Slack team. DAEMON (optional) When true, teambot will run as a daemon in the background. DEBUG (optional) When true, teambot will log more debugging information about what it's doing. LOGFILE (optional) When logging debug details, teambot will log to this file if provided TEAM_DB_FILE (optional) Teambot will persist its directory in this file. If absent, it will use teams.db in the working directory. If both the file and environment variables are present, the environment variables take precedence. Boolean flags DAEMON and DEBUG will should be set to the string true to enable them. Setting up a team Team management is simple, and anyone can do it by direct messaging @team in slack. The bot has a built in help, but here's a quick start guide. Let's imagine that you want to create a team for the xp channel with a bunch of people. Just run these commands: /join xp /invite @team /msg @team create xp @jackson @amberdixon @tp @bhartard @dan @killpack @glenn @scottsilver @jess @tdeck @barlow Then Curtis, a new intern, joins your XP team. You can either ask him to run /msg @team join xp , or run /msg @team add xp @curtisf . Similarly, if Jess gets tired of being notified about every deploy, she can /msg @team leave xp , or you can take her off the list with /msg @team remove xp @jess . All the commands info gets the team list for a channel: info xp create creates a new team for a channel and optionally adds people to the list: create xp @killpack @amberdixon @tp @dneighman add adds one or more people to a team list add xp @tdeck remove removes one or more people from a team list remove xp @dneighman join adds you to a channel's team join xp leave removes you from a channel's team leave xp drop deletes the team list for a channel drop foundation server don't drop teams you weren't on",Unknown,Unknown 272,Unknown,Unknown,Unknown,"Introduction Taba is a service for aggregating instrumentation events from large distributed systems in near real time. It was built to handle high throughput and scale easily. Check out an overview of Taba's architecture on the TellApart Eng Blog: Example Taba helps you instrument your services and provide a near real time view into what's happening across a large cluster. For example, you could use it to track the winning bid price for a certain type of bid: from taba import client ... client.Counter('bids_won', 1) client.Counter('winning_bid_price', wincpm) When those Events reach the Taba Server and are aggregated, they produce an output like the following: $ taba cli agg winning_bid_price winning_bid_price: { 1m : { count : 436, total : 571.64}, 10m : { count : 5285, total : 6884.57}, 1h : { count : 34265, total : 44175.47}, 1d : { count : 569787, total : 744423.87}, pct : 0.09, 0.47, 2.19, 3.55, 4.37, 14.09, 17.59 } There are many other input data types and aggregation methods available. See the Types and Handlers documentation. Overview A Taba deployment consists of 6 layers, each horizontally scalable. These layers are: Taba Client code integrated into applications Taba Agent process running locally to the application servers Taba Server processes in the frontend ('fe') role Taba Server processes in the backend ('be') role Redis sentinel processes Redis database processes The Taba Client is integrated into the application it is instrumenting, and exposes an API for recording events to different counter types. The Python distribution includes a default Client implementation based on threads, and a Gevent engine. There is also a Java Client available ( ) The Client typically sends events to a Taba Agent process running on the same server. While the Client will only forward events on a best effort basis, the Agent provides more robust buffering and failure recovery. It is also significantly smarter about load balancing. The will forward events to one of the Taba Server end points it has been configured to connect to. Any Server in the cluster can receive any set of events. The Taba Server processes are split into two groups: frontend ('fe') and backend ('be'). These roles are configured when the process starts. Assigning a process a 'fe' role has no effect on its operation it is simply a marker that the process will use to advertise its role. (The intention it to allow a load balancer to use that indicator to route traffic to just the 'fe' processes). Assigning the 'be' role will configure the process to launch a background worker that processes queued events. There must be at least one 'be' Server process in the cluster. A Server process can be assigned both 'fe' and 'be' roles. For small clusters, this will work well. However, once a cluster becomes large enough to require multiple Server processes, separating 'fe' and 'be' processes will perform better. There is a third role 'ctrl', which essentially marks a Server process as neither 'fe' nor 'be'. This is useful for maintaining a separate set of processes for querying. The Taba Server uses a group of Redis databases and Sentinels. Having at least one Sentinel is a requirement, as it is used for service discovery of the individual database processes. Sharding across the databases is accomplished by splitting the key space into a large number of virtual buckets, and assigning ranges of buckets to each process. Installing Taba Taba was designed to run on Python 2.6/2.7. It has the following Python package dependencies, which should be installed automatically: gevent (> 0.13.1) python cjson (> 1.0.5) cython (> 0.13) redis (> 2.9) requests (> 1.2.0) Additionally, building the Python dependencies requires the following. These dependencies are _not_ installed automatically: gcc make python dev libevent dev The latest stable release can be installed from PyPi: pip install taba Or Taba can be installed directly from the repository: git clone cd taba python setup.py install Installing Redis The Taba Server uses a group of Redis instances with Sentinels as its database. It requires at least Redis 2.8. For details about installing Redis, please visit the Redis Downloads page Deploying Taba There are many ways to deploy Taba, depending on the use case. See examples/EXAMPLES for pointers on how to get started. About Taba is a project at TellApart led by Kevin Ballard to create a reliable, high performance platform for real time monitoring. It is used to monitor over 30,000 Tabs, consuming nearly 10,000,000 Events per second, and an average latency of under 15s. Any questions or comments can be forwarded to taba@tellapart.com (mailto:taba@tellapart.com)",Unknown,Unknown 273,Unknown,Unknown,Unknown,"The dot Post In this repo you will find all the content from available in Markdown under a Creative Commons license! For more info on how to contribute, please read",Unknown,Unknown 274,Unknown,Unknown,Unknown,"augment Misc. python decorators. Installation pip install augment Examples Some specific examples are listed below. Tests contain more exmaples. class TestAugment(unittest.TestCase): def test_ensure_args(self): Define constrained function. @ensure_args(a (lambda x: x > 10, 'must be greater than 10'), b (lambda x: x 10, 'must be greater than 10'), b (lambda x: x 10, 'must be greater than 10'), b (lambda x: x 10, 'must be greater than 10'), b r'^? \d+(\.\d+)$', c lambda x: x 10, b lambda x: hasattr(x, '__getitem__'), c lambda x: x < 5) def foo(a, b, c 4, d None): pass Function/method hooks. Basic function hooks to run on entering, leaving or both ways. def login(root): print Logging in. def logout(root): print Logging out login will run before entering home . logout will run after exiting from home . root param passed to home will be passed to login , logout . @enter(login) @leave(logout) def home(root): print Home page. def tracer(): print tracing tracer will run both before entering home and after exiting home . @around(tracer) def inbox(): print Inbox Please note that the hooks( login and logout ) above are passed the arguments passed to the wrapped method( home ). Method hooks should be accepting the same arguments as wrapped method. They work the same on bound functions. class Foo: def login(self): print Logging in. def logout(self): print Logging out @leave(logout) @enter(login) def home(self, test None): print Home page. def tracer(self): print tracing @around(tracer) def inbox(self): print Inbox Dynamic delgation. class Foo: def __init__(self): self.a 'a' self.b 'b' self.c 'c' @delegate_to('foo_delegate', 'a', 'b') class Bar: def __init__(self): self.foo_delegate Foo() b Bar() a and b will be delegated to Foo . print b.a print b.b This will throw an AttributeError. print b.c",Unknown,Unknown 275,Unknown,Unknown,Unknown,"InstaPy Tooling that automates your social media interactions to “farm” Likes, Comments, and Followers on Instagram Implemented in Python using the Selenium module. Twitter of InstaPy Twitter of Tim Discord Channel How it works (Medium) Talk about automating your Instagram Talk about doing Open Source work Listen to the Talk Python to me Episode Newsletter: Sign Up for the Newsletter here! Offical Video Guide: Get it here! Table of contents How to install and run InstaPy ( installation) Installing InstaPy ( installation) Updating InstaPy ( updating instapy) Guides and tutorials ( guides) Video tutorials ( video tutorials) Written guides ( written guides) Externals and additionals tools ( external tools) Dashboard ( dashboard) Web Interface ( gui) Running InstaPy on Docker ( docker) Documentation of all Instapy's features ( documentation) Support ( support) Credits ( credits) Disclaimer ( disclaimer) Installation elm pip install instapy That's it! 🚀 If you're on Ubuntu, read the specific guide on Installing on Ubuntu (64 Bit) . If you're on a Raspberry Pi, read the Installing on RaspberryPi guide instead. __Important:__ depending on your system, make sure to use pip3 and python3 instead. Here is the easiest quickstart script you can use And here you can find lots of sophisticated quickstart templates shared by the community! You can put in your account details now by passing the username and password parameters to the InstaPy() function in your quickstart script, like so: python InstaPy(username abc , password 123 ) Or you can pass them using the Command Line Interface (CLI) ( pass arguments by cli). > If you've used _InstaPy_ before installing it by pip , you have to move your _old_ data to the new workspace folder for once. Read how to do this here (./DOCUMENTATION.md migrating your data to the workspace folder). To run InstaPy, you'll need to run the quickstart script you've just downloaded. elm python quickstart.py or python quickstart.py username abc password 123 InstaPy will now open a browser window and start working. > If want InstaPy to run in the background pass the headless browser option when running from the CLI Or add the headless_browser True parameter to the InstaPy(headless_browser True) constructor. Updating InstaPy elm pip install instapy U Guides Video tutorials: Official InstaPy Guide on Udemy Installation on Windows Installation on MacOS Installation on Linux Installation on DigitalOcean Server Written Guides: How to Ubuntu (64 Bit) How to RaspberryPi External Tools: InstaPy Dashboard > InstaPy Dashboard is an Open Source project developed by @converge to visualize Instagram accounts progress and real time InstaPy logs on the browser. InstaPy GUI > InstaPy GUI is a Graphical User Interface including some useful Analytics developed by @breuerfelix . Docker All information on how to use InstaPy with Docker can be found in the instapy docker repository. Documentation A list of all features of InstaPy can be found here (./DOCUMENTATION.md). Support Do you need help ? If you should encounter any issue, please first search for similar issues and only if you can't find any, create a new issue or use the discord channel for help. Do you want to support us ? Help build InstaPy! Check out this short guide on how to start contributing! . Credits Contributors This project exists thanks to all the people who contribute. Contribute . Backers Thank you to all our backers! 🙏 Become a backer Sponsors Support this project by becoming a sponsor. Your logo will show up here with a link to your website. Become a sponsor > Disclaimer : Please Note that this is a research project. I am by no means responsible for any usage of this tool. Use on your own behalf. I'm also not responsible if your accounts get banned due to extensive use of this tool.",Unknown,Unknown 276,Unknown,Unknown,Unknown,"MultiResponse A Python class for Django to provide mime type aware responses. This allows a client to receive different responses based on the HTTP Accept header they send. This is used in place of render_to_response or a manual HttpResponse . Requirements Python 2.5+ (lower versions may work but are untested.) Django 1.0+ (again, lower versions may work but are untested.) mimeparse 0.1.2+ Sample Usage: from django.conf import settings from django.shortcuts import render_to_response from multiresponse import MultiResponse def index(request, extension): sample_people {'name': 'Daniel', 'age': 26}, {'name': 'John', 'age': 26}, {'name': 'Jane', 'age': 20}, {'name': 'Bob', 'age': 35}, mr MultiResponse(request) mr.register('html', 'index.html') mr.register('xml', 'people.xml') mr.register('json', 'people.json') mr.register('txt', 'people.txt') return mr.render({ 'people': sample_people, }) Output A HTTP GET to with a web browser would yield something like: HTTP/1.0 200 OK Date: Tue, 02 Dec 2008 05:39:53 GMT Server: WSGIServer/0.1 Python/2.5.1 Content Type: text/html; charset utf 8 People People Daniel John Jane Bob However, a HTTP GET to via curl i H 'Accept: application/xml' would yield: HTTP/1.0 200 OK Date: Tue, 02 Dec 2008 05:42:14 GMT Server: WSGIServer/0.1 Python/2.5.1 Content Type: application/xml; charset utf 8 Daniel 26 John 26 Jane 20 Bob 35 And a HTTP GET to via Javascript might look like: HTTP/1.0 200 OK Date: Tue, 02 Dec 2008 05:42:47 GMT Server: WSGIServer/0.1 Python/2.5.1 Content Type: application/json; charset utf 8 { 'people': {'name': 'Daniel', 'age': '26'}, {'name': 'John', 'age': '26'}, {'name': 'Jane', 'age': '20'}, {'name': 'Bob', 'age': '35'}, }",Unknown,Unknown 277,Unknown,Unknown,Unknown,"Django Feature Flipper django feature flipper helps flip features of your Django site on and off, in the database or per request or session, using URL parameters. THE SOFTWARE IS ALPHA. THE API IS CHANGING AND THERE ARE NO UNIT TESTS. This will help you can deploy code and schema changes for upcoming features but hide the features from your users until you're ready. This practice is commonly used in continuous deployment. The term feature flipper seems to have come from Flickr, as described in this often cited blog post: Feature flags or switches are becoming more commonly used, it seems. django feature flipper is in part inspired by that post, along with some of the other feature flippers available, including: (for Rails, by Florian Munz at Qype) (for Rails, by Matt Johnson) (for CodeIgniter, Dan Horrigan) (for Canonical's Launchpad) A few days after I first committed django feature flipper to github, David Cramer at Disqus has released the gargoyle plugin for Django, that offers overlapping functionality. That plugin requires Nexus , their Django front end admin replacement. The following post is an interview with Flickr's John Allspaw, author of The Art of Capacity Planning: Scaling Web Resources. Includes this quote, which covers feature flags being used to disable features to help panic gracefully. : Of course it's easier to do those things when you have easy config flags to turn things on or off, and a list to run through of what things are acceptable to serve stale and static. We currently have about 195 'features' we can turn off at Flickr in dire circumstances. And we've used those flags when we needed to. More on feature flags/flippers: (also Florian Munz) Continuous deployment: Installation . Add the featureflipper directory to your Python path. This should work:: pip install e git+ . Add featureflipper to your INSTALLED_APPS setting. . Add featureflipper.context_processors.features to your TEMPLATE_CONTEXT_PROCESSORS setting. It doesn't matter where you put it in relation to existing entries. . Add featureflipper.middleware.FeaturesMiddleware to your MIDDLEWARE_CLASSES setting. It doesn't matter where you put it in relation to existing entries. . Optionally, add a settings.FEATURES_FILE, and set it to the location of a features file (see below) to load after each syncdb (or whenever you'd normally expect fixtures to be loaded). . Run ./manage.py syncdb to create the database table. Limitations Feature status is currently kept in the database. This is inefficient. They should probably be in Memcached instead. There is, unforgivably, poor unit test coverage. What determines a feature's status A feature's status (enabled or disabled) is determined by, in order: . The database: the value of the attribute enabled of the Feature table. You can edit this value using the Django admin application. . The session: if a session entry feature_status_myfeature exists, the feature will be enabled if the value is enabled , and disabled otherwise. The middleware will add this entry if the GET parameter session_enable_myfeature is included, as explained below. . The request: if a GET parameter enabled_myfeature exists, the feature will enabled for this request, as explained below. Enabling and disabling features using URLs Users with permission can_flip_with_url can turn features on and off using URL parameters. To enable a feature for the current request:: /mypage/?enable_myfeature To enable a feature for this request and the rest of a session:: /mypage/?session_enable_myfeature To clear all the features enabled in the session:: /mypage/?session_clear_features If you want to allow anonymous users to do this, see the section Authorization for Anonymous Users here: Alternatively (since that looks painful) you can allow anyone to use URLs to flip features by setting FEATURE_FLIPPER_ANONYMOUS_URL_FLIPPING to True in your settings.py. How to use the features in templates The application registers itself with Django's admin app so you can manage the Features . Each feature has a name made up of just alphanumeric characters and hyphens that you can use in templates, views, URLs and elsewhere in your code. Each feature has a boolean enabled property, which is False (disabled) by default. The app also adds a few custom actions to the change list page so you can enable, disable and flip features there. Features also have a name and description, which aren't currently used anywhere but should help you keep track. The context processor adds features to the template context, which you can use like this:: {% if feature.search %} ... {% endif %} Here, search is the name of the feature. If the feature referenced doesn't exist, it is silently treated as disabled. To save you some typing, you can also use a new block tag:: {% load feature_tag %} {% feature login %} Login {% endfeature %} You can also do this:: {% feature profile %} ... will only be output if feature 'profile' is enabled ... {% disabled %} ... will only be output if the feature is disabled ... {% endfeature %} How to use the features in views The middleware adds features , a dict subclass, to each request:: if request.features 'search' : ... The middleware also adds features_panel to the request. This object provides more information about the state of each feature than features . enabled('myfeature') returns True if myfeature is enabled. source('myfeature') returns a string indicating the source of the final status of the feature: site : site wide, in the Feature instance itself session : in the session, set using a URL parameter url : per request, set using a URL parameter source('myfeature) will return another value if a featureflipper plugin is being used (see below). features and source are also available. They are demonstrated in the example application. Features file To make sure you can easily keep features and their default settings under version control, you can load features from a file using the loadfeatures management command (below). If you add FEATURES_FILE to your settings, pointing to a file (typically features.json), features from this file will be loaded each time you do a syncdb. Note that any existing feature of the same name will be overwritten. The file needs to look like this:: { name : profile , enabled : true, description : Allow the user to view and edit their profile. }, { name : search , enabled : true, description : Shows the search box on most pages, and the larger one on the home page. } Note that for profile above, we're using the description field to describe the feature in general, whereas for search we're describing how and where that feature is make visible to the user. You might end up using a mix of these. Management commands ./manage.py features : List the features in the database, along with their status. ./manage.py addfeature : Adds one or more features to the database (leaving them disabled). ./manage.py loadfeatures : Loads features from a JSON file (as above), or from the features file defined in settings.FEATURES_FILE. ./manage.py dumpfeatures : Outputs features from the database in the same JSON format (although the keys aren't in the same order as the example above). ./manage.py enablefeature : Enables the named feature(s). ./manage.py disablefeature : Disables the named feature(s). Signals Signal featureflipper.signals.feature_defaulted is sent when a feature referred to in a template or view is being defaulted to disabled. This will happen if the feature is not in the database, and hasn't been enabled using URL parameters. The example project shows how this signal can be used, in views.py . Note also that featureflipper uses Django's post_syncdb to load a features file when syncdb is run. The connection to the signal is made in featureflipper/management/__init.py__ . Using the example project included in the source The source tree for django feature flipper includes an example project created using the App Factory described on a post_ on the Washington Times open source blog. .. _post: The settings.py file stipulates a sqlite3 database, so you'll need sqlite3 to be installed on your system. The database will be created automatically as necessary. To try the example project:: cd example ./manage.py syncdb ./manage.py runserver Let syncdb help you create a superuser so you can use the admin to create your own features. If you forget this step you can always run the createsuperuser command to do this. Two features ( profile and search ) will be loaded from features.json when you do the syncdb . These are referenced in the example template used on the home page. There's no link bank to the home page from the admin so you'll need to hack the URL or open the admin in a separate tab in your browser. Good practice Once you no longer need to flip a feature, remove the feature from the database and all the logic from your template and views. If you decide to remove the feature itself from your application, don't leave unused template and view code around. Just delete it. If you later decide to resurect the feature, it'll always be there in your version control repository. Extending Feature Flipper The app includes a hook to allow you to add feature providers that provide the state of features. On each request, the feature states are collected in turn from any plugins found (the order they're called on is undefined), just after feature states are collected from the database. To add a plugin, you need to create a subclass of featureflipper.FeatureProvider, and make sure it gets compiled along with the rest of your application. The class attribute source must be a string. This string is what the middeware makes available in request.features_panel.source(). The static method features must return a (possibly empty) list of tuples. The first member is the name of the feature, and the second True if the feature is enabled, and False otherwise. The features returned need not be defined in a Feature instance in the database. For example:: from featureflipper import FeatureProvider class UserFeatures(FeatureProvider): source 'user' @staticmethod def features(request): return ('feature1', False), ('feature2', True) TODOs and BUGS See:",Unknown,Unknown 278,Unknown,Unknown,Unknown,"Logo tqdm PyPI Versions PyPI Status Conda Forge Status Docker Snapcraft Build Status Coverage Status Branch Coverage Status Codacy Grade Libraries Rank PyPI Downloads DOI URI LICENCE OpenHub Status binder demo notebook demo tqdm means progress in Arabic ( taqadum , تقدّم) and is an abbreviation for I love you so much in Spanish ( te quiero demasiado ). Instantly make your loops show a smart progress meter just wrap any iterable with tqdm(iterable) , and you're done! .. code:: python from tqdm import tqdm for i in tqdm(range(10000)): ... 76% ████████████████████████████ 7568/10000 00:33 __ It can also be executed as a module with pipes: .. code:: sh $ seq 9999999 tqdm bytes wc l 75.2MB 00:00, 217MB/s 9999999 $ 7z a bd r backup.7z docs/ grep Compressing \ tqdm total $(find docs/ type f wc l) unit files >> backup.log 100% ███████████████████████████████▉ 8014/8014 01:37 __ has an 800ns/iter overhead. In addition to its low overhead, tqdm uses smart algorithms to predict the remaining time and to skip unnecessary iteration displays, which allows for a negligible overhead in most cases. tqdm works on any platform (Linux, Windows, Mac, FreeBSD, NetBSD, Solaris/SunOS), in any console or in a GUI, and is also friendly with IPython/Jupyter notebooks. tqdm does not require any dependencies (not even curses !), just Python and an environment supporting carriage return \r and line feed \n control characters. .. contents:: Table of contents :backlinks: top :local: Installation Latest PyPI stable release PyPI Status PyPI Downloads Libraries Dependents .. code:: sh pip install tqdm Latest development release on GitHub GitHub Status GitHub Stars GitHub Commits GitHub Forks GitHub Updated Pull and install in the current directory: .. code:: sh pip install e git+ Latest Conda release Conda Forge Status .. code:: sh conda install c conda forge tqdm Latest Snapcraft release Snapcraft .. code:: sh snap install tqdm Latest Docker release Docker .. code:: sh docker pull tqdm/tqdm docker run i rm tqdm/tqdm help Changelog The list of all changes is available either on GitHub's Releases: GitHub Status , on the wiki __, on the website __, or on crawlers such as allmychanges.com _. Usage tqdm is very versatile and can be used in a number of ways. The three main ones are given below. Iterable based Wrap tqdm() around any iterable: .. code:: python from tqdm import tqdm import time text for char in tqdm( a , b , c , d ): time.sleep(0.25) text text + char trange(i) is a special optimised instance of tqdm(range(i)) : .. code:: python for i in trange(100): time.sleep(0.01) Instantiation outside of the loop allows for manual control over tqdm() : .. code:: python pbar tqdm( a , b , c , d ) for char in pbar: time.sleep(0.25) pbar.set_description( Processing %s % char) Manual Manual control on tqdm() updates by using a with statement: .. code:: python with tqdm(total 100) as pbar: for i in range(10): time.sleep(0.1) pbar.update(10) If the optional variable total (or an iterable with len() ) is provided, predictive stats are displayed. with is also optional (you can just assign tqdm() to a variable, but in this case don't forget to del or close() at the end: .. code:: python pbar tqdm(total 100) for i in range(10): time.sleep(0.1) pbar.update(10) pbar.close() Module Perhaps the most wonderful use of tqdm is in a script or on the command line. Simply inserting tqdm (or python m tqdm ) between pipes will pass through all stdin to stdout while printing progress to stderr . The example below demonstrated counting the number of lines in all Python files in the current directory, with timing information included. .. code:: sh $ time find . name ' .py' type f exec cat \{} \; wc l 857365 real 0m3.458s user 0m0.274s sys 0m3.325s $ time find . name ' .py' type f exec cat \{} \; tqdm wc l 857366it 00:03, 246471.31it/s 857365 real 0m3.585s user 0m0.862s sys 0m3.358s Note that the usual arguments for tqdm can also be specified. .. code:: sh $ find . name ' .py' type f exec cat \{} \; tqdm unit loc unit_scale total 857366 >> /dev/null 100% ███████████████████████████████████ 857K/857K 00:04 > backup.log 100% ███████████████████████████████▉ 8014/8014 01:37 __, ConEmu __ and PyCharm __ (also here __, here __, and here __) lack full support. Windows: additionally may require the Python module colorama to ensure nested bars stay within their respective lines. Unicode: Environments which report that they support unicode will have solid smooth progressbars. The fallback is an ascii only bar. Windows consoles often only partially support unicode and thus often require explicit ascii True __ (also here __). This is due to either normal width unicode characters being incorrectly displayed as wide , or some unicode characters not rendering. Wrapping enumerated iterables: use enumerate(tqdm(...)) instead of tqdm(enumerate(...)) . The same applies to numpy.ndenumerate . This is because enumerate functions tend to hide the length of iterables. tqdm does not. Wrapping zipped iterables has similar issues due to internal optimisations. tqdm(zip(a, b)) should be replaced with zip(tqdm(a), b) or even zip(tqdm(a), tqdm(b)) . Hanging pipes in python2 __: when using tqdm on the CLI, you may need to use python 3.5+ for correct buffering. If you come across any other difficulties, browse and file GitHub Issues . Documentation PyPI Versions README Hits (Since 19 May 2016) .. code:: python class tqdm(object): Decorate an iterable object, returning an iterator which acts exactly like the original iterable, but prints a dynamically updating progressbar every time a value is requested. def __init__(self, iterable None, desc None, total None, leave True, file None, ncols None, mininterval 0.1, maxinterval 10.0, miniters None, ascii None, disable False, unit 'it', unit_scale False, dynamic_ncols False, smoothing 0.3, bar_format None, initial 0, position None, postfix None, unit_divisor 1000): Parameters iterable : iterable, optional Iterable to decorate with a progressbar. Leave blank to manually manage the updates. desc : str, optional Prefix for the progressbar. total : int, optional The number of expected iterations. If unspecified, len(iterable) is used if possible. If float( inf ) or as a last resort, only basic progress statistics are displayed (no ETA, no progressbar). If gui is True and this parameter needs subsequent updating, specify an initial arbitrary large positive integer, e.g. int(9e9). leave : bool, optional If default: True , keeps all traces of the progressbar upon termination of iteration. file : io.TextIOWrapper or io.StringIO , optional Specifies where to output the progress messages (default: sys.stderr). Uses file.write(str) and file.flush() methods. For encoding, see write_bytes . ncols : int, optional The width of the entire output message. If specified, dynamically resizes the progressbar to stay within this bound. If unspecified, attempts to use environment width. The fallback is a meter width of 10 and no limit for the counter and statistics. If 0, will not print any meter (only stats). mininterval : float, optional Minimum progress display update interval default: 0.1 seconds. maxinterval : float, optional Maximum progress display update interval default: 10 seconds. Automatically adjusts miniters to correspond to mininterval after long display update lag. Only works if dynamic_miniters or monitor thread is enabled. miniters : int, optional Minimum progress display update interval, in iterations. If 0 and dynamic_miniters , will automatically adjust to equal mininterval (more CPU efficient, good for tight loops). If > 0, will skip display of specified number of iterations. Tweak this and mininterval to get very efficient loops. If your progress is erratic with both fast and slow iterations (network, skipping items, etc) you should set miniters 1. ascii : bool or str, optional If unspecified or False, use unicode (smooth blocks) to fill the meter. The fallback is to use ASCII characters 123456789 . disable : bool, optional Whether to disable the entire progressbar wrapper default: False . If set to None, disable on non TTY. unit : str, optional String that will be used to define the unit of each iteration default: it . unit_scale : bool or int or float, optional If 1 or True, the number of iterations will be reduced/scaled automatically and a metric prefix following the International System of Units standard will be added (kilo, mega, etc.) default: False . If any other non zero number, will scale total and n . dynamic_ncols : bool, optional If set, constantly alters ncols to the environment (allowing for window resizes) default: False . smoothing : float, optional Exponential moving average smoothing factor for speed estimates (ignored in GUI mode). Ranges from 0 (average speed) to 1 (current/instantaneous speed) default: 0.3 . bar_format : str, optional Specify a custom bar string formatting. May impact performance. default: '{l_bar}{bar}{r_bar}' , where l_bar '{desc}: {percentage:3.0f}% ' and r_bar ' {n_fmt}/{total_fmt} {elapsed} >> t tqdm(total filesize) Initialise >>> for current_buffer in stream: ... ... ... t.update(len(current_buffer)) >>> t.close() The last line is highly recommended, but possibly not necessary if t.update() will be called in such a way that filesize will be exactly reached and printed. Parameters n : int, optional Increment to add to the internal counter of iterations default: 1 . def close(self): Cleanup and (if leave False) close the progressbar. def clear(self, nomove False): Clear current bar display. def refresh(self): Force refresh the display of this bar. def unpause(self): Restart tqdm timer from last print time. def reset(self, total None): Resets to 0 iterations for repeated use. Consider combining with leave True . Parameters total : int, optional. Total to use for the new bar. def set_description(self, desc None, refresh True): Set/modify description of the progress bar. Parameters desc : str, optional refresh : bool, optional Forces refresh default: True . def set_postfix(self, ordered_dict None, refresh True, kwargs): Set/modify postfix (additional stats) with automatic formatting based on datatype. Parameters ordered_dict : dict or OrderedDict, optional refresh : bool, optional Forces refresh default: True . kwargs : dict, optional @classmethod def write(cls, s, file sys.stdout, end \n ): Print a message via tqdm (without overlap with bars). @property def format_dict(self): Public API for read only member access. def display(self, msg None, pos None): Use self.sp to display msg in the specified pos . Consider overloading this function when inheriting to use e.g.: self.some_frontend( self.format_dict) instead of self.sp . Parameters msg : str, optional. What to display (default: repr(self) ). pos : int, optional. Position to moveto (default: abs(self.pos) ). def trange( args, kwargs): A shortcut for tqdm(xrange( args), kwargs). On Python3+ range is used instead of xrange. class tqdm_gui(tqdm): Experimental GUI version def tgrange( args, kwargs): Experimental GUI version of trange class tqdm_notebook(tqdm): Experimental IPython/Jupyter Notebook widget def tnrange( args, kwargs): Experimental IPython/Jupyter Notebook widget version of trange Examples and Advanced Usage See the examples __ folder; import the module and run help() ; consult the wiki __; this has an excellent article __ on how to make a great progressbar; run the notebook demo or binder demo , or check out the slides from PyData London __. Description and additional stats Custom information can be displayed and updated dynamically on tqdm bars with the desc and postfix arguments: .. code:: python from tqdm import trange from random import random, randint from time import sleep with trange(10) as t: for i in t: Description will be displayed on the left t.set_description('GEN %i' % i) Postfix will be displayed on the right, formatted automatically based on argument's datatype t.set_postfix(loss random(), gen randint(1,999), str 'h', lst 1, 2 ) sleep(0.1) with tqdm(total 10, bar_format {postfix 0 } {postfix 1 value :>8.2g} , postfix Batch , dict(value 0) ) as t: for i in range(10): sleep(0.1) t.postfix 1 value i / 2 t.update() Points to remember when using {postfix ... } in the bar_format string: postfix also needs to be passed as an initial argument in a compatible format, and postfix will be auto converted to a string if it is a dict like object. To prevent this behaviour, insert an extra item into the dictionary where the key is not a string. Additional bar_format parameters may also be defined by overriding format_dict , and the bar itself may be modified using ascii : .. code:: python from tqdm import tqdm class TqdmExtraFormat(tqdm): Provides a total_time format parameter @property def format_dict(self): d super(TqdmExtraFormat, self).format_dict total_time d elapsed (d total or 0) / max(d n , 1) d.update(total_time self.format_interval(total_time) + in total ) return d for i in TqdmExtraFormat( range(10), ascii .oO0 , bar_format {total_time}: {percentage:.0f}% {bar}{r_bar} ): pass .. code:: 00:01 in total: 40% 000o 4/10 00:00 __ will be used if available to keep nested bars on their respective lines. For manual control over positioning (e.g. for multi threaded use), you may specify position n where n 0 for the outermost bar, n 1 for the next, and so on: .. code:: python from time import sleep from tqdm import trange, tqdm from multiprocessing import Pool, freeze_support, RLock L list(range(9)) def progresser(n): interval 0.001 / (n + 2) total 5000 text {}, est. {: __. Functional alternative in examples/tqdm_wget.py __. It is recommend to use miniters 1 whenever there is potentially large differences in iteration speed (e.g. downloading a file over a patchy connection). Pandas Integration Due to popular demand we've added support for pandas here's an example for DataFrame.progress_apply and DataFrameGroupBy.progress_apply : .. code:: python import pandas as pd import numpy as np from tqdm import tqdm df pd.DataFrame(np.random.randint(0, 100, (100000, 6))) Register pandas.progress_apply and pandas.Series.map_apply with tqdm (can use tqdm_gui , tqdm_notebook , optional kwargs, etc.) tqdm.pandas(desc my bar! ) Now you can use progress_apply instead of apply and progress_map instead of map df.progress_apply(lambda x: x 2) can also groupby: df.groupby(0).progress_apply(lambda x: x 2) In case you're interested in how this works (and how to modify it for your own callbacks), see the examples __ folder or import the module and run help() . IPython/Jupyter Integration IPython/Jupyter is supported via the tqdm_notebook submodule: .. code:: python from tqdm import tnrange, tqdm_notebook from time import sleep for i in tnrange(3, desc '1st loop'): for j in tqdm_notebook(range(100), desc '2nd loop'): sleep(0.01) In addition to tqdm features, the submodule provides a native Jupyter widget (compatible with IPython v1 v4 and Jupyter), fully working nested bars and colour hints (blue: normal, green: completed, red: error/interrupt, light blue: no ETA); as demonstrated below. Screenshot Jupyter1 Screenshot Jupyter2 Screenshot Jupyter3 It is also possible to let tqdm automatically choose between console or notebook versions by using the autonotebook submodule: .. code:: python from tqdm.autonotebook import tqdm tqdm.pandas() Note that this will issue a TqdmExperimentalWarning if run in a notebook since it is not meant to be possible to distinguish between jupyter notebook and jupyter console . Use auto instead of autonotebook to suppress this warning. Custom Integration tqdm may be inherited from to create custom callbacks (as with the TqdmUpTo example above __) or for custom frontends (e.g. GUIs such as notebook or plotting packages). In the latter case: 1. def __init__() to call super().__init__(..., gui True) to disable terminal status_printer creation. 2. Redefine: close() , clear() , display() . Consider overloading display() to use e.g. self.frontend( self.format_dict) instead of self.sp(repr(self)) . Dynamic Monitor/Meter You can use a tqdm as a meter which is not monotonically increasing. This could be because n decreases (e.g. a CPU usage monitor) or total changes. One example would be recursively searching for files. The total is the number of objects found so far, while n is the number of those objects which are files (rather than folders): .. code:: python from tqdm import tqdm import os.path def find_files_recursively(path, show_progress True): files total 1 assumes path is a file t tqdm(total 1, unit file , disable not show_progress) if not os.path.exists(path): raise IOError( Cannot find: + path) def append_found_file(f): files.append(f) t.update() def list_found_dir(path): returns os.listdir(path) assuming os.path.isdir(path) listing os.listdir(path) subtract 1 since a file we found was actually this directory t.total + len(listing) 1 fancy way to give info without forcing a refresh t.set_postfix(dir path 10: , refresh False) t.update(0) may trigger a refresh return listing def recursively_search(path): if os.path.isdir(path): for f in list_found_dir(path): recursively_search(os.path.join(path, f)) else: append_found_file(path) recursively_search(path) t.set_postfix(dir path) t.close() return files Using update(0) is a handy way to let tqdm decide when to trigger a display refresh to avoid console spamming. Writing messages This is a work in progress (see 737 __). Since tqdm uses a simple printing mechanism to display progress bars, you should not write any message in the terminal using print() while a progressbar is open. To write messages in the terminal without any collision with tqdm bar display, a .write() method is provided: .. code:: python from tqdm import tqdm, trange from time import sleep bar trange(10) for i in bar: Print using tqdm class method .write() sleep(0.1) if not (i % 3): tqdm.write( Done task %i % i) Can also use bar.write() By default, this will print to standard output sys.stdout . but you can specify any file like object using the file argument. For example, this can be used to redirect the messages writing to a log file or class. Redirecting writing If using a library that can print messages to the console, editing the library by replacing print() with tqdm.write() may not be desirable. In that case, redirecting sys.stdout to tqdm.write() is an option. To redirect sys.stdout , create a file like class that will write any input string to tqdm.write() , and supply the arguments file sys.stdout, dynamic_ncols True . A reusable canonical example is given below: .. code:: python from time import sleep import contextlib import sys from tqdm import tqdm class DummyTqdmFile(object): Dummy file like that will write to tqdm file None def __init__(self, file): self.file file def write(self, x): Avoid print() second call (useless \n) if len(x.rstrip()) > 0: tqdm.write(x, file self.file) def flush(self): return getattr(self.file, flush , lambda: None)() @contextlib.contextmanager def std_out_err_redirect_tqdm(): orig_out_err sys.stdout, sys.stderr try: sys.stdout, sys.stderr map(DummyTqdmFile, orig_out_err) yield orig_out_err 0 Relay exceptions except Exception as exc: raise exc Always restore sys.stdout/err if necessary finally: sys.stdout, sys.stderr orig_out_err def some_fun(i): print( Fee, fi, fo, .split() i ) Redirect stdout to tqdm.write() (don't forget the as save_stdout ) with std_out_err_redirect_tqdm() as orig_stdout: tqdm needs the original stdout and dynamic_ncols True to autodetect console width for i in tqdm(range(3), file orig_stdout, dynamic_ncols True): sleep(.5) some_fun(i) After the with , printing is restored print( Done! ) Monitoring thread, intervals and miniters tqdm implements a few tricks to to increase efficiency and reduce overhead. Avoid unnecessary frequent bar refreshing: mininterval defines how long to wait between each refresh. tqdm always gets updated in the background, but it will diplay only every mininterval . Reduce number of calls to check system clock/time. mininterval is more intuitive to configure than miniters . A clever adjustment system dynamic_miniters will automatically adjust miniters to the amount of iterations that fit into time mininterval . Essentially, tqdm will check if it's time to print without actually checking time. This behaviour can be still be bypassed by manually setting miniters . However, consider a case with a combination of fast and slow iterations. After a few fast iterations, dynamic_miniters will set miniters to a large number. When iteration rate subsequently slows, miniters will remain large and thus reduce display update frequency. To address this: maxinterval defines the maximum time between display refreshes. A concurrent monitoring thread checks for overdue updates and forces one where necessary. The monitoring thread should not have a noticeable overhead, and guarantees updates at least every 10 seconds by default. This value can be directly changed by setting the monitor_interval of any tqdm instance (i.e. t tqdm.tqdm(...); t.monitor_interval 2 ). The monitor thread may be disabled application wide by setting tqdm.tqdm.monitor_interval 0 before instantiatiation of any tqdm bar. Contributions GitHub Commits GitHub Issues GitHub PRs OpenHub Status GitHub Contributions All source code is hosted on GitHub __. Contributions are welcome. See the CONTRIBUTING __ file for more information. Developers who have made significant contributions, ranked by LoC (surviving lines of code, git fame wMC excl '\.(png gif enc)$' __), are: Name ID LoC Notes Casper da Costa Luis casperdcl __ 3/4 primary maintainer Gift Casper Stephen Larroque lrq3000 __ 1/6 team member Noam Yorav Raphael noamraph __ 1% original author Matthew Stevens mjstevens777 __ 1% Guangshuo Chen chengs __ 1% Hadrien Mary hadim __ 1% team member Mikhail Korobov kmike __ 1% team member Kyle Altendorf altendky __ 1% Ports to Other Languages A list is available on this wiki page __. LICENCE Open Source (OSI approved): LICENCE Citation information: DOI URI README Hits (Since 19 May 2016) .. Logo image:: .. Screenshot image:: .. Build Status image:: :target: .. Coverage Status image:: :target: .. Branch Coverage Status image:: :target: .. Codacy Grade image:: :target: .. GitHub Status image:: :target: .. GitHub Forks image:: :target: .. GitHub Stars image:: :target: .. GitHub Commits image:: :target: .. GitHub Issues image:: :target: .. GitHub PRs image:: :target: .. GitHub Contributions image:: :target: .. GitHub Updated image:: :target: .. Gift Casper image:: :target: .. PyPI Status image:: :target: .. PyPI Downloads image:: :target: .. PyPI Versions image:: :target: .. Conda Forge Status image:: :target: .. Snapcraft image:: :target: .. Docker image:: :target: .. Libraries Rank image:: :target: .. Libraries Dependents image:: :target: .. OpenHub Status image:: :target: .. LICENCE image:: :target: .. DOI URI image:: :target: .. notebook demo image:: :target: .. binder demo image:: :target: .. Screenshot Jupyter1 image:: .. Screenshot Jupyter2 image:: .. Screenshot Jupyter3 image:: .. README Hits image:: :target:",Unknown,Unknown 279,Unknown,Unknown,Unknown,"Algo VPN Join the chat at Twitter TravisCI Status Algo VPN is a set of Ansible scripts that simplify the setup of a personal IPSEC and Wireguard VPN. It uses the most secure defaults available, works with common cloud providers, and does not require client software on most devices. See our release announcement for more information. Features Supports only IKEv2 with strong crypto (AES GCM, SHA2, and P 256) and WireGuard Generates Apple profiles to auto configure iOS and macOS devices Includes a helper script to add and remove users Blocks ads with a local DNS resolver (optional) Sets up limited SSH users for tunneling traffic (optional) Based on current versions of Ubuntu and strongSwan Installs to DigitalOcean, Amazon Lightsail, Amazon EC2, Vultr, Microsoft Azure, Google Compute Engine, Scaleway, OpenStack, or your own Ubuntu server Anti features Does not support legacy cipher suites or protocols like L2TP, IKEv1, or RSA Does not install Tor, OpenVPN, or other risky servers Does not depend on the security of TLS Does not require client software on most platforms Does not claim to provide anonymity or censorship avoidance Does not claim to protect you from the FSB , MSS ), DGSE , or FSM Deploy the Algo Server The easiest way to get an Algo server running is to let it set up a _new_ virtual machine in the cloud for you. 1. Setup an account on a cloud hosting provider. Algo supports DigitalOcean (most user friendly), Amazon Lightsail , Amazon EC2 , Vultr , Microsoft Azure , Google Compute Engine , Scaleway , and DreamCompute or other OpenStack based cloud hosting. 2. Download Algo . Unzip it in a convenient location on your local machine. 3. Install Algo's core dependencies. Open the Terminal. The python interpreter you use to deploy Algo must be python2. If you don't know what this means, you're probably fine. cd into the algo master directory where you unzipped Algo, then run: macOS: bash $ python m ensurepip user $ python m pip install user upgrade virtualenv Linux (deb based): bash $ sudo apt get update && sudo apt get install \ build essential \ libssl dev \ libffi dev \ python dev \ python pip \ python setuptools \ python virtualenv y Linux (rpm based): See the pre installation documentation for RedHat/CentOS 6.x (docs/deploy from redhat centos6.md) or Fedora (docs/deploy from fedora workstation.md) Windows: See the Windows documentation (docs/deploy from windows.md) 4. Install Algo's remaining dependencies. Use the same Terminal window as the previous step and run: bash $ python m virtualenv python which python2 env && source env/bin/activate && python m pip install U pip virtualenv && python m pip install r requirements.txt On macOS, you may be prompted to install cc . You should press accept if so. 5. List the users to create. Open config.cfg in your favorite text editor. Specify the users you wish to create in the users list. If you want to be able to add or delete users later, you must select yes for the Do you want to retain the CA key? prompt during the deployment. 6. Start the deployment. Return to your terminal. In the Algo directory, run ./algo and follow the instructions. There are several optional features available. None are required for a fully functional VPN server. These optional features are described in greater detail in deploy from ansible.md (docs/deploy from ansible.md). That's it! You will get the message below when the server deployment process completes. You now have an Algo server on the internet. Take note of the p12 (user certificate) password and the CA key in case you need them later, they will only be displayed this time . You can now setup clients to connect it, e.g. your iPhone or laptop. Proceed to Configure the VPN Clients ( configure the vpn clients) below. Congratulations! Your Algo server is running. Config files and certificates are in the ./configs/ directory. Go to after connecting and ensure that all your traffic passes through the VPN. Local DNS resolver 172.16.0.1 The p12 and SSH keys password for new users is XXXXXXXX The CA key password is XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX Shell access: ssh i configs/algo.pem root@xxx.xxx.xx.xx Configure the VPN Clients Certificates and configuration files that users will need are placed in the configs directory. Make sure to secure these files since many contain private keys. All files are saved under a subdirectory named with the IP address of your new Algo VPN server. Apple Devices WireGuard is used to provide VPN services on Apple devices. Algo generates a WireGuard configuration file, wireguard/ .conf , and a QR code, wireguard/ .png , for each user defined in config.cfg . On iOS, install the WireGuard app from the iOS App Store. Then, use the WireGuard app to scan the QR code or AirDrop the configuration file to the device. On macOS Mojave or later, install the WireGuard app from the Mac App Store. WireGuard will appear in the menu bar once you run the app. Click on the WireGuard icon, choose Import tunnel(s) from file... , then select the appropriate WireGuard configuration file. On either iOS or macOS, you can enable Connect on Demand and/or exclude certain trusted Wi Fi networks (such as your home or work) by editing the tunnel configuration in the WireGuard app. (Algo can't do this automatically for you.) Installing WireGuard is a little more complicated on older version of macOS. See Using macOS as a Client with WireGuard (docs/client macos wireguard.md). If you prefer to use the built in IPSEC VPN on Apple devices, or need Connect on Demand or excluded Wi Fi networks automatically configured, then see Using Apple Devices as a Client with IPSEC (docs/client apple ipsec.md). Android Devices WireGuard is used to provide VPN services on Android. Install the WireGuard VPN Client . Import the corresponding wireguard/ .conf file to your device, then setup a new connection with it. See the Android setup instructions (/docs/client android.md) for more detailed walkthrough. Windows 10 Copy your PowerShell script windows_{username}.ps1 to the Windows client and run the following command as Administrator to configure the VPN connection. powershell ExecutionPolicy ByPass File windows_{username}.ps1 Add For a manual installation, see the Windows setup instructions (/docs/client windows.md). Linux Network Manager Clients (e.g., Ubuntu, Debian, or Fedora Desktop) Network Manager does not support AES GCM. In order to support Linux Desktop clients, choose the compatible cryptography during the deploy process and use at least Network Manager 1.4.1. See Issue 263 for more information. Linux strongSwan Clients (e.g., OpenWRT, Ubuntu Server, etc.) Install strongSwan, then copy the included ipsec_user.conf, ipsec_user.secrets, user.crt (user certificate), and user.key (private key) files to your client device. These will require customization based on your exact use case. These files were originally generated with a point to point OpenWRT based VPN in mind. Ubuntu Server example 1. sudo apt get install strongswan libstrongswan standard plugins : install strongSwan 2. /etc/ipsec.d/certs : copy .crt from algo master/configs/ /ipsec/manual/ .crt 3. /etc/ipsec.d/private : copy .key from algo master/configs/ /ipsec/manual/ .key 4. /etc/ipsec.d/cacerts : copy cacert.pem from algo master/configs/ /ipsec/manual/cacert.pem 5. /etc/ipsec.secrets : add your user.key to the list, e.g. : ECDSA .key 6. /etc/ipsec.conf : add the connection from ipsec_user.conf and ensure leftcert matches the .crt filename 7. sudo ipsec restart : pick up config changes 8. sudo ipsec up : start the ipsec tunnel 9. sudo ipsec down : shutdown the ipsec tunnel One common use case is to let your server access your local LAN without going through the VPN. Set up a passthrough connection by adding the following to /etc/ipsec.conf : conn lan passthrough leftsubnet 192.168.1.1/24 Replace with your LAN subnet rightsubnet 192.168.1.1/24 Replace with your LAN subnet authby never No authentication necessary type pass passthrough auto route no need to ipsec up lan passthrough To configure the connection to come up at boot time replace auto add with auto start . Other Devices Depending on the platform, you may need one or multiple of the following files. cacert.pem: CA Certificate user.mobileconfig: Apple Profile user.p12: User Certificate and Private Key (in PKCS 12 format) ipsec_user.conf: strongSwan client configuration ipsec_user.secrets: strongSwan client configuration windows_user.ps1: Powershell script to help setup a VPN connection on Windows Setup an SSH Tunnel If you turned on the optional SSH tunneling role, then local user accounts will be created for each user in config.cfg and SSH authorized_key files for them will be in the configs directory (user.ssh.pem). SSH user accounts do not have shell access, cannot authenticate with a password, and only have limited tunneling options (e.g., ssh N is required). This ensures that SSH users have the least access required to setup a tunnel and can perform no other actions on the Algo server. Use the example command below to start an SSH tunnel by replacing user and ip with your own. Once the tunnel is setup, you can configure a browser or other application to use 127.0.0.1:1080 as a SOCKS proxy to route traffic through the Algo server. ssh D 127.0.0.1:1080 f q C N user@ip i configs/ /ssh tunnel/ .pem SSH into Algo Server Your Algo server is configured for key only SSH access for administrative purposes. Open the Terminal app, cd into the algo master directory where you originally downloaded Algo, and then use the command listed on the success message: ssh i configs/algo.pem user@ip where user is either root or ubuntu as listed on the success message, and ip is the IP address of your Algo server. If you find yourself regularly logging into the server then it will be useful to load your Algo ssh key automatically. Add the following snippet to the bottom of /.bash_profile to add it to your shell environment permanently. ssh add /.ssh/algo > /dev/null 2>&1 Adding or Removing Users _If you chose to save the CA key during the deploy process,_ then Algo's own scripts can easily add and remove users from the VPN server. 1. Update the users list in your config.cfg 2. Open a terminal, cd to the algo directory, and activate the virtual environment with source env/bin/activate 3. Run the command: ./algo update users After this process completes, the Algo VPN server will contain only the users listed in the config.cfg file. Additional Documentation Deployment instructions, cloud provider setup instructions, and further client setup instructions available here. (docs/index.md) FAQ (docs/faq.md) Troubleshooting (docs/troubleshooting.md) If you read all the documentation and have further questions, join the chat on Gitter . Endorsements > I've been ranting about the sorry state of VPN svcs for so long, probably about > time to give a proper talk on the subject. TL;DR: use Algo. Kenn White > Before picking a VPN provider/app, make sure you do some research > ... – or consider Algo The Register > Algo is really easy and secure. the grugq > I played around with Algo VPN, a set of scripts that let you set up a VPN in the cloud in very little time, even if you don’t know much about development. I’ve got to say that I was quite impressed with Trail of Bits’ approach. Romain Dillet for TechCrunch > If you’re uncomfortable shelling out the cash to an anonymous, random VPN provider, this is the best solution. Thorin Klosowski for Lifehacker Support Algo VPN Flattr PayPal Patreon Bountysource All donations support continued development. Thanks! We accept donations via PayPal , Patreon , and Flattr . Use our referral code when you sign up to Digital Ocean for a $10 credit. We also accept and appreciate contributions of new code and bugfixes via Github Pull Requests. Algo is licensed and distributed under the AGPLv3. If you want to distribute a closed source modification or service based on Algo, then please consider purchasing an exception . As with the methods above, this will help support continued development.",Unknown,Unknown 280,Unknown,Unknown,Unknown,"django schemata BEWARE! THIS IS AN EXPERIMENTAL CODE! Created this as a proof of concept and never had a chance to test it thoroughly, not speaking about the production run, as our team changed the plans. While I was very excited during coding it, I unfortunately have no use for schemata currently. I'd love to hear how the code is really doing and if you find something that should be fixed, I'll gladly review and pull your patches. This project adds the PostgreSQL schema support to Django . The schema, which can be seen as a namespace in which any database object exists, allows to isolate the database objects even when they have the same names. You can have same set of tables, indices, sequences etc. many times under the single database. In case you're not using schemata, your objects lie in the default schema public and because the default search_path contains public , you don't have to care. Why to care? It's simple: One code One instance One shared buffering One connection One database One schema for one customer You scale up to the stars Using schemata can be very useful if you run the Software as a service (SaaS) server for multiple customers. Typically for multiple databases you had single project code, cloned many times and that required strong maintenance effort. So until recently you were forced to maintain multiple Django instances even when the code did the same things, only the data varied. With the invention of multiple databases support in Django it was possible to use it for SaaS, yet using schemata was found to bring even more advantages. This code was inspired by the A better way : SaaS with Django and PostgreSQL Schemas blog post and the django appschema application. Going underneath Like django appschema this project infers the proper schema to switch to from the hostname found in each web request. You're expected to point multiple HTTP domains of your customers handled by your (Apache/WSGI) server to the single Django instance supporting schemata. Warning: This application was not tested in the multithreading environment, we configure our mod_wsgi to run each Django instance as mutiple separated processes. Unlike django appschema , this project seeks for the maximum simplicity (added layer and toolset must be as thin as possible so the data path is clear): Minimalistic code. No hacking of INSTALLED_APPS , syncdb or migrate commands... (they had enough with South ). Schema definitions are not stored in the database, but in settings 's dict. That allows you to flexibly and uniformly configure the differences between individual domains. django schemata only requires schema_name sub key, but you're free to store additional configuration there. Shared applications Not yet. The reason why django appschema became hackish is that it tries to sync/migrate both isolated and shared applications in a single run. The app is shared if it has its tables in the public schema, hence they're accessible by every domain. That's because public schema is always checked after the object was not found in its home schema. The support for shared application will be added to django schemata as soon as it becomes clear it is required. And we strive to add the support in a more simple way: ALTER TABLE table SET SCHEMA schema looks very promising . We believe it's bearable for the admin to do some extra setup steps, when the code stays simple. Setup django schemata requires the following settings.py modifications: We wrap around the PostgreSQL backend. DATABASE_ENGINE 'django_schemata.postgresql_backend' Schema switching upon web requests. MIDDLEWARE_CLASSES ( 'django_schemata.middleware.SchemataMiddleware', ... ) We also offer some management commands. INSTALLED_APPS ( ... 'django_schemata', ... ) We need to assure South of the real db backends for all databases. Otherwise it dies in uncertainty. For Django 1.1 or below: SOUTH_DATABASE_ADAPTER 'south.db.postgresql_psycopg2' For Django 1.2 or above: SOUTH_DATABASE_ADAPTERS { 'default': 'south.db.postgresql_psycopg2', } This maps all HTTP domains to all schemata we want to support. All of your supported customers need to be registered here. SCHEMATA_DOMAINS { 'localhost': { 'schema_name': 'localhost', 'additional_data': ... }, 'first client.com': { 'schema_name': 'firstclient', }, 'second client.com': { 'schema_name': 'secondclient', }, } Management commands ./manage.py manage_schemata As soon as you add your first domain to settings.SCHEMATA_DOMAINS , you can run this. PostgreSQL database is inspected and yet not existing schemata are added. Current ones are not touched (command is safe to re run). Later more capabilities will be added here. ./manage.py sync_schemata This command runs the syncdb command for every registered database schema. You can sync all of your apps and domains in a single run. The options given to sync_schemata are passed to every syncdb . So if you use South, you may find this handy: ./manage sync_schemata migrate ./manage.py migrate_schemata This command runs the South's migrate command for every registered database schema. The options given to migrate_schemata are passed to every migrate . Hence you may find ./manage.py migrate_schemata list handy if you're curious or ./manage.py migrate_schemata myapp 0001_initial fake in case you're just switching myapp application to use South migrations. Bug report? Idea? Patch? We're happy to incorporate your patches and ideas. Please either fork and send pull requests or just send the patch. Discuss this project! Please report bugs. Success stories are highly welcome. Thank you.",Unknown,Unknown 281,Unknown,Unknown,Unknown,"Django Reversetag Django Reversetag is an enhanced replacement for Django's builtin url_ template tag. .. _url: Features Consistent syntax ( string literals and variables ) Ability to reverse view names stored in context variables Partial reversing (see Advanced Usage below) Dependencies Python 2.3+ Installation To use reversetag in your Django project it needs to be accessible by your Python installation. The easy way: pip install django reversetag (or use easy_install if you must) The manual way: Simply place the reversetag directory somewhere that is on your $PYTHONPATH. Django Setup Then all that is left to do is adding reversetag to INSTALLED_APPS in your projets settings.py . Example:: INSTALLED_APPS ( 'django.contrib.auth', 'django.contrib.contenttypes', 'django.contrib.sessions', 'django.contrib.sites', 'django.contrib.admin', 'reversetag', 0 9 +)', 'app.views.view', name paginatable_view ), ... /urls.py template.html {% load reversetag %} {% reverse partial paginatable_view as this_page %} {% include pagination.html %} /template.html pagination.html {% load reversetag %} next page /pagination.html In this example the template template.html constructs a partial reversed url to itself and saves the result in a context varialbe this_page which in turn is used by a generic pagination.html to display a link to the next page without having to know anything about the view except that it takes a page argument.",Unknown,Unknown 282,Unknown,Unknown,Unknown,"congress legislators Members of the United States Congress (1789 Present), congressional committees (1973 Present), committee membership (current only), and presidents and vice presidents of the United States in YAML, JSON, and CSV format. Build Status Overview This project provides the following data files: File Download Description legislators current YAML JSON CSV Currently serving Members of Congress. legislators historical YAML JSON CSV Historical Members of Congress (i.e. all Members of Congress except those in the current file). legislators social media YAML JSON Current social media accounts for Members of Congress. Official accounts only (no campaign or personal accounts). committees current YAML JSON Current committees of the Congress, with subcommittees. committee membership current YAML JSON Current committee/subcommittee assignments. committees historical YAML JSON Current and historical committees of the Congress, with subcommittees, from the 93rd Congress (1973) and on. legislators district offices YAML JSON CSV District offices for current Members of Congress. executive YAML JSON Presidents and vice presidents. The data formats are documented below. The files are maintained in YAML format in the master branch of this project. YAML is a serialization format similar in structure to JSON but typically written with one field per line. Like JSON, it allows for nested structure. Each level of nesting is indicated by indentation or a dash. CSV and JSON formatted files are also provided in the gh pages branch they're linked above. This database is maintained through a combination of manual edits by volunteers (from GovTrack , ProPublica , MapLight , FiveThirtyEight , and others) and automated imports from a variety of sources including: GovTrack.us . The Congressional Biographical Directory . Congressional Committees, Historical Standing Committees data set by Garrison Nelson and Charles Stewart . Martis’s “The Historical Atlas of Political Parties in the United States Congress”, via Rosenthal, Howard L., and Keith T. Poole. United States Congressional Roll Call Voting Records, 1789 1990 . The Sunlight Labs Congress API . The Library of Congress's THOMAS website . C SPAN's Congressional Chronicle Data Format Documentation Legislators file structure overview legislators current.yaml and legislators historical.yaml contain biographical information on all Members of Congress that have ever served in Congress, that is, since 1789, as well as cross walks into other databases. Each legislator record is grouped into four guaranteed parts: id's which relate the record to other databases, name information (first, last, etc.), biographical information (birthday, gender), and terms served in Congress. A typical record looks something like this: id: bioguide: R000570 thomas: '01560' govtrack: 400351 opensecrets: N00004357 votesmart: 26344 fec: H8WI01024 cspan: 57970 wikipedia: Paul Ryan ballotpedia: Paul Ryan maplight: 445 house_history: 20785 icpsr: 29939 name: first: Paul middle: D. last: Ryan bio: birthday: '1970 01 29' gender: M terms: ... type: rep start: '2011 01 03' end: '2013 01 03' ... type: rep start: '2013 01 03' end: '2015 01 03' state: WI party: Republican district: 1 url: address: 1233 Longworth HOB; Washington DC 20515 4901 phone: 202 225 3031 fax: 202 225 3393 contact_form: office: 1233 Longworth House Office Building Terms correspond to elections and are listed in chronological order. If a legislator is currently serving, the current term information will always be the last one. To check if a legislator is currently serving, check that the end date on the last term is in the future. The split between legislators current.yaml and legislators historical.yaml is somewhat arbitrary because these files may not be updated immediately when a legislator leaves office. If it matters to you, just load both files. A separate file legislators social media.yaml stores social media account information. Its structure is similar but includes different fields. Data Dictionary The following fields are available in legislators current.yaml and legislators historical.yaml : id bioguide: The alphanumeric ID for this legislator in Note that at one time some legislators (women who had changed their name when they got married) had two entries on the bioguide website. Only one bioguide ID is included here. This is the best field to use as a primary key. thomas: The numeric ID for this legislator on and The ID is stored as a string with leading zeros preserved. lis: The alphanumeric ID for this legislator found in Senate roll call votes . fec: A list of IDs for this legislator in Federal Election Commission data. In the CSV format, the fec_ids column is comma separated. govtrack: The numeric ID for this legislator on GovTrack.us (stored as an integer). opensecrets: The alphanumeric ID for this legislator on OpenSecrets.org. votesmart: The numeric ID for this legislator on VoteSmart.org (stored as an integer). icpsr: The numeric ID for this legislator in Keith Poole's VoteView.com website, originally based on an ID system by the Interuniversity Consortium for Political and Social Research (stored as an integer). cspan: The numeric ID for this legislator on C SPAN's video website, e.g. (stored as an integer). wikipedia: The Wikipedia page name for the person (spaces are given as spaces, not underscores). ballotpedia: The ballotpedia.org page name for the person (spaces are given as spaces, not underscores). maplight : The numeric ID for this legislator on maplight.org (stored as an integer). house_history: The numeric ID for this legislator on The ID is present only for members who have served in the U.S. House. bioguide_previous: When bioguide.congress.gov mistakenly listed a legislator under multiple IDs, this field is a list of alternative IDs. (This often ocurred for women who changed their name.) The IDs in this list probably were removed from bioguide.congress.gov but might still be in use in the wild. name first: The legislator's first name. Sometimes a first initial and period (e.g. in W. Todd Akin), in which case it is suggested to not use the first name for display purposes. middle: The legislator's middle name or middle initial (with period). last: The legislator's last name. Many last names include non ASCII characters. When building search systems, it is advised to index both the raw value as well as a value with extended characters replaced with their ASCII equivalents (in Python that's: u .join(c for c in unicodedata.normalize('NFKD', lastname) if not unicodedata.combining(c))). suffix: A suffix on the legislator's name, such as Jr. or III . nickname: The legislator's nick name when used as a common alternative to his first name. official_full: The full name of the legislator according to the House or Senate (usually first, middle initial, nickname, last, and suffix). Present for those serving on 2012 10 30 and later. other_names, when present, lists other names the legislator has gone by officially. This is helpful in cases where a legislator's legal name has changed. These listings will only include the name attributes which differ from the current name, and a start or end date where applicable. Where multiple names exist, other names are listed chronologically by end date. An excerpted example: id: bioguide: B001228 thomas: '01465' govtrack: 400039 opensecrets: N00007068 name: first: Mary middle: Whitaker last: Bono Mack other_names: last: Bono end: '2007 12 17' ... bio birthday: The legislator's birthday, in YYYY MM DD format. gender: The legislator's gender, either M or F . (In historical data, we've worked backwards from history.house.gov's Women in Congress feature .) terms (one entry for each election) type: The type of the term. Either sen for senators or rep for representatives and delegates to the House. start: The date legislative service began: the date the legislator was sworn in, if known, or else the beginning of the legislator's term. Since 1935 regularly elected terms begin on January 3 at noon on odd numbered years, but when Congress does not first meet on January 3, term start dates might reflect that swearing in occurred on a later date. (Prior to 1935, terms began on March 4 of odd numbered years, see here .) Formatted as YYYY MM DD. end: The date the term ended (because the Congress ended or the legislator died or resigned, etc.). End dates follow the Constitutional end of a term. Since 1935, terms begin and end on January 3 at noon in odd numbered years, and thus a term end date may also be a term start date. Prior to 1935, terms began on March 4 and ended either on March 3 or March 4. The end date is the last date on which the legislator served this term. Unlike the start date, whether Congress was in session or not does not affect the value of this field. state: The two letter, uppercase USPS abbreviation for the state that the legislator is serving from. See below. how: How the term came to be. This field is generally not present. The field is set to appointment for senators appointed to fill a vacancy . Senators and representatives elected by special election are not yet marked in the data. For senators currently serving per an appointment, the field end type may be set to special election , in which case the end date of the term will reflect the expected special election date to replace the appointed senator. Once the special election occurs and the next senator is sworn in, ending the term of the appointed senator, the end date will be updated to reflect the actual end of service (which will follow the election date). district: For representatives, the district number they are serving from. At large districts are district 0. In historical data, unknown district numbers are recorded as 1. class: For senators, their election class (1, 2, or 3). Note that this is unrelated to seniority. state_rank: For senators, whether they are the junior or senior senator (only valid if the term is current, otherwise the senator's rank at the time the term ended). party: The political party of the legislator. If the legislator changed parties, this is the most recent party held during the term and party_affiliations will be set. Values are typically Democrat , Independent , or Republican . The value typically matches the political party of the legislator on the ballot in his or her last election, although for state affiliate parties such as Democratic Farmer Labor we will use the national party name ( Democrat ) instead to keep the values of this field normalized. caucus: For independents, the party that the legislator caucuses with, using the same values as the party field. Omitted if the legislator caucuses with the party indicated in the party field. When in doubt about the difference between the party and caucus fields, the party field is what displays after the legislator's name (i.e. (D) ) but the caucus field is what normally determines committee seniority. This field was added starting with terms for the 113th Congress. party_affiliations: This field is present if the legislator changed party or caucus affiliation during the term. The value is a list of time periods, with start and end dates, each of which has a party field and a caucus field if applicable, with the same meanings as the main party and caucus fields. The time periods cover the entire term, so the first start will match the term start , the last end will match the term end , and the last party (and caucus if present) will match the term party (and caucus ). url: The official website URL of the legislator (only valid if the term is current). address: The mailing address of the legislator's Washington, D.C. office (only valid if the term is current, otherwise the last known address). phone: The phone number of the legislator's Washington, D.C. office (only valid if the term is current, otherwise the last known number). fax: The fax number of the legislator's Washington, D.C. office (only valid if the term is current, otherwise the last known number). contact_form: The website URL of the contact page on the legislator's official website (only valid if the term is current, otherwise the last known URL). office: Similar to the address field, this is just the room and building number, suitable for display (only valid if the term is current, otherwise the last known office). rss_url The URL to the official website's RSS feed (only valid if the term is current, otherwise the last known URL). Leadership roles : yaml leadership_roles: title: Minority Leader chamber: senate start: '2007 01 04' end: '2009 01 06' For members with top formal positions of leadership in each party in each chamber, a leadership_roles field will include an array of start/end dates and titles documenting when they held this role. Leadership terms are not identical to legislative terms, and so start and end dates will be different than legislative term dates. However, leaders do need to be re elected each legislative term, so their leadership terms should all be subsets of their legislative terms. Except where noted, fields are omitted when their value is empty or unknown. Any field may be unknown. Notes: In most cases, a legislator has a single term on any given date. In some cases a legislator resigned from one chamber and was sworn in in the other chamber on the same day. Terms for senators list each six year term, so the terms span three Congresses. For representatives and delegates, each two year term is listed, each corresponding to a single Congress. But Puerto Rico's Resident Commissioner serves four year terms, and so the Resident Commissioner will have a single term covering two Congresses (this has not been updated in historical data). Historically, some states sending at large representatives actually sent multiple at large representatives. Thus, state and district may not be a unique key. Data on Official Social Media Accounts This dataset is designed to include accounts that are paid for with public funds and which represent official communications of their office. We rely on reasonable verification from the legislative office about the status of their accounts. Offices are supposed to maintain strict separation of official funds and campaign funds, and official funds are not supposed to be used to further things like re election efforts. In practice, a campaign account may often look similar to an official account in terms of content, especially when expressing views on issues and legislations. However, there will be differences in what's appropriate for each account, and they will likely be maintained by different staff employed by different organizations. The social media file legislators social media.yaml stores current social media account information. Each record has two sections: id and social . The id section identifies the legislator using bioguide, thomas, and govtrack IDs (where available). The social section has social media account identifiers: twitter: The current official Twitter handle of the legislator. youtube: The current official YouTube username of the legislator. youtube_id: The current official YouTube channel ID of the legislator. instagram: The current official Instagram handle of the legislator. instagram_id: The numeric ID of the current official Instagram handle of the legislator. facebook: The username of the current official Facebook presence of the legislator. Several legislators do not have an assigned YouTube username. In these cases, only the youtube_id field is populated. All values can be turned into URLs by preceding them with the domain name of the service in question (and in the case of YouTube channels, the path /channel ): Legislators are only present when they have one or more social media accounts known. Fields are omitted when the account is unknown. Updating social media accounts Available tasks with scripts/social_media.py : sweep : Given a service , looks through current members for those missing an account on that service, and checks that member's official website's source code for mentions of that service. Uses a CSV at data/social_media_blacklist.csv to exclude known non individual account names. A CSV of leads is produced for manual review. update : Given a service , reads the CSV produced by sweep back in and updates the YAML accordingly. Note : With small updates, for people already in the YAML, it's easiest to just update by hand. clean : Given a service , removes legislators from the social media file who are no longer current. resolvefb : Uses Facebook usernames to look up graph IDs, and updates the YAML accordingly. resolveyt Uses YouTube usernames to look up any channel IDs, and updates the YAML accordingly. resolveig Uses Instagram user IDs to look up any usernames, and updates the YAML accordingly. Options used with the above tasks: service : Can be twitter , youtube , or facebook . bioguide : Limit activity to a single member, by bioguide ID. email : In conjunction with sweep , send an email if there are any new leads, using settings in scripts/email/config.yml (if it was created and filled out). Committees Data Dictionary The committees current.yaml file lists all current House, Senate, and Joint committees of the United States Congress. It includes metadata and cross walks into other databases of committee information. It is based on data scraped from House.gov and Senate.gov. The committees historical.yaml file is a possibly partial list of current and historical committees and subcommittees referred to in the unitedstates/congress project bill data, as scraped from THOMAS.gov. Only committees/subcommmittees that have had bills referred to them are included. The basic structure of a committee entry looks like the following: type: house name: House Committee on Agriculture url: thomas_id: HSAG house_committee_id: AG jurisdiction: The U.S. House Committee on Agriculture, or Agriculture Committee, is a standing committee of the ... jurisdiction_source: subcommittees: (... subcommittee list ...) The two files are structured each as a list of committees, each entry an associative array of key/value pairs of committee metadata. The fields available in both files are as follows: type: 'house', 'senate', or 'joint' indicating the type of commmittee name: The current (or most recent) official name of the committee. thomas_id: The four letter code used for the committee on the THOMAS advanced search page. senate_committee_id: For Senate and Joint committees, the four letter code used on Currently the same as the thomas_id. house_committee_id: For House committees, the two letter code used on Currently always the same as the last two letters of the thomas_id. jurisdiction: The committee's jurisdiction. jurisdiction_source: The source for the jurisdiction text. subcommittees: A list of subcommittees, with the following fields: name: The name of the subcommittee, excluding Subcommittee on that appears at the start of most subcommittee names. Some subcommittee names begin with a lowercase the so bear that in mind during display. thomas_id: The two digit (zero padded) code for the subcommittee as it appeared on THOMAS, and likely also the same code used on the House and Senate websites. Additional fields are present on current committee entries (that is, in committees current.yaml ): url: The current website URL of the committee. address: The mailing address for the committee. phone: The phone number of the committee. rss_url: The URL for the committee's RSS feed. minority_rss_url: The URL for the committee's minority party website's RSS feed. Two additional fields are present on committees and subcommmittees in the committees historical.yaml file: congresses: A list of Congress numbers in which this committee appears on the THOMAS advanced search page. It is roughly an indication of the time period during which the committee was in use. However, if a committee was not referred any bills it may not appear on THOMAS's list and therefore would not appear here. names: A list of past names for the committee. This is an associative array from a Congress number to the name of the committee. The name is that given on the THOMAS advanced search page for previous Congresses and does not always exactly match the official names of commmittees. Committee Membership Data Dictionary The committee membership current.yaml file contains current committee assignments, as of the date of the last update of this file. The file is structured as a mapping from committee IDs to a list of committee members. The basic structure looks like this: HSAG: name: Frank D. Lucas party: majority rank: 1 title: Chair bioguide: L000491 thomas: '00711' name: Bob Goodlatte party: majority rank: 2 (...snip...) HSAG03: name: Jean Schmidt party: majority rank: 1 title: Chair The committee IDs in this file are the thomas_id's from the committees current.yaml file, or for subcommittees the concatentation of the thomas_id of the parent committee and the thomas_id of the subcommittee. Each committee/subcommittee entry is a list containing the members of the committee. Each member has the following fields: name: The name of the Member of Congress. This field is intended for debugging. Instead, use the id fields. Some of the id fields used in the legislators YAML files, such as bioguide and thomas. party: Either majority or minority. Committee work is divided strictly by party. rank: The apparent rank of the member on the committee, within his or her party. This is based on the order of names on the House/Senate committee membership pages. Rank 1 is always for the committee chair or ranking member (the most senior minority party member). The rank is essentially approximate, because the House/Senate pages don't necessarily make a committment that the order on the page precisely indicates actual rank (if such a concept even applies). But if you want to preserve the order as displayed by the House and Senate, you can use this attribute. title: The title of the member on the committee, e.g. Chair, Ranking Member, or Ex Officio. This field is not normalized, however, so be prepared to accept any string. chamber: For joint committees only, the chamber that the representative is serving in, either house or senate . District Offices Data Dictionary The legistlators district offices.yaml file lists district offices for all currently serving Members of Congress. This data is crowdsourced from members' official websites. It does not include Congressional offices in Washington, D.C.; these are listed in the legislators current.yaml file. Each current Member of Congress has a listing in the file, comprised of two parts: ids and offices. The id section contains the fields bioguide, thomas, and govtrack, which correspond to fields with the same names in legislators current.yaml as described above. The bioguide field is required, and used as the primary key for this file. The offices section is a list of the Member's district offices. Each listing contains the following fields: address: The street address of the office, e.g. 123 Main St . building: The name of the building containing the office, if applicable, e.g. Dane County Courthouse . city: The city containing the office. required fax: The fax machine number of the office, e.g. 256 555 6043. hours: Free text field describing the days and hours the office is open. phone: The main phone number of the office, .e.g. 256 555 6043 state: The two letter state code of the state containing the office. required suite: The suite or room number of the office, if applicable, e.g. Suite 200 zip: The 5 digit USPS zip code of the office, e.g. 35055 . latitude: The decimal latitude of the office's geocoded location, e.g. 34.181059. longitude: The decimal longitude of the office's geocoded location, e.g. 86.840631. id: An identifier for the office, consisting of the member's bioguide id and the city name, e.g. X000055 seattle . required To qualify for inclusion in this file, an office must have at least an address or a phone number. The Executive Branch Because of their role in the legislative process, we also include a file executive.yaml which contains terms served by U.S. presidents (who signed legislation) and U.S. vice presidents (who are nominally the president of the Senate and occassionally cast tie breaking votes there). This file has a similar structure as the legislator files. The file contains a list, where each entry is a person. Each entry is a dict with id, name, bio, and terms fields. The id, bio, and name fields are the same as those listed above. Except: icpsr_prez: The numeric ICPSR identifier used in voteview.com historical roll call data when indicating the position of the President on a roll call vote. If the person also served in Congress, he or she will also have a regular icpsr ID with a different value. Each term has the following fields: type: either prez (a presidential term) or viceprez (a vice presidential term). start: The start date of the term. In modern times, typically January 20 following an election year. end: The end date of the term. In modern times, typically January 20 following an election year. party: The political party from which the person was elected. how: How the term came to be, either election (the normal case), succession (presidential succession), or appointment (the appointment by the president of a new vice president). Presidents and vice presidents that previously served in Congress will also be listed in one of the legislator files, but their Congressional terms will only appear in the legislator files and their executive branch terms will only appear in executive.yaml . State Abbreviations Although you can find the USPS abbreviations for the 50 states anywhere, non voting delegates from territories including historical territories that no longer exist are included in this database. Here is a complete list of abbreviations: The 50 States: AK Alaska AL Alabama AR Arkansas AZ Arizona CA California CO Colorado CT Connecticut DE Delaware FL Florida GA Georgia HI Hawaii IA Iowa ID Idaho IL Illinois IN Indiana KS Kansas KY Kentucky LA Louisiana MA Massachusetts MD Maryland ME Maine MI Michigan MN Minnesota MO Missouri MS Mississippi MT Montana NC North Carolina ND North Dakota NE Nebraska NH New Hampshire NJ New Jersey NM New Mexico NV Nevada NY New York OH Ohio OK Oklahoma OR Oregon PA Pennsylvania RI Rhode Island SC South Carolina SD South Dakota TN Tennessee TX Texas UT Utah VA Virginia VT Vermont WA Washington WI Wisconsin WV West Virginia WY Wyoming Current Territories: Legislators serving in the House from these territories are called delegates, except for the so called Resident Commissioner from Puerto Rico. AS American Samoa DC District of Columbia GU Guam MP Northern Mariana Islands PR Puerto Rico VI Virgin Islands Historical Territories: These territories no longer exist. DK Dakota Territory OL Territory of Orleans PI Philippines Territory/Commonwealth Helping us maintain the data You can just use the data directly without running any scripts. If you want to develop on and help maintain the data, our scripts are tested and developed on Python 3.6 . (Recommended) First, create a virtualenv in the scripts directory: bash cd scripts virtualenv virt source virt/bin/activate Install the requirements: bash pip install r requirements.txt Try updating the House members contact information (mailing address, etc.): bash python house_contacts.py Check whether and how the data has changed: bash git diff ../ .yaml We run the following scripts periodically to scrape for new information and keep the data files up to date. The scripts do not take any command line arguments. house_contacts.py : Updates House members' contact information (address, office, and phone fields on their current term, and their official_full name field) house_websites.py : Updates House members' current website URLs. senate_contacts.py : Updates senator information (party, class, state_rank, address, office, phone, and contact_form fields on their current term, and their official_full name, bioguide ID, and lis ID fields) committee_membership.py : Updates committees current.yaml (name, address, and phone fields for House committees; name and url fields for Senate committees; creates new subcommittees when found with name and thomas_id fields) and writes out a whole new committee membership current.yaml file by scraping the House and Senate websites. historical_committees.py : Updates committees historical.yaml based on the committees listed on THOMAS.gov, which are committees to which bills have been referred since the 103rd Congress (1973). social_media.py : Generates leads for Twitter, YouTube, and Facebook accounts for members of Congress by scraping their official websites. Uses a blacklist CSV and a whitelist CSV to manage false positives and negatives. influence_ids.py : Grabs updated FEC and OpenSecrets IDs from the Influence Explorer API . Will only work for members with a Bioguide ID. The following script takes one required command line argument icpsr_ids.py : Updates ICPSR ID's for all members of the House and Senate in a given congress, based on roll call vote data files stored by Voteview.com. The script takes one command line argument: congress congress_number where congress_number is the number of the Congress to be updated. As of July, 2013, the permanent URL for future roll call data is unclear, and as such, the script may need to be modified when it is run for the 114th congress. The following script is run to create alternately formatted data files for the gh pages branch. It takes no command line arguments. alternate_bulk_formats.py : creates JSON files for all YAML files and CSV files for current legislators, historical legislators, and district offices. The CSV files do not include all fields from the legislator YAML files, and do include data from the social media YAML. Two scripts help maintain and validate district office data: geocode_offices.py : Derives latitude, longitude pairs for office addresses. It should be run whenever new offices are added. By default this script geocodes all offices with addresses that have not already been geocoded. It optionally takes bioguide IDs as arguments, and in this case will geocode just offices for the specified ids. This script uses the Google Maps API, and requires that a key be set in scripts/cache/google_maps_api_key.txt . office_validator.py : Validates rules for district office data and reports errors and warnings. An optional skip warnings argument will suppress display of warnings. This script should be run whenever offices are added or modified. It is used by continuous integration testing, so errors here will cause the build to fail. Every script in scripts/ should be safely import able without executing code, beyond imports themselves. We typically do this with a def run(): declaration after the imports, and putting this at the bottom of the script: python if __name__ '__main__': run() Every pull request will pass submitted scripts through an import, to catch exceptions, and through pyflakes , to catch unused imports or local vars. To contribute updates for district offices, edit the legislators district offices.yaml file by hand and submit a pull request. Updates should pass validation as defined by scripts/office_validator.py . Other Scripts The ballotpedia field has been created using code from James Michael DuPont, using the code in git@github.com:h4ck3rm1k3/rootstrikers wikipedia.git in the branch ballotpedia . Related libraries Karl Nicholas made a set of Java classes to easily filter the data. TheWalkers maintain congress turk to do bulk collection of district office data using Amazon Mechanical Turk. Who's Using This Data Ongoing projects making use of this data: GovTrack.us Sunlight Congress API ProPublica Congress API Represent EveryPolitician.org Stories written with this data: Other projects: Margie Roswell's committee map Public domain This project is dedicated to the public domain (LICENSE). As spelled out in CONTRIBUTING (CONTRIBUTING.md): > The project is in the public domain within the United States, and copyright and related rights in the work worldwide are waived through the CC0 1.0 Universal public domain dedication . > All contributions to this project will be released under the CC0 dedication. By submitting a pull request, you are agreeing to comply with this waiver of copyright interest.",Unknown,Unknown 283,Unknown,Unknown,Unknown,"Traversable Dict .. image:: :target: .. image:: :target: .. image:: :target: Traversing and Querying Dicts the easy way Free software: BSD license Install .. code block:: console pip install t_dict Why? Dealing with deep nested dicts can be a total pain. TDict aims to make less boring working with it, using jsonpointer syntax for that. It stand on the shoulders of jsonpointer , which implements the RFC Usage .. code block:: python from t_dict.t_dict import TDict td TDict({'nested': { 'dict': 'here', 'other': {'spam': 'eggs'} }}) td.find('/nested/dict') >> 'here' td.find('/nested/notfound', 'defaultvalue') >> 'defaultvalue' td.setin('/nested/dict', 'new') td 'nested' 'dict' 'new' >> True converts dict to TDict isinstance(td.find('/nested/other'), TDict) >> True",Unknown,Unknown 284,Unknown,Unknown,Unknown,"Awesome Python Awesome A curated list of awesome Python frameworks, libraries, software and resources. Inspired by awesome php . Awesome Python ( awesome python) Admin Panels ( admin panels) Algorithms and Design Patterns ( algorithms and design patterns) Audio ( audio) Authentication ( authentication) Build Tools ( build tools) Built in Classes Enhancement ( built in classes enhancement) Caching ( caching) ChatOps Tools ( chatops tools) CMS ( cms) Code Analysis ( code analysis) Command line Interface Development ( command line interface development) Command line Tools ( command line tools) Compatibility ( compatibility) Computer Vision ( computer vision) Concurrency and Parallelism ( concurrency and parallelism) Configuration ( configuration) Cryptography ( cryptography) Data Analysis ( data analysis) Data Validation ( data validation) Data Visualization ( data visualization) Database ( database) Database Drivers ( database drivers) Date and Time ( date and time) Debugging Tools ( debugging tools) Deep Learning ( deep learning) DevOps Tools ( devops tools) Distributed Computing ( distributed computing) Distribution ( distribution) Documentation ( documentation) Downloader ( downloader) E commerce ( e commerce) Editor Plugins and IDEs ( editor plugins and ides) Email ( email) Environment Management ( environment management) Files ( files) Foreign Function Interface ( foreign function interface) Forms ( forms) Functional Programming ( functional programming) Game Development ( game development) Geolocation ( geolocation) GUI Development ( gui development) Hardware ( hardware) HTML Manipulation ( html manipulation) HTTP Clients ( Image Processing ( image processing) Implementations ( implementations) Interactive Interpreter ( interactive interpreter) Internationalization ( internationalization) Job Scheduler ( job scheduler) Logging ( logging) Machine Learning ( machine learning) Miscellaneous ( miscellaneous) Natural Language Processing ( natural language processing) Network Virtualization ( network virtualization) Networking ( networking) News Feed ( news feed) ORM ( orm) Package Management ( package management) Package Repositories ( package repositories) Permissions ( permissions) Processes ( processes) Queue ( queue) Recommender Systems ( recommender systems) RESTful API ( restful api) Robotics ( robotics) RPC Servers ( rpc servers) Science ( science) Search ( search) Serialization ( serialization) Serverless Frameworks ( serverless frameworks) Specific Formats Processing ( specific formats processing) Static Site Generator ( static site generator) Tagging ( tagging) Template Engine ( template engine) Testing ( testing) Text Processing ( text processing) Third party APIs ( third party apis) URL Manipulation ( url manipulation) Video ( video) Web Asset Management ( web asset management) Web Content Extracting ( web content extracting) Web Crawling ( web crawling) Web Frameworks ( web frameworks) WebSocket ( websocket) WSGI Servers ( wsgi servers) Services ( services) Code Quality ( code quality) Continuous Integration ( continuous integration) Resources ( resources) Podcasts ( podcasts) Twitter ( twitter) Websites ( websites) Weekly ( weekly) Contributing ( contributing) Admin Panels Libraries for administrative interfaces. ajenti The admin panel your servers deserve. django grappelli A jazzy skin for the Django Admin Interface. django jet Modern responsive template for the Django admin interface with improved functionality. django suit Alternative Django Admin Interface (free only for Non commercial use). django xadmin Drop in replacement of Django admin comes with lots of goodies. flask admin Simple and extensible administrative interface framework for Flask. flower Real time monitor and web admin for Celery. wooey A Django app which creates automatic web UIs for Python scripts. Algorithms and Design Patterns Python implementation of algorithms and design patterns. algorithms Minimal examples of data structures and algorithms in Python. PyPattyrn A simple yet effective library for implementing common design patterns. python patterns A collection of design patterns in Python. sortedcontainers Fast, pure Python implementation of SortedList, SortedDict, and SortedSet types. Audio Libraries for manipulating audio and its metadata. Audio audioread Cross library (GStreamer + Core Audio + MAD + FFmpeg) audio decoding. dejavu Audio fingerprinting and recognition. mingus An advanced music theory and notation package with MIDI file and playback support. pyAudioAnalysis Audio feature extraction, classification, segmentation and applications. pydub Manipulate audio with a simple and easy high level interface. TimeSide Open web audio processing framework. Metadata beets A music library manager and MusicBrainz tagger. eyeD3 A tool for working with audio files, specifically MP3 files containing ID3 metadata. mutagen A Python module to handle audio metadata. tinytag A library for reading music meta data of MP3, OGG, FLAC and Wave files. Authentication Libraries for implementing authentications schemes. OAuth authlib JavaScript Object Signing and Encryption draft implementation. django allauth Authentication app for Django that just works. django oauth toolkit OAuth 2 goodies for Django. oauthlib A generic and thorough implementation of the OAuth request signing logic. python oauth2 A fully tested, abstract interface to creating OAuth clients and servers. python social auth An easy to setup social authentication mechanism. JWT pyjwt JSON Web Token implementation in Python. python jose A JOSE implementation in Python. python jwt A module for generating and verifying JSON Web Tokens. Build Tools Compile software from source code. BitBake A make like build tool for embedded Linux. buildout A build system for creating, assembling and deploying applications from multiple parts. PlatformIO A console tool to build code with different development platforms. pybuilder A continuous build tool written in pure Python. SCons A software construction tool. Built in Classes Enhancement Libraries for enhancing Python built in classes. dataclasses (Python standard library) Data classes. attrs Replacement for __init__ , __eq__ , __repr__ , etc. boilerplate in class definitions. bidict Efficient, Pythonic bidirectional map data structures and related functionality.. Box Python dictionaries with advanced dot notation access. DottedDict A library that provides a method of accessing lists and dicts with a dotted path notation. CMS Content Management Systems. wagtail A Django content management system. django cms An Open source enterprise CMS based on the Django. feincms One of the most advanced Content Management Systems built on Django. Kotti A high level, Pythonic web application framework built on Pyramid. mezzanine A powerful, consistent, and flexible content management platform. plone A CMS built on top of the open source application server Zope. quokka Flexible, extensible, small CMS powered by Flask and MongoDB. Caching Libraries for caching data. beaker A WSGI middleware for sessions and caching. django cache machine Automatic caching and invalidation for Django models. django cacheops A slick ORM cache with automatic granular event driven invalidation. dogpile.cache dogpile.cache is next generation replacement for Beaker made by same authors. HermesCache Python caching library with tag based invalidation and dogpile effect prevention. pylibmc A Python wrapper around the libmemcached interface. python diskcache SQLite and file backed cache backend with faster lookups than memcached and redis. ChatOps Tools Libraries for chatbot development. errbot The easiest and most popular chatbot to implement ChatOps. Code Analysis Tools of static analysis, linters and code quality checkers. Also see awesome static analysis . Code Analysis coala Language independent and easily extendable code analysis application. code2flow Turn your Python and JavaScript code into DOT flowcharts. prospector A tool to analyse Python code. pycallgraph A library that visualises the flow (call graph) of your Python application. Code Linters flake8 A wrapper around pycodestyle , pyflakes and McCabe. pylint A fully customizable source code analyzer. pylama A code audit tool for Python and JavaScript. Code Formatters black The uncompromising Python code formatter. yapf Yet another Python code formatter from Google. Static Type Checkers mypy Check variable types during compile time. pyre check Performant type checking. Static Type Annotations Generators MonkeyType A system for Python that generates static type annotations by collecting runtime types Command line Interface Development Libraries for building command line applications. Command line Application Development cement CLI Application Framework for Python. click A package for creating beautiful command line interfaces in a composable way. cliff A framework for creating command line programs with multi level commands. clint Python Command line Application Tools. docopt Pythonic command line arguments parser. python fire A library for creating command line interfaces from absolutely any Python object. python prompt toolkit A library for building powerful interactive command lines. Terminal Rendering asciimatics A package to create full screen text UIs (from interactive forms to ASCII animations). bashplotlib Making basic plots in the terminal. colorama Cross platform colored terminal text. tqdm Fast, extensible progress bar for loops and CLI. Command line Tools Useful CLI based tools for productivity. Productivity Tools cookiecutter A command line utility that creates projects from cookiecutters (project templates). doitlive A tool for live presentations in the terminal. howdoi Instant coding answers via the command line. PathPicker Select files out of bash output. percol Adds flavor of interactive selection to the traditional pipe concept on UNIX. thefuck Correcting your previous console command. tmuxp A tmux session manager. try A dead simple CLI to try out python packages it's never been easier. CLI Enhancements A command line HTTP client, a user friendly cURL replacement. kube shell An integrated shell for working with the Kubernetes CLI. mycli A Terminal Client for MySQL with AutoCompletion and Syntax Highlighting. pgcli Postgres CLI with autocompletion and syntax highlighting. saws A Supercharged aws cli . Compatibility Libraries for migrating from Python 2 to 3. python future The missing compatibility layer between Python 2 and Python 3. python modernize Modernizes Python code for eventual Python 3 migration. six Python 2 and 3 compatibility utilities. Computer Vision Libraries for computer vision. OpenCV Open Source Computer Vision Library. pytesseract Another wrapper for Google Tesseract OCR . SimpleCV An open source framework for building computer vision applications. Concurrency and Parallelism Libraries for concurrent and parallel execution. Also see awesome asyncio . concurrent.futures (Python standard library) A high level interface for asynchronously executing callables. multiprocessing (Python standard library) Process based parallelism. eventlet Asynchronous framework with WSGI support. gevent A coroutine based Python networking library that uses greenlet . uvloop Ultra fast implementation of asyncio event loop on top of libuv . scoop Scalable Concurrent Operations in Python. Configuration Libraries for storing and parsing configuration options. configobj INI file parser with validation. configparser (Python standard library) INI file parser. profig Config from multiple formats with value conversion. python decouple Strict separation of settings from code. Cryptography cryptography A package designed to expose cryptographic primitives and recipes to Python developers. paramiko The leading native Python SSHv2 protocol library. passlib Secure password storage/hashing library, very high level. pynacl Python binding to the Networking and Cryptography (NaCl) library. Data Analysis Libraries for data analyzing. Blaze NumPy and Pandas interface to Big Data. Open Mining Business Intelligence (BI) in Pandas interface. Orange Data mining, data visualization, analysis and machine learning through visual programming or scripts. Pandas A library providing high performance, easy to use data structures and data analysis tools. Optimus Agile Data Science Workflows made easy with PySpark. Data Validation Libraries for validating data. Used for forms in many cases. Cerberus A lightweight and extensible data validation library. colander Validating and deserializing data obtained via XML, JSON, an HTML form post. jsonschema An implementation of JSON Schema for Python. schema A library for validating Python data structures. Schematics Data Structure Validation. valideer Lightweight extensible data validation and adaptation library. voluptuous A Python data validation library. Data Visualization Libraries for visualizing data. Also see awesome javascript . Altair Declarative statistical visualization library for Python. Bokeh Interactive Web Plotting for Python. bqplot Interactive Plotting Library for the Jupyter Notebook Dash Built on top of Flask, React and Plotly aimed at analytical web applications. awesome dash ggplot Same API as ggplot2 for R. Matplotlib A Python 2D plotting library. Pygal A Python SVG Charts Creator. PyGraphviz Python interface to Graphviz . PyQtGraph Interactive and realtime 2D/3D/Image plotting and science/engineering widgets. Seaborn Statistical data visualization using Matplotlib. VisPy High performance scientific visualization based on OpenGL. Database Databases implemented in Python. pickleDB A simple and lightweight key value store for Python. tinydb A tiny, document oriented database. ZODB A native object database for Python. A key value and object graph database. Database Drivers Libraries for connecting and operating databases. MySQL awesome mysql mysqlclient MySQL connector with Python 3 support ( mysql python fork). PyMySQL A pure Python MySQL driver compatible to mysql python. PostgreSQL awesome postgres psycopg2 The most popular PostgreSQL adapter for Python. queries A wrapper of the psycopg2 library for interacting with PostgreSQL. Other Relational Databases pymssql A simple database interface to Microsoft SQL Server. NoSQL Databases cassandra driver The Python Driver for Apache Cassandra. happybase A developer friendly library for Apache HBase. kafka python The Python client for Apache Kafka. py2neo Python wrapper client for Neo4j's restful interface. pymongo The official Python client for MongoDB. redis py The Python client for Redis. Asynchronous Clients motor The async Python driver for MongoDB. Telephus Twisted based client for Cassandra. txpostgres Twisted based asynchronous driver for PostgreSQL. txRedis Twisted based client for Redis. Date and Time Libraries for working with dates and times. Chronyk A Python 3 library for parsing human written times and dates. dateutil Extensions to the standard Python datetime module. delorean A library for clearing up the inconvenient truths that arise dealing with datetimes. moment A Python library for dealing with dates/times. Inspired by Moment.js . Pendulum Python datetimes made easy. PyTime A easy use Python module which aims to operate date/time/datetime by string. pytz World timezone definitions, modern and historical. Brings the tz database into Python. when.py Providing user friendly functions to help perform common date and time actions. maya Datetimes for Humans. Debugging Tools Libraries for debugging code. pdb like Debugger ipdb IPython enabled pdb . pdb++ Another drop in replacement for pdb. pudb A full screen, console based Python debugger. wdb An improbable web debugger through WebSockets. Tracing lptrace strace for Python programs. manhole Debugging UNIX socket connections and present the stacktraces for all threads and an interactive prompt. pyringe Debugger capable of attaching to and injecting code into Python processes. python hunter A flexible code tracing toolkit. Profiler line_profiler Line by line profiling. memory_profiler Monitor Memory usage of Python code. profiling An interactive Python profiler. py spy A sampling profiler for Python programs. Written in Rust. pyflame A ptracing profiler For Python. vprof Visual Python profiler. Others icecream Inspect variables, expressions, and program execution with a single, simple function call. django debug toolbar Display various debug information for Django. django devserver A drop in replacement for Django's runserver. flask debugtoolbar A port of the django debug toolbar to flask. pyelftools Parsing and analyzing ELF files and DWARF debugging information. Deep Learning Frameworks for Neural Networks and Deep Learning. Also see awesome deep learning . caffe A fast open framework for deep learning.. keras A high level neural networks library and capable of running on top of either TensorFlow or Theano. mxnet A deep learning framework designed for both efficiency and flexibility. pytorch Tensors and Dynamic neural networks in Python with strong GPU acceleration. SerpentAI Game agent framework. Use any video game as a deep learning sandbox. tensorflow The most popular Deep Learning framework created by Google. Theano A library for fast numerical computation. DevOps Tools Software and libraries for DevOps. ansible A radically simple IT automation platform. cloudinit A multi distribution package that handles early initialization of a cloud instance. cuisine Chef like functionality for Fabric. docker compose Fast, isolated development environments using Docker . fabric A simple, Pythonic tool for remote execution and deployment. fabtools Tools for writing awesome Fabric files. honcho A Python clone of Foreman , for managing Procfile based applications. OpenStack Open source software for building private and public clouds. pexpect Controlling interactive programs in a pseudo terminal like GNU expect. psutil A cross platform process and system utilities module. saltstack Infrastructure automation and management system. supervisor Supervisor process control system for UNIX. Distributed Computing Frameworks and libraries for Distributed Computing. Batch Processing PySpark Apache Spark Python API. dask A flexible parallel computing library for analytic computing. luigi A module that helps you build complex pipelines of batch jobs. mrjob Run MapReduce jobs on Hadoop or Amazon Web Services. Ray A system for parallel and distributed Python that unifies the machine learning ecosystem. Stream Processing faust A stream processing library, porting the ideas from Kafka Streams to Python. streamparse Run Python code against real time streams of data via Apache Storm . Distribution Libraries to create packaged executables for release distribution. dh virtualenv Build and distribute a virtualenv as a Debian package. Nuitka Compile scripts, modules, packages to an executable or extension module. py2app Freezes Python scripts (Mac OS X). py2exe Freezes Python scripts (Windows). PyInstaller Converts Python programs into stand alone executables (cross platform). pynsist A tool to build Windows installers, installers bundle Python itself. Documentation Libraries for generating project documentation. sphinx Python Documentation generator. awesome sphinxdoc pdoc Epydoc replacement to auto generate API documentation for Python libraries. pycco The literate programming style documentation generator. Downloader Libraries for downloading. s3cmd A command line tool for managing Amazon S3 and CloudFront. s4cmd Super S3 command line tool, good for higher performance. you get A YouTube/Youku/Niconico video downloader written in Python 3. youtube dl A small command line program to download videos from YouTube. E commerce Frameworks and libraries for e commerce and payments. alipay Unofficial Alipay API for Python. Cartridge A shopping cart app built using the Mezzanine. django oscar An open source e commerce framework for Django. django shop A Django based shop system. merchant A Django app to accept payments from various payment processors. money Money class with optional CLDR backed locale aware formatting and an extensible currency exchange. python currencies Display money format and its filthy currencies. forex python Foreign exchange rates, Bitcoin price index and currency conversion. saleor An e commerce storefront for Django. shoop An open source E Commerce platform based on Django. Editor Plugins and IDEs Emacs elpy Emacs Python Development Environment. Sublime Text anaconda Anaconda turns your Sublime Text 3 in a full featured Python development IDE. SublimeJEDI A Sublime Text plugin to the awesome auto complete library Jedi. Vim jedi vim Vim bindings for the Jedi auto completion library for Python. python mode An all in one plugin for turning Vim into a Python IDE. YouCompleteMe Includes Jedi based completion engine for Python. Visual Studio PTVS Python Tools for Visual Studio. Visual Studio Code Python The official VSCode extension with rich support for Python. IDE PyCharm Commercial Python IDE by JetBrains. Has free community edition available. spyder Open Source Python IDE. Email Libraries for sending and parsing email. envelopes Mailing for human beings. flanker A email address and Mime parsing library. imbox Python IMAP for Humans. inbox.py Python SMTP Server for Humans. lamson Pythonic SMTP Application Server. Marrow Mailer High performance extensible mail delivery framework. modoboa A mail hosting and management platform including a modern and simplified Web UI. Nylas Sync Engine Providing a RESTful API on top of a powerful email sync platform. yagmail Yet another Gmail/SMTP client. Environment Management Libraries for Python version and virtual environment management. pyenv Simple Python version management. pipenv Python Development Workflow for Humans. poetry Python dependency management and packaging made easy. virtualenv A tool to create isolated Python environments. Files Libraries for file manipulation and MIME type detection. mimetypes (Python standard library) Map filenames to MIME types. path.py A module wrapper for os.path . pathlib (Python standard library) An cross platform, object oriented path library. PyFilesystem2 Python's filesystem abstraction layer. python magic A Python interface to the libmagic file type identification library. Unipath An object oriented approach to file/directory operations. watchdog API and shell utilities to monitor file system events. Foreign Function Interface Libraries for providing foreign function interface. cffi Foreign Function Interface for Python calling C code. ctypes (Python standard library) Foreign Function Interface for Python calling C code. PyCUDA A Python wrapper for Nvidia's CUDA API. SWIG Simplified Wrapper and Interface Generator. Forms Libraries for working with forms. Deform Python HTML form generation library influenced by the formish form generation library. django bootstrap3 Bootstrap 3 integration with Django. django bootstrap4 Bootstrap 4 integration with Django. django crispy forms A Django app which lets you create beautiful forms in a very elegant and DRY way. django remote forms A platform independent Django form serializer. WTForms A flexible forms validation and rendering library. Functional Programming Functional Programming with Python. Coconut Coconut is a variant of Python built for simple, elegant, Pythonic functional programming. CyToolz Cython implementation of Toolz: High performance functional utilities. fn.py Functional programming in Python: implementation of missing features to enjoy FP. funcy A fancy and practical functional tools. Toolz A collection of functional utilities for iterators, functions, and dictionaries. GUI Development Libraries for working with graphical user interface applications. curses Built in wrapper for ncurses used to create terminal GUI applications. Eel A library for making simple Electron like offline HTML/JS GUI apps. enaml Creating beautiful user interfaces with Declaratic Syntax like QML. Flexx Flexx is a pure Python toolkit for creating GUI's, that uses web technology for its rendering. Gooey Turn command line programs into a full GUI application with one line. kivy A library for creating NUI applications, running on Windows, Linux, Mac OS X, Android and iOS. pyglet A cross platform windowing and multimedia library for Python. PyGObject Python Bindings for GLib/GObject/GIO/GTK+ (GTK+3). PyQt Python bindings for the Qt cross platform application and UI framework. PySimpleGUI Wrapper for tkinter, Qt, WxPython and Remi. pywebview A lightweight cross platform native wrapper around a webview component. Tkinter Tkinter is Python's de facto standard GUI package. Toga A Python native, OS native GUI toolkit. urwid A library for creating terminal GUI applications with strong support for widgets, events, rich colors, etc. wxPython A blending of the wxWidgets C++ class library with the Python. Game Development Awesome game development libraries. Cocos2d cocos2d is a framework for building 2D games, demos, and other graphical/interactive applications. Harfang3D Python framework for 3D, VR and game development. Panda3D 3D game engine developed by Disney. Pygame Pygame is a set of Python modules designed for writing games. PyOgre Python bindings for the Ogre 3D render engine, can be used for games, simulations, anything 3D. PyOpenGL Python ctypes bindings for OpenGL and it's related APIs. PySDL2 A ctypes based wrapper for the SDL2 library. RenPy A Visual Novel engine. Geolocation Libraries for geocoding addresses and working with latitudes and longitudes. django countries A Django app that provides a country field for models and forms. GeoDjango A world class geographic web framework. GeoIP Python API for MaxMind GeoIP Legacy Database. geojson Python bindings and utilities for GeoJSON. geopy Python Geocoding Toolbox. pygeoip Pure Python GeoIP API. HTML Manipulation Libraries for working with HTML and XML. BeautifulSoup Providing Pythonic idioms for iterating, searching, and modifying HTML or XML. bleach A whitelist based HTML sanitization and text linkification library. cssutils A CSS library for Python. html5lib A standards compliant library for parsing and serializing HTML documents and fragments. lxml A very fast, easy to use and versatile library for handling HTML and XML. MarkupSafe Implements a XML/HTML/XHTML Markup safe string for Python. pyquery A jQuery like library for parsing HTML. untangle Converts XML documents to Python objects for easy access. WeasyPrint A visual rendering engine for HTML and CSS that can export to PDF. xmldataset Simple XML Parsing. xmltodict Working with XML feel like you are working with JSON. HTTP Clients Libraries for working with HTTP. grequests requests + gevent for asynchronous HTTP requests. Comprehensive HTTP client library. requests HTTP Requests for Humans™. treq Python requests like API built on top of Twisted's HTTP client. urllib3 A HTTP library with thread safe connection pooling, file post support, sanity friendly. Hardware Libraries for programming with hardware. ino Command line toolkit for working with Arduino . keyboard Hook and simulate global keyboard events on Windows and Linux. mouse Hook and simulate global mouse events on Windows and Linux. Pingo Pingo provides a uniform API to program devices like the Raspberry Pi, pcDuino, Intel Galileo, etc. PyUserInput A module for cross platform control of the mouse and keyboard. scapy A brilliant packet manipulation library. wifi A Python library and command line tool for working with WiFi on Linux. Image Processing Libraries for manipulating images. hmap Image histogram remapping. imgSeek A project for searching a collection of images using visual similarity. nude.py Nudity detection. pagan Retro identicon (Avatar) generation based on input string and hash. pillow Pillow is the friendly PIL fork. pyBarcode Create barcodes in Python without needing PIL. pygram Instagram like image filters. python qrcode A pure Python QR Code generator. Quads Computer art based on quadtrees. scikit image A Python library for (scientific) image processing. thumbor A smart imaging service. It enables on demand crop, re sizing and flipping of images. wand Python bindings for MagickWand , C API for ImageMagick. Implementations Implementations of Python. CPython Default, most widely used implementation of the Python programming language written in C. Cython Optimizing Static Compiler for Python. CLPython Implementation of the Python programming language written in Common Lisp. Grumpy More compiler than interpreter as more powerful CPython2.7 replacement (alpha). IronPython Implementation of the Python programming language written in C . Jython Implementation of Python programming language written in Java for the JVM. MicroPython A lean and efficient Python programming language implementation. Numba Python JIT compiler to LLVM aimed at scientific Python. PeachPy x86 64 assembler embedded in Python. Pyjion A JIT for Python based upon CoreCLR. PyPy A very fast and compliant implementation of the Python language. Pyston A Python implementation using JIT techniques. Stackless Python An enhanced version of the Python programming language. Interactive Interpreter Interactive Python interpreters (REPL). bpython A fancy interface to the Python interpreter. Jupyter Notebook (IPython) A rich toolkit to help you make the most out of using Python interactively. awesome jupyter ptpython Advanced Python REPL built on top of the python prompt toolkit . Internationalization Libraries for working with i18n. Babel An internationalization library for Python. PyICU A wrapper of International Components for Unicode C++ library ( ICU ). Job Scheduler Libraries for scheduling jobs. APScheduler A light but powerful in process task scheduler that lets you schedule functions. django schedule A calendaring app for Django. doit A task runner and build tool. gunnery Multipurpose task execution tool for distributed systems with web based interface. Joblib A set of tools to provide lightweight pipelining in Python. Plan Writing crontab file in Python like a charm. schedule Python job scheduling for humans. Spiff A powerful workflow engine implemented in pure Python. TaskFlow A Python library that helps to make task execution easy, consistent and reliable. Airflow Airflow is a platform to programmatically author, schedule and monitor workflows. Logging Libraries for generating and working with logs. Eliot Logging for complex & distributed systems. logbook Logging replacement for Python. logging (Python standard library) Logging facility for Python. raven Python client for Sentry, a log/error tracking, crash reporting and aggregation platform for web applications. Machine Learning Libraries for Machine Learning. Also see awesome machine learning . H2O Open Source Fast Scalable Machine Learning Platform. Metrics Machine learning evaluation metrics. NuPIC Numenta Platform for Intelligent Computing. scikit learn The most popular Python library for Machine Learning. Spark ML Apache Spark 's scalable Machine Learning library. vowpal_porpoise A lightweight Python wrapper for Vowpal Wabbit . xgboost A scalable, portable, and distributed gradient boosting library. Microsoft Windows Python programming on Microsoft Windows. Python(x,y) Scientific applications oriented Python Distribution based on Qt and Spyder. pythonlibs Unofficial Windows binaries for Python extension packages. PythonNet Python Integration with the .NET Common Language Runtime (CLR). PyWin32 Python Extensions for Windows. WinPython Portable development environment for Windows 7/8. Miscellaneous Useful libraries or tools that don't fit in the categories above. blinker A fast Python in process signal/event dispatching system. boltons A set of pure Python utilities. itsdangerous Various helpers to pass trusted data to untrusted environments. pluginbase A simple but flexible plugin system for Python. tryton A general purpose business framework. Natural Language Processing Libraries for working with human languages. General gensim Topic Modelling for Humans. langid.py Stand alone language identification system. nltk A leading platform for building Python programs to work with human language data. pattern A web mining module for the Python. polyglot Natural language pipeline supporting hundreds of languages. pytext A natural language modeling framework based on PyTorch. PyTorch NLP A toolkit enabling rapid deep learning NLP prototyping for research. spacy A library for industrial strength natural language processing in Python and Cython. stanfordnlp The Stanford NLP Group's official Python library, supporting 50+ languages. Chinese jieba The most popular Chinese text segmentation library. pkuseg python A toolkit for Chinese word segmentation in various domains. snownlp A library for processing Chinese text. funNLP A collection of tools and datasets for Chinese NLP. Network Virtualization Tools and libraries for Virtual Networking and SDN (Software Defined Networking). mininet A popular network emulator and API written in Python. pox A Python based SDN control applications, such as OpenFlow SDN controllers. Networking Libraries for networking programming. asyncio (Python standard library) Asynchronous I/O, event loop, coroutines and tasks. awesome asyncio pulsar Event driven concurrent framework for Python. pyzmq A Python wrapper for the ZeroMQ message library. Twisted An event driven networking engine. napalm Cross vendor API to manipulate network devices. News Feed Libraries for building user's activities. django activity stream Generating generic activity streams from the actions on your site. Stream Framework Building newsfeed and notification systems using Cassandra and Redis. ORM Libraries that implement Object Relational Mapping or data mapping techniques. Relational Databases Django Models A part of Django. SQLAlchemy The Python SQL Toolkit and Object Relational Mapper. awesome sqlalchemy dataset Store Python dicts in a database works with SQLite, MySQL, and PostgreSQL. orator The Orator ORM provides a simple yet beautiful ActiveRecord implementation. peewee A small, expressive ORM. pony ORM that provides a generator oriented interface to SQL. pydal A pure Python Database Abstraction Layer. NoSQL Databases hot redis Rich Python data types for Redis. mongoengine A Python Object Document Mapper for working with MongoDB. PynamoDB A Pythonic interface for Amazon DynamoDB . redisco A Python Library for Simple Models and Containers Persisted in Redis. Package Management Libraries for package and dependency management. pip The Python package and dependency manager. PyPI pip tools A set of tools to keep your pinned Python dependencies fresh. conda Cross platform, Python agnostic binary package manager. Package Repositories Local PyPI repository server and proxies. warehouse Next generation Python Package Repository (PyPI). bandersnatch PyPI mirroring tool provided by Python Packaging Authority (PyPA). devpi PyPI server and packaging/testing/release tool. localshop Local PyPI server (custom packages and auto mirroring of pypi). Permissions Libraries that allow or deny users access to data or functionality. django guardian Implementation of per object permissions for Django 1.2+ django rules A tiny but powerful app providing object level permissions to Django, without requiring a database. Processes Libraries for starting and communicating with OS processes. delegator.py Subprocesses for Humans™ 2.0. sarge Yet another wrapper for subprocess. sh A full fledged subprocess replacement for Python. Queue Libraries for working with event and task queues. celery An asynchronous task queue/job queue based on distributed message passing. huey Little multi threaded task queue. mrq Mr. Queue A distributed worker task queue in Python using Redis & gevent. rq Simple job queues for Python. Recommender Systems Libraries for building recommender systems. annoy Approximate Nearest Neighbors in C++/Python optimized for memory usage. fastFM A library for Factorization Machines. implicit A fast Python implementation of collaborative filtering for implicit datasets. libffm A library for Field aware Factorization Machine (FFM). lightfm A Python implementation of a number of popular recommendation algorithms. spotlight Deep recommender models using PyTorch. Surprise A scikit for building and analyzing recommender systems. tensorrec A Recommendation Engine Framework in TensorFlow. RESTful API Libraries for developing RESTful APIs. Django django rest framework A powerful and flexible toolkit to build web APIs. django tastypie Creating delicious APIs for Django apps. Flask eve REST API framework powered by Flask, MongoDB and good intentions. flask api utils Taking care of API representation and authentication for Flask. flask api Browsable Web APIs for Flask. flask restful Quickly building REST APIs for Flask. flask restless Generating RESTful APIs for database models defined with SQLAlchemy. Pyramid cornice A RESTful framework for Pyramid. Framework agnostic apistar A smart Web API framework, designed for Python 3. falcon A high performance framework for building cloud APIs and web app backends. hug A Python 3 framework for cleanly exposing APIs. restless Framework agnostic REST framework based on lessons learned from Tastypie. ripozo Quickly creating REST/HATEOAS/Hypermedia APIs. sandman Automated REST APIs for existing database driven systems. Robotics Libraries for robotics. PythonRobotics This is a compilation of various robotics algorithms with visualizations. rospy This is a library for ROS (Robot Operating System). RPC Servers RPC compatible servers. SimpleJSONRPCServer This library is an implementation of the JSON RPC specification. SimpleXMLRPCServer (Python standard library) Simple XML RPC server implementation, single threaded. zeroRPC zerorpc is a flexible RPC implementation based on ZeroMQ and MessagePack . Science Libraries for scientific computing. Also see Python for Scientists astropy A community Python library for Astronomy. bcbio nextgen Providing best practice pipelines for fully automated high throughput sequencing analysis. bccb Collection of useful code related to biological analysis. Biopython Biopython is a set of freely available tools for biological computation. cclib A library for parsing and interpreting the results of computational chemistry packages. Colour Implementing a comprehensive number of colour theory transformations and algorithms. NetworkX A high productivity software for complex networks. NIPY A collection of neuroimaging toolkits. NumPy A fundamental package for scientific computing with Python. Open Babel A chemical toolbox designed to speak the many languages of chemical data. ObsPy A Python toolbox for seismology. PyDy Short for Python Dynamics, used to assist with workflow in the modeling of dynamic motion. PyMC Markov Chain Monte Carlo sampling toolkit. QuTiP Quantum Toolbox in Python. RDKit Cheminformatics and Machine Learning Software. SciPy A Python based ecosystem of open source software for mathematics, science, and engineering. statsmodels Statistical modeling and econometrics in Python. SymPy A Python library for symbolic mathematics. Zipline A Pythonic algorithmic trading library. SimPy A process based discrete event simulation framework. Search Libraries and software for indexing and performing search queries on data. elasticsearch py The official low level Python client for Elasticsearch . elasticsearch dsl py The official high level Python client for Elasticsearch. django haystack Modular search for Django. pysolr A lightweight Python wrapper for Apache Solr . whoosh A fast, pure Python search engine library. Serialization Libraries for serializing complex data types marshmallow A lightweight library for converting complex objects to and from simple Python datatypes. pysimdjson A Python bindings for simdjson . python rapidjson A Python wrapper around RapidJSON . Serverless Frameworks Frameworks for developing serverless Python code. python lambda A toolkit for developing and deploying Python code in AWS Lambda. Zappa A tool for deploying WSGI applications on AWS Lambda and API Gateway. Specific Formats Processing Libraries for parsing and manipulating specific text formats. General tablib A module for Tabular Datasets in XLS, CSV, JSON, YAML. Office openpyxl A library for reading and writing Excel 2010 xlsx/xlsm/xltx/xltm files. pyexcel Providing one API for reading, manipulating and writing csv, ods, xls, xlsx and xlsm files. python docx Reads, queries and modifies Microsoft Word 2007/2008 docx files. python pptx Python library for creating and updating PowerPoint (.pptx) files. unoconv Convert between any document format supported by LibreOffice/OpenOffice. XlsxWriter A Python module for creating Excel .xlsx files. xlwings A BSD licensed library that makes it easy to call Python from Excel and vice versa. xlwt / xlrd Writing and reading data and formatting information from Excel files. PDF PDFMiner A tool for extracting information from PDF documents. PyPDF2 A library capable of splitting, merging and transforming PDF pages. ReportLab Allowing Rapid creation of rich PDF documents. Markdown Mistune Fastest and full featured pure Python parsers of Markdown. Python Markdown A Python implementation of John Gruber’s Markdown. YAML PyYAML YAML implementations for Python. CSV csvkit Utilities for converting to and working with CSV. Archive unp A command line tool that can unpack archives easily. Static Site Generator Static site generator is a software that takes some text + templates as input and produces HTML files on the output. mkdocs Markdown friendly documentation generator. pelican Static site generator that supports Markdown and reST syntax. lektor An easy to use static CMS and blog engine. nikola A static website and blog generator. Tagging Libraries for tagging items. django taggit Simple tagging for Django. Template Engine Libraries and tools for templating and lexing. Jinja2 A modern and designer friendly templating language. Genshi Python templating toolkit for generation of web aware output. Mako Hyperfast and lightweight templating for the Python platform. Testing Libraries for testing codebases and generating test data. Testing Frameworks pytest A mature full featured Python testing tool. hypothesis Hypothesis is an advanced Quickcheck style property based testing library. nose2 The successor to nose , based on unittest2. Robot Framework A generic test automation framework. unittest (Python standard library) Unit testing framework. Test Runners green A clean, colorful test runner. mamba The definitive testing tool for Python. Born under the banner of BDD. tox Auto builds and tests distributions in multiple Python versions GUI / Web Testing locust Scalable user load testing tool written in Python. PyAutoGUI PyAutoGUI is a cross platform GUI automation Python module for human beings. Selenium Python bindings for Selenium WebDriver. sixpack A language agnostic A/B Testing framework. splinter Open source tool for testing web applications. Mock mock (Python standard library) A mocking and patching library. doublex Powerful test doubles framework for Python. freezegun Travel through time by mocking the datetime module. httmock A mocking library for requests for Python 2.6+ and 3.2+. HTTP request mock tool for Python. mocket A socket mock framework with gevent/asyncio/SSL support. responses A utility library for mocking out the requests Python library. VCR.py Record and replay HTTP interactions on your tests. Object Factories factory_boy A test fixtures replacement for Python. mixer Another fixtures replacement. Supported Django, Flask, SQLAlchemy, Peewee and etc. model_mommy Creating random fixtures for testing in Django. Code Coverage coverage Code coverage measurement. Fake Data mimesis is a Python library that help you generate fake data. fake2db Fake database generator. faker A Python package that generates fake data. radar Generate random datetime / time. Text Processing Libraries for parsing and manipulating plain texts. General chardet Python 2/3 compatible character encoding detector. difflib (Python standard library) Helpers for computing deltas. ftfy Makes Unicode text less broken and more consistent automagically. fuzzywuzzy Fuzzy String Matching. Levenshtein Fast computation of Levenshtein distance and string similarity. pangu.py Paranoid text spacing. pyfiglet An implementation of figlet written in Python. pypinyin Convert Chinese hanzi (漢字) to pinyin (拼音). textdistance Compute distance between sequences with 30+ algorithms. unidecode ASCII transliterations of Unicode text. Slugify awesome slugify A Python slugify library that can preserve unicode. python slugify A Python slugify library that translates unicode to ASCII. unicode slugify A slugifier that generates unicode slugs with Django as a dependency. Unique identifiers hashids Implementation of hashids in Python. shortuuid A generator library for concise, unambiguous and URL safe UUIDs. Parser ply Implementation of lex and yacc parsing tools for Python. pygments A generic syntax highlighter. pyparsing A general purpose framework for generating parsers. python nameparser Parsing human names into their individual components. python phonenumbers Parsing, formatting, storing and validating international phone numbers. python user agents Browser user agent parser. sqlparse A non validating SQL parser. Third party APIs Libraries for accessing third party services APIs. Also see List of Python API Wrappers and Libraries . apache libcloud One Python library for all clouds. boto3 Python interface to Amazon Web Services. django wordpress WordPress models and views for Django. facebook sdk Facebook Platform Python SDK. google api python client Google APIs Client Library for Python. gspread Google Spreadsheets Python API. twython A Python wrapper for the Twitter API. URL Manipulation Libraries for parsing URLs. furl A small Python library that makes parsing and manipulating URLs easy. purl A simple, immutable URL class with a clean API for interrogation and manipulation. pyshorteners A pure Python URL shortening lib. webargs A friendly library for parsing HTTP request arguments with built in support for popular web frameworks. Video Libraries for manipulating video and GIFs. moviepy A module for script based movie editing with many formats, including animated GIFs. scikit video Video processing routines for SciPy. WSGI Servers WSGI compatible web servers. bjoern Asynchronous, very fast and written in C. gunicorn Pre forked, partly written in C. uWSGI A project aims at developing a full stack for building hosting services, written in C. waitress Multi threaded, powers Pyramid. werkzeug A WSGI utility library for Python that powers Flask and can easily be embedded into your own projects. Web Asset Management Tools for managing, compressing and minifying website assets. django compressor Compresses linked and inline JavaScript or CSS into a single cached file. django pipeline An asset packaging library for Django. django storages A collection of custom storage back ends for Django. fanstatic Packages, optimizes, and serves static file dependencies as Python packages. fileconveyor A daemon to detect and sync files to CDNs, S3 and FTP. flask assets Helps you integrate webassets into your Flask app. webassets Bundles, optimizes, and manages unique cache busting URLs for static resources. Web Content Extracting Libraries for extracting web contents. html2text Convert HTML to Markdown formatted text. lassie Web Content Retrieval for Humans. micawber A small library for extracting rich content from URLs. newspaper News extraction, article extraction and content curation in Python. python readability Fast Python port of arc90's readability tool. requests html Pythonic HTML Parsing for Humans. sumy A module for automatic summarization of text documents and HTML pages. textract Extract text from any document, Word, PowerPoint, PDFs, etc. toapi Every web site provides APIs. Web Crawling Libraries to automate web scraping. cola A distributed crawling framework. feedparser Universal feed parser. grab Site scraping framework. MechanicalSoup A Python library for automating interaction with websites. pyspider A powerful spider system. robobrowser A simple, Pythonic library for browsing the web without a standalone web browser. scrapy A fast high level screen scraping and web crawling framework. portia Visual scraping for Scrapy. Web Frameworks Full stack web frameworks. Django The most popular web framework in Python. awesome django Flask A microframework for Python. awesome flask Masonite The modern and developer centric Python web framework. Pyramid A small, fast, down to earth, open source Python web framework. awesome pyramid Sanic Web server that's written to go fast. Vibora Fast, efficient and asynchronous Web framework inspired by Flask. Tornado A Web framework and asynchronous networking library. WebSocket Libraries for working with WebSocket. autobahn python WebSocket & WAMP for Python on Twisted and asyncio . crossbar Open source Unified Application Router (Websocket & WAMP for Python on Autobahn). django channels Developer friendly asynchrony for Django. django socketio WebSockets for Django. WebSocket for Python WebSocket client and server library for Python 2 and 3 as well as PyPy. Services Online tools and APIs to simplify development. Continuous Integration Also see awesome CIandCD . CircleCI A CI service that can run very fast parallel testing. Travis CI A popular CI service for your open source and private projects. (GitHub only) Vexor CI A continuous integration tool for private apps with pay per minute billing model. Wercker A Docker based platform for building and deploying applications and microservices. Code Quality Codacy Automated Code Review to ship better code, faster. Codecov Code coverage dashboard. CodeFactor Automated Code Review for Git. Landscape Hosted continuous Python code metrics. PEP 8 Speaks GitHub integration to review code style. Resources Where to discover new Python libraries. Podcasts From Python Import Podcast Podcast.init Python Bytes Python Testing Radio Free Python Talk Python To Me Test and Code Twitter @codetengu @getpy @importpython @planetpython @pycoders @pypi @pythontrending @PythonWeekly @TalkPython @realpython Websites /r/CoolGithubProjects /r/Python Awesome Python @LibHunt Django Packages Full Stack Python Python Cheatsheet Python Hackers Python ZEEF Python 开发社区 Real Python Trending Python repositories on GitHub today Сообщество Python Программистов Weekly CodeTengu Weekly 碼天狗週刊 Import Python Newsletter Pycoder's Weekly Python Weekly Python Tricks Contributing Your contributions are always welcome! Please take a look at the contribution guidelines first. I will keep some pull requests open if I'm not sure whether those libraries are awesome, you could vote for them by adding :+1: to them. Pull requests will be merged when their votes reach 20 . If you have any question about this opinionated list, do not hesitate to contact me @vinta on Twitter or open an issue on GitHub.",Unknown,Unknown 285,Unknown,Unknown,Unknown,"Programming with Python (MATH20622) Lecture notes from the course taught at the University of Manchester in the academic year 2014/15. Notes: The exercises are meant to be solved at home, prior to lab classes. Lab classes problems are meant to be solved by students during the lab classes (and finished at home), but you are welcome to try them at home as well. The standard form for the name of the solution files is name xx yy.py . Those files that have additional parts in their names are aimed at the students interested in more advanced programming, usually incorporating more than what is covered by the materials so far, and – as such – will not be part of that week's tests. For example, the files named in the form name xx yy py.py are so called Pythonic solutions that use more advanced Python constructs (some of which will be covered in the later lectures). Each solution has a short description in its docstring, emphasising when it is not a standard solution. Week 1: Introductions Lecture: The course introduction (01a intro/01a intro.pdf) Lab class: Python introduction, input/output, variables, operators (01c python_intro.ipynb) Week 2: Loops and conditionals Lecture (02a loops_conditionals.ipynb) Exercises (02b exercises.ipynb) Lab class (02c loops_conditionals.ipynb) Week 3: Functions Lecture (03a functions.ipynb) Exercises (03b exercises.ipynb) Lab class (03c functions.ipynb) Week 4: Lists and other iterables Lecture (04a lists.ipynb) Exercises (04b exercises.ipynb) Lab class (04c lists.ipynb) Week 5: Strings, generators, and generator expressions Lecture (05a strings.ipynb) Exercises (05b exercises.ipynb) Lab class (05c strings.ipynb) Week 6: Control flow Lecture (06a control_flow.ipynb) Exercises (06b exercises.ipynb) Lab class (06c control_flow.ipynb) Week 7: Modules Lecture (07a modules.ipynb) Exercises (07b exercises.ipynb) Lab class (07c modules.ipynb) Week 8: Files I/O Lecture (08a file_io.ipynb) Exercises (08b exercises.ipynb) Lab class (08c file_io.ipynb) Week 9: Analysis of algorithms Lecture (09a analysis_of_algorithms.ipynb) Exercises (09b exercises.ipynb) Lab class (09c analysis_of_algorithms.ipynb) Week 10: Data analysis Lecture (10a data_analysis.ipynb) Lab class (10c data_analysis.ipynb) Week 11: Graphs Lecture (11a graphs.ipynb) Lab class (11c graphs.ipynb) Week 12: PIL Python Imaging Library Lecture (12a pil.ipynb) Lab class (12c pil.ipynb)",Unknown,Unknown 286,Unknown,Unknown,Unknown,"PY RETROSHEET Python scripts for Retrosheet data downloading and parsing. YE REQUIREMENTS Chadwick 0.6.2 python 2.5+ (don't know about 3.0, sorry) sqlalchemy: if using postgres psycopg2 python package (dependency for sqlalchemy) USAGE Setup cp scripts/config.ini.dist scripts/config.ini Edit scripts/config.ini as needed. See the steps below for what might need to be changed. Download python download.py y year The scripts/download.py script downloads Retrosheet data. Edit the config.ini file to configure what types of files should be downloaded. Optionally set the year to download via the command line argument. download > dl_eventfiles determines if Retrosheet Event Files should be downloaded or not. These are the only files that can be processed by parse.py at this time. download > dl_gamelogs determines if Retrosheet Game Logs should be downloaded or not. These are not able to be processed by parse.py at this time. Parse into SQL python parse.py y After the files have been downloaded, parse them into SQL with parse.py . 1. Create database called retrosheet (or whatever). 2. Add schema to the database w/ the included SQL script (the .postgres.sql one works nicely w/ PG, the other w/ MySQL) 3. Configure the file config.ini with your appropriate ENGINE , USER , HOST , PASSWORD , and DATABASE values if you're using postgres, you can optionally define SCHEMA and download directory Valid values for ENGINE are valid sqlalchemy engines e.g. 'mysql', 'postgresql', or 'sqlite', If you have your server configured to allow passwordless connections, you don't need to define USER and PASSWORD . If you are using sqlite3, database in the config should be the path to your database file. Specify directory for retrosheet files to be downloaded to, needs to exist before script runs 5. Run parse.py to parse the files and insert the data into the database. (optionally use y YYYY to import just one year) Environment Variables (optional) Instead of editing the config.ini file, you may, optionally, use environment variables to set configuration options. Name the environment variables in the format _ . Thus, an environment variable that sets the database username would be called DATABASE_USER . The environment variables overwrite any settings in the config.ini file. Example, $ DATABASE_DATABASE rtrsht_testing CHADWICK_DIRECTORY /usr/bin/ python parse.py y 1956 YE GRATITUDE Github user jeffcrow made many fixes and additions and added sqlite support JUST THE DATA If you're using PostgreSQL (and you should be), you can get a dump of all data up through 2016 (warning: 521MB) here Importing into PostgreSQL After creating a PostgreSQL user named wells , you can create a database from the dump by running pg_restore U d 1 retrosheet.2016.psql . License I don't care. Have at it.",Unknown,Unknown 287,Unknown,Unknown,Unknown,"Profiling The profiling package is an interactive continuous Python profiler. It is inspired from Unity 3D profiler. This package provides these features: Profiling statistics keep the frame stack. An interactive TUI profiling statistics viewer. Provides both of statistical and deterministic profiling. Utilities for remote profiling. Thread or greenlet aware CPU timer. Supports Python 2.7, 3.3, 3.4 and 3.5. Currently supports only Linux. Build Status Coverage Status Unity 3D : Installation Install the latest release via PyPI: sh $ pip install profiling Profiling To profile a single program, simply run the profiling command: sh $ profiling your program.py Then an interactive viewer will be executed: ! (screenshots/tracing.png) If your program uses greenlets, choose greenlet timer: sh $ profiling timer greenlet your program.py With dump option, it saves the profiling result to a file. You can browse the saved result by using the view subcommand: sh $ profiling dump your program.prf your program.py $ profiling view your program.prf If your script reads sys.argv , append your arguments after . It isolates your arguments from the profiling command: sh $ profiling your program.py your flag your param 42 Live Profiling If your program has a long life time like a web server, a profiling result at the end of program is not helpful enough. Probably you need a continuous profiler. It can be achived by the live profile subcommand: sh $ profiling live profile webserver.py See a demo: asciicast There's a live profiling server also. The server doesn't profile the program at ordinary times. But when a client connects to the server, it starts to profile and reports the results to the all connected clients. Start a profling server by the remote profile subcommand: sh $ profiling remote profile webserver.py bind 127.0.0.1:8912 And also run a client for the server by the view subcommand: sh $ profiling view 127.0.0.1:8912 Statistical Profiling TracingProfiler , the default profiler, implements a deterministic profiler for deep call graph. Of course, it has heavy overhead. The overhead can pollute your profiling result or can make your application to be slow. In contrast, SamplingProfiler implements a statistical profiler. Like other statistical profilers, it also has only very cheap overhead. When you profile you can choose it by just sampling (shortly S ) option: sh $ profiling live profile S webserver.py ^^ ! (screenshots/sampling.png) Timeit then Profiling Do you use timeit to check the performance of your code? sh $ python m timeit s 'from trueskill import ' 'rate_1vs1(Rating(), Rating())' 1000 loops, best of 3: 722 usec per loop If you want to profile the checked code, simply use the timeit subcommand: sh $ profiling timeit s 'from trueskill import ' 'rate_1vs1(Rating(), Rating())' ^^^^^^^^^ Profiling from Code You can also profile your program by profiling.tracing.TracingProfiler or profiling.sampling.SamplingProfiler directly: python from profiling.tracing import TracingProfiler profile your program. profiler TracingProfiler() profiler.start() ... run your program. profiler.stop() or using context manager. with profiler: ... run your program. view and interact with the result. profiler.run_viewer() or save profile data to file profiler.dump('path/to/file') Viewer Key Bindings q Quit. space Pause/Resume. \\ Toggle layout between NESTED and FLAT. ↑ and ↓ Navigate frames. → Expand the frame. ← Fold the frame. > Go to the hotspot. esc Defocus. and Change sorting column. Columns Common FUNCTION 1. The function name with the code location. (e.g. my_func (my_code.py:42) , my_func (my_module:42) ) 1. Only the location without line number. (e.g. my_code.py , my_module ) Tracing Profiler CALLS Total call count of the function. OWN (Exclusive Time) Total spent time in the function excluding sub calls. /CALL after OWN Exclusive time per call. % after OWN Exclusive time per total spent time. DEEP (Inclusive Time) Total spent time in the function. /CALL after DEEP Inclusive time per call. % after DEEP Inclusive time per total spent time. Sampling Profiler OWN (Exclusive Samples) Number of samples which are collected during the direct execution of the function. % after OWN Exclusive samples per number of the total samples. DEEP (Inclusive Samples) Number of samples which are collected during the excution of the function. % after DEEP Inclusive samples per number of the total samples. Testing There are some additional requirements to run the test code, which can be installed by running the following command. sh $ pip install $(python test/fit_requirements.py test/requirements.txt) Then you should be able to run pytest . sh $ pytest v Thanks to Seungmyeong Yang who suggested this project. Pavel who inspired to implement m option. Licensing Written by Heungsub Lee at What! Studio in Nexon , and distributed under the BSD 3 Clause license. Heungsub Lee : What! Studio : Nexon : BSD 3 Clause :",Unknown,Unknown 288,Unknown,Unknown,Unknown,"psignifit Python toolbox for Bayesian psychometric function estimation. Installation For users pip install For developers, from within the git repo clone: pip install e . Testing For users within Python console >>> import psignifit >>> psignifit.test() For developers, from within the git repo clone: pytest Contributors See the CONTRIBUTORS file License and COPYRIGHT See the COPYRIGHT file",Unknown,Unknown 289,Unknown,Unknown,Unknown,"mac java launcher Launcher for bundled java applications on Mac OS Usage is simple: bash $ git clone mac java launcher.git $ cd mac java launcher.git $ ./use apply /Applications/IntelliJ IDEA 12 CE.app java version 1.6+ This command will: 1. Backup original Info.plist to Info.plist.original 2. Copy launcher script to AppBundle.app/Contents/MacOS/mac java launcher 3. Remove Java (or JVMOptions ) section from Info.plist 4. Set CFBundleExecutable in Info.plist to mac java launcher 5. Set JVMVersion in Info.plist.original to 1.6+ Also, it is easy to restore original launcher: bash $ ./use undo /Applications/IntelliJ IDEA 12 CE.app java version 1.6 This command will revert previous one. java version is optional in both cases, however undo doesn't revert java version change by itself. Changing Only the JDK Version Many newer .app bundles do not require the launcher script to work, such as IntelliJ IDEA 13.x . However, these applications may still require an earlier version of Java to be installed (e.g., JDK 1.6). In such cases, the Info.plist needs to have its JVMVersion updated. IntelliJ IDEA 13.x has a value of 1.6 , which _requires_ a release of JDK 1.6 to be installed. However, changing it to 1.6+ allows it to work with JDK 1.6 and newer. bash $ ./use java version 1.6+ /Applications/IntelliJ IDEA 13.app This command will: 1. Change the JVMVersion to 1.6+ from the default 1.6 and allow IntelliJ IDEA 13 to work JDK 1.6, JDK 1.7, or JDK 1.8. About Mac OS launcher for bundled java applications requires JDK 1.6 to be installed. And even if application itself requires only JDK 1.7 you will still need to install JDK 1.6 just to satisfy the application launcher. You can see this issue, for example, with IntelliJ IDEA or yEd after changing JVMVersion in Info.plist to 1.6+ (not the case for newer releases of IDEA). If you would like to require a specific JDK release, such as JDK 1.7, then you can change the value to 1.7 . If you would like to allow for any release _after_ a specific release, such as JDK 1.7, then you can specify the value with a + like 1.7+ to allow any JDK to be used _starting_ with that relase and up (e.g., JDK 1.8 would also work). mac java launcher replaces the default launcher shipped with the application and mac java launcher requires no JDK by itself (it will replace launcher only for single application specified in command line argument). After that, only the JDK version actually required by application must be installed. mac java launcher uses the Java section from Info.plist.original when launching application. If you want, for example, change JVMVersion from 1.6 to 1.6+ you can do it in Info.plist.original . Also, that can be done with java version option.",Unknown,Unknown 290,Unknown,Unknown,Unknown,"NAME autojump a faster way to navigate your filesystem DESCRIPTION autojump is a faster way to navigate your filesystem. It works by maintaining a database of the directories you use the most from the command line. Directories must be visited first before they can be jumped to. USAGE j is a convenience wrapper function around autojump . Any option that can be used with autojump can be used with j and vice versa. Jump To A Directory That Contains foo : j foo Jump To A Child Directory: Sometimes it's convenient to jump to a child directory (sub directory of current directory) rather than typing out the full name. jc bar Open File Manager To Directories (instead of jumping): Instead of jumping to a directory, you can open a file explorer window (Mac Finder, Windows Explorer, GNOME Nautilus, etc.) to the directory instead. jo music Opening a file manager to a child directory is also supported: jco images Using Multiple Arguments: Let's assume the following database: 30 /home/user/mail/inbox 10 /home/user/work/inbox j in would jump into /home/user/mail/inbox as the higher weighted entry. However you can pass multiple arguments to autojump to prefer a different entry. In the above example, j w in would then change directory to /home/user/work/inbox. For more options refer to help: autojump help INSTALLATION REQUIREMENTS Python v2.6+ or Python v3.3+ Supported shells bash first class support zsh first class support fish community supported tcsh community supported clink community supported Supported platforms Linux first class support OS X first class support Windows community supported BSD community supported Supported installation methods source code first class support Debian and derivatives first class support ArchLinux / Gentoo / openSUSE / RedHat and derivatives community supported Homebrew / MacPorts community supported Due to limited time and resources, only first class support items will be maintained by the primary committers. All community supported items will be updated based on pull requests submitted by the general public. Please continue opening issues and providing feedback for community supported items since consolidating information helps other users troubleshoot and submit enhancements and fixes. MANUAL Grab a copy of autojump: git clone git://github.com/wting/autojump.git Run the installation script and follow on screen instructions. cd autojump ./install.py or ./uninstall.py AUTOMATIC Linux autojump is included in the following distro repositories, please use relevant package management utilities to install (e.g. apt get, yum, pacman, etc): Debian, Ubuntu, Linux Mint All Debian derived distros require manual activation for policy reasons, please see /usr/share/doc/autojump/README.Debian . RedHat, Fedora, CentOS Install autojump zsh for zsh, autojump fish for fish, etc. ArchLinux Gentoo Frugalware Slackware OS X Homebrew is the recommended installation method for Mac OS X: brew install autojump MacPorts is also available: port install autojump Windows Windows support is enabled by clink which should be installed prior to installing autojump. KNOWN ISSUES autojump does not support directories that begin with . For bash users, autojump keeps track of directories by modifying $PROMPT_COMMAND . Do not overwrite $PROMPT_COMMAND : export PROMPT_COMMAND history a Instead append to the end of the existing \$PROMPT\_COMMAND: export PROMPT_COMMAND ${PROMPT_COMMAND:+$PROMPT_COMMAND ;} history a REPORTING BUGS For any questions or issues please visit: AUTHORS autojump was originally written by Joël Schaerer, and currently maintained by William Ting. More contributors can be found in AUTHORS . COPYRIGHT Copyright © 2016 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later . This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law.",Unknown,Unknown 291,Unknown,Unknown,Unknown,"onedrive d Keep an eye on A Microsoft OneDrive desktop client / daemon on Linux, written in Python 3. Install Steps 1, 2, and 5 need to be done manually. For steps 3 and 4, the script file install.sh will handle the work automatically. (1) Always uninstall older versions before installing newer ones bash To remove onedrive d 1.0 sudo pip3 uninstall onedrive d Remove residual config files rm rfv /.onedrive (2) Grab the source code bash git clone cd onedrive d Or you can browse and download the ZIP file manually. (3) Pre requisites Your local filesystem must store UTC timestamps, not local time. This is true for most Unix filesystems. onedrive d requires Python3 intepreter. If Python version is older than 3.4, python3 pip is also required. Python3 intepreter must use Unicode mode (default for most Linux distro) otherwise its string datatype won't work. The daemon package ( daemonocle ) has a Python dependency psutil , which requires system package python3 dev installed. If installation fails because of missing , check if python3 dev package is installed. Not all Linux distro ship this package by default. Pay extra attention to this if your desktop environment is MATE (i.e., if your distribute is Linux Mint or Ubuntu MATE, etc.). For GUI component to work, Python3 binding of GObject ( python3 gi package for Debian/Ubuntu, pygobject3 for Fedora, python gobject for Arch, and python3 gobject for OpenSUSE) is needed. Refer to this article if you want to build PyGObject from source. Another recommended package is inotify tools (for most package managers), which contains command inotifywait . If this command is available on the system, the real time file system monitoring thread will be enabled. Otherwise the synchronization is performed every certain amount of time (configurable). (4) Install onedrive d bash Register package sudo python3 setup.py install Clean temporary files sudo python3 setup.py clean Create settings dir mkdir /.onedrive cp ./onedrive_d/res/default_ignore.ini /.onedrive/ignore_v2.ini Create log file sudo touch /var/log/onedrive_d.log you may need to change whoami to your username sudo chown whoami /var/log/onedrive_d.log (5) Configure / start onedrive d bash First read help info onedrive pref help onedrive d help Run config program with CLI onedrive pref Or run with GUI onedrive pref ui gtk Run onedrive d start as a daemon onedrive d start or start as a regular process onedrive d start debug Run without installation To run the source code directly without installing it to the system, do steps 1 to 3 in Installation section, and copy config files by bash mkdir /.onedrive cp ./onedrive_d/res/default_ignore.ini /.onedrive/ignore_v2.ini Create log file if you need to run onedrive d as daemon sudo touch /var/log/onedrive_d.log you may need to change whoami to your username sudo chown whoami /var/log/onedrive_d.log Now you can run the program by commands bash assume you are in onedrive d folder that contains onedrive_d folder. equivalent to onedrive pref command python3 m onedrive_d.od_pref help equivalent to onedrive d command python3 m onedrive_d.od_main help Note that the commands above are no longer valid after installing the package to the system. Remove Refer to step 1 of section Installation . Notes for Users Note that this is the older version. the current version is still in development. If you are having problems with python make sure you have the correct version and that you have installed the correct modules (i.e. apt get install python3 dev). Data Integrity Files and directories deleted locally can be found in Trash. Files and directories deleted remotely can be found in OneDrive recycle bin. Files overwritten remotely can be recovered by OneDrive file version feature. onedrive d only performs overwriting when it is 100% sure one file is older than its local/remote counterpart. Uploading / Downloading by Blocks When file size exceeds an amount (e.g., 8 MiB), onedrive d will choose to upload / download it by blocks of smaller size (e.g., 512 KiB). This results in smaller cost (thus better reliability) when recovering from network failures, but more HTTP requests may slow down the process. Tweak the parameters to best fit your network condition. Copying and Moving Files and Folders Because the various behaviors of file managers on Linux, it is hard to determine what actions a user performed based on the log of inotifywait . We adopt a very conservative strategy to judge if a file is moved within local OneDrive folder. In most cases file moving results in removing the old path and uploading to the new path. This kinds of wastes network traffic. Most file managers, including cp command, do not copy file attributes like mtime. inotifywait reports file writing on copy completion. This makes it infeasible to check if the file writing is a copy action. As a result, file copying is also treated as uploading. Things are even worse when one copies / moves a directory. In most cases the mtime attribute will be changed, resulting in onedrive d uploading the whole folder.",Unknown,Unknown 292,Unknown,Unknown,Unknown,"PyCon 2017 Notes These are my notes from PyCon 2017 in Portland, OR; I try to give credit to the speakers and link to their public web presence when possible, and sometimes I annotate with extra information. Pull requests welcome, and feel free to fork. I'm @y3l2n if you want to ping me.",Unknown,Unknown 293,Unknown,Unknown,Unknown,"What is XenonPy project Build Status Build status codecov Version Python Versions Downloads XenonPy is a Python library that implements a comprehensive set of machine learning tools for materials informatics. Its functionalities partially depend on PyTorch and R. The current release provides some limited modules: Interface to public materials database Library of materials descriptors (compositional/structural descriptors) Pretrained model library XenonPy.MDL (v0.1.0b, 2018/12/25: more than 10,000 modles in 35 properties of small molecules, polymers, and inorganic compounds) Machine learning tools. Transfer learning using the pretrained models in XenonPy.MDL XenonPy inspired by matminer: XenonPy is a open source project See our documents for details: XenonPy images XenonPy images packed a lot of useful packages for materials informatics using. The following table list some core packages in XenonPy images. Package Version PyTorch 1.1.0 tensorly 0.4.3 pymatgen 2019.5.8 matminer 0.5.6 mordred 1.1.2 scipy 1.2.1 scikit learn 0.21.1 pandas 0.24.2 rdkit 2019.03.2 jupyter 1.0.0 seaborn 0.9.0 matplotlib 3.0.3 plotly 3.8.1 Requirements In order to use this image you must have Docker Engine installed. Instructions for setting up Docker Engine are available on the Docker website . CUDA requirements If you have a CUDA compatible NVIDIA graphics card, you can use a CUDA enabled version of the PyTorch image to enable hardware acceleration. This only can be used in Ubuntu Linux. Firstly, ensure that you install the appropriate NVIDIA drivers and libraries. If you are running Ubuntu, you can install proprietary NVIDIA drivers from the PPA and CUDA from the NVIDIA website . You will also need to install nvidia docker2 to enable GPU device access within Docker containers. This can be found at NVIDIA/nvidia docker . Usage Pre built xenonpy images are available on Docker Hub under the name yoshidalab/xenonpy . For example, you can pull the CUDA 10.0 version with: bash docker pull yoshidalab/xenonpy:cuda10 The table below lists software versions for each of the currently supported Docker image tags . Image tag CUDA PyTorch latest 10.0 1.1.0 cpu None 1.1.0 cuda10 10.0 1.1.0 cuda9 9.0 1.1.0 Running XenonPy It is possible to run XenonPy inside a container. Using xenonpy with jupyter is very easy, you could run it with the following command: sh docker run rm it \ runtime nvidia \ ipc host \ publish 8888:8888 volume $Home/.xenonpy:/home/user/.xenonpy \ volume :/workspace \ e NVIDIA_VISIBLE_DEVICES 0 \ yoshidalab/xenonpy Here's a description of the Docker command line options shown above: runtime nvidia : Required if using CUDA, optional otherwise. Passes the graphics card from the host to the container. Optional, based on your usage . ipc host : Required if using multiprocessing, as explained at Optional publish 8888:8888 : Publish container's port 8888 to the host. Needed volume $Home/.xenonpy:/home/user/.xenonpy : Mounts the XenonPy root directory into the container. Optional, but highly recommended . volume :/workspace : Mounts the your working directory into the container. Optional, but highly recommended . e NVIDIA_VISIBLE_DEVICES 0 : Sets an environment variable to restrict which graphics cards are seen by programs running inside the container. Set to all to enable all cards. Optional, defaults to all. You may wish to consider using Docker Compose to make running containers with many options easier. At the time of writing, only version 2.3 of Docker Compose configuration files supports the runtime option. Copyright and license ©Copyright 2019 The XenonPy project, all rights reserved. Released under the BSD 3 license .",Unknown,Unknown 294,Unknown,Unknown,Unknown,"Build Status youtube dl download videos from youtube.com or other video platforms INSTALLATION ( installation) DESCRIPTION ( description) OPTIONS ( options) CONFIGURATION ( configuration) OUTPUT TEMPLATE ( output template) FORMAT SELECTION ( format selection) VIDEO SELECTION ( video selection) FAQ ( faq) DEVELOPER INSTRUCTIONS ( developer instructions) EMBEDDING YOUTUBE DL ( embedding youtube dl) BUGS ( bugs) COPYRIGHT ( copyright) INSTALLATION To install it right away for all UNIX users (Linux, macOS, etc.), type: sudo curl L o /usr/local/bin/youtube dl sudo chmod a+rx /usr/local/bin/youtube dl If you do not have curl, you can alternatively use a recent wget: sudo wget O /usr/local/bin/youtube dl sudo chmod a+rx /usr/local/bin/youtube dl Windows users can download an .exe file and place it in any location on their PATH except for %SYSTEMROOT%\System32 (e.g. do not put in C:\Windows\System32 ). You can also use pip: sudo H pip install upgrade youtube dl This command will update youtube dl if you have already installed it. See the pypi page for more information. macOS users can install youtube dl with Homebrew : brew install youtube dl Or with MacPorts : sudo port install youtube dl Alternatively, refer to the developer instructions ( developer instructions) for how to check out and work with the git repository. For further options, including PGP signatures, see the youtube dl Download Page . DESCRIPTION youtube dl is a command line program to download videos from YouTube.com and a few more sites. It requires the Python interpreter, version 2.6, 2.7, or 3.2+, and it is not platform specific. It should work on your Unix box, on Windows or on macOS. It is released to the public domain, which means you can modify it, redistribute it or use it however you like. youtube dl OPTIONS URL URL... OPTIONS h, help Print this help text and exit version Print program version and exit U, update Update this program to latest version. Make sure that you have sufficient permissions (run with sudo if needed) i, ignore errors Continue on download errors, for example to skip unavailable videos in a playlist abort on error Abort downloading of further videos (in the playlist or the command line) if an error occurs dump user agent Display the current browser identification list extractors List all supported extractors extractor descriptions Output descriptions of all supported extractors force generic extractor Force extraction to use the generic extractor default search PREFIX Use this prefix for unqualified URLs. For example gvsearch2: downloads two videos from google videos for youtube dl large apple . Use the value auto to let youtube dl guess ( auto_warning to emit a warning when guessing). error just throws an error. The default value fixup_error repairs broken URLs, but emits an error if this is not possible instead of searching. ignore config Do not read configuration files. When given in the global configuration file /etc/youtube dl.conf: Do not read the user configuration in /.config/youtube dl/config (%APPDATA%/youtube dl/config.txt on Windows) config location PATH Location of the configuration file; either the path to the config or its containing directory. flat playlist Do not extract the videos of a playlist, only list them. mark watched Mark videos watched (YouTube only) no mark watched Do not mark videos watched (YouTube only) no color Do not emit color codes in output Network Options: proxy URL Use the specified HTTP/HTTPS/SOCKS proxy. To enable SOCKS proxy, specify a proper scheme. For example socks5://127.0.0.1:1080/. Pass in an empty string ( proxy ) for direct connection socket timeout SECONDS Time to wait before giving up, in seconds source address IP Client side IP address to bind to 4, force ipv4 Make all connections via IPv4 6, force ipv6 Make all connections via IPv6 Geo Restriction: geo verification proxy URL Use this proxy to verify the IP address for some geo restricted sites. The default proxy specified by proxy (or none, if the option is not present) is used for the actual downloading. geo bypass Bypass geographic restriction via faking X Forwarded For HTTP header no geo bypass Do not bypass geographic restriction via faking X Forwarded For HTTP header geo bypass country CODE Force bypass geographic restriction with explicitly provided two letter ISO 3166 2 country code geo bypass ip block IP_BLOCK Force bypass geographic restriction with explicitly provided IP block in CIDR notation Video Selection: playlist start NUMBER Playlist video to start at (default is 1) playlist end NUMBER Playlist video to end at (default is last) playlist items ITEM_SPEC Playlist video items to download. Specify indices of the videos in the playlist separated by commas like: playlist items 1,2,5,8 if you want to download videos indexed 1, 2, 5, 8 in the playlist. You can specify range: playlist items 1 3,7,10 13 , it will download the videos at index 1, 2, 3, 7, 10, 11, 12 and 13. match title REGEX Download only matching titles (regex or caseless sub string) reject title REGEX Skip download for matching titles (regex or caseless sub string) max downloads NUMBER Abort after downloading NUMBER files min filesize SIZE Do not download any videos smaller than SIZE (e.g. 50k or 44.6m) max filesize SIZE Do not download any videos larger than SIZE (e.g. 50k or 44.6m) date DATE Download only videos uploaded in this date datebefore DATE Download only videos uploaded on or before this date (i.e. inclusive) dateafter DATE Download only videos uploaded on or after this date (i.e. inclusive) min views COUNT Do not download any videos with less than COUNT views max views COUNT Do not download any videos with more than COUNT views match filter FILTER Generic video filter. Specify any key (see the OUTPUT TEMPLATE for a list of available keys) to match if the key is present, !key to check if the key is not present, key > NUMBER (like comment_count > 12 , also works with > , 100 & dislike_count .+?) (?P .+) xattrs Write metadata to the video file's xattrs (using dublin core and xdg standards) fixup POLICY Automatically correct known faults of the file. One of never (do nothing), warn (only emit a warning), detect_or_warn (the default; fix file if we can, warn otherwise) prefer avconv Prefer avconv over ffmpeg for running the postprocessors prefer ffmpeg Prefer ffmpeg over avconv for running the postprocessors (default) ffmpeg location PATH Location of the ffmpeg/avconv binary; either the path to the binary or its containing directory. exec CMD Execute a command on the file after downloading, similar to find's exec syntax. Example: exec 'adb push {} /sdcard/Music/ && rm {}' convert subs FORMAT Convert the subtitles to other format (currently supported: srt ass vtt lrc) CONFIGURATION You can configure youtube dl by placing any supported command line option to a configuration file. On Linux and macOS, the system wide configuration file is located at /etc/youtube dl.conf and the user wide configuration file at /.config/youtube dl/config . On Windows, the user wide configuration file locations are %APPDATA%\youtube dl\config.txt or C:\Users\ \youtube dl.conf . Note that by default configuration file may not exist so you may need to create it yourself. For example, with the following configuration file youtube dl will always extract the audio, not copy the mtime, use a proxy and save all videos under Movies directory in your home directory: Lines starting with are comments Always extract audio x Do not copy the mtime no mtime Use this proxy proxy 127.0.0.1:3128 Save all videos under Movies directory in your home directory o /Movies/%(title)s.%(ext)s Note that options in configuration file are just the same options aka switches used in regular command line calls thus there must be no whitespace after or , e.g. o or proxy but not o or proxy . You can use ignore config if you want to disable the configuration file for a particular youtube dl run. You can also use config location if you want to use custom configuration file for a particular youtube dl run. Authentication with .netrc file You may also want to configure automatic credentials storage for extractors that support authentication (by providing login and password with username and password ) in order not to pass credentials as command line arguments on every youtube dl execution and prevent tracking plain text passwords in the shell command history. You can achieve this using a .netrc file on a per extractor basis. For that you will need to create a .netrc file in your $HOME and restrict permissions to read/write by only you: touch $HOME/.netrc chmod a rwx,u+rw $HOME/.netrc After that you can add credentials for an extractor in the following format, where extractor is the name of the extractor in lowercase: machine login password For example: machine youtube login myaccount@gmail.com password my_youtube_password machine twitch login my_twitch_account_name password my_twitch_password To activate authentication with the .netrc file you should pass netrc to youtube dl or place it in the configuration file ( configuration). On Windows you may also need to setup the %HOME% environment variable manually. For example: set HOME %USERPROFILE% OUTPUT TEMPLATE The o option allows users to indicate a template for the output file names. tl;dr: navigate me to examples ( output template examples). The basic usage is not to set any template arguments when downloading a single file, like in youtube dl o funny_video.flv However, it may contain special sequences that will be replaced when downloading each video. The special sequences may be formatted according to python string formatting operations . For example, %(NAME)s or %(NAME)05d . To clarify, that is a percent symbol followed by a name in parentheses, followed by formatting operations. Allowed names along with sequence type are: id (string): Video identifier title (string): Video title url (string): Video URL ext (string): Video filename extension alt_title (string): A secondary title of the video display_id (string): An alternative identifier for the video uploader (string): Full name of the video uploader license (string): License name the video is licensed under creator (string): The creator of the video release_date (string): The date (YYYYMMDD) when the video was released timestamp (numeric): UNIX timestamp of the moment the video became available upload_date (string): Video upload date (YYYYMMDD) uploader_id (string): Nickname or id of the video uploader channel (string): Full name of the channel the video is uploaded on channel_id (string): Id of the channel location (string): Physical location where the video was filmed duration (numeric): Length of the video in seconds view_count (numeric): How many users have watched the video on the platform like_count (numeric): Number of positive ratings of the video dislike_count (numeric): Number of negative ratings of the video repost_count (numeric): Number of reposts of the video average_rating (numeric): Average rating give by users, the scale used depends on the webpage comment_count (numeric): Number of comments on the video age_limit (numeric): Age restriction for the video (years) is_live (boolean): Whether this video is a live stream or a fixed length video start_time (numeric): Time in seconds where the reproduction should start, as specified in the URL end_time (numeric): Time in seconds where the reproduction should end, as specified in the URL format (string): A human readable description of the format format_id (string): Format code specified by format format_note (string): Additional info about the format width (numeric): Width of the video height (numeric): Height of the video resolution (string): Textual description of width and height tbr (numeric): Average bitrate of audio and video in KBit/s abr (numeric): Average audio bitrate in KBit/s acodec (string): Name of the audio codec in use asr (numeric): Audio sampling rate in Hertz vbr (numeric): Average video bitrate in KBit/s fps (numeric): Frame rate vcodec (string): Name of the video codec in use container (string): Name of the container format filesize (numeric): The number of bytes, if known in advance filesize_approx (numeric): An estimate for the number of bytes protocol (string): The protocol that will be used for the actual download extractor (string): Name of the extractor extractor_key (string): Key name of the extractor epoch (numeric): Unix epoch when creating the file autonumber (numeric): Five digit number that will be increased with each download, starting at zero playlist (string): Name or id of the playlist that contains the video playlist_index (numeric): Index of the video in the playlist padded with leading zeros according to the total length of the playlist playlist_id (string): Playlist identifier playlist_title (string): Playlist title playlist_uploader (string): Full name of the playlist uploader playlist_uploader_id (string): Nickname or id of the playlist uploader Available for the video that belongs to some logical chapter or section: chapter (string): Name or title of the chapter the video belongs to chapter_number (numeric): Number of the chapter the video belongs to chapter_id (string): Id of the chapter the video belongs to Available for the video that is an episode of some series or programme: series (string): Title of the series or programme the video episode belongs to season (string): Title of the season the video episode belongs to season_number (numeric): Number of the season the video episode belongs to season_id (string): Id of the season the video episode belongs to episode (string): Title of the video episode episode_number (numeric): Number of the video episode within a season episode_id (string): Id of the video episode Available for the media that is a track or a part of a music album: track (string): Title of the track track_number (numeric): Number of the track within an album or a disc track_id (string): Id of the track artist (string): Artist(s) of the track genre (string): Genre(s) of the track album (string): Title of the album the track belongs to album_type (string): Type of the album album_artist (string): List of all artists appeared on the album disc_number (numeric): Number of the disc or other physical medium the track belongs to release_year (numeric): Year (YYYY) when the album was released Each aforementioned sequence when referenced in an output template will be replaced by the actual value corresponding to the sequence name. Note that some of the sequences are not guaranteed to be present since they depend on the metadata obtained by a particular extractor. Such sequences will be replaced with NA . For example for o %(title)s %(id)s.%(ext)s and an mp4 video with title youtube dl test video and id BaW_jenozKcj , this will result in a youtube dl test video BaW_jenozKcj.mp4 file created in the current directory. For numeric sequences you can use numeric related formatting, for example, %(view_count)05d will result in a string with view count padded with zeros up to 5 characters, like in 00042 . Output templates can also contain arbitrary hierarchical path, e.g. o '%(playlist)s/%(playlist_index)s %(title)s.%(ext)s' which will result in downloading each video in a directory corresponding to this path template. Any missing directory will be automatically created for you. To use percent literals in an output template use %% . To output to stdout use o . The current default template is %(title)s %(id)s.%(ext)s . In some cases, you don't want special characters such as 中, spaces, or &, such as when transferring the downloaded filename to a Windows system or the filename through an 8bit unsafe channel. In these cases, add the restrict filenames flag to get a shorter title: Output template and Windows batch files If you are using an output template inside a Windows batch file then you must escape plain percent characters ( % ) by doubling, so that o %(title)s %(id)s.%(ext)s should become o %%(title)s %%(id)s.%%(ext)s . However you should not touch % 's that are not plain characters, e.g. environment variables for expansion should stay intact: o C:\%HOMEPATH%\Desktop\%%(title)s.%%(ext)s . Output template examples Note that on Windows you may need to use double quotes instead of single. bash $ youtube dl get filename o '%(title)s.%(ext)s' BaW_jenozKc youtube dl test video ''_ä↭𝕐.mp4 All kinds of weird characters $ youtube dl get filename o '%(title)s.%(ext)s' BaW_jenozKc restrict filenames youtube dl_test_video_.mp4 A simple file name Download YouTube playlist videos in separate directory indexed by video order in a playlist $ youtube dl o '%(playlist)s/%(playlist_index)s %(title)s.%(ext)s' Download all playlists of YouTube channel/user keeping each playlist in separate directory: $ youtube dl o '%(uploader)s/%(playlist)s/%(playlist_index)s %(title)s.%(ext)s' Download Udemy course keeping each chapter in separate directory under MyVideos directory in your home $ youtube dl u user p password o ' /MyVideos/%(playlist)s/%(chapter_number)s %(chapter)s/%(title)s.%(ext)s' Download entire series season keeping each series and each season in separate directory under C:/MyVideos $ youtube dl o C:/MyVideos/%(series)s/%(season_number)s %(season)s/%(episode_number)s %(episode)s.%(ext)s Stream the video being downloaded to stdout $ youtube dl o BaW_jenozKc FORMAT SELECTION By default youtube dl tries to download the best available quality, i.e. if you want the best quality you don't need to pass any special options, youtube dl will guess it for you by default . But sometimes you may want to download in a different format, for example when you are on a slow or intermittent connection. The key mechanism for achieving this is so called format selection based on which you can explicitly specify desired format, select formats based on some criterion or criteria, setup precedence and much more. The general syntax for format selection is format FORMAT or shorter f FORMAT where FORMAT is a selector expression , i.e. an expression that describes format or formats you would like to download. tl;dr: navigate me to examples ( format selection examples). The simplest case is requesting a specific format, for example with f 22 you can download the format with format code equal to 22. You can get the list of available format codes for particular video using list formats or F . Note that these format codes are extractor specific. You can also use a file extension (currently 3gp , aac , flv , m4a , mp3 , mp4 , ogg , wav , webm are supported) to download the best quality format of a particular file extension served as a single file, e.g. f webm will download the best quality format with the webm extension served as a single file. You can also use special names to select particular edge case formats: best : Select the best quality format represented by a single file with video and audio. worst : Select the worst quality format represented by a single file with video and audio. bestvideo : Select the best quality video only format (e.g. DASH video). May not be available. worstvideo : Select the worst quality video only format. May not be available. bestaudio : Select the best quality audio only format. May not be available. worstaudio : Select the worst quality audio only format. May not be available. For example, to download the worst quality video only format you can use f worstvideo . If you want to download multiple videos and they don't have the same formats available, you can specify the order of preference using slashes. Note that slash is left associative, i.e. formats on the left hand side are preferred, for example f 22/17/18 will download format 22 if it's available, otherwise it will download format 17 if it's available, otherwise it will download format 18 if it's available, otherwise it will complain that no suitable formats are available for download. If you want to download several formats of the same video use a comma as a separator, e.g. f 22,17,18 will download all these three formats, of course if they are available. Or a more sophisticated example combined with the precedence feature: f 136/137/mp4/bestvideo,140/m4a/bestaudio . You can also filter the video formats by putting a condition in brackets, as in f best height 720 (or f filesize>10M ). The following numeric meta fields can be used with comparisons , > , (equals), ! (not equals): filesize : The number of bytes, if known in advance width : Width of the video, if known height : Height of the video, if known tbr : Average bitrate of audio and video in KBit/s abr : Average audio bitrate in KBit/s vbr : Average video bitrate in KBit/s asr : Audio sampling rate in Hertz fps : Frame rate Also filtering work for comparisons (equals), ^ (starts with), $ (ends with), (contains) and following string meta fields: ext : File extension acodec : Name of the audio codec in use vcodec : Name of the video codec in use container : Name of the container format protocol : The protocol that will be used for the actual download, lower case ( rtsp , rtmp , rtmpe , mms , f4m , ism , m3u8 , or m3u8_native ) format_id : A short description of the format Any string comparison may be prefixed with negation ! in order to produce an opposite comparison, e.g. ! (does not contain). Note that none of the aforementioned meta fields are guaranteed to be present since this solely depends on the metadata obtained by particular extractor, i.e. the metadata offered by the video hoster. Formats for which the value is not known are excluded unless you put a question mark ( ? ) after the operator. You can combine format filters, so f height 500 selects up to 720p videos (or videos where the height is not known) with a bitrate of at least 500 KBit/s. You can merge the video and audio of two formats into a single file using f + (requires ffmpeg or avconv installed), for example f bestvideo+bestaudio will download the best video only format, the best audio only format and mux them together with ffmpeg/avconv. Format selectors can also be grouped using parentheses, for example if you want to download the best mp4 and webm formats with a height lower than 480 you can use f '(mp4,webm) height \bin ), put all the executables directly in there, and then set your PATH environment variable to include that directory. From then on, after restarting your shell, you will be able to access both youtube dl and ffmpeg (and youtube dl will be able to find ffmpeg) by simply typing youtube dl or ffmpeg , no matter what directory you're in. How do I put downloads into a specific folder? Use the o to specify an output template ( output template), for example o /home/user/videos/%(title)s %(id)s.%(ext)s . If you want this for all of your downloads, put the option into your configuration file ( configuration). How do I download a video starting with a ? Either prepend or separate the ID from the options with : youtube dl wNyEUrxzFU youtube dl How do I pass cookies to youtube dl? Use the cookies option, for example cookies /path/to/cookies/file.txt . In order to extract cookies from browser use any conforming browser extension for exporting cookies. For example, cookies.txt (for Chrome) or cookies.txt (for Firefox). Note that the cookies file must be in Mozilla/Netscape format and the first line of the cookies file must be either HTTP Cookie File or Netscape HTTP Cookie File . Make sure you have correct newline format in the cookies file and convert newlines if necessary to correspond with your OS, namely CRLF ( \r\n ) for Windows and LF ( \n ) for Unix and Unix like systems (Linux, macOS, etc.). HTTP Error 400: Bad Request when using cookies is a good sign of invalid newline format. Passing cookies to youtube dl is a good way to workaround login when a particular extractor does not implement it explicitly. Another use case is working around CAPTCHA some websites require you to solve in particular cases in order to get access (e.g. YouTube, CloudFlare). How do I stream directly to media player? You will first need to tell youtube dl to stream media to stdout with o , and also tell your media player to read from stdin (it must be capable of this for streaming) and then pipe former to latter. For example, streaming to vlc can be achieved with: youtube dl o vlc How do I download only new videos from a playlist? Use download archive feature. With this feature you should initially download the complete playlist with download archive /path/to/download/archive/file.txt that will record identifiers of all the videos in a special file. Each subsequent run with the same download archive will download only new videos and skip all videos that have been downloaded before. Note that only successful downloads are recorded in the file. For example, at first, youtube dl download archive archive.txt will download the complete PLwiyx1dc3P2JR9N8gQaQN_BCvlSlap7re playlist and create a file archive.txt . Each subsequent run will only download new videos if any: youtube dl download archive archive.txt Should I add hls prefer native into my config? When youtube dl detects an HLS video, it can download it either with the built in downloader or ffmpeg. Since many HLS streams are slightly invalid and ffmpeg/youtube dl each handle some invalid cases better than the other, there is an option to switch the downloader if needed. When youtube dl knows that one particular downloader works better for a given website, that downloader will be picked. Otherwise, youtube dl will pick the best downloader for general compatibility, which at the moment happens to be ffmpeg. This choice may change in future versions of youtube dl, with improvements of the built in downloader and/or ffmpeg. In particular, the generic extractor (used when your website is not in the list of supported sites by youtube dl cannot mandate one specific downloader. If you put either hls prefer native or hls prefer ffmpeg into your configuration, a different subset of videos will fail to download correctly. Instead, it is much better to file an issue or a pull request which details why the native or the ffmpeg HLS downloader is a better choice for your use case. Can you add support for this anime video site, or site which shows current movies for free? As a matter of policy (as well as legality), youtube dl does not include support for services that specialize in infringing copyright. As a rule of thumb, if you cannot easily find a video that the service is quite obviously allowed to distribute (i.e. that has been uploaded by the creator, the creator's distributor, or is published under a free license), the service is probably unfit for inclusion to youtube dl. A note on the service that they don't host the infringing content, but just link to those who do, is evidence that the service should not be included into youtube dl. The same goes for any DMCA note when the whole front page of the service is filled with videos they are not allowed to distribute. A fair use note is equally unconvincing if the service shows copyright protected videos in full without authorization. Support requests for services that do purchase the rights to distribute their content are perfectly fine though. If in doubt, you can simply include a source that mentions the legitimate purchase of content. How can I speed up work on my issue? (Also known as: Help, my important issue not being solved!) The youtube dl core developer team is quite small. While we do our best to solve as many issues as possible, sometimes that can take quite a while. To speed up your issue, here's what you can do: First of all, please do report the issue at our issue tracker . That allows us to coordinate all efforts by users and developers, and serves as a unified point. Unfortunately, the youtube dl project has grown too large to use personal email as an effective communication channel. Please read the bug reporting instructions ( bugs) below. A lot of bugs lack all the necessary information. If you can, offer proxy, VPN, or shell access to the youtube dl developers. If you are able to, test the issue from multiple computers in multiple countries to exclude local censorship or misconfiguration issues. If nobody is interested in solving your issue, you are welcome to take matters into your own hands and submit a pull request (or coerce/pay somebody else to do so). Feel free to bump the issue from time to time by writing a small comment ( Issue is still present in youtube dl version ...from France, but fixed from Belgium ), but please not more than once a month. Please do not declare your issue as important or urgent . How can I detect whether a given URL is supported by youtube dl? For one, have a look at the list of supported sites (docs/supportedsites.md). Note that it can sometimes happen that the site changes its URL scheme (say, from to ) and youtube dl reports an URL of a service in that list as unsupported. In that case, simply report a bug. It is not possible to detect whether a URL is supported or not. That's because youtube dl contains a generic extractor which matches all URLs. You may be tempted to disable, exclude, or remove the generic extractor, but the generic extractor not only allows users to extract videos from lots of websites that embed a video from another service, but may also be used to extract video from a service that it's hosting itself. Therefore, we neither recommend nor support disabling, excluding, or removing the generic extractor. If you want to find out whether a given URL is supported, simply call youtube dl with it. If you get no videos back, chances are the URL is either not referring to a video or unsupported. You can find out which by examining the output (if you run youtube dl on the console) or catching an UnsupportedError exception if you run it from a Python program. Why do I need to go through that much red tape when filing bugs? Before we had the issue template, despite our extensive bug reporting instructions ( bugs), about 80% of the issue reports we got were useless, for instance because people used ancient versions hundreds of releases old, because of simple syntactic errors (not in youtube dl but in general shell usage), because the problem was already reported multiple times before, because people did not actually read an error message, even if it said please install ffmpeg , because people did not mention the URL they were trying to download and many more simple, easy to avoid problems, many of whom were totally unrelated to youtube dl. youtube dl is an open source project manned by too few volunteers, so we'd rather spend time fixing bugs where we are certain none of those simple problems apply, and where we can be reasonably confident to be able to reproduce the issue without asking the reporter repeatedly. As such, the output of youtube dl v YOUR_URL_HERE is really all that's required to file an issue. The issue template also guides you through some basic steps you can do, such as checking that your version of youtube dl is current. DEVELOPER INSTRUCTIONS Most users do not need to build youtube dl and can download the builds or get them from their distribution. To run youtube dl as a developer, you don't need to build anything either. Simply execute python m youtube_dl To run the test, simply invoke your favorite test runner, or execute a test file directly; any of the following work: python m unittest discover python test/test_download.py nosetests See item 6 of new extractor tutorial ( adding support for a new site) for how to run extractor specific test cases. If you want to create a build of youtube dl yourself, you'll need python make (only GNU make is supported) pandoc zip nosetests Adding support for a new site If you want to add support for a new site, first of all make sure this site is not dedicated to copyright infringement (README.md can you add support for this anime video site or site which shows current movies for free) . youtube dl does not support such sites thus pull requests adding support for them will be rejected . After you have ensured this site is distributing its content legally, you can follow this quick list (assuming your service is called yourextractor ): 1. Fork this repository 2. Check out the source code with: git clone git@github.com:YOUR_GITHUB_USERNAME/youtube dl.git 3. Start a new git branch with cd youtube dl git checkout b yourextractor 4. Start with this simple template and save it to youtube_dl/extractor/yourextractor.py : python coding: utf 8 from __future__ import unicode_literals from .common import InfoExtractor class YourExtractorIE(InfoExtractor): _VALID_URL r' _TEST { 'url': ' 'md5': 'TODO: md5 sum of the first 10241 bytes of the video file (use test)', 'info_dict': { 'id': '42', 'ext': 'mp4', 'title': 'Video title goes here', 'thumbnail': r're:^ TODO more properties, either as: A value MD5 checksum; start the string with md5: A regular expression; start the string with re: Any Python type (for example int or float) } } def _real_extract(self, url): video_id self._match_id(url) webpage self._download_webpage(url, video_id) TODO more code goes here, for example ... title self._html_search_regex(r' (.+?) ', webpage, 'title') return { 'id': video_id, 'title': title, 'description': self._og_search_description(webpage), 'uploader': self._search_regex(r' +id uploader ^> >( ^ +id title ^> >( ^ , for example: python description self._search_regex( r' +id title ^> >( ^ \d+)' Incorrect: python r'(id ID) (?P \d+)' Make regular expressions relaxed and flexible When using regular expressions try to write them fuzzy, relaxed and flexible, skipping insignificant parts that are more likely to change, allowing both single and double quotes for quoted values and so on. Example Say you need to extract title from the following HTML code: html some fancy title The code for that task should look similar to: python title self._search_regex( r' +class title ^> >( ^ +class ( \' )title\1 ^> >(?P ^ (. ?) ', webpage, 'title', group 'title') Long lines policy There is a soft limit to keep lines of code under 80 characters long. This means it should be respected if possible and if it does not make readability and code maintenance worse. For example, you should never split long string literals like URLs or some other often copied entities over multiple lines to fit this limit: Correct: python ' Incorrect: python ' 'PLMYEtVRpaqY00V9W81Cwmzp6N6vZqfUKD4' Use convenience conversion and parsing functions Wrap all extracted numeric data into safe functions from youtube_dl/utils.py : int_or_none , float_or_none . Use them for string to number conversions as well. Use url_or_none for safe URL processing. Use try_get for safe metadata extraction from parsed JSON. Use unified_strdate for uniform upload_date or any YYYYMMDD meta field extraction, unified_timestamp for uniform timestamp extraction, parse_filesize for filesize extraction, parse_count for count meta fields extraction, parse_resolution , parse_duration for duration extraction, parse_age_limit for age_limit extraction. Explore youtube_dl/utils.py for more useful convenience functions. More examples Safely extract optional description from parsed JSON python description try_get(response, lambda x: x 'result' 'video' 0 'summary' , compat_str) Safely extract more optional metadata python video try_get(response, lambda x: x 'result' 'video' 0 , dict) or {} description video.get('summary') duration float_or_none(video.get('durationMs'), scale 1000) view_count int_or_none(video.get('views')) EMBEDDING YOUTUBE DL youtube dl makes the best effort to be a good command line program, and thus should be callable from any programming language. If you encounter any problems parsing its output, feel free to create a report . From a Python program, you can embed youtube dl in a more powerful fashion, like this: python from __future__ import unicode_literals import youtube_dl ydl_opts {} with youtube_dl.YoutubeDL(ydl_opts) as ydl: ydl.download( ' Most likely, you'll want to use various options. For a list of options available, have a look at youtube_dl/YoutubeDL.py . For a start, if you want to intercept youtube dl's output, set a logger object. Here's a more complete example of a program that outputs only errors (and a short message after the download is finished), and downloads/converts the video to an mp3 file: python from __future__ import unicode_literals import youtube_dl class MyLogger(object): def debug(self, msg): pass def warning(self, msg): pass def error(self, msg): print(msg) def my_hook(d): if d 'status' 'finished': print('Done downloading, now converting ...') ydl_opts { 'format': 'bestaudio/best', 'postprocessors': { 'key': 'FFmpegExtractAudio', 'preferredcodec': 'mp3', 'preferredquality': '192', } , 'logger': MyLogger(), 'progress_hooks': my_hook , } with youtube_dl.YoutubeDL(ydl_opts) as ydl: ydl.download( ' BUGS Bugs and suggestions should be reported at: . Unless you were prompted to or there is another pertinent reason (e.g. GitHub fails to accept the bug report), please do not send bug reports via personal email. For discussions, join us in the IRC channel youtube dl (irc://chat.freenode.net/ youtube dl) on freenode ( webchat ). Please include the full output of youtube dl when run with v , i.e. add v flag to your command line , copy the whole output and post it in the issue body wrapped in \ \ \ for better formatting. It should look similar to this: $ youtube dl v debug System config: debug User config: debug Command line args: u' v', u' debug Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 debug youtube dl version 2015.12.06 debug Git HEAD: 135392e debug Python version 2.6.6 Windows 2003Server 5.2.3790 SP2 debug exe versions: ffmpeg N 75573 g1d0487f, ffprobe N 75573 g1d0487f, rtmpdump 2.4 debug Proxy map: {} ... Do not post screenshots of verbose logs; only plain text is acceptable. The output (including the first lines) contains important debugging information. Issues without the full output are often not reproducible and therefore do not get solved in short order, if ever. Please re read your issue once again to avoid a couple of common mistakes (you can and should use this as a checklist): Is the description of the issue itself sufficient? We often get issue reports that we cannot really decipher. While in most cases we eventually get the required information after asking back multiple times, this poses an unnecessary drain on our resources. Many contributors, including myself, are also not native speakers, so we may misread some parts. So please elaborate on what feature you are requesting, or what bug you want to be fixed. Make sure that it's obvious What the problem is How it could be fixed How your proposed solution would look like If your report is shorter than two lines, it is almost certainly missing some of these, which makes it hard for us to respond to it. We're often too polite to close the issue outright, but the missing info makes misinterpretation likely. As a committer myself, I often get frustrated by these issues, since the only possible way for me to move forward on them is to ask for clarification over and over. For bug reports, this means that your report should contain the complete output of youtube dl when called with the v flag. The error message you get for (most) bugs even says so, but you would not believe how many of our bug reports do not contain this information. If your server has multiple IPs or you suspect censorship, adding call home may be a good idea to get more diagnostics. If the error is ERROR: Unable to extract ... and you cannot reproduce it from multiple countries, add dump pages (warning: this will yield a rather large output, redirect it to the file log.txt by adding >log.txt 2>&1 to your command line) or upload the .dump files you get when you add write pages somewhere . Site support requests must contain an example URL . An example URL is a URL you might want to download, like There should be an obvious video present. Except under very special circumstances, the main page of a video service (e.g. is not an example URL. Are you using the latest version? Before reporting any issue, type youtube dl U . This should report that you're up to date. About 20% of the reports we receive are already fixed, but people are using outdated versions. This goes for feature requests as well. Is the issue already documented? Make sure that someone has not already opened the issue you're trying to open. Search at the top of the window or browse the GitHub Issues of this repository. If there is an issue, feel free to write something along the lines of This affects me as well, with version 2015.01.01. Here is some more information on the issue: ... . While some issues may be old, a new post into them often spurs rapid activity. Why are existing options not enough? Before requesting a new feature, please have a quick peek at the list of supported options . Many feature requests are for features that actually exist already! Please, absolutely do show off your work in the issue report and detail how the existing similar options do not solve your problem. Is there enough context in your bug report? People want to solve problems, and often think they do us a favor by breaking down their larger problems (e.g. wanting to skip already downloaded files) to a specific request (e.g. requesting us to look whether the file exists before downloading the info page). However, what often happens is that they break down the problem into two steps: One simple, and one impossible (or extremely complicated one). We are then presented with a very complicated request when the original problem could be solved far easier, e.g. by recording the downloaded video IDs in a separate file. To avoid this, you must include the greater context where it is non obvious. In particular, every feature request that does not consist of adding support for a new site should contain a use case scenario that explains in what situation the missing feature would be useful. Does the issue involve one problem, and one problem only? Some of our users seem to think there is a limit of issues they can or should open. There is no limit of issues they can or should open. While it may seem appealing to be able to dump all your issues into one ticket, that means that someone who solves one of your issues cannot mark the issue as closed. Typically, reporting a bunch of issues leads to the ticket lingering since nobody wants to attack that behemoth, until someone mercifully splits the issue into multiple ones. In particular, every site support request issue should only pertain to services at one site (generally under a common domain, but always using the same backend technology). Do not request support for vimeo user videos, White house podcasts, and Google Plus pages in the same issue. Also, make sure that you don't post bug reports alongside feature requests. As a rule of thumb, a feature request does not include outputs of youtube dl that are not immediately related to the feature at hand. Do not post reports of a network error alongside the request for a new video service. Is anyone going to need the feature? Only post features that you (or an incapacitated friend you can personally talk to) require. Do not post features because they seem like a good idea. If they are really useful, they will be requested by someone who requires them. Is your question about youtube dl? It may sound strange, but some bug reports we receive are completely unrelated to youtube dl and relate to a different, or even the reporter's own, application. Please make sure that you are actually using youtube dl. If you are using a UI for youtube dl, report the bug to the maintainer of the actual application providing the UI. On the other hand, if your UI for youtube dl fails in some way you believe is related to youtube dl, by all means, go ahead and report the bug. COPYRIGHT youtube dl is released into the public domain by the copyright holders. This README file was originally written by Daniel Bolton and is likewise released into the public domain.",Unknown,Unknown 295,Unknown,Unknown,Unknown,"fabric_deploy Overview Capistrano like deploy recipe for Fabric. Requirements Fabric Usage This recipe is just a template for basic deploy procedures. You may need to override your own tasks in your fabfile.py. Initialize directory structure for development stage. % fab development deploy.setup Deploy application to development stage. % fab development deploy Rollback to previously deployed application. % fab development deploy.rollback Clean up old applications. % fab development deploy.cleanup Privilege configurations This recipe assumes that you can ssh by user named deploy and app by default. deploy (user) Used for application deployment. Belongs same group as app. sudo(8) should be granted without password. app (runner) Used for running applications Belongs same group as deploy. No sudo(8) required. You can change these names by overriding user and runner options. Examples Following is a sample tasks for multistage deployment ( development and production ). Uses supervisord for service management. This exapmle consists from 2 files. ./fabfile/__init__.py Basic configuration for deployment ./fabfile/deploy.py Overridden tasks for your deployment ./fabfile/__init__.py from fabric.api import from fabric_deploy import options import deploy options.set('scm', 'git') options.set('application', 'myapp') options.set('repository', 'git@githum.com:yyuu/myapp.git') options.set('supervisord_pid', (lambda: '%(dir)s/tmp/pids/supervisord.pid' % dict(dir options.fetch('current_path')))) options.set('supervisord_conf', (lambda: '%(dir)s/supervisord.conf' % dict(dir options.fetch('current_path')))) @task def development(): options.set('current_stage', 'development') env.roledefs.update({'app': 'alpha' }) @task def production(): options.set('current_stage', 'production') env.roledefs.update({ 'app': 'zulu' }) ./fabfile/deploy.py from fabric_deploy.deploy import @task @roles('app') def restart(): with cd(fetch('current_path')): result sudo( (test f %(supervisord_pid)s && kill HUP cat %(supervisord_pid)s ) %(virtualenv)s/bin/supervisord c %(supervisord_conf)s % var('virtualenv', 'supervisord_pid', 'supervisord_conf'), user fetch('runner')) Author Yamashita, Yuu",Unknown,Unknown 296,Unknown,Unknown,Unknown,"Introduction This is a command line version of 750 Words . I really like the idea of writing 750 words daily, but I don't like the idea of having all of my words on someone else's server. I also wanted to write my 750 words in Vim, with Git version control. Therefore, I wrote this script. You get all the benefits of writing 750 words daily, within the comfort of your preferred text editor. You also get the peace of mind that comes with having your words as tangible plain text files on your on machine, tracked with Git. You don't (yet) get the cool analysis features that are on the 750 words website, but this is something that I would like to improve. Installation Clone the repository, and execute: $ sudo python setup.py install Or whatever you do to install Python scripts on your machine. If you want version control with Git, you're going to want to install that as well. Configuration There are three things to configure: which editor to use, where to store your words, and what file extension to use. Edit the configuration file /.750words/config to suit your needs: 750words editor vim extension .md directory /home/zach/docs/journal Usage Every day, type 750words and type at least 750 words. You'll experience major improvements in creativity and a clear mind. Here's the full output of 750words h : usage: 750words h p config CONFIG dates dates ... positional arguments: dates the date of the text optional arguments: h, help show this help message and exit p, path print out path for use with external scripts config CONFIG the location of the configuration file Have fun! Examples Write the day's words: bash $ 750words To see how much you wrote in August 2011: bash $ wc $(dirname $(750words p))/2011 08 750 Words :",Unknown,Unknown 297,Unknown,Unknown,Unknown,"django qmethod django qmethod is a library for easily defining operations on collections of Django models (that is, QuerySets and Managers). One day, I hope something like this is included in Django core. Usage Basic usage is as follows: python import cPickle as pickle from django.db import models from djqmethod import Manager, querymethod class Group(models.Model): pass class Person(models.Model): GENDERS dict(m 'Male', f 'Female', u 'Unspecified').items() group models.ForeignKey(Group, related_name 'people') gender models.CharField(max_length 1, choices GENDERS) age models.PositiveIntegerField() Note: you need to create an explicit manager here. objects Manager() @querymethod def minors(query): return query.filter(age__lt 18) @querymethod def adults(query): return query.filter(age__gte 18) The minors() and adults() methods will be available on the manager: assert isinstance(Person.objects.minors(), models.query.QuerySet) They'll be available on subsequent querysets: assert isinstance(Person.objects.filter(gender 'm').minors(), models.query.QuerySet) They'll also be available on relations, if you use the djqmethod.Manager as the default manager for the related model. group Group.objects.all() 0 assert isinstance(group.people.minors(), models.query.QuerySet) The QuerySets produced are totally pickle safe: assert isinstance(pickle.loads(pickle.dumps(Person.objects.minors())), models.query.QuerySet) A test project is located in test/example/ ; consult this for a more comprehensive example. Installation pip install django qmethod (Un)license This is free and unencumbered software released into the public domain. Anyone is free to copy, modify, publish, use, compile, sell, or distribute this software, either in source code form or as a compiled binary, for any purpose, commercial or non commercial, and by any means. In jurisdictions that recognize copyright laws, the author or authors of this software dedicate any and all copyright interest in the software to the public domain. We make this dedication for the benefit of the public at large and to the detriment of our heirs and successors. We intend this dedication to be an overt act of relinquishment in perpetuity of all present and future rights to this software under copyright law. THE SOFTWARE IS PROVIDED AS IS , WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. For more information, please refer to",Unknown,Unknown 298,Unknown,Unknown,Unknown,"EchoNestPy A python based API for integrating the Amazon Echo to the Nest Thermostat EchoNestPy is a python based API server running on flask. It allows the Amazon Echo to talk to the server and then the server will talk to your Nest allowing you to control it. This version is setup to allow multiple Amazon Echo users to share the same server to control their own Nest without interfering with the other users Nests. This means that each Amazon Echo UserID is tied to their own Nest Token and it remembers which Echo UserID is tied to each Nest Token. It uses pickle to write the datastore to a file on the disk so that when the server is restarted it does not require each user to re authenticate the nest again. Right now it will control multiple Nests in the same house however when setting the temperature to a set value, it will set all Nests to the same temperature. But when changing the temperature by warmer or cooler it will +/ 2 degrees f to each Nest, even if they are different values. Sample Interactions: Alexa, Talk to Nest What can I say? Alexa, Tell Nest to set the temperature to 76 degrees. Alexa, Tell Nest that I am too warm. Alexa, Tell Nest to turn the temperature up. More details and videos at: Requirements and setup Local development environment Your computer or virtual environment needs the following installed before you go any further: Python PIP To run EchoNestPy, you'll need the python packages specified in requirements.txt (./requirements.txt). Once you have the above requirements installed on your computer, clone this repository, and run the following from the project root to get the environment setup for running EchoNestPy: 1. pip install r requirements.txt Setting Up Server The Alexa Skills Kit (ASK) requires that the server has an open connection to the internet on port 443 (HTTPS) with a SSL Certificate (self signed is okay). Right now this runs in flask with out HTTPS but I am looking into changing this. One way to work around this is to use STunnel4 or ngix forwarding to accecpt connections on 443 and connect to the app on port 5000. Setting Up Alexa Skills Kit on Amazon The ASK is available at: 2. Sign in or Create an Account. 2. Go to Apps & Services at the top of the page 2. Click on Alexa 2. Click Add New Skill 2. Fill out the first form: Name: Anything you want it to be I use Nest Control Invocation Name: The hotword to call the app I have gotten it working with Nest Version: 1.0 \ /alexa/EchoPyAPI 2. Go to the next page and copy the intentSchema.json to the Intent Schema and sampleUtterances.txt to the Sample Utterances 2. Go to the next page and upload the self signed SSL Cert you have.. and hit next.. Setting Up Nest Developer Token Nest developer is available at: 3. Sign In or Create an Account 3. Click on Clients 3. Click Register New Client 3. Fill out the form: Name: Anything I use EchoPy Support URL: If you have a domain.. or github? OAuth Redirect URL: or ip address\>/alexa/oauth2 Make sure that you click on read/write on all options that you can (some are unavailable) and give a short description (you have to..) 3. Click Update Client 3. From the clients page copy the Authorization URL and put it in nestpy_settings.py as nest_auth_uri_1 (sample at SAMPLE_nestpy_settings.py) 3. From the clients page copy the Access Token URL and put it in nestpy_settings.py as nest_auth_uri_2 Test At this point you should be able to go to or ip address\>/alexa/ and see a basic page.. If this works you're good to go! Usage run: python echopy.py At this time you will have to go to your Echo and say 'Alexa, Talk to Nest' (Replace Nest with what the Invocation Name you set). It should say that you are an unautherized Nest user and to check the card in your Echo App. Open the Echo app and look at the card there. It should give you what your User ID is.. (A bunch of random text) Go to or ip address\>/alexa/auth/\ This should allow you to authorize it to control your nest. Login to your Nest account and Authorize it.. It should bring you back to the root Alexa page. You should be good to start using EchoNestPy Notes The NestPy is another project that I have been working on that I am about to post to github that is a standalone python based API for Nest. To Do: Add change mode to Nest and Alexa for Away / Home Add better support for multi Nest Households. Add check in time of inbound requests for security. Improve sample utterances Add better help.",Unknown,Unknown 299,Unknown,Unknown,Unknown,"Zulip overview Zulip is a powerful, open source group chat application that combines the immediacy of real time chat with the productivity benefits of threaded conversations. Zulip is used by open source projects, Fortune 500 companies, large standards bodies, and others who need a real time chat system that allows users to easily process hundreds or thousands of messages a day. With over 500 contributors merging over 500 commits a month, Zulip is also the largest and fastest growing open source group chat project. CircleCI branch Travis Build Status Coverage Status Mypy coverage mypy coverage GitHub release docs Zulip chat Twitter mypy coverage : Getting started Click on the appropriate link below. If nothing seems to apply, join us on the Zulip community server and tell us what's up! You might be interested in: Contributing code . Check out our guide for new contributors to get started. Zulip prides itself on maintaining a clean and well tested codebase, and a stock of hundreds of beginner friendly issues beginner friendly . Contributing non code . Report an issue , translate Zulip into your language, write for the Zulip blog, or give us feedback . We would love to hear from you, even if you're just trying the product out. Supporting Zulip . Advocate for your organization to use Zulip, write a review in the mobile app stores, or upvote Zulip on product comparison sites. Checking Zulip out . The best way to see Zulip in action is to drop by the Zulip community server . We also recommend reading Zulip for open source , Zulip for companies , or Zulip for working groups and part time communities . Running a Zulip server . Setting up a server takes just a couple of minutes. Zulip runs on Ubuntu 18.04 Bionic, Ubuntu 16.04 Xenial, Ubuntu 14.04 Trusty, and Debian 9 Stretch. The installation process is documented here . Commercial support is available; see for details. Using Zulip without setting up a server . offers free and commercial hosting. Applying for a Zulip internship . Zulip runs internship programs with Outreachy , Google Summer of Code , and the MIT Externship program . Zulip also participates in Google Code In . More information is available here . You may also be interested in reading our blog or following us on twitter . Zulip is distributed under the Apache 2.0 license. beginner friendly :",Unknown,Unknown 2007,Natural Language Processing,Natural Language Processing,Natural Language Processing,Neural Machine Translation English translation to French using Tensorflow seq2seq model Use Tensorflow to build a seq2seq model and train on a small dataset of English French sentence pairs. Use the whole dataset for training and inference due to data limitaion. NMT1: Build encoder decoder model with attention mechanism for machine translation. Use the default graph and reuse the weights between training and inference. NMT2: Build 2 different graphs and sessions for training and inference. Share the weights by saver. NMT3: Add visualization of training loss and two graphs by tensorboard. Use multilayer LSTM and beam search and dropout Reference:,Machine Translation,Machine Translation 2011,Natural Language Processing,Natural Language Processing,Natural Language Processing,"BERT \ \ \ \ \ New November 23rd, 2018: Un normalized multilingual model + Thai + Mongolian \ \ \ \ \ We uploaded a new multilingual model which does not perform any normalization on the input (no lower casing, accent stripping, or Unicode normalization), and additionally inclues Thai and Mongolian. It is recommended to use this version for developing multilingual models, especially on languages with non Latin alphabets. This does not require any code changes, and can be downloaded here: BERT Base, Multilingual Cased : 104 languages, 12 layer, 768 hidden, 12 heads, 110M parameters \ \ \ \ \ New November 15th, 2018: SOTA SQuAD 2.0 System \ \ \ \ \ We released code changes to reproduce our 83% F1 SQuAD 2.0 system, which is currently 1st place on the leaderboard by 3%. See the SQuAD 2.0 section of the README for details. \ \ \ \ \ New November 5th, 2018: Third party PyTorch and Chainer versions of BERT available \ \ \ \ \ NLP researchers from HuggingFace made a PyTorch version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. Sosuke Kobayashi also made a Chainer version of BERT available (Thanks!) We were not involved in the creation or maintenance of the PyTorch implementation so please direct any questions towards the authors of that repository. \ \ \ \ \ New November 3rd, 2018: Multilingual and Chinese models available \ \ \ \ \ We have made two new BERT models available: BERT Base, Multilingual (Not recommended, use Multilingual Cased instead) : 102 languages, 12 layer, 768 hidden, 12 heads, 110M parameters BERT Base, Chinese : Chinese Simplified and Traditional, 12 layer, 768 hidden, 12 heads, 110M parameters We use character based tokenization for Chinese, and WordPiece tokenization for all other languages. Both models should work out of the box without any code changes. We did update the implementation of BasicTokenizer in tokenization.py to support Chinese character tokenization, so please update if you forked it. However, we did not change the tokenization API. For more, see the Multilingual README . \ \ \ \ \ End new information \ \ \ \ \ Introduction BERT , or B idirectional E ncoder R epresentations from T ransformers, is a new method of pre training language representations which obtains state of the art results on a wide array of Natural Language Processing (NLP) tasks. Our academic paper which describes BERT in detail and provides full results on a number of tasks can be found here: To give a few numbers, here are the results on the SQuAD v1.1 question answering task: SQuAD v1.1 Leaderboard (Oct 8th 2018) Test EM Test F1 : : : : 1st Place Ensemble BERT 87.4 93.2 2nd Place Ensemble nlnet 86.0 91.7 1st Place Single Model BERT 85.1 91.8 2nd Place Single Model nlnet 83.5 90.1 And several natural language inference tasks: System MultiNLI Question NLI SWAG : : : : : : BERT 86.7 91.1 86.3 OpenAI GPT (Prev. SOTA) 82.2 88.1 75.0 Plus many other tasks. Moreover, these results were all obtained with almost no task specific neural network architecture design. If you already know what BERT is and you just want to get started, you can download the pre trained models ( pre trained models) and run a state of the art fine tuning ( fine tuning with bert) in only a few minutes. What is BERT? BERT is a method of pre training language representations, meaning that we train a general purpose language understanding model on a large text corpus (like Wikipedia), and then use that model for downstream NLP tasks that we care about (like question answering). BERT outperforms previous methods because it is the first unsupervised , deeply bidirectional system for pre training NLP. Unsupervised means that BERT was trained using only a plain text corpus, which is important because an enormous amount of plain text data is publicly available on the web in many languages. Pre trained representations can also either be context free or contextual , and contextual representations can further be unidirectional or bidirectional . Context free models such as word2vec or GloVe generate a single word embedding representation for each word in the vocabulary, so bank would have the same representation in bank deposit and river bank . Contextual models instead generate a representation of each word that is based on the other words in the sentence. BERT was built upon recent work in pre training contextual representations — including Semi supervised Sequence Learning , Generative Pre Training , ELMo , and ULMFit — but crucially these models are all unidirectional or shallowly bidirectional . This means that each word is only contextualized using the words to its left (or right). For example, in the sentence I made a bank deposit the unidirectional representation of bank is only based on I made a but not deposit . Some previous work does combine the representations from separate left context and right context models, but only in a shallow manner. BERT represents bank using both its left and right context — I made a ... deposit — starting from the very bottom of a deep neural network, so it is deeply bidirectional . BERT uses a simple approach for this: We mask out 15% of the words in the input, run the entire sequence through a deep bidirectional Transformer encoder, and then predict only the masked words. For example: Input: the man went to the MASK1 . he bought a MASK2 of milk. Labels: MASK1 store; MASK2 gallon In order to learn relationships between sentences, we also train on a simple task which can be generated from any monolingual corpus: Given two sentences A and B , is B the actual next sentence that comes after A , or just a random sentence from the corpus? Sentence A: the man went to the store . Sentence B: he bought a gallon of milk . Label: IsNextSentence Sentence A: the man went to the store . Sentence B: penguins are flightless . Label: NotNextSentence We then train a large model (12 layer to 24 layer Transformer) on a large corpus (Wikipedia + BookCorpus ) for a long time (1M update steps), and that's BERT. Using BERT has two stages: Pre training and fine tuning . Pre training is fairly expensive (four days on 4 to 16 Cloud TPUs), but is a one time procedure for each language (current models are English only, but multilingual models will be released in the near future). We are releasing a number of pre trained models from the paper which were pre trained at Google. Most NLP researchers will never need to pre train their own model from scratch. Fine tuning is inexpensive. All of the results in the paper can be replicated in at most 1 hour on a single Cloud TPU, or a few hours on a GPU, starting from the exact same pre trained model. SQuAD, for example, can be trained in around 30 minutes on a single Cloud TPU to achieve a Dev F1 score of 91.0%, which is the single system state of the art. The other important aspect of BERT is that it can be adapted to many types of NLP tasks very easily. In the paper, we demonstrate state of the art results on sentence level (e.g., SST 2), sentence pair level (e.g., MultiNLI), word level (e.g., NER), and span level (e.g., SQuAD) tasks with almost no task specific modifications. What has been released in this repository? We are releasing the following: TensorFlow code for the BERT model architecture (which is mostly a standard Transformer architecture). Pre trained checkpoints for both the lowercase and cased version of BERT Base and BERT Large from the paper. TensorFlow code for push button replication of the most important fine tuning experiments from the paper, including SQuAD, MultiNLI, and MRPC. All of the code in this repository works out of the box with CPU, GPU, and Cloud TPU. Pre trained models We are releasing the BERT Base and BERT Large models from the paper. Uncased means that the text has been lowercased before WordPiece tokenization, e.g., John Smith becomes john smith . The Uncased model also strips out any accent markers. Cased means that the true case and accent markers are preserved. Typically, the Uncased model is better unless you know that case information is important for your task (e.g., Named Entity Recognition or Part of Speech tagging). These models are all released under the same license as the source code (Apache 2.0). For information about the Multilingual and Chinese model, see the Multilingual README . When using a cased model, make sure to pass do_lower False to the training scripts. (Or pass do_lower_case False directly to FullTokenizer if you're using your own script.) The links to the models are here (right click, 'Save link as...' on the name): BERT Base, Uncased : 12 layer, 768 hidden, 12 heads, 110M parameters BERT Large, Uncased : 24 layer, 1024 hidden, 16 heads, 340M parameters BERT Base, Cased : 12 layer, 768 hidden, 12 heads , 110M parameters BERT Large, Cased : 24 layer, 1024 hidden, 16 heads, 340M parameters BERT Base, Multilingual Cased (New, recommended) : 104 languages, 12 layer, 768 hidden, 12 heads, 110M parameters BERT Base, Multilingual Uncased (Orig, not recommended) (Not recommended, use Multilingual Cased instead) : 102 languages, 12 layer, 768 hidden, 12 heads, 110M parameters BERT Base, Chinese : Chinese Simplified and Traditional, 12 layer, 768 hidden, 12 heads, 110M parameters Each .zip file contains three items: A TensorFlow checkpoint ( bert_model.ckpt ) containing the pre trained weights (which is actually 3 files). A vocab file ( vocab.txt ) to map WordPiece to word id. A config file ( bert_config.json ) which specifies the hyperparameters of the model. Fine tuning with BERT Important : All results on the paper were fine tuned on a single Cloud TPU, which has 64GB of RAM. It is currently not possible to re produce most of the BERT Large results on the paper using a GPU with 12GB 16GB of RAM, because the maximum batch size that can fit in memory is too small. We are working on adding code to this repository which allows for much larger effective batch size on the GPU. See the section on out of memory issues ( out of memory issues) for more details. This code was tested with TensorFlow 1.11.0. It was tested with Python2 and Python3 (but more thoroughly with Python2, since this is what's used internally in Google). The fine tuning examples which use BERT Base should be able to run on a GPU that has at least 12GB of RAM using the hyperparameters given. Fine tuning with Cloud TPUs Most of the examples below assumes that you will be running training/evaluation on your local machine, using a GPU like a Titan X or GTX 1080. However, if you have access to a Cloud TPU that you want to train on, just add the following flags to run_classifier.py or run_squad.py : use_tpu True \ tpu_name $TPU_NAME Please see the Google Cloud TPU tutorial for how to use Cloud TPUs. Alternatively, you can use the Google Colab notebook BERT FineTuning with Cloud TPUs . On Cloud TPUs, the pretrained model and the output directory will need to be on Google Cloud Storage. For example, if you have a bucket named some_bucket , you might use the following flags instead: output_dir gs://some_bucket/my_output_dir/ The unzipped pre trained model files can also be found in the Google Cloud Storage folder gs://bert_models/2018_10_18 . For example: export BERT_BASE_DIR gs://bert_models/2018_10_18/uncased_L 12_H 768_A 12 Sentence (and sentence pair) classification tasks Before running this example you must download the GLUE data by running this script and unpack it to some directory $GLUE_DIR . Next, download the BERT Base checkpoint and unzip it to some directory $BERT_BASE_DIR . This example code fine tunes BERT Base on the Microsoft Research Paraphrase Corpus (MRPC) corpus, which only contains 3,600 examples and can fine tune in a few minutes on most GPUs. shell export BERT_BASE_DIR /path/to/bert/uncased_L 12_H 768_A 12 export GLUE_DIR /path/to/glue python run_classifier.py \ task_name MRPC \ do_train true \ do_eval true \ data_dir $GLUE_DIR/MRPC \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ max_seq_length 128 \ train_batch_size 32 \ learning_rate 2e 5 \ num_train_epochs 3.0 \ output_dir /tmp/mrpc_output/ You should see output like this: Eval results eval_accuracy 0.845588 eval_loss 0.505248 global_step 343 loss 0.505248 This means that the Dev set accuracy was 84.55%. Small sets like MRPC have a high variance in the Dev set accuracy, even when starting from the same pre training checkpoint. If you re run multiple times (making sure to point to different output_dir ), you should see results between 84% and 88%. A few other pre trained models are implemented off the shelf in run_classifier.py , so it should be straightforward to follow those examples to use BERT for any single sentence or sentence pair classification task. Note: You might see a message Running train on CPU . This really just means that it's running on something other than a Cloud TPU, which includes a GPU. Prediction from classifier Once you have trained your classifier you can use it in inference mode by using the do_predict true command. You need to have a file named test.tsv in the input folder. Output will be created in file called test_results.tsv in the output folder. Each line will contain output for each sample, columns are the class probabilities. shell export BERT_BASE_DIR /path/to/bert/uncased_L 12_H 768_A 12 export GLUE_DIR /path/to/glue export TRAINED_CLASSIFIER /path/to/fine/tuned/classifier python run_classifier.py \ task_name MRPC \ do_predict true \ data_dir $GLUE_DIR/MRPC \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $TRAINED_CLASSIFIER \ max_seq_length 128 \ output_dir /tmp/mrpc_output/ SQuAD 1.1 The Stanford Question Answering Dataset (SQuAD) is a popular question answering benchmark dataset. BERT (at the time of the release) obtains state of the art results on SQuAD with almost no task specific network architecture modifications or data augmentation. However, it does require semi complex data pre processing and post processing to deal with (a) the variable length nature of SQuAD context paragraphs, and (b) the character level answer annotations which are used for SQuAD training. This processing is implemented and documented in run_squad.py . To run on SQuAD, you will first need to download the dataset. The SQuAD website does not seem to link to the v1.1 datasets any longer, but the necessary files can be found here: train v1.1.json dev v1.1.json evaluate v1.1.py Download these to some directory $SQUAD_DIR . The state of the art SQuAD results from the paper currently cannot be reproduced on a 12GB 16GB GPU due to memory constraints (in fact, even batch size 1 does not seem to fit on a 12GB GPU using BERT Large ). However, a reasonably strong BERT Base model can be trained on the GPU with these hyperparameters: shell python run_squad.py \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ do_train True \ train_file $SQUAD_DIR/train v1.1.json \ do_predict True \ predict_file $SQUAD_DIR/dev v1.1.json \ train_batch_size 12 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir /tmp/squad_base/ The dev set predictions will be saved into a file called predictions.json in the output_dir : shell python $SQUAD_DIR/evaluate v1.1.py $SQUAD_DIR/dev v1.1.json ./squad/predictions.json Which should produce an output like this: shell { f1 : 88.41249612335034, exact_match : 81.2488174077578} You should see a result similar to the 88.5% reported in the paper for BERT Base . If you have access to a Cloud TPU, you can train with BERT Large . Here is a set of hyperparameters (slightly different than the paper) which consistently obtain around 90.5% 91.0% F1 single system trained only on SQuAD: shell python run_squad.py \ vocab_file $BERT_LARGE_DIR/vocab.txt \ bert_config_file $BERT_LARGE_DIR/bert_config.json \ init_checkpoint $BERT_LARGE_DIR/bert_model.ckpt \ do_train True \ train_file $SQUAD_DIR/train v1.1.json \ do_predict True \ predict_file $SQUAD_DIR/dev v1.1.json \ train_batch_size 24 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir gs://some_bucket/squad_large/ \ use_tpu True \ tpu_name $TPU_NAME For example, one random run with these parameters produces the following Dev scores: shell { f1 : 90.87081895814865, exact_match : 84.38978240302744} If you fine tune for one epoch on TriviaQA before this the results will be even better, but you will need to convert TriviaQA into the SQuAD json format. SQuAD 2.0 This model is also implemented and documented in run_squad.py . To run on SQuAD 2.0, you will first need to download the dataset. The necessary files can be found here: train v2.0.json dev v2.0.json evaluate v2.0.py Download these to some directory $SQUAD_DIR . On Cloud TPU you can run with BERT Large as follows: shell python run_squad.py \ vocab_file $BERT_LARGE_DIR/vocab.txt \ bert_config_file $BERT_LARGE_DIR/bert_config.json \ init_checkpoint $BERT_LARGE_DIR/bert_model.ckpt \ do_train True \ train_file $SQUAD_DIR/train v2.0.json \ do_predict True \ predict_file $SQUAD_DIR/dev v2.0.json \ train_batch_size 24 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir gs://some_bucket/squad_large/ \ use_tpu True \ tpu_name $TPU_NAME \ version_2_with_negative True We assume you have copied everything from the output directory to a local directory called ./squad/. The initial dev set predictions will be at ./squad/predictions.json and the differences between the score of no answer ( ) and the best non null answer for each question will be in the file ./squad/null_odds.json Run this script to tune a threshold for predicting null versus non null answers: python $SQUAD_DIR/evaluate v2.0.py $SQUAD_DIR/dev v2.0.json ./squad/predictions.json na prob file ./squad/null_odds.json Assume the script outputs best_f1_thresh THRESH. (Typical values are between 1.0 and 5.0). You can now re run the model to generate predictions with the derived threshold or alternatively you can extract the appropriate answers from ./squad/nbest_predictions.json. shell python run_squad.py \ vocab_file $BERT_LARGE_DIR/vocab.txt \ bert_config_file $BERT_LARGE_DIR/bert_config.json \ init_checkpoint $BERT_LARGE_DIR/bert_model.ckpt \ do_train False \ train_file $SQUAD_DIR/train v2.0.json \ do_predict True \ predict_file $SQUAD_DIR/dev v2.0.json \ train_batch_size 24 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir gs://some_bucket/squad_large/ \ use_tpu True \ tpu_name $TPU_NAME \ version_2_with_negative True \ null_score_diff_threshold $THRESH Out of memory issues All experiments in the paper were fine tuned on a Cloud TPU, which has 64GB of device RAM. Therefore, when using a GPU with 12GB 16GB of RAM, you are likely to encounter out of memory issues if you use the same hyperparameters described in the paper. The factors that affect memory usage are: max_seq_length : The released models were trained with sequence lengths up to 512, but you can fine tune with a shorter max sequence length to save substantial memory. This is controlled by the max_seq_length flag in our example code. train_batch_size : The memory usage is also directly proportional to the batch size. Model type, BERT Base vs. BERT Large : The BERT Large model requires significantly more memory than BERT Base . Optimizer : The default optimizer for BERT is Adam, which requires a lot of extra memory to store the m and v vectors. Switching to a more memory efficient optimizer can reduce memory usage, but can also affect the results. We have not experimented with other optimizers for fine tuning. Using the default training scripts ( run_classifier.py and run_squad.py ), we benchmarked the maximum batch size on single Titan X GPU (12GB RAM) with TensorFlow 1.11.0: System Seq Length Max Batch Size BERT Base 64 64 ... 128 32 ... 256 16 ... 320 14 ... 384 12 ... 512 6 BERT Large 64 12 ... 128 6 ... 256 2 ... 320 1 ... 384 0 ... 512 0 Unfortunately, these max batch sizes for BERT Large are so small that they will actually harm the model accuracy, regardless of the learning rate used. We are working on adding code to this repository which will allow much larger effective batch sizes to be used on the GPU. The code will be based on one (or both) of the following techniques: Gradient accumulation : The samples in a minibatch are typically independent with respect to gradient computation (excluding batch normalization, which is not used here). This means that the gradients of multiple smaller minibatches can be accumulated before performing the weight update, and this will be exactly equivalent to a single larger update. Gradient checkpointing : The major use of GPU/TPU memory during DNN training is caching the intermediate activations in the forward pass that are necessary for efficient computation in the backward pass. Gradient checkpointing trades memory for compute time by re computing the activations in an intelligent way. However, this is not implemented in the current release. Using BERT to extract fixed feature vectors (like ELMo) In certain cases, rather than fine tuning the entire pre trained model end to end, it can be beneficial to obtained pre trained contextual embeddings , which are fixed contextual representations of each input token generated from the hidden layers of the pre trained model. This should also mitigate most of the out of memory issues. As an example, we include the script extract_features.py which can be used like this: shell Sentence A and Sentence B are separated by the delimiter for sentence pair tasks like question answering and entailment. For single sentence inputs, put one sentence per line and DON'T use the delimiter. echo 'Who was Jim Henson ? Jim Henson was a puppeteer' > /tmp/input.txt python extract_features.py \ input_file /tmp/input.txt \ output_file /tmp/output.jsonl \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ layers 1, 2, 3, 4 \ max_seq_length 128 \ batch_size 8 This will create a JSON file (one line per line of input) containing the BERT activations from each Transformer layer specified by layers ( 1 is the final hidden layer of the Transformer, etc.) Note that this script will produce very large output files (by default, around 15kb for every input token). If you need to maintain alignment between the original and tokenized words (for projecting training labels), see the Tokenization ( tokenization) section below. Note: You may see a message like Could not find trained model in model_dir: /tmp/tmpuB5g5c, running initialization to predict. This message is expected, it just means that we are using the init_from_checkpoint() API rather than the saved model API. If you don't specify a checkpoint or specify an invalid checkpoint, this script will complain. Tokenization For sentence level tasks (or sentence pair) tasks, tokenization is very simple. Just follow the example code in run_classifier.py and extract_features.py . The basic procedure for sentence level tasks is: 1. Instantiate an instance of tokenizer tokenization.FullTokenizer 2. Tokenize the raw text with tokens tokenizer.tokenize(raw_text) . 3. Truncate to the maximum sequence length. (You can use up to 512, but you probably want to use shorter if possible for memory and speed reasons.) 4. Add the CLS and SEP tokens in the right place. Word level and span level tasks (e.g., SQuAD and NER) are more complex, since you need to maintain alignment between your input text and output text so that you can project your training labels. SQuAD is a particularly complex example because the input labels are character based, and SQuAD paragraphs are often longer than our maximum sequence length. See the code in run_squad.py to show how we handle this. Before we describe the general recipe for handling word level tasks, it's important to understand what exactly our tokenizer is doing. It has three main steps: 1. Text normalization : Convert all whitespace characters to spaces, and (for the Uncased model) lowercase the input and strip out accent markers. E.g., John Johanson's, → john johanson's, . 2. Punctuation splitting : Split all punctuation characters on both sides (i.e., add whitespace around all punctuation characters). Punctuation characters are defined as (a) Anything with a P Unicode class, (b) any non letter/number/space ASCII character (e.g., characters like $ which are technically not punctuation). E.g., john johanson's, → john johanson ' s , 3. WordPiece tokenization : Apply whitespace tokenization to the output of the above procedure, and apply WordPiece tokenization to each token separately. (Our implementation is directly based on the one from tensor2tensor , which is linked). E.g., john johanson ' s , → john johan son ' s , The advantage of this scheme is that it is compatible with most existing English tokenizers. For example, imagine that you have a part of speech tagging task which looks like this: Input: John Johanson 's house Labels: NNP NNP POS NN The tokenized output will look like this: Tokens: john johan son ' s house Crucially, this would be the same output as if the raw text were John Johanson's house (with no space before the 's ). If you have a pre tokenized representation with word level annotations, you can simply tokenize each input word independently, and deterministically maintain an original to tokenized alignment: python Input orig_tokens John , Johanson , 's , house labels NNP , NNP , POS , NN Output bert_tokens Token map will be an int > int mapping between the orig_tokens index and the bert_tokens index. orig_to_tok_map tokenizer tokenization.FullTokenizer( vocab_file vocab_file, do_lower_case True) bert_tokens.append( CLS ) for orig_token in orig_tokens: orig_to_tok_map.append(len(bert_tokens)) bert_tokens.extend(tokenizer.tokenize(orig_token)) bert_tokens.append( SEP ) bert_tokens CLS , john , johan , son , ' , s , house , SEP orig_to_tok_map 1, 2, 4, 6 Now orig_to_tok_map can be used to project labels to the tokenized representation. There are common English tokenization schemes which will cause a slight mismatch between how BERT was pre trained. For example, if your input tokenization splits off contractions like do n't , this will cause a mismatch. If it is possible to do so, you should pre process your data to convert these back to raw looking text, but if it's not possible, this mismatch is likely not a big deal. Pre training with BERT We are releasing code to do masked LM and next sentence prediction on an arbitrary text corpus. Note that this is not the exact code that was used for the paper (the original code was written in C++, and had some additional complexity), but this code does generate pre training data as described in the paper. Here's how to run the data generation. The input is a plain text file, with one sentence per line. (It is important that these be actual sentences for the next sentence prediction task). Documents are delimited by empty lines. The output is a set of tf.train.Example s serialized into TFRecord file format. You can perform sentence segmentation with an off the shelf NLP toolkit such as spaCy . The create_pretraining_data.py script will concatenate segments until they reach the maximum sequence length to minimize computational waste from padding (see the script for more details). However, you may want to intentionally add a slight amount of noise to your input data (e.g., randomly truncate 2% of input segments) to make it more robust to non sentential input during fine tuning. This script stores all of the examples for the entire input file in memory, so for large data files you should shard the input file and call the script multiple times. (You can pass in a file glob to run_pretraining.py , e.g., tf_examples.tf_record .) The max_predictions_per_seq is the maximum number of masked LM predictions per sequence. You should set this to around max_seq_length masked_lm_prob (the script doesn't do that automatically because the exact value needs to be passed to both scripts). shell python create_pretraining_data.py \ input_file ./sample_text.txt \ output_file /tmp/tf_examples.tfrecord \ vocab_file $BERT_BASE_DIR/vocab.txt \ do_lower_case True \ max_seq_length 128 \ max_predictions_per_seq 20 \ masked_lm_prob 0.15 \ random_seed 12345 \ dupe_factor 5 Here's how to run the pre training. Do not include init_checkpoint if you are pre training from scratch. The model configuration (including vocab size) is specified in bert_config_file . This demo code only pre trains for a small number of steps (20), but in practice you will probably want to set num_train_steps to 10000 steps or more. The max_seq_length and max_predictions_per_seq parameters passed to run_pretraining.py must be the same as create_pretraining_data.py . shell python run_pretraining.py \ input_file /tmp/tf_examples.tfrecord \ output_dir /tmp/pretraining_output \ do_train True \ do_eval True \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ train_batch_size 32 \ max_seq_length 128 \ max_predictions_per_seq 20 \ num_train_steps 20 \ num_warmup_steps 10 \ learning_rate 2e 5 This will produce an output like this: Eval results global_step 20 loss 0.0979674 masked_lm_accuracy 0.985479 masked_lm_loss 0.0979328 next_sentence_accuracy 1.0 next_sentence_loss 3.45724e 05 Note that since our sample_text.txt file is very small, this example training will overfit that data in only a few steps and produce unrealistically high accuracy numbers. Pre training tips and caveats If using your own vocabulary, make sure to change vocab_size in bert_config.json . If you use a larger vocabulary without changing this, you will likely get NaNs when training on GPU or TPU due to unchecked out of bounds access. If your task has a large domain specific corpus available (e.g., movie reviews or scientific papers ), it will likely be beneficial to run additional steps of pre training on your corpus, starting from the BERT checkpoint. The learning rate we used in the paper was 1e 4. However, if you are doing additional steps of pre training starting from an existing BERT checkpoint, you should use a smaller learning rate (e.g., 2e 5). Current BERT models are English only, but we do plan to release a multilingual model which has been pre trained on a lot of languages in the near future (hopefully by the end of November 2018). Longer sequences are disproportionately expensive because attention is quadratic to the sequence length. In other words, a batch of 64 sequences of length 512 is much more expensive than a batch of 256 sequences of length 128. The fully connected/convolutional cost is the same, but the attention cost is far greater for the 512 length sequences. Therefore, one good recipe is to pre train for, say, 90,000 steps with a sequence length of 128 and then for 10,000 additional steps with a sequence length of 512. The very long sequences are mostly needed to learn positional embeddings, which can be learned fairly quickly. Note that this does require generating the data twice with different values of max_seq_length . If you are pre training from scratch, be prepared that pre training is computationally expensive, especially on GPUs. If you are pre training from scratch, our recommended recipe is to pre train a BERT Base on a single preemptible Cloud TPU v2 , which takes about 2 weeks at a cost of about $500 USD (based on the pricing in October 2018). You will have to scale down the batch size when only training on a single Cloud TPU, compared to what was used in the paper. It is recommended to use the largest batch size that fits into TPU memory. Pre training data We will not be able to release the pre processed datasets used in the paper. For Wikipedia, the recommended pre processing is to download the latest dump , extract the text with WikiExtractor.py , and then apply any necessary cleanup to convert it into plain text. Unfortunately the researchers who collected the BookCorpus no longer have it available for public download. The Project Guttenberg Dataset is a somewhat smaller (200M word) collection of older books that are public domain. Common Crawl is another very large collection of text, but you will likely have to do substantial pre processing and cleanup to extract a usable corpus for pre training BERT. Learning a new WordPiece vocabulary This repository does not include code for learning a new WordPiece vocabulary. The reason is that the code used in the paper was implemented in C++ with dependencies on Google's internal libraries. For English, it is almost always better to just start with our vocabulary and pre trained models. For learning vocabularies of other languages, there are a number of open source options available. However, keep in mind that these are not compatible with our tokenization.py library: Google's SentencePiece library tensor2tensor's WordPiece generation script Rico Sennrich's Byte Pair Encoding library Using BERT in Colab If you want to use BERT with Colab , you can get started with the notebook BERT FineTuning with Cloud TPUs . At the time of this writing (October 31st, 2018), Colab users can access a Cloud TPU completely for free. Note: One per user, availability limited, requires a Google Cloud Platform account with storage (although storage may be purchased with free credit for signing up with GCP), and this capability may not longer be available in the future. Click on the BERT Colab that was just linked for more information. FAQ Is this code compatible with Cloud TPUs? What about GPUs? Yes, all of the code in this repository works out of the box with CPU, GPU, and Cloud TPU. However, GPU training is single GPU only. I am getting out of memory errors, what is wrong? See the section on out of memory issues ( out of memory issues) for more information. Is there a PyTorch version available? There is no official PyTorch implementation. However, NLP researchers from HuggingFace made a PyTorch version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. We were not involved in the creation or maintenance of the PyTorch implementation so please direct any questions towards the authors of that repository. Is there a Chainer version available? There is no official Chainer implementation. However, Sosuke Kobayashi made a Chainer version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. We were not involved in the creation or maintenance of the Chainer implementation so please direct any questions towards the authors of that repository. Will models in other languages be released? Yes, we plan to release a multi lingual BERT model in the near future. We cannot make promises about exactly which languages will be included, but it will likely be a single model which includes most of the languages which have a significantly sized Wikipedia. Will models larger than BERT Large be released? So far we have not attempted to train anything larger than BERT Large . It is possible that we will release larger models if we are able to obtain significant improvements. What license is this library released under? All code and models are released under the Apache 2.0 license. See the LICENSE file for more information. How do I cite BERT? For now, cite the Arxiv paper : @article{devlin2018bert, title {BERT: Pre training of Deep Bidirectional Transformers for Language Understanding}, author {Devlin, Jacob and Chang, Ming Wei and Lee, Kenton and Toutanova, Kristina}, journal {arXiv preprint arXiv:1810.04805}, year {2018} } If we submit the paper to a conference or journal, we will update the BibTeX. Disclaimer This is not an official Google product. Contact information For help or issues using BERT, please submit a GitHub issue. For personal communication related to BERT, please contact Jacob Devlin ( jacobdevlin@google.com ), Ming Wei Chang ( mingweichang@google.com ), or Kenton Lee ( kentonl@google.com ).",Machine Translation,Machine Translation 2018,Natural Language Processing,Natural Language Processing,Natural Language Processing,"This repository hosts the implementation of the paper Augmenting Neural Response Generation with Context Aware Topical Attention . Topical Hierarchical Recurrent Encoder Decoder (THRED) THRED is a multi turn response generation system intended to produce contextual and topic aware responses. The codebase is evolved from the Tensorflow NMT repository. Dependencies Python > 3.5 Tensorflow > 1.4.0 Tensorflow Hub SpaCy Gensim PyYAML tqdm redis 1 mistune 1 emot 1 prompt toolkit 2 1 packages required only for parsing and cleaning the Reddit data. 2 used only for testing dialogue models in command line interactive mode To install the dependencies using pip , run pip install r requirements . And for Anaconda, run conda env create f thred_env.yml (recommended). Data Our Reddit dataset is collected from 95 selected subreddits (listed here (corpora/reddit/subreddit_whitelist.txt)). We processed Reddit for a 12 month period ranging from December 2016 until December 2017 (excluding June and July; we utilized these two months to train an LDA model). Please see here (corpora/reddit) for the details of how the Reddit dataset is built including pre processing and cleaning the raw Reddit files. In the data files, each line corresponds to a single conversation where utterances are tab separated. Topic words appear after the last utterance by a delimiter ' ' (a vertical bar preceding and trailing two whitespaces). Embeddings First, pre trained word embedding models should be downloaded by running the following Python script: bash export PYTHONPATH . ; python util/get_embed.py The script downloads and extracts the GloVe embeddings file. The output is stored in the direcctory workspace/embeddings . Additionally, the following options are available: e, embedding_type glove or word2vec (default: glove) d, dimensions dimensions of embedding vectors (default: 300) f, embedding_file In case of using a non default embedding, you can provide an embedding file loadable by Gensim (default: None) In the model config files (explained below), the default embedding types can be either of the following: glove , word2vec , and tf_word2vec . Note that tf_word2vec refers to the pre trained word2vec provided in Tensorflow Hub Wiki words . If you intend to use the embeddings from Tensorflow Hub, there is no need to run the above command. Train The training configuration should be defined in a YAML file similar to Tensorflow NMT. Sample configurations for THRED and other baselines are provided here (conf). The implemented models are Seq2Seq , HRED , Topic Aware Seq2Seq , and THRED. Note that while most of the parameters are common among the different models, some models may have additional parameters (e.g., topical models have topic_words_per_utterance and boost_topic_gen_prob parameters). To train a model, run the following command: bash python main.py mode train config \ train_data dev_data test_data \ model_dir In , vocabulary files and Tensorflow model files are stored. Training can be resumed by executing: bash python main.py mode train model_dir Test With the following command, the model can be tested against the test dataset. bash python main.py mode test model_dir test_data It is possible to override test parameters during testing. These parameters are: beam width beam_width , length penalty weight length_penalty_weight , and sampling temperature sampling_temperature . A simple command line interface is implemented that allows you to converse with the learned model (Similar to test mode, the test parameters can be overrided too): bash python main.py mode interactive model_dir In the interactive mode, a pre trained LDA model is required to feed the inferred topic words into the model. We trained an LDA model using Gensim on a Reddit corpus, collected for this purpose. It can be downloaded from here . The downloaded file should be uncompressed and passed to the program via lda_model_dir . Citation Please cite the following paper if you used our work in your research: @article{dziri2018augmenting, title {Augmenting Neural Response Generation with Context Aware Topical Attention}, author {Dziri, Nouha and Kamalloo, Ehsan and Mathewson, Kory W and Zaiane, Osmar R}, journal {arXiv preprint arXiv:1811.01063}, year {2018} }",Machine Translation,Machine Translation 2036,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Fully Character Level Neural Machine Translation Theano implementation of the models described in the paper Fully Character Level Neural Machine Translation without Explicit Segmentation . We present code for training and decoding four different models: 1. bilingual bpe2char (from Chung et al., 2016 ). 2. bilingual char2char 3. multilingual bpe2char 4. multilingual char2char Dependencies Python Theano Numpy NLTK GPU CUDA (we recommend using the latest version. The version 8.0 was used in all our experiments.) Related code For preprocessing and evaluation, we used scripts from MOSES . This code is based on Subword NMT and dl4mt cdec . Downloading Datasets & Pre trained Models The original WMT'15 corpora can be downloaded from here . For the preprocessed corpora used in our experiments, see below. WMT'15 preprocessed corpora Standard version (for bilingual models, 3.5GB) Cyrillic converted to Latin (for multilingual models, 2.6GB) To obtain the pre trained top performing models, see below. Pre trained models (6.0GB) : Tarball updated on Nov 21st 2016. The CS EN bi char2char model in the previous tarball was not the best performing model. Training Details Using GPUs Do the following before executing train .py . bash $ export THEANO_FLAGS device gpu,floatX float32 With space permitting on your GPU, it may speed up training to use cnmem : bash $ export THEANO_FLAGS device gpu,floatX float32,lib.cnmem 0.95,allow_gc False On a pre 2016 Titan X GPU with 12GB RAM, our bpe2char models were trained with cnmem . Our char2char models (both bilingual and multilingual) were trained without cnmem (due to lack of RAM). Training models Before executing the following, modify train .py such that the correct directory containing WMT15 corpora is referenced. Bilingual bpe2char bash $ python bpe2char/train_bi_bpe2char.py translate Bilingual char2char bash $ python char2char/train_bi_char2char.py translate Multilingual bpe2char bash $ python bpe2char/train_multi_bpe2char.py Multilingual char2char bash $ python char2char/train_multi_char2char.py Checkpoint To resume training a model from a checkpoint, simply append re_load and re_load_old_setting above. Make sure the checkpoint resides in the correct directory ( .../dl4mt c2c/models ). Using Custom Datasets To train your models using your own dataset (and not the WMT'15 corpus), you first need to learn your vocabulary using build_dictionary_char.py or build_dictionary_word.py for char2char or bpe2char model, respectively. For the bpe2char model, you additionally need to learn your BPE segmentation rules on the source corpus using the Subword NMT repository (see below). Decoding Decoding WMT'15 validation / test files Before executing the following, modify translate .py such that the correct directory containing WMT15 corpora is referenced. bash $ export THEANO_FLAGS device gpu,floatX float32,lib.cnmem 0.95,allow_gc False $ python translate/translate_bpe2char.py model translate saveto which for bpe2char models $ python translate/translate_char2char.py model translate saveto which for char2char models When choosing which pre trained model to give to model , make sure to choose e.g. .grads.123000.npz . The models with .grads in their names are the optimal models and you should be decoding from those. Decoding an arbitrary file Remove which and append source . If you choose to decode your own source file, make sure it is: 1. properly tokenized (using preprocess/preprocess.sh ). 2. bpe tokenized for bpe2char models. 3. Cyrillic characters should be converted to Latin for multilingual models. Decoding multilingual models Append many (of course, provide a path to a multilingual model for model ). Evaluation We use the script from MOSES to compute the bleu score. The reference translations can be found in .../wmt15 . perl preprocess/multi bleu.perl reference.txt {codes_file} ./apply_bpe.py c {codes_file} < {test_file} Converting Cyrillic to Latin bash $ python preprocess/iso.py russian_source.txt will produce an output at russian_source.txt.iso9 . Citation @article{Lee:16, author {Jason Lee and Kyunghyun Cho and Thomas Hofmann}, title {Fully Character Level Neural Machine Translation without Explicit Segmentation}, year {2016}, journal {arXiv preprint arXiv:1610.03017}, }",Machine Translation,Machine Translation 2037,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Character Level Neural Machine Translation This is an implementation of the models described in the paper A Character Level Decoder without Explicit Segmentation for Neural Machine Translation . Dependencies: The majority of the script files are written in pure Theano. In the preprocessing pipeline, there are the following dependencies. Python Libraries: NLTK MOSES: Subword NMT : This code is based on the dl4mt library. link: Be sure to include the path to this library in your PYTHONPATH. We recommend you to use the latest version of Theano. If you want exact reproduction however, please use the following version of Theano. commit hash: fdfbab37146ee475b3fd17d8d104fb09bf3a8d5c Preparing Text Corpora: The original text corpora can be downloaded from Once the downloading is finished, use the 'preprocess.sh' in 'preprocess' directory to preprocess the text files. For the character level decoders, preprocessing is not necessary however, in order to compare the results with subword level decoders and other word level approaches, we apply the same process to all of the target corpora. Finally, use 'build_dictionary_char.py' for character case and 'build_dictionary_word.py' for subword case to build the vocabulary. Updating...",Machine Translation,Machine Translation 2038,Natural Language Processing,Natural Language Processing,Natural Language Processing,"This is the legacy version used for ICML 2018 rejected submission and kept here for reference Please checkout multigpu branch for latest (cleaner) version with newer experiments reported in EMNLP 2018 paper Deterministic Non Autoregressive Neural Sequence Modeling by Iterative Refinement PyTorch implementation of the models described in the paper Deterministic Non Autoregressive Neural Sequence Modeling by Iterative Refinement . We present code for training and decoding both autoregressive and non autoregressive models, as well as preprocessed datasets and pretrained models. Dependencies Python Python 3.6 PyTorch 0.3 Numpy NLTK torchtext torchvision GPU CUDA (we recommend using the latest version. The version 8.0 was used in all our experiments.) Related code For preprocessing, we used the scripts from Moses and Subword NMT . This code is based on NA NMT . Downloading Datasets & Pre trained Models The original translation corpora can be downloaded from ( IWLST'16 En De , WMT'16 En Ro , WMT'15 En De , MS COCO ). For the preprocessed corpora and pre trained models, see below. Dataset Model IWSLT'16 En De Data Models WMT'16 En Ro Data Models WMT'15 En De Data Models MS COCO Data Models Before you run the code Set correct path to data in data_path() function located in data.py : Loading & Decoding from Pre trained Models 1. For vocab_size , use 60000 for WMT'15 En De, 40000 for the other translation datasets and 10000 for MS COCO. 2. For params , use big for WMT'15 En De and small for the other translation datasets. Autoregressive bash $ python run.py dataset vocab_size ffw_block highway params lr_schedule anneal mode test debug load_from Non autoregressive bash $ python run.py dataset vocab_size ffw_block highway params lr_schedule anneal fast valid_repeat_dec 20 use_argmax next_dec_input both mode test remove_repeats debug trg_len_option predict use_predicted_trg_len load_from For adaptive decoding, add the flag adaptive_decoding jaccard to the above. Training New Models Autoregressive bash $ python run.py dataset vocab_size ffw_block highway params lr_schedule anneal Non autoregressive bash $ python run.py dataset vocab_size ffw_block highway params lr_schedule anneal fast valid_repeat_dec 8 use_argmax next_dec_input both denoising_prob layerwise_denoising_weight use_distillation Training the Length Prediction Model 1. Take a checkpoint pre trained non autoregressive model 2. Resume training using these in addition to the same flags used in step 1: load_from resume finetune_trg_len trg_len_option predict MS COCO dataset Run pre trained autoregressive model python run.py dataset mscoco params big load_vocab mode test n_layers 4 ffw_block highway debug load_from mscoco_models_final/ar_model batch_size 1024 Run pre trained non autoregressive model python run.py dataset mscoco params big use_argmax load_vocab mode test n_layers 4 fast ffw_block highway debug trg_len_option predict use_predicted_trg_len load_from mscoco_models_final/nar_model batch_size 1024 Train new autoregressive model python run.py dataset mscoco params big batch_size 1024 load_vocab eval_every 1000 drop_ratio 0.5 lr_schedule transformer n_layers 4 Train new non autoregressive model python run.py dataset mscoco params big use_argmax batch_size 1024 load_vocab eval_every 1000 drop_ratio 0.5 lr_schedule transformer n_layers 4 fast use_distillation ffw_block highway denoising_prob 0.5 layerwise_denoising_weight load_encoder_from mscoco_models_final/ar_model After training it, train the length predictor (set correct path in load_from argument) python run.py dataset mscoco params big use_argmax batch_size 1024 load_vocab mode train n_layers 4 fast ffw_block highway eval_every 1000 drop_ratio 0.5 drop_len_pred 0.0 lr_schedule anneal anneal_steps 100000 use_distillation load_from mscoco_models/new_nar_model trg_len_option predict finetune_trg_len max_offset 20 Citation If you find the resources in this repository useful, please consider citing: @article{Lee:18, author {Jason Lee and Elman Mansimov and Kyunghyun Cho}, title {Deterministic Non Autoregressive Neural Sequence Modeling by Iterative Refinement}, year {2018}, journal {arXiv preprint arXiv:1802.06901}, }",Machine Translation,Machine Translation 2047,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Links model distillation (e.g. nasnet & nasnet mobile) > f1_loss: focal loss: Ideas find and extract individual cells and classify those learn from training samples which have only 1 label use thresholds which achieve a similar per class distribution of the different proteins as in the validation set normalize different channels separately any predictions with no label? if not, handle this when predicting split image e.g. into 2x2 subimages, train/predict independently, combine predictions make sure that the number of predictions is within 1,x , where x is the max number of targets in the train data use sklearn's f1_score similar images between train and test set? data leak: find correlation between different classes for multi class targets in training data add lr finder mode to lr scheduling save the val/test predictions treat problem as image segmentation with green channel as mask use adap optimizer w/o fiddling around on the lr use the green layer for attention check how many combinations of predictions there are > do some grouping? check model performance when training only on the green channel Challenges class imbalance visualize/explore bce loss weights focal loss stratified train/test split (StratifiedShuffleSplit from scikit learn) stratified mini batch sampling do oversampling of classes with low frequencies (WeightedRandomSampler) overfitting add dropout add dropout2d to early conv layers apply data augmentation use leaky ReLU thresholding use per class threshold which optimizes for that class only use per class threshold which replicates the distribution of that class (and still improves global score) learn the thresholds (per class) reliance on f1score fixed threshold of 0.5 might be misleading use loss metric instead score use stronger model center loss for better discrimination attention use f1 score based loss use f1 score based loss with weights combine f1 loss with bce/focal loss prediction performance (special prize) knowledge distillation",Machine Translation,Machine Translation 2050,Natural Language Processing,Natural Language Processing,Natural Language Processing,conv nmt Implementation of multiple NMT models Based off of Facebook fairseq: see convolutional seq2seq Work in progress,Machine Translation,Machine Translation 2072,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Quick NLP Quick NLP is a deep learning nlp library inspired by the fast.ai library _ It follows the same api as fastai and extends it allowing for quick and easy running of nlp models Features Python 3.6 code Tight knit integration with Fast.ai library: Fast.ai style DataLoader objects for sentence to sentence algorithms Fast.ai style DataLoader objects for dialogue algorithms Fast.ai style DataModel objects for training nlp models Can run a seq2seq model with a few lines of code similar to existing fast.ai examples Easy to expand/train and try different models or use different data Ready made algorithms to try out Seq2Seq Seq2Seq with Attention HRED Attention is all you need Depthwise Separable Convolutions for Neural Machine Translation (TODO) Installation Installation of fast.ai library is required. Please install using the instructions here _ . It is important that the latest version of fast.ai is used and not the pip version which is not up to date. After setting up an environment using the fasta.ai instructions please clone the quick nlp repo and use pip install to install the package as follows: .. code block:: bash git clone cd quick nlp pip install . Docker Image A docker image with the latest master is available to use it please run: .. code block:: bash docker run runtime nvidia it p 8888:8888 mount type bind,source $(pwd) ,target /workspace agispof/quicknlp:latest this will mount your current directory to /workspace and start a jupyter lab session in that directory Usage Example The main goal of quick nlp is to provided the easy interface of the fast.ai library for seq2seq models. For example Lets assume that we have a dataset_path with folders for training, validation files. Each file is a tsv file where each row is two sentences separated by a tab. For example a file inside the train folder can be a eng_to_fr.tsv file with the following first few lines:: Go. Va ! Run! Cours ! Run! Courez ! Wow! Ça alors ! Fire! Au feu ! Help! À l'aide ! Jump. Saute. Stop! Ça suffit ! Stop! Stop ! Stop! Arrête toi ! Wait! Attends ! Wait! Attendez ! I see. Je comprends. loading the data from the directory is as simple as: .. code block:: python from fastai.plots import from torchtext.data import Field from fastai.core import SGD_Momentum from fastai.lm_rnn import seq2seq_reg from quicknlp import SpacyTokenizer, print_batch, S2SModelData INIT_TOKEN EOS_TOKEN DATAPATH dataset_path fields ( english , Field(init_token INIT_TOKEN, eos_token EOS_TOKEN, tokenize SpacyTokenizer('en'), lower True)), ( french , Field(init_token INIT_TOKEN, eos_token EOS_TOKEN, tokenize SpacyTokenizer('fr'), lower True)) batch_size 64 data S2SModelData.from_text_files(path DATAPATH, fields fields, train train , validation validation , source_names english , french , target_names french , bs batch_size ) Finally, to train a seq2seq model with the data we only need to do: .. code block:: python emb_size 300 nh 1024 nl 3 learner data.get_model(opt_fn SGD_Momentum(0.7), emb_sz emb_size, nhid nh, nlayers nl, bidir True, ) clip 0.3 learner.reg_fn reg_fn learner.clip clip learner.fit(2.0, wds 1e 6)",Machine Translation,Machine Translation 2075,Natural Language Processing,Natural Language Processing,Natural Language Processing,"byteNet tensorflow Join the chat at This is a tensorflow implementation of the byte net model from DeepMind's paper Neural Machine Translation in Linear Time 1 . From the abstract >The ByteNet decoder attains state of the art performance on character level language modeling and outperforms the previous best results obtained with recurrent neural networks. The ByteNet also achieves a performance on raw character level machine translation that approaches that of the best neural translation models that run in quadratic time. The implicit structure learnt by the ByteNet mirrors the expected alignments between the sequences. ByteNet Encoder Decoder Model: ! Model architecture Image Source Neural Machine Translation in Linear Time 1 paper The model applies dilated 1d convolutions on the sequential data, layer by layer to obain the source encoding. The decoder then applies masked 1d convolutions on the target sequence (conditioned by the encoder output) to obtain the next character in the target sequence.The character generation model is just the byteNet decoder, while the machine translation model is the combined encoder and decoder. Implementation Notes 1. The character generation model is defined in ByteNet/generator.py and the translation model is defined in ByteNet/translator.py . ByteNet/ops.py contains the bytenet residual block, dilated conv1d and layer normalization. 2. The model can be configured by editing model_config.py. 5. Number of residual channels 512 (Configurable in model_config.py). Requirements Python 2.7.6 Tensorflow 1.2.0 Datasets The character generation model has been trained on Shakespeare text 4 . I have included the text file in the repository Data/generator_training_data/shakespeare.txt . The machine translation model has been trained for german to english translation. You may download the news commentary dataset from here Training Create the following directories Data/tb_summaries/translator_model , Data/tb_summaries/generator_model , Data/Models/generation_model , Data/Models/translation_model . Text Generation Configure the model by editing model_config.py . Save the text files to train on, in Data/generator_training_data . A sample shakespeare.txt is included in the repo. Train the model by : python train_generator.py text_dir Data/generator_training_data python train_generator.py help for more options. Machine Translation Configure the model by editing model_config.py . Save the source and target sentences in separate files in Data/MachineTranslation . You may download the new commentary training corpus using this link 6 . The model is trained on buckets of sentence pairs of length in mutpiples of a configurable parameter bucket_quant . The sentences are padded with a special character beyond the actual length. Train translation model using: python train_translator.py source_file target_file bucket_quant 50 python train_translator.py help for more options. Generating Samples Generate new samples using : python generate.py seed SOME_TEXT_TO_START_WITH sample_size You can test sample translations from the dataset using python translate.py . This will pick random source sentences from the dataset and translate them. Sample Generations ANTONIO: What say you to this part of this to thee? KING PHILIP: What say these faith, madam? First Citizen: The king of England, the will of the state, That thou dost speak to me, and the thing that shall In this the son of this devil to the storm, That thou dost speak to thee to the world, That thou dost see the bear that was the foot, Translation Results to be updated TODO Evaluating the translation Model Implement beam search Contributors welcomed. Currently the model samples from the probability distribution from the top k most probable predictions. References Neural Machine Translation in Linear Time 1 paper Tensorflow Wavenet 2 code Sugar Tensor Source Code 7 For implementing some ops. 1 : 2 : 3 : 4 : 5 : 6 : 7 :",Machine Translation,Machine Translation 2104,Natural Language Processing,Natural Language Processing,Natural Language Processing,ImageToLaTeX (Tensorflow implementation) Converting images of mathematical formulas to their corresponding LaTeX code using end to end convolutional networks (architecture inspired from Original proposed model is evaluated by replacing the convolutional encoder and convolutional decoder by bidiectional lstm.,Machine Translation,Machine Translation 2112,Natural Language Processing,Natural Language Processing,Natural Language Processing,"pytorch transformer This repository provides a PyTorch implementation of the Transformer model that has been introduced in the paper Attention Is All You Need (Vaswani et al. 2017). Installation The easiest way to install this package is via pip: bash pip install git+ Usage python import transformer model transformer.Transformer(...) 1. Computing Predictions given a Target Sequence This is the default behaviour of a Transformer (src/main/python/transformer/transformer.py), and is implemented in its forward (src/main/python/transformer/transformer.py L205) method: python predictions model(input_seq, target_seq) 2. Evaluating the Probability of a Target Sequence The probability of an output sequence given an input sequence under an already trained model can be evaluated by means of the function eval_probability (src/main/python/transformer/transformer_tools.py L46): python probabilities transformer.eval_probability(model, input_seq, target_seq, pad_index ...) 3. Sampling an Output Sequence Sampling a random output given an input sequence under the distribution computed by a model is realized by the function sample_output (src/main/python/transformer/transformer_tools.py L115): python output_seq transformer.sample_output(model, input_seq, eos_index, pad_index, max_len) Pretraining Encoders with BERT For pretraining the encoder part of the transformer (i.e., transformer.Encoder (src/main/python/transformer/encoder.py)) with BERT (Devlin et al., 2018), the class MLMLoss (src/main/python/transformer/bert/mlm_loss.py) provides an implementation of the masked language model loss function. A full example of how to implement pretraining with BERT can be found in examples/bert_pretraining.py (examples/bert_pretraining.py). References > Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., Polosukhin, I. (2017). > Attention Is All You Need. > Preprint at > Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). > BERT: Pre training of Deep Bidirectional Transformers for Language Understanding. > Preprint at",Machine Translation,Machine Translation 2129,Natural Language Processing,Natural Language Processing,Natural Language Processing,WIP (chainer) Pervasive Attention: 2D Convolutional Neural Networks for Sequence to Sequence Prediction CircleCI A chainer implementation of paper:,Machine Translation,Machine Translation 2134,Natural Language Processing,Natural Language Processing,Natural Language Processing,"AttentionCluster This code implements attention clusters with shifting operation. It was developed on top of starters code provided by Google AI. Detailed table of contents and descriptions can be found at the original repository . The module was implemented & tested in TensorFlow 1.8.0. Attention Cluster is distributed under Apache 2 License (see the LICENCE file). Differences with the original paper The respository makes use of youtube 8m dataset. The original paper uses Flash MNIST. Empirically, I found that batch normalization layer at attention mechanism increases the convergence time & GAP. In between MoE, I used wide context gating developed from 2 . Dropout layers were actively used to prevent overfitting. This was inspired by 3 . Training Training dataset is available in Google Cloud Platform. In order to use the following command, first download GCP SDK. It is recommanded to adopt early stopping. gcloud ml engine local train package path youtube 8m module name youtube 8m.train train_data_pattern 'gs://youtube8m ml us east1/2/frame/train/train .tfrecord' frame_features True base_learning_rate 0.0002 model AttentionClusterModule feature_names 'rgb,audio' feature_sizes '1024,128' batch_size 128 train_dir AttentionClusterModule base_learning_rate 0.0002 runtime version 1.8 video_cluster_size 128 audio_cluster_size 16 shift_operation True filter_size 2 cluster_dropout 0.7 ff_dropout 0.8 hidden_size 512 moe_num_mixtures 2 learning_rate_decay_examples 2000000 learning_rate_decay 0.85 num_epochs 4 moe_l2 1e 6 max_step 400000 Evaluation Validation / Test dataset is also available in Google Cloud Platform. With this parameter settings, I was able to acheive 86.8 GAP on test data. gcloud ml engine local train package path youtube 8m module name youtube 8m.eval eval_data_pattern 'gs://youtube8m ml us east1/2/frame/validate/validate .tfrecord' frame_features True model AttentionClusterModule feature_names 'rgb,audio' feature_sizes '1024,128' batch_size 128 train_dir AttentionClusterModule base_learning_rate 0.0002 run_once True video_cluster_size 128 audio_cluster_size 16 shift_operation True filter_size 2 cluster_dropout 0.7 ff_dropout 0.8 hidden_size 512 moe_num_mixtures 2 learning_rate_decay_examples 2000000 learning_rate_decay 0.85 num_epochs 4 moe_l2 1e 6 max_step 400000 References Please note that I am not the author of the following references. 1 2 3 Changes 1.00 (05 August 2018) Initial public release Contributors Juhan Bae",Machine Translation,Machine Translation 2135,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Introduction This is NPMT, the source codes of Towards Neural Phrase based Machine Translation and Sequence Modeling via Segmentations from Microsoft Research. It is built on top of the fairseq toolkit in Torch . We present the setup and Neural Machine Translation (NMT) experiments in Towards Neural Phrase based Machine Translation . NPMT Neural Phrase based Machine Translation (NPMT) explicitly models the phrase structures in output sequences using Sleep WAke Networks (SWAN), a recently proposed segmentation based sequence modeling method. To mitigate the monotonic alignment requirement of SWAN, we introduce a new layer to perform (soft) local reordering of input sequences. Different from existing neural machine translation (NMT) approaches, NPMT does not use attention based decoding mechanisms. Instead, it directly outputs phrases in a sequential order and can decode in linear time. Model architecture ! Example (npmt.png) An illustration of using NPMT in German English translation ! Example (de en_example.png) Please refer to the PR for our implementations. Our implementation is based on the lastest version of fairseq. Citation If you use the code in your paper, then please cite it as: @article{pshuang2018NPMT, author {Po{ }Sen Huang and Chong Wang and Sitao Huang and Dengyong Zhou and Li Deng}, title {Towards Neural Phrase based Machine Translation}, journal {CoRR}, volume {abs/1706.05565}, year {2017}, url { archivePrefix {arXiv}, eprint {1706.05565}, } and @inproceedings{wang2017SWAN, author {Chong Wang and Yining Wang and Po{ }Sen Huang and Abdelrahman Mohamed and Dengyong Zhou and Li Deng}, title {Sequence Modeling via Segmentations}, booktitle {Proceedings of the 34th International Conference on Machine Learning, {ICML} 2017, Sydney, NSW, Australia, 6 11 August 2017}, pages {3674 3683}, year {2017}, } Requirements and Installation A computer running macOS or Linux For training new models, you'll also need a NVIDIA GPU and NCCL A Torch installation . For maximum speed, we recommend using LuaJIT and Intel MKL . A recent version nn . The minimum required version is from May 5th, 2017. A simple luarocks install nn is sufficient to update your locally installed version. Install fairseq by cloning the GitHub repository and running luarocks make rocks/fairseq scm 1.rockspec LuaRocks will fetch and build any additional dependencies that may be missing. In order to install the CPU only version (which is only useful for translating new data with an existing model), do luarocks make rocks/fairseq cpu scm 1.rockspec The LuaRocks installation provides a command line tool that includes the following functionality: fairseq preprocess : Data pre processing: build vocabularies and binarize training data fairseq train : Train a new model on one or multiple GPUs fairseq generate : Translate pre processed data with a trained model fairseq generate lines : Translate raw text with a trained model fairseq score : BLEU scoring of generated translations against reference translations fairseq tofloat : Convert a trained model to a CPU model fairseq optimize fconv : Optimize a fully convolutional model for generation. This can also be achieved by passing the fconvfast flag to the generation scripts. Quick Start Training a New Model Data Pre processing The fairseq source distribution contains an example pre processing script for the IWSLT14 German English corpus. Pre process and binarize the data as follows: $ cd data/ $ bash prepare iwslt14.sh $ cd .. $ TEXT data/iwslt14.tokenized.de en $ fairseq preprocess sourcelang de targetlang en \ trainpref $TEXT/train validpref $TEXT/valid testpref $TEXT/test \ thresholdsrc 3 thresholdtgt 3 destdir data bin/iwslt14.tokenized.de en This will write binarized data that can be used for model training to data bin/iwslt14.tokenized.de en. We also provide an example of pre processing script for the IWSLT15 English Vietnamese corpus. Pre process and binarize the data as follows: $ cd data/ $ bash prepare iwslt15.sh $ cd .. $ TEXT data/iwslt15 $ fairseq preprocess sourcelang en targetlang vi \ trainpref $TEXT/train validpref $TEXT/tst2012 testpref $TEXT/tst2013 \ thresholdsrc 5 thresholdtgt 5 destdir data bin/iwslt15.tokenized.en vi Training Use fairseq train to train a new model. Here a few example settings that work well for the IWSLT14, IWSLT15 datasets: NPMT model (IWSLT DE EN) $ mkdir p trainings/iwslt_de_en $ fairseq train sourcelang de targetlang en datadir data bin/iwslt14.tokenized.de en \ model npmt nhid 256 dec_unit_size 512 dropout .5 dropout_hid 0 npmt_dropout .5 \ optim adam lr 0.001 batchsize 32 log_interval 100 nlayer 2 nenclayer 2 kwidth 7 \ max_segment_len 6 rnn_mode GRU group_size 500 use_resnet_enc use_resnet_dec log momentum 0.99 clip 10 maxbatch 600 bptt 0 maxepoch 100 ndatathreads 4 seed 1002 maxsourcelen 75 num_lower_win_layers 1 save_interval 250 use_accel noearlystop \ validbleu lrshrink 1.25 minepochtoanneal 18 annealing_type slow \ savedir trainings/iwslt_de_en NPMT model (IWSLT EN DE) $ mkdir p trainings/iwslt_en_de $ fairseq train sourcelang en targetlang de datadir data bin/iwslt14.tokenized.en de \ model npmt nhid 256 dec_unit_size 512 dropout .5 dropout_hid 0 npmt_dropout .5 \ optim adam lr 0.001 batchsize 32 log_interval 100 nlayer 2 nenclayer 2 kwidth 7 \ max_segment_len 6 rnn_mode GRU group_size 500 use_resnet_enc use_resnet_dec \ log momentum 0.99 clip 10 maxbatch 800 bptt 0 maxepoch 100 ndatathreads 4 \ seed 1002 maxsourcelen 75 num_lower_win_layers 1 save_interval 250 use_accel \ noearlystop validbleu lrshrink 1.25 minepochtoanneal 15 \ annealing_type slow savedir trainings/iwslt_en_de NPMT model (IWSLT EN VI) $ mkdir p trainings/iwslt_en_vi $ fairseq train sourcelang en targetlang vi datadir data bin/iwslt15.tokenized.en vi \ model npmt nhid 512 dec_unit_size 512 dropout .4 dropout_hid 0 npmt_dropout .4 \ optim adam lr 0.001 batchsize 48 log_interval 100 nlayer 3 nenclayer 2 kwidth 7 \ max_segment_len 7 rnn_mode LSTM group_size 800 use_resnet_enc use_resnet_dec log \ momentum 0.99 clip 500 maxbatch 800 bptt 0 maxepoch 50 ndatathreads 4 seed 1002 \ maxsourcelen 75 num_lower_win_layers 1 save_interval 250 use_accel noearlystop \ validbleu nembed 512 lrshrink 1.25 minepochtoanneal 8 annealing_type slow \ savedir trainings/iwslt_en_vi By default, fairseq train will use all available GPUs on your machine. Use the CUDA_VISIBLE_DEVICES environment variable to select specific GPUs or ngpus to change the number of GPU devices that will be used. Generation Once your model is trained, you can translate with it using fairseq generate (for binarized data) or fairseq generate lines (for text). Here, we'll do it for a NPMT model: Translate some text $ DATA data bin/iwslt14.tokenized.de en $ fairseq generate lines sourcedict $DATA/dict.de.th7 targetdict $DATA/dict.en.th7 \ path trainings/iwslt_de_en/model_bestbleu.th7 beam 1 model npmt target Dictionary: 22823 types source Dictionary: 32010 types > danke , aber das beste kommt noch . max decoding: 1:184 1:15 2:4 3:28 4:6 4:282 6:16 6:201 6:311 8:5 avg. phrase size 1.666667 S danke , aber das beste kommt noch . O danke , aber das beste kommt noch . H 0.10934638977051 thank you , but the best is still coming . A 1 where the max decoding suggests the output segments are thank you , but the best is still coming . , and avg. phrase size represents the average phrase length 10/6 1.666667 . Generation with the binarized test sets can be run as follows (not in batched mode), e.g. for German English: $ fairseq generate sourcelang de targetlang en datadir data bin/iwslt14.tokenized.de en \ path trainings/iwslt_de_en/model_bestbleu.th7 beam 10 lenpen 1 dataset test model npmt tee /tmp/gen.out ... Translated 6750 sentences (137891 tokens) in 3013.7s (45.75 tokens/s) Timings: setup 10.7s (0.4%), encoder 28.2s (0.9%), decoder 2747.9s (91.2%), search_results 0.0s (0.0%), search_prune 0.0s (0.0%) BLEU4 29.92, 64.7/37.9/23.8/15.3 (BP 0.973, ratio 1.027, sys_len 127660, ref_len 131141) Word level BLEU scoring: $ grep ^H /tmp/gen.out cut f3 sed 's/@@ //g' > /tmp/gen.out.sys $ grep ^T /tmp/gen.out cut f2 sed 's/@@ //g' > /tmp/gen.out.ref $ fairseq score sys /tmp/gen.out.sys ref /tmp/gen.out.ref BLEU4 29.92, 64.7/37.9/23.8/15.3 (BP 0.973, ratio 1.027, sys_len 127660, ref_len 131141) License fairseq is BSD licensed. The released codes modified the original fairseq are BSD licensed. The rest of the codes are MIT licensed.",Machine Translation,Machine Translation 2162,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Introduction Fairseq( py) is a sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling and other text generation tasks. It provides reference implementations of various sequence to sequence models, including: Convolutional Neural Networks (CNN) Dauphin et al. (2017): Language Modeling with Gated Convolutional Networks (examples/language_model/conv_lm/README.md) Gehring et al. (2017): Convolutional Sequence to Sequence Learning (examples/conv_seq2seq/README.md) Edunov et al. (2018): Classical Structured Prediction Losses for Sequence to Sequence Learning Fan et al. (2018): Hierarchical Neural Story Generation (examples/stories/README.md) LightConv and DynamicConv models _New_ Wu et al. (2019): Pay Less Attention with Lightweight and Dynamic Convolutions (examples/pay_less_attention_paper/README.md) Long Short Term Memory (LSTM) networks Luong et al. (2015): Effective Approaches to Attention based Neural Machine Translation Wiseman and Rush (2016): Sequence to Sequence Learning as Beam Search Optimization Transformer (self attention) networks Vaswani et al. (2017): Attention Is All You Need Ott et al. (2018): Scaling Neural Machine Translation (examples/scaling_nmt/README.md) Edunov et al. (2018): Understanding Back Translation at Scale (examples/backtranslation/README.md) _New_ Baevski and Auli (2018): Adaptive Input Representations for Neural Language Modeling (examples/language_model/transformer_lm/README.md) _New_ Shen et al. (2019): Mixture Models for Diverse Machine Translation: Tricks of the Trade (examples/translation_moe/README.md) Fairseq features: multi GPU (distributed) training on one machine or across multiple machines fast generation on both CPU and GPU with multiple search algorithms implemented: beam search Diverse Beam Search ( Vijayakumar et al., 2016 ) sampling (unconstrained and top k) large mini batch training even on a single GPU via delayed updates fast half precision floating point (FP16) training extensible: easily register new models, criterions, tasks, optimizers and learning rate schedulers We also provide pre trained models ( pre trained models and examples) for several benchmark translation and language modeling datasets. ! Model (fairseq.gif) Requirements and Installation PyTorch version > 1.0.0 Python version > 3.6 For training new models, you'll also need an NVIDIA GPU and NCCL Please follow the instructions here to install PyTorch: If you use Docker make sure to increase the shared memory size either with ipc host or shm size as command line options to nvidia docker run . After PyTorch is installed, you can install fairseq with pip : pip install fairseq Installing from source To install fairseq from source and develop locally: git clone cd fairseq pip install editable . Improved training speed Training speed can be further improved by installing NVIDIA's apex library with the cuda_ext option. fairseq will automatically switch to the faster modules provided by apex. Getting Started The full documentation contains instructions for getting started, training new models and extending fairseq with new model types and tasks. Pre trained models and examples We provide pre trained models and pre processed, binarized test sets for several tasks listed below, as well as example training and evaluation commands. Translation (examples/translation/README.md): convolutional and transformer models are available Language Modeling (examples/language_model/README.md): convolutional models are available We also have more detailed READMEs to reproduce results from specific papers: Shen et al. (2019) Mixture Models for Diverse Machine Translation: Tricks of the Trade (examples/translation_moe/README.md) Wu et al. (2019): Pay Less Attention with Lightweight and Dynamic Convolutions (examples/pay_less_attention_paper/README.md) Edunov et al. (2018): Understanding Back Translation at Scale (examples/backtranslation/README.md) Edunov et al. (2018): Classical Structured Prediction Losses for Sequence to Sequence Learning Fan et al. (2018): Hierarchical Neural Story Generation (examples/stories/README.md) Ott et al. (2018): Scaling Neural Machine Translation (examples/scaling_nmt/README.md) Gehring et al. (2017): Convolutional Sequence to Sequence Learning (examples/conv_seq2seq/README.md) Dauphin et al. (2017): Language Modeling with Gated Convolutional Networks (examples/language_model/conv_lm/README.md) Join the fairseq community Facebook page: Google group: License fairseq( py) is BSD licensed. The license applies to the pre trained models as well. We also provide an additional patent grant. Citation Please cite as: bibtex @inproceedings{ott2019fairseq, title {fairseq: A Fast, Extensible Toolkit for Sequence Modeling}, author {Myle Ott and Sergey Edunov and Alexei Baevski and Angela Fan and Sam Gross and Nathan Ng and David Grangier and Michael Auli}, booktitle {Proceedings of NAACL HLT 2019: Demonstrations}, year {2019}, }",Machine Translation,Machine Translation 2182,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Sarcasm Detection it's fault Sarcasm is a form of verbal irony that is intended to express contempt or ridicule. Relying on the shared knowledge between the speaker and his audience, sarcasm requires wit to understand and wit to produce. In our daily interactions, we use gestures and mimics, intonation and prosody to hint the sarcastic intent. Since we do not have access to such paralinguistic cues, detecting sarcasm in written text is a much harder task. I investigated various methods to detect sarcasm in tweets, using both traditional machine learning (SVMs and Logistic Regressors on discrete features) and deep learning models (CNNs, LSTMs, GRUs, Bi directional LSTMs and attention based LSTMs), evaluating them on 4 different Twitter datasets (details in res/ (res)). This research project was completed in partial fulfilment of the requirements for the degree of Bachelor of Science in Computer Science at the University of Manchester and under the careful supervision of Mr John McNaught, my tutor and mentor. The overall project achievements are explained in this video Overview src/ (src) contains all the source code used to process, analyse, train and evaluate the datasets (as described in res/ (res)) in order to investigate sarcasm detection on Twitter data res/ (res) contains both the raw and processed datasets as well as some useful vocabularies, lists or selections of words/emojis that proved very useful in pre processing the data models/ (models) contains all the pre trained models contributing to the achievement of the claimed results as well as all the trained models, saved after training under the described parameters and DL architectures plots/ (plots) contains a collection of interesting plots that should be useful in analysing and sustaining the results obtained stats/ (stats) contains some comparisons between preprocessing phases as well as some raw statistical results collected while training/evaluating images/ (images) contains the visualizations obtained and some pictures of the architectures or models used in the report or screencast Dependencies The code included in this repository has been tested to work with Python 3.5 on an Ubuntu 16.04 machine, using Keras 2.0.8 with Tensorflow as the backend. List of requirements Python 3.5 Keras 2.0 Tensorflow 1.3 gensim 3.0 numpy 1.13 scikit learn h5py emoji tqdm pandas itertools matplotlib Installation and running 1. Clone the repository and make sure that all the dependencies listed above are installed. 2. Download all the resources from here and place them in the res/ directory 3. Download the pre trained models from here and place them in the models/ directory 4. Go to the src/ directory 5. For a thorough feature analysis, run: bash python feature_analysis.py 6. For training and evaluating a traditional machine learning model, run: bash python ml_models.py 7. For training and evaluating the embeddings (word and/or emojis/deepmojis), run: bash python embeddings_model.py 8. For training and evaluating various deep learning models, quickly implemented in Keras, run: bash python dl_models.py 9. For training and evaluating the attention based LSTM model implemented in TensorFlow, run: bash python tf_attention.py By default, the dataset collected by Ghosh and Veale (2016) is used, but this can be easily replaced by changing the dataset parameter in the code (as for all other parameters). Results Here are the results obtained on the considered datasets. ! Results (images/contrast_results_dl_model.png) Visualizations You can obtain a nice visualization of a deep layer by extracting the final weights and colour the hidden units distinctively. Running either of the two files below will produce a .html file in plots/html_visualizations/ . LSTM visualization Visualize the LSTM weights for a selected example in the test set after you have trained the model (here we use a simpler architecture with fewer hidden units and no stacked LSTMs in order to visualize anything sensible). Excitatory units (weight > 0) are coloured in a reddish colour while inhibitory units (weight < 0) in a bluish colour. Colour gradients are used to distinguish the heavy from the weak weights. Run: bash python src/visualize_hidden_units.py In the sample visualization given below, doctor , late and even lame have heavier weights and therefore are contributing more to sarcasm recognition (since they receive more attention). Historically, we know that going to the doctor is regarded as an undesirable activity (so it is subject to strong sarcastic remarks) while late and lame are sentiment bearing expressions, confirming previous results about sarcastic cues in written and spoken language. ! LSTM visualization (images/lstm_vis1.png) ! LSTM visualization (images/lstm_vis3.png) Other visualizations are available in images/ Attention visualization Visualize the attention words over the whole (or a selection of the) test set after you have trained the model. The network is paying attention to some specific words (supposedly, those who contribute more towards a sarcasm decision being made). A reddish colour is used to emphasize attention weights while colour gradients are used to distinguish the heavy from the weak weights. Run: bash python src/visualize_tf_attention.py In the sample visualization given below, strong sentiment bearing words, stereotypical topics, emojis, punctuation, numerals and sometimes slang or ungrammatical words are receiving more attention from the network and therefore are contributing more to sarcasm recognition. ! Attention visualization (images/attention_vis0.png) Disclaimer The purpose of this project was not to produce the most optimally efficient code, but to draw some useful conclusions about sarcasm detection in written text (specifically, for Twitter data). However, it is not disastrously inefficient actually, it should be fast enough for most purposes. Although the code has been verified and reviewed, I cannot guarantee that there are absolutely no bugs or faults so use the code on your own responsibility. License The source code and all my pre trained models are licensed under the MIT license. References 1 Ellen Riloff, Ashequl Qadir, Prafulla Surve, Lalindra De Silva, Nathan Gilbert, and Ruihong Huang. 2013. Sarcasm as contrast between a positive sentiment and negative situation. In EMNLP, volume 13, pages 704–714. 2 Aniruddha Ghosh and Tony Veale. 2016. Fracking Sarcasm using Neural Network. 7th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis (WASSA 2016). NAACL HLT. 3 Tomas Ptacek, Ivan Habernal, and Jun Hong. 2014. Sarcasm detection on Czech and English Twitter. In COLING, pages 213–223. 4 Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781. 5 Jeffrey Pennington, Richard Socher, and Christopher D. Manning. GloVe: Global Vectors for Word Representation, in Proceedings of the 2014 Conference on Empirical Methods In Natural Language Processing (EMNLP 2014), October 2014. 6 Ben Eisner, Tim Rocktäschel, Isabelle Augenstein, Matko Bošnjak, and Sebastian Riedel. “emoji2vec: Learning Emoji Representations from their Description,” in Proceedings of the 4th International Workshop on Natural Language Processing for Social Media at EMNLP 2016 (SocialNLP at EMNLP 2016), November 2016. 7 Sepp Hochreiter and Jurgen Schmidhuber. Long short term memory. 1997. In Neural Computation 9(8):1735 80. 8 Dzmitry Bahdanau, KyungHyun Cho and Yoshua Bengio. 2016. Neural Machine Translation by Jointly Learning to Align and Translate. arXiv:1409.0473v7",Machine Translation,Machine Translation 2191,Natural Language Processing,Natural Language Processing,Natural Language Processing,sru naive A naive implementation of Simple Recurrent Unit in paper Training RNNs as Fast as CNNs,Machine Translation,Machine Translation 2219,Natural Language Processing,Natural Language Processing,Natural Language Processing,"LSTMs This repository includes the tests with the Blocksworld environment and LSTMs Based on the code of DQN from dennybritz Blocksworld environment on OpenAI Gym I implemented the interface for OpenAI Gym environments for the BlocksWorld environment based on the work of Slaney and Thiébaux Follow these steps in order to configure the environment: Identify your gym installation by running python import gym print (gym.__file__) Copy the file /Blocksworld/classic_control/blocksworld.py to the folder gym/envs/classic_control Edit the file gym/envs/__init__.py and add the lines python register( id 'BlocksWorld v0', entry_point 'gym.envs.classic_control:BlocksWorldEnv', max_episode_steps 20000, ) Edit the file gym/envs/classic_control/__init__.py and add the following line: python from gym.envs.classic_control.blocksworld import BlocksWorldEnv Go to the folder Blocksworld/GENERATOR/bwstates.1 and compile the binary 'bwstates'. Alternatively download the file bwstates.1.tar.gz and compile for your platform. Edit the file blocksworld.py and configure the path to the binary file 'bwstates' Visualize tensorflow learning tensorboard logdir ./BlocksWorld v0/ About cell states, hidden states and outputs The naming conventions are quite confusing in the LSTM implementations I found. As per tensorflow reference of the BasicLSTMCell I've followed the conventions in Basically: The cell state is the same as the hidden state (named c) The outputs are named with h How to use the LSTMVis module See Once the file DQN_LSTM_BlocksWorld.py has completed all the episodes, a folder called /lstmvis will be created inside the experiment folder. This folder contains the files lstm.yml states.hdf5 train.dict train.hdf5 Guide to the files in this repository blocksworld.py : Implementation of the BlocksWorld environment based on OpenAI Gym. DQN_BlocksWorld.ipynb : Implementation of DQN solving the BlocksWorld environment (toy problem with 2 blocks). DQN_LSTM_BlocksWorld.py : Implementation of DQN using LSTM. TODO",Machine Translation,Machine Translation 2237,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Actor Critic for Sequence Prediction The reference implementation for the paper _An Actor Critic Algorithm for Sequence Prediction_ ( openreview , submitted to ICLR 2017) by Dzmitry Bahdanau, Philemon Brakel, Kelvin Xu, Anirudh Goyal, Ryan Lowe, Joelle Pineau, Aaron Courville, Yoshua Bengio Note, that in fact it is a heavily modified speech recognizer , so please do not be surprised by the presense of speech related terms in the code. The code is provided only for replication purposes, further development is not planned. If you have questions, please contact Dzmitry Bahdanau or just create an issue here. How to use install all the dependencies (see the list below) set your environment variables by calling source env.sh for training use $LVSR/bin/run.py train for testing use $LVSR/bin/run.py search Please proceed to exp/ted (exp/ted/README.md) for the instructions how to replicate our machine translation results on TED data, or to exp/billion_words (exp/billion_words/README.md) in order to run our spelling correction experiments. Dependencies Python packages: pykwalify, toposort, pyyaml, numpy, pandas, picklable itertools Theano 0.9, the old gpu Backend (a.k.a. device gpu) blocks , commit d8b7ffbdda68b4e2ca3c1a2984964285cb1cb709 blocks extras , commit 0cefaa3a8a372c551fc3b0df02d5d4f105767d9f fuel , commit 42e21a25ed248739e5fe75b9e4193c749979ba57 License MIT",Machine Translation,Machine Translation 2253,Natural Language Processing,Natural Language Processing,Natural Language Processing,"BERT \ \ \ \ \ New November 23rd, 2018: Un normalized multilingual model + Thai + Mongolian \ \ \ \ \ We uploaded a new multilingual model which does not perform any normalization on the input (no lower casing, accent stripping, or Unicode normalization), and additionally includes Thai and Mongolian. It is recommended to use this version for developing multilingual models, especially on languages with non Latin alphabets. This does not require any code changes, and can be downloaded here: BERT Base, Multilingual Cased : 104 languages, 12 layer, 768 hidden, 12 heads, 110M parameters \ \ \ \ \ New November 15th, 2018: SOTA SQuAD 2.0 System \ \ \ \ \ We released code changes to reproduce our 83% F1 SQuAD 2.0 system, which is currently 1st place on the leaderboard by 3%. See the SQuAD 2.0 section of the README for details. \ \ \ \ \ New November 5th, 2018: Third party PyTorch and Chainer versions of BERT available \ \ \ \ \ NLP researchers from HuggingFace made a PyTorch version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. Sosuke Kobayashi also made a Chainer version of BERT available (Thanks!) We were not involved in the creation or maintenance of the PyTorch implementation so please direct any questions towards the authors of that repository. \ \ \ \ \ New November 3rd, 2018: Multilingual and Chinese models available \ \ \ \ \ We have made two new BERT models available: BERT Base, Multilingual : 102 languages, 12 layer, 768 hidden, 12 heads, 110M parameters BERT Base, Chinese : Chinese Simplified and Traditional, 12 layer, 768 hidden, 12 heads, 110M parameters We use character based tokenization for Chinese, and WordPiece tokenization for all other languages. Both models should work out of the box without any code changes. We did update the implementation of BasicTokenizer in tokenization.py to support Chinese character tokenization, so please update if you forked it. However, we did not change the tokenization API. For more, see the Multilingual README . \ \ \ \ \ End new information \ \ \ \ \ Introduction BERT , or B idirectional E ncoder R epresentations from T ransformers, is a new method of pre training language representations which obtains state of the art results on a wide array of Natural Language Processing (NLP) tasks. Our academic paper which describes BERT in detail and provides full results on a number of tasks can be found here: To give a few numbers, here are the results on the SQuAD v1.1 question answering task: SQuAD v1.1 Leaderboard (Oct 8th 2018) Test EM Test F1 : : : : 1st Place Ensemble BERT 87.4 93.2 2nd Place Ensemble nlnet 86.0 91.7 1st Place Single Model BERT 85.1 91.8 2nd Place Single Model nlnet 83.5 90.1 And several natural language inference tasks: System MultiNLI Question NLI SWAG : : : : : : BERT 86.7 91.1 86.3 OpenAI GPT (Prev. SOTA) 82.2 88.1 75.0 Plus many other tasks. Moreover, these results were all obtained with almost no task specific neural network architecture design. If you already know what BERT is and you just want to get started, you can download the pre trained models ( pre trained models) and run a state of the art fine tuning ( fine tuning with bert) in only a few minutes. What is BERT? BERT is a method of pre training language representations, meaning that we train a general purpose language understanding model on a large text corpus (like Wikipedia), and then use that model for downstream NLP tasks that we care about (like question answering). BERT outperforms previous methods because it is the first unsupervised , deeply bidirectional system for pre training NLP. Unsupervised means that BERT was trained using only a plain text corpus, which is important because an enormous amount of plain text data is publicly available on the web in many languages. Pre trained representations can also either be context free or contextual , and contextual representations can further be unidirectional or bidirectional . Context free models such as word2vec or GloVe generate a single word embedding representation for each word in the vocabulary, so bank would have the same representation in bank deposit and river bank . Contextual models instead generate a representation of each word that is based on the other words in the sentence. BERT was built upon recent work in pre training contextual representations — including Semi supervised Sequence Learning , Generative Pre Training , ELMo , and ULMFit — but crucially these models are all unidirectional or shallowly bidirectional . This means that each word is only contextualized using the words to its left (or right). For example, in the sentence I made a bank deposit the unidirectional representation of bank is only based on I made a but not deposit . Some previous work does combine the representations from separate left context and right context models, but only in a shallow manner. BERT represents bank using both its left and right context — I made a ... deposit — starting from the very bottom of a deep neural network, so it is deeply bidirectional . BERT uses a simple approach for this: We mask out 15% of the words in the input, run the entire sequence through a deep bidirectional Transformer encoder, and then predict only the masked words. For example: Input: the man went to the MASK1 . he bought a MASK2 of milk. Labels: MASK1 store; MASK2 gallon In order to learn relationships between sentences, we also train on a simple task which can be generated from any monolingual corpus: Given two sentences A and B , is B the actual next sentence that comes after A , or just a random sentence from the corpus? Sentence A: the man went to the store . Sentence B: he bought a gallon of milk . Label: IsNextSentence Sentence A: the man went to the store . Sentence B: penguins are flightless . Label: NotNextSentence We then train a large model (12 layer to 24 layer Transformer) on a large corpus (Wikipedia + BookCorpus ) for a long time (1M update steps), and that's BERT. Using BERT has two stages: Pre training and fine tuning . Pre training is fairly expensive (four days on 4 to 16 Cloud TPUs), but is a one time procedure for each language (current models are English only, but multilingual models will be released in the near future). We are releasing a number of pre trained models from the paper which were pre trained at Google. Most NLP researchers will never need to pre train their own model from scratch. Fine tuning is inexpensive. All of the results in the paper can be replicated in at most 1 hour on a single Cloud TPU, or a few hours on a GPU, starting from the exact same pre trained model. SQuAD, for example, can be trained in around 30 minutes on a single Cloud TPU to achieve a Dev F1 score of 91.0%, which is the single system state of the art. The other important aspect of BERT is that it can be adapted to many types of NLP tasks very easily. In the paper, we demonstrate state of the art results on sentence level (e.g., SST 2), sentence pair level (e.g., MultiNLI), word level (e.g., NER), and span level (e.g., SQuAD) tasks with almost no task specific modifications. What has been released in this repository? We are releasing the following: TensorFlow code for the BERT model architecture (which is mostly a standard Transformer architecture). Pre trained checkpoints for both the lowercase and cased version of BERT Base and BERT Large from the paper. TensorFlow code for push button replication of the most important fine tuning experiments from the paper, including SQuAD, MultiNLI, and MRPC. All of the code in this repository works out of the box with CPU, GPU, and Cloud TPU. Pre trained models We are releasing the BERT Base and BERT Large models from the paper. Uncased means that the text has been lowercased before WordPiece tokenization, e.g., John Smith becomes john smith . The Uncased model also strips out any accent markers. Cased means that the true case and accent markers are preserved. Typically, the Uncased model is better unless you know that case information is important for your task (e.g., Named Entity Recognition or Part of Speech tagging). These models are all released under the same license as the source code (Apache 2.0). For information about the Multilingual and Chinese model, see the Multilingual README . When using a cased model, make sure to pass do_lower False to the training scripts. (Or pass do_lower_case False directly to FullTokenizer if you're using your own script.) The links to the models are here (right click, 'Save link as...' on the name): BERT Base, Uncased : 12 layer, 768 hidden, 12 heads, 110M parameters BERT Large, Uncased : 24 layer, 1024 hidden, 16 heads, 340M parameters BERT Base, Cased : 12 layer, 768 hidden, 12 heads , 110M parameters BERT Large, Cased : 24 layer, 1024 hidden, 16 heads, 340M parameters BERT Base, Multilingual Cased (New, recommended) : 104 languages, 12 layer, 768 hidden, 12 heads, 110M parameters BERT Base, Multilingual Uncased (Orig, not recommended) : 102 languages, 12 layer, 768 hidden, 12 heads, 110M parameters BERT Base, Chinese : Chinese Simplified and Traditional, 12 layer, 768 hidden, 12 heads, 110M parameters Each .zip file contains three items: A TensorFlow checkpoint ( bert_model.ckpt ) containing the pre trained weights (which is actually 3 files). A vocab file ( vocab.txt ) to map WordPiece to word id. A config file ( bert_config.json ) which specifies the hyperparameters of the model. Fine tuning with BERT Important : All results on the paper were fine tuned on a single Cloud TPU, which has 64GB of RAM. It is currently not possible to re produce most of the BERT Large results on the paper using a GPU with 12GB 16GB of RAM, because the maximum batch size that can fit in memory is too small. We are working on adding code to this repository which allows for much larger effective batch size on the GPU. See the section on out of memory issues ( out of memory issues) for more details. This code was tested with TensorFlow 1.11.0. It was tested with Python2 and Python3 (but more thoroughly with Python2, since this is what's used internally in Google). The fine tuning examples which use BERT Base should be able to run on a GPU that has at least 12GB of RAM using the hyperparameters given. Fine tuning with Cloud TPUs Most of the examples below assumes that you will be running training/evaluation on your local machine, using a GPU like a Titan X or GTX 1080. However, if you have access to a Cloud TPU that you want to train on, just add the following flags to run_classifier.py or run_squad.py : use_tpu True \ tpu_name $TPU_NAME Please see the Google Cloud TPU tutorial for how to use Cloud TPUs. Alternatively, you can use the Google Colab notebook BERT FineTuning with Cloud TPUs . On Cloud TPUs, the pretrained model and the output directory will need to be on Google Cloud Storage. For example, if you have a bucket named some_bucket , you might use the following flags instead: output_dir gs://some_bucket/my_output_dir/ The unzipped pre trained model files can also be found in the Google Cloud Storage folder gs://bert_models/2018_10_18 . For example: export BERT_BASE_DIR gs://bert_models/2018_10_18/uncased_L 12_H 768_A 12 Sentence (and sentence pair) classification tasks Before running this example you must download the GLUE data by running this script and unpack it to some directory $GLUE_DIR . Next, download the BERT Base checkpoint and unzip it to some directory $BERT_BASE_DIR . This example code fine tunes BERT Base on the Microsoft Research Paraphrase Corpus (MRPC) corpus, which only contains 3,600 examples and can fine tune in a few minutes on most GPUs. shell export BERT_BASE_DIR /path/to/bert/uncased_L 12_H 768_A 12 export GLUE_DIR /path/to/glue python run_classifier.py \ task_name MRPC \ do_train true \ do_eval true \ data_dir $GLUE_DIR/MRPC \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ max_seq_length 128 \ train_batch_size 32 \ learning_rate 2e 5 \ num_train_epochs 3.0 \ output_dir /tmp/mrpc_output/ You should see output like this: Eval results eval_accuracy 0.845588 eval_loss 0.505248 global_step 343 loss 0.505248 This means that the Dev set accuracy was 84.55%. Small sets like MRPC have a high variance in the Dev set accuracy, even when starting from the same pre training checkpoint. If you re run multiple times (making sure to point to different output_dir ), you should see results between 84% and 88%. A few other pre trained models are implemented off the shelf in run_classifier.py , so it should be straightforward to follow those examples to use BERT for any single sentence or sentence pair classification task. Note: You might see a message Running train on CPU . This really just means that it's running on something other than a Cloud TPU, which includes a GPU. Prediction from classifier Once you have trained your classifier you can use it in inference mode by using the do_predict true command. You need to have a file named test.tsv in the input folder. Output will be created in file called test_results.tsv in the output folder. Each line will contain output for each sample, columns are the class probabilities. shell export BERT_BASE_DIR /path/to/bert/uncased_L 12_H 768_A 12 export GLUE_DIR /path/to/glue export TRAINED_CLASSIFIER /path/to/fine/tuned/classifier python run_classifier.py \ task_name MRPC \ do_predict true \ data_dir $GLUE_DIR/MRPC \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $TRAINED_CLASSIFIER \ max_seq_length 128 \ output_dir /tmp/mrpc_output/ SQuAD 1.1 The Stanford Question Answering Dataset (SQuAD) is a popular question answering benchmark dataset. BERT (at the time of the release) obtains state of the art results on SQuAD with almost no task specific network architecture modifications or data augmentation. However, it does require semi complex data pre processing and post processing to deal with (a) the variable length nature of SQuAD context paragraphs, and (b) the character level answer annotations which are used for SQuAD training. This processing is implemented and documented in run_squad.py . To run on SQuAD, you will first need to download the dataset. The SQuAD website does not seem to link to the v1.1 datasets any longer, but the necessary files can be found here: train v1.1.json dev v1.1.json evaluate v1.1.py Download these to some directory $SQUAD_DIR . The state of the art SQuAD results from the paper currently cannot be reproduced on a 12GB 16GB GPU due to memory constraints (in fact, even batch size 1 does not seem to fit on a 12GB GPU using BERT Large ). However, a reasonably strong BERT Base model can be trained on the GPU with these hyperparameters: shell python run_squad.py \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ do_train True \ train_file $SQUAD_DIR/train v1.1.json \ do_predict True \ predict_file $SQUAD_DIR/dev v1.1.json \ train_batch_size 12 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir /tmp/squad_base/ The dev set predictions will be saved into a file called predictions.json in the output_dir : shell python $SQUAD_DIR/evaluate v1.1.py $SQUAD_DIR/dev v1.1.json ./squad/predictions.json Which should produce an output like this: shell { f1 : 88.41249612335034, exact_match : 81.2488174077578} You should see a result similar to the 88.5% reported in the paper for BERT Base . If you have access to a Cloud TPU, you can train with BERT Large . Here is a set of hyperparameters (slightly different than the paper) which consistently obtain around 90.5% 91.0% F1 single system trained only on SQuAD: shell python run_squad.py \ vocab_file $BERT_LARGE_DIR/vocab.txt \ bert_config_file $BERT_LARGE_DIR/bert_config.json \ init_checkpoint $BERT_LARGE_DIR/bert_model.ckpt \ do_train True \ train_file $SQUAD_DIR/train v1.1.json \ do_predict True \ predict_file $SQUAD_DIR/dev v1.1.json \ train_batch_size 24 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir gs://some_bucket/squad_large/ \ use_tpu True \ tpu_name $TPU_NAME For example, one random run with these parameters produces the following Dev scores: shell { f1 : 90.87081895814865, exact_match : 84.38978240302744} If you fine tune for one epoch on TriviaQA before this the results will be even better, but you will need to convert TriviaQA into the SQuAD json format. SQuAD 2.0 This model is also implemented and documented in run_squad.py . To run on SQuAD 2.0, you will first need to download the dataset. The necessary files can be found here: train v2.0.json dev v2.0.json evaluate v2.0.py Download these to some directory $SQUAD_DIR . On Cloud TPU you can run with BERT Large as follows: shell python run_squad.py \ vocab_file $BERT_LARGE_DIR/vocab.txt \ bert_config_file $BERT_LARGE_DIR/bert_config.json \ init_checkpoint $BERT_LARGE_DIR/bert_model.ckpt \ do_train True \ train_file $SQUAD_DIR/train v2.0.json \ do_predict True \ predict_file $SQUAD_DIR/dev v2.0.json \ train_batch_size 24 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir gs://some_bucket/squad_large/ \ use_tpu True \ tpu_name $TPU_NAME \ version_2_with_negative True We assume you have copied everything from the output directory to a local directory called ./squad/. The initial dev set predictions will be at ./squad/predictions.json and the differences between the score of no answer ( ) and the best non null answer for each question will be in the file ./squad/null_odds.json Run this script to tune a threshold for predicting null versus non null answers: python $SQUAD_DIR/evaluate v2.0.py $SQUAD_DIR/dev v2.0.json ./squad/predictions.json na prob file ./squad/null_odds.json Assume the script outputs best_f1_thresh THRESH. (Typical values are between 1.0 and 5.0). You can now re run the model to generate predictions with the derived threshold or alternatively you can extract the appropriate answers from ./squad/nbest_predictions.json. shell python run_squad.py \ vocab_file $BERT_LARGE_DIR/vocab.txt \ bert_config_file $BERT_LARGE_DIR/bert_config.json \ init_checkpoint $BERT_LARGE_DIR/bert_model.ckpt \ do_train False \ train_file $SQUAD_DIR/train v2.0.json \ do_predict True \ predict_file $SQUAD_DIR/dev v2.0.json \ train_batch_size 24 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir gs://some_bucket/squad_large/ \ use_tpu True \ tpu_name $TPU_NAME \ version_2_with_negative True \ null_score_diff_threshold $THRESH Out of memory issues All experiments in the paper were fine tuned on a Cloud TPU, which has 64GB of device RAM. Therefore, when using a GPU with 12GB 16GB of RAM, you are likely to encounter out of memory issues if you use the same hyperparameters described in the paper. The factors that affect memory usage are: max_seq_length : The released models were trained with sequence lengths up to 512, but you can fine tune with a shorter max sequence length to save substantial memory. This is controlled by the max_seq_length flag in our example code. train_batch_size : The memory usage is also directly proportional to the batch size. Model type, BERT Base vs. BERT Large : The BERT Large model requires significantly more memory than BERT Base . Optimizer : The default optimizer for BERT is Adam, which requires a lot of extra memory to store the m and v vectors. Switching to a more memory efficient optimizer can reduce memory usage, but can also affect the results. We have not experimented with other optimizers for fine tuning. Using the default training scripts ( run_classifier.py and run_squad.py ), we benchmarked the maximum batch size on single Titan X GPU (12GB RAM) with TensorFlow 1.11.0: System Seq Length Max Batch Size BERT Base 64 64 ... 128 32 ... 256 16 ... 320 14 ... 384 12 ... 512 6 BERT Large 64 12 ... 128 6 ... 256 2 ... 320 1 ... 384 0 ... 512 0 Unfortunately, these max batch sizes for BERT Large are so small that they will actually harm the model accuracy, regardless of the learning rate used. We are working on adding code to this repository which will allow much larger effective batch sizes to be used on the GPU. The code will be based on one (or both) of the following techniques: Gradient accumulation : The samples in a minibatch are typically independent with respect to gradient computation (excluding batch normalization, which is not used here). This means that the gradients of multiple smaller minibatches can be accumulated before performing the weight update, and this will be exactly equivalent to a single larger update. Gradient checkpointing : The major use of GPU/TPU memory during DNN training is caching the intermediate activations in the forward pass that are necessary for efficient computation in the backward pass. Gradient checkpointing trades memory for compute time by re computing the activations in an intelligent way. However, this is not implemented in the current release. Using BERT to extract fixed feature vectors (like ELMo) In certain cases, rather than fine tuning the entire pre trained model end to end, it can be beneficial to obtained pre trained contextual embeddings , which are fixed contextual representations of each input token generated from the hidden layers of the pre trained model. This should also mitigate most of the out of memory issues. As an example, we include the script extract_features.py which can be used like this: shell Sentence A and Sentence B are separated by the delimiter for sentence pair tasks like question answering and entailment. For single sentence inputs, put one sentence per line and DON'T use the delimiter. echo 'Who was Jim Henson ? Jim Henson was a puppeteer' > /tmp/input.txt python extract_features.py \ input_file /tmp/input.txt \ output_file /tmp/output.jsonl \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ layers 1, 2, 3, 4 \ max_seq_length 128 \ batch_size 8 This will create a JSON file (one line per line of input) containing the BERT activations from each Transformer layer specified by layers ( 1 is the final hidden layer of the Transformer, etc.) Note that this script will produce very large output files (by default, around 15kb for every input token). If you need to maintain alignment between the original and tokenized words (for projecting training labels), see the Tokenization ( tokenization) section below. Note: You may see a message like Could not find trained model in model_dir: /tmp/tmpuB5g5c, running initialization to predict. This message is expected, it just means that we are using the init_from_checkpoint() API rather than the saved model API. If you don't specify a checkpoint or specify an invalid checkpoint, this script will complain. Tokenization For sentence level tasks (or sentence pair) tasks, tokenization is very simple. Just follow the example code in run_classifier.py and extract_features.py . The basic procedure for sentence level tasks is: 1. Instantiate an instance of tokenizer tokenization.FullTokenizer 2. Tokenize the raw text with tokens tokenizer.tokenize(raw_text) . 3. Truncate to the maximum sequence length. (You can use up to 512, but you probably want to use shorter if possible for memory and speed reasons.) 4. Add the CLS and SEP tokens in the right place. Word level and span level tasks (e.g., SQuAD and NER) are more complex, since you need to maintain alignment between your input text and output text so that you can project your training labels. SQuAD is a particularly complex example because the input labels are character based, and SQuAD paragraphs are often longer than our maximum sequence length. See the code in run_squad.py to show how we handle this. Before we describe the general recipe for handling word level tasks, it's important to understand what exactly our tokenizer is doing. It has three main steps: 1. Text normalization : Convert all whitespace characters to spaces, and (for the Uncased model) lowercase the input and strip out accent markers. E.g., John Johanson's, → john johanson's, . 2. Punctuation splitting : Split all punctuation characters on both sides (i.e., add whitespace around all punctuation characters). Punctuation characters are defined as (a) Anything with a P Unicode class, (b) any non letter/number/space ASCII character (e.g., characters like $ which are technically not punctuation). E.g., john johanson's, → john johanson ' s , 3. WordPiece tokenization : Apply whitespace tokenization to the output of the above procedure, and apply WordPiece tokenization to each token separately. (Our implementation is directly based on the one from tensor2tensor , which is linked). E.g., john johanson ' s , → john johan son ' s , The advantage of this scheme is that it is compatible with most existing English tokenizers. For example, imagine that you have a part of speech tagging task which looks like this: Input: John Johanson 's house Labels: NNP NNP POS NN The tokenized output will look like this: Tokens: john johan son ' s house Crucially, this would be the same output as if the raw text were John Johanson's house (with no space before the 's ). If you have a pre tokenized representation with word level annotations, you can simply tokenize each input word independently, and deterministically maintain an original to tokenized alignment: python Input orig_tokens John , Johanson , 's , house labels NNP , NNP , POS , NN Output bert_tokens Token map will be an int > int mapping between the orig_tokens index and the bert_tokens index. orig_to_tok_map tokenizer tokenization.FullTokenizer( vocab_file vocab_file, do_lower_case True) bert_tokens.append( CLS ) for orig_token in orig_tokens: orig_to_tok_map.append(len(bert_tokens)) bert_tokens.extend(tokenizer.tokenize(orig_token)) bert_tokens.append( SEP ) bert_tokens CLS , john , johan , son , ' , s , house , SEP orig_to_tok_map 1, 2, 4, 6 Now orig_to_tok_map can be used to project labels to the tokenized representation. There are common English tokenization schemes which will cause a slight mismatch between how BERT was pre trained. For example, if your input tokenization splits off contractions like do n't , this will cause a mismatch. If it is possible to do so, you should pre process your data to convert these back to raw looking text, but if it's not possible, this mismatch is likely not a big deal. Pre training with BERT We are releasing code to do masked LM and next sentence prediction on an arbitrary text corpus. Note that this is not the exact code that was used for the paper (the original code was written in C++, and had some additional complexity), but this code does generate pre training data as described in the paper. Here's how to run the data generation. The input is a plain text file, with one sentence per line. (It is important that these be actual sentences for the next sentence prediction task). Documents are delimited by empty lines. The output is a set of tf.train.Example s serialized into TFRecord file format. You can perform sentence segmentation with an off the shelf NLP toolkit such as spaCy . The create_pretraining_data.py script will concatenate segments until they reach the maximum sequence length to minimize computational waste from padding (see the script for more details). However, you may want to intentionally add a slight amount of noise to your input data (e.g., randomly truncate 2% of input segments) to make it more robust to non sentential input during fine tuning. This script stores all of the examples for the entire input file in memory, so for large data files you should shard the input file and call the script multiple times. (You can pass in a file glob to run_pretraining.py , e.g., tf_examples.tf_record .) The max_predictions_per_seq is the maximum number of masked LM predictions per sequence. You should set this to around max_seq_length masked_lm_prob (the script doesn't do that automatically because the exact value needs to be passed to both scripts). shell python create_pretraining_data.py \ input_file ./sample_text.txt \ output_file /tmp/tf_examples.tfrecord \ vocab_file $BERT_BASE_DIR/vocab.txt \ do_lower_case True \ max_seq_length 128 \ max_predictions_per_seq 20 \ masked_lm_prob 0.15 \ random_seed 12345 \ dupe_factor 5 Here's how to run the pre training. Do not include init_checkpoint if you are pre training from scratch. The model configuration (including vocab size) is specified in bert_config_file . This demo code only pre trains for a small number of steps (20), but in practice you will probably want to set num_train_steps to 10000 steps or more. The max_seq_length and max_predictions_per_seq parameters passed to run_pretraining.py must be the same as create_pretraining_data.py . shell python run_pretraining.py \ input_file /tmp/tf_examples.tfrecord \ output_dir /tmp/pretraining_output \ do_train True \ do_eval True \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ train_batch_size 32 \ max_seq_length 128 \ max_predictions_per_seq 20 \ num_train_steps 20 \ num_warmup_steps 10 \ learning_rate 2e 5 This will produce an output like this: Eval results global_step 20 loss 0.0979674 masked_lm_accuracy 0.985479 masked_lm_loss 0.0979328 next_sentence_accuracy 1.0 next_sentence_loss 3.45724e 05 Note that since our sample_text.txt file is very small, this example training will overfit that data in only a few steps and produce unrealistically high accuracy numbers. Pre training tips and caveats If using your own vocabulary, make sure to change vocab_size in bert_config.json . If you use a larger vocabulary without changing this, you will likely get NaNs when training on GPU or TPU due to unchecked out of bounds access. If your task has a large domain specific corpus available (e.g., movie reviews or scientific papers ), it will likely be beneficial to run additional steps of pre training on your corpus, starting from the BERT checkpoint. The learning rate we used in the paper was 1e 4. However, if you are doing additional steps of pre training starting from an existing BERT checkpoint, you should use a smaller learning rate (e.g., 2e 5). Current BERT models are English only, but we do plan to release a multilingual model which has been pre trained on a lot of languages in the near future (hopefully by the end of November 2018). Longer sequences are disproportionately expensive because attention is quadratic to the sequence length. In other words, a batch of 64 sequences of length 512 is much more expensive than a batch of 256 sequences of length 128. The fully connected/convolutional cost is the same, but the attention cost is far greater for the 512 length sequences. Therefore, one good recipe is to pre train for, say, 90,000 steps with a sequence length of 128 and then for 10,000 additional steps with a sequence length of 512. The very long sequences are mostly needed to learn positional embeddings, which can be learned fairly quickly. Note that this does require generating the data twice with different values of max_seq_length . If you are pre training from scratch, be prepared that pre training is computationally expensive, especially on GPUs. If you are pre training from scratch, our recommended recipe is to pre train a BERT Base on a single preemptible Cloud TPU v2 , which takes about 2 weeks at a cost of about $500 USD (based on the pricing in October 2018). You will have to scale down the batch size when only training on a single Cloud TPU, compared to what was used in the paper. It is recommended to use the largest batch size that fits into TPU memory. Pre training data We will not be able to release the pre processed datasets used in the paper. For Wikipedia, the recommended pre processing is to download the latest dump , extract the text with WikiExtractor.py , and then apply any necessary cleanup to convert it into plain text. Unfortunately the researchers who collected the BookCorpus no longer have it available for public download. The Project Guttenberg Dataset is a somewhat smaller (200M word) collection of older books that are public domain. Common Crawl is another very large collection of text, but you will likely have to do substantial pre processing and cleanup to extract a usable corpus for pre training BERT. Learning a new WordPiece vocabulary This repository does not include code for learning a new WordPiece vocabulary. The reason is that the code used in the paper was implemented in C++ with dependencies on Google's internal libraries. For English, it is almost always better to just start with our vocabulary and pre trained models. For learning vocabularies of other languages, there are a number of open source options available. However, keep in mind that these are not compatible with our tokenization.py library: Google's SentencePiece library tensor2tensor's WordPiece generation script Rico Sennrich's Byte Pair Encoding library Using BERT in Colab If you want to use BERT with Colab , you can get started with the notebook BERT FineTuning with Cloud TPUs . At the time of this writing (October 31st, 2018), Colab users can access a Cloud TPU completely for free. Note: One per user, availability limited, requires a Google Cloud Platform account with storage (although storage may be purchased with free credit for signing up with GCP), and this capability may not longer be available in the future. Click on the BERT Colab that was just linked for more information. FAQ Is this code compatible with Cloud TPUs? What about GPUs? Yes, all of the code in this repository works out of the box with CPU, GPU, and Cloud TPU. However, GPU training is single GPU only. I am getting out of memory errors, what is wrong? See the section on out of memory issues ( out of memory issues) for more information. Is there a PyTorch version available? There is no official PyTorch implementation. However, NLP researchers from HuggingFace made a PyTorch version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. We were not involved in the creation or maintenance of the PyTorch implementation so please direct any questions towards the authors of that repository. Is there a Chainer version available? There is no official Chainer implementation. However, Sosuke Kobayashi made a Chainer version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. We were not involved in the creation or maintenance of the Chainer implementation so please direct any questions towards the authors of that repository. Will models in other languages be released? Yes, we plan to release a multi lingual BERT model in the near future. We cannot make promises about exactly which languages will be included, but it will likely be a single model which includes most of the languages which have a significantly sized Wikipedia. Will models larger than BERT Large be released? So far we have not attempted to train anything larger than BERT Large . It is possible that we will release larger models if we are able to obtain significant improvements. What license is this library released under? All code and models are released under the Apache 2.0 license. See the LICENSE file for more information. How do I cite BERT? For now, cite the Arxiv paper : @article{devlin2018bert, title {BERT: Pre training of Deep Bidirectional Transformers for Language Understanding}, author {Devlin, Jacob and Chang, Ming Wei and Lee, Kenton and Toutanova, Kristina}, journal {arXiv preprint arXiv:1810.04805}, year {2018} } If we submit the paper to a conference or journal, we will update the BibTeX. Disclaimer This is not an official Google product. Contact information For help or issues using BERT, please submit a GitHub issue. For personal communication related to BERT, please contact Jacob Devlin ( jacobdevlin@google.com ), Ming Wei Chang ( mingweichang@google.com ), or Kenton Lee ( kentonl@google.com ).",Machine Translation,Machine Translation 2255,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Subword Neural Machine Translation This repository contains preprocessing scripts to segment text into subword units. The primary purpose is to facilitate the reproduction of our experiments on Neural Machine Translation with subword units (see below for reference). INSTALLATION install via pip (from PyPI): pip install subword nmt install via pip (from Github): pip install alternatively, clone this repository; the scripts are executable stand alone. USAGE INSTRUCTIONS Check the individual files for usage instructions. To apply byte pair encoding to word segmentation, invoke these commands: subword nmt learn bpe s {num_operations} {codes_file} subword nmt apply bpe c {codes_file} {out_file} To segment rare words into character n grams, do the following: subword nmt get vocab train_file {train_file} vocab_file {vocab_file} subword nmt segment char ngrams vocab {vocab_file} n {order} shortlist {size} {out_file} The original segmentation can be restored with a simple replacement: sed r 's/(@@ ) (@@ ?$)//g' If you cloned the repository and did not install a package, you can also run the individual commands as scripts: ./subword_nmt/learn_bpe.py s {num_operations} {codes_file} BEST PRACTICE ADVICE FOR BYTE PAIR ENCODING IN NMT We found that for languages that share an alphabet, learning BPE on the concatenation of the (two or more) involved languages increases the consistency of segmentation, and reduces the problem of inserting/deleting characters when copying/transliterating names. However, this introduces undesirable edge cases in that a word may be segmented in a way that has only been observed in the other language, and is thus unknown at test time. To prevent this, apply_bpe.py accepts a vocabulary and a vocabulary threshold option so that the script will only produce symbols which also appear in the vocabulary (with at least some frequency). To use this functionality, we recommend the following recipe (assuming L1 and L2 are the two languages): Learn byte pair encoding on the concatenation of the training text, and get resulting vocabulary for each: cat {train_file}.L1 {train_file}.L2 subword nmt learn bpe s {num_operations} o {codes_file} subword nmt apply bpe c {codes_file} {vocab_file}.L1 subword nmt apply bpe c {codes_file} {vocab_file}.L2 more conventiently, you can do the same with with this command: subword nmt learn joint bpe and vocab input {train_file}.L1 {train_file}.L2 s {num_operations} o {codes_file} write vocabulary {vocab_file}.L1 {vocab_file}.L2 re apply byte pair encoding with vocabulary filter: subword nmt apply bpe c {codes_file} vocabulary {vocab_file}.L1 vocabulary threshold 50 {train_file}.BPE.L1 subword nmt apply bpe c {codes_file} vocabulary {vocab_file}.L2 vocabulary threshold 50 {train_file}.BPE.L2 as a last step, extract the vocabulary to be used by the neural network. Example with Nematus: nematus/data/build_dictionary.py {train_file}.BPE.L1 {train_file}.BPE.L2 you may want to take the union of all vocabularies to support multilingual systems for test/dev data, re use the same options for consistency: subword nmt apply bpe c {codes_file} vocabulary {vocab_file}.L1 vocabulary threshold 50 {test_file}.BPE.L1 PUBLICATIONS The segmentation methods are described in: Rico Sennrich, Barry Haddow and Alexandra Birch (2016): Neural Machine Translation of Rare Words with Subword Units Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL 2016). Berlin, Germany. HOW IMPLEMENTATION DIFFERS FROM Sennrich et al. (2016) This repository implements the subword segmentation as described in Sennrich et al. (2016), but since version 0.2, there is one core difference related to end of word tokens. In Sennrich et al. (2016), the end of word token is initially represented as a separate token, which can be merged with other subwords over time: u n d f u n d Since 0.2, end of word tokens are initially concatenated with the word final character: u n d f u n d The new representation ensures that when BPE codes are learned from the above examples and then applied to new text, it is clear that a subword unit und is unambiguously word final, and un is unambiguously word internal, preventing the production of up to two different subword units from each BPE merge operation. apply_bpe.py is backward compatible and continues to accept old style BPE files. New style BPE files are identified by having the following first line: version: 0.2 ACKNOWLEDGMENTS This project has received funding from Samsung Electronics Polska sp. z o.o. Samsung R&D Institute Poland, and from the European Union’s Horizon 2020 research and innovation programme under grant agreement 645452 (QT21).",Machine Translation,Machine Translation 2256,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Scripts for Edinburgh Neural MT systems for WMT 16 This repository contains scripts and an example config used for the Edinburgh Neural MT submission (UEDIN NMT) for the shared translation task at the 2016 Workshops on Statistical Machine Translation , and for the paper Linguistic Input Features Improve Neural Machine Translation . The scripts will facilitate the reproduction of our results, and serve as additional documentation (along with the system description paper) NOTE: for newer instructions (from WMT 17), see OVERVIEW We built translation models with Nematus ( ) We used BPE as subword segmentation to achieve open vocabulary translation ( ) We automatically back translated in domain monolingual data into the source language to create additional training data. More details about our system are available in the system description paper (see below for reference) MODELS and DATA pre trained models on WMT data are released at they currently correspond to the best single model for 8 translation directions: EN {CS,DE,RO,RU} automatically back translated monolingual data, which we used for our WMT submissions, is available at linguistically annotated corpora which we used for our factored models are available at SCRIPTS preprocessing : preprocessing scripts for Romanian that we found helpful for translation quality. we used the Moses tokenizer and truecaser for all language pairs. sample : sample scripts that we used for preprocessing, training and decoding. We used mostly the same settings for all translation directions, with small differences in vocabulary size. Dropout was enabled for EN RO, but disabled otherwise. factored_sample: sample scripts for preprocessing and training with linguistic input features. This was not used in shared task submissions, but in (Sennrich and Haddow, 2016). r2l : scripts for reranking the output of the (default) left to right decoder with a model that decodes from right to left. EVALUATION WMT reports case sensitive BLEU on detokenized text with the NIST BLEU scorer. Assuming that you have detokenized your output (see sample/postprocess test.sh ) in the file output.detok , here is how we score a system (on the example of EN DE): /path/to/mosesdecoder/scripts/ems/support/wrap xml.perl de newstest2016 ende src.en.sgm tmpfile /path/to/mosesdecoder/scripts/generic/mteval v13a.pl c s newstest2016 ende src.en.sgm r newstest2016 ende ref.de.sgm t tmpfile alternatively, you can use multi bleu detok.perl , which accepts detokenized output in plain text, and gives the same result as the NIST Bleu scorer: /path/to/nematus/data/strip_sgml.py newstest2016 ende ref.de.txt /path/to/nematus/data/multi bleu detok.perl newstest2016 ende ref.de.txt < output.detok Note that multi bleu.perl (or multi bleu detok.perl ) on tokenized text will give different scores (usually higher), because of tokenization differences. Also, comparing different systems with tokenized BLEU is unreliable unless tokenization is identical. Even when using standard Moses tokenization, command line options like ' penn' and ' a' will cause inconsistencies. LICENSE The scripts are available under the MIT License. PUBLICATIONS The Edinburgh Neural MT submission to WMT 2016 is described in: Rico Sennrich, Barry Haddow, Alexandra Birch (2016): Edinburgh Neural Machine Translation Systems for WMT 16, Proc. of the First Conference on Machine Translation (WMT16). Berlin, Germany It is based on work described in the following publications: Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio (2015): Neural Machine Translation by Jointly Learning to Align and Translate, Proceedings of the International Conference on Learning Representations (ICLR). Rico Sennrich, Barry Haddow, Alexandra Birch (2016): Neural Machine Translation of Rare Words with Subword Units. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL 2016). Berlin, Germany. Rico Sennrich, Barry Haddow, Alexandra Birch (2016): Improving Neural Machine Translation Models with Monolingual Data. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL 2016). Berlin, Germany. The use of linguistic input features (factored_sample) is described in: Rico Sennrich, Barry Haddow (2016): Linguistic Input Features Improve Neural Machine Translation, Proc. of the First Conference on Machine Translation (WMT16). Berlin, Germany",Machine Translation,Machine Translation 2287,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Non Autoregressive Transformer Code release for Non Autoregressive Neural Machine Translation by Jiatao Gu, James Bradbury, Caiming Xiong, Victor O.K. Li, and Richard Socher. Requires PyTorch 0.3, torchtext 0.2.1, and SpaCy. The pipeline for training a NAT model for a given language pair includes: 1. run_alignment_wmt_LANG.sh (runs fast_align for alignment supervision) 2. run_LANG.sh (trains an autoregressive model) 3. run_LANG_decode.sh (produces the distillation corpus for training the NAT) 4. run_LANG_fast.sh (trains the NAT model) 5. run_LANG_fine.sh (fine tunes the NAT model)",Machine Translation,Machine Translation 2288,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Quasi Recurrent Neural Network (QRNN) for PyTorch Updated to support multi GPU environments via DataParallel see the the multigpu_dataparallel.py example. This repository contains a PyTorch implementation of Salesforce Research 's Quasi Recurrent Neural Networks paper. The QRNN provides similar accuracy to the LSTM but can be betwen 2 and 17 times faster than the highly optimized NVIDIA cuDNN LSTM implementation depending on the use case. To install, simply run: pip install cupy pynvrtc git+ If you use this code or our results in your research, please cite: @article{bradbury2016quasi, title {{Quasi Recurrent Neural Networks}}, author {Bradbury, James and Merity, Stephen and Xiong, Caiming and Socher, Richard}, journal {International Conference on Learning Representations (ICLR 2017)}, year {2017} } Software Requirements This codebase requires Python 3, PyTorch , pynvrtc (NVIDIA's Python Bindings to NVRTC), and CuPy . While the codebase contains a CPU implementation of the QRNN, the GPU QRNN implementation is used by default if possible. Requirements are provided in requirements.txt . Example Usage We've updated the previously released Salesforce Research AWD LSTM language modeling codebase to support use of the AWD QRNN . With the same number of parameters as the LSTM and less well tuned hyper parameters, the QRNN model trains over twice as quickly and achieves nearly equivalent state of the art language modeling results. For full details, refer to the AWD LSTM LM repository . Usage The QRNN API is meant to be drop in compatible with the LSTM for many standard use cases. As such, the easiest thing to do is replace any GRU or LSTM module with the QRNN . Note: bidirectional QRNN is not yet supported though will be in the near future. python import torch from torchqrnn import QRNN seq_len, batch_size, hidden_size 7, 20, 256 size (seq_len, batch_size, hidden_size) X torch.autograd.Variable(torch.rand(size), requires_grad True).cuda() qrnn QRNN(hidden_size, hidden_size, num_layers 2, dropout 0.4) qrnn.cuda() output, hidden qrnn(X) print(output.size(), hidden.size()) The full documentation for the QRNN is listed below: QRNN(input_size, hidden_size, num_layers, dropout 0): Applies a multiple layer Quasi Recurrent Neural Network (QRNN) to an input sequence. Args: input_size: The number of expected features in the input x. hidden_size: The number of features in the hidden state h. If not specified, the input size is used. num_layers: The number of QRNN layers to produce. layers: List of preconstructed QRNN layers to use for the QRNN module (optional). save_prev_x: Whether to store previous inputs for use in future convolutional windows (i.e. for a continuing sequence such as in language modeling). If true, you must call reset to remove cached previous values of x. Default: False. window: Defines the size of the convolutional window (how many previous tokens to look when computing the QRNN values). Supports 1 and 2. Default: 1. zoneout: Whether to apply zoneout (i.e. failing to update elements in the hidden state) to the hidden state updates. Default: 0. output_gate: If True, performs QRNN fo (applying an output gate to the output). If False, performs QRNN f. Default: True. use_cuda: If True, uses fast custom CUDA kernel. If False, uses naive for loop. Default: True. Inputs: X, hidden X (seq_len, batch, input_size): tensor containing the features of the input sequence. hidden (layers, batch, hidden_size): tensor containing the initial hidden state for the QRNN. Outputs: output, h_n output (seq_len, batch, hidden_size): tensor containing the output of the QRNN for each timestep. h_n (layers, batch, hidden_size): tensor containing the hidden state for t seq_len The included QRNN layer supports convolutional windows of size 1 or 2 but will be extended in the future to support arbitrary convolutions. If you are using convolutional windows of size 2 (i.e. looking at the inputs from two previous timesteps to compute the input) and want to run over a long sequence in batches, such as when using BPTT, you can set save_prev_x True and call reset when you wish to reset the cached previous inputs. If you want flexibility in the definition of each QRNN layer, you can construct individual QRNNLayer modules and pass them to the QRNN module using the layer argument. Speed Speeds are between 2 and 17 times faster than NVIDIA's cuDNN LSTM, with the difference as a result of varying batch size and sequence length. The largest gains are for small batch sizes or long sequence lengths, both highlighting the LSTMs parallelization difficulty due to forced sequentiality. For full information, refer to the Quasi Recurrent Neural Networks paper. ! Figure 4 from QRNN paper (images/qrnn_speed.png) Pictured above is Figure 4 from the QRNN paper: Left: Training speed for two layer 640 unit PTB LM on a batch of 20 examples of 105 timesteps. “RNN” and “softmax” include the forward and backward times, while “optimization overhead” includes gradient clipping, L2 regularization, and SGD computations. Right: Inference speed advantage of a 320 unit QRNN layer alone over an equal sized cuDNN LSTM layer for data with the given batch size and sequence length. Training results are similar. Extending the QRNN speed advantage to other recurrent architectures with ForgetMult The QRNN architecture's speed advantage comes from two primary sources: the ability to batch all computations into a few large matrix multiplications and the use of a fast element wise recurrence function. This recurrence function, named ForgetMult , is general and can be used in other scenarios. The ForgetMult takes two arguments the candidate input x and forget gates f and computes h f x + (1 f) hm1 where hm1 is the previous hidden state output. The QRNN class is a thin wrapper around this that performs the large matrix multiplications for the candidate x , the forget gates f , and the output gates o . Any other operation which requires recurrence and can have precomputed values for the candidate x and forget gates f can use this fast form of recurrence. Example usage of the ForgetMult module: output ForgetMult()(f, x, hidden) . ForgetMult computes a simple recurrent equation: h_t f_t x_t + (1 f_t) h_{t 1} This equation is equivalent to dynamic weighted averaging. Inputs: X, hidden X (seq_len, batch, input_size): tensor containing the features of the input sequence. F (seq_len, batch, input_size): tensor containing the forget gate values, assumed in range 0, 1 . hidden_init (batch, input_size): tensor containing the initial hidden state for the recurrence (h_{t 1}). cuda: If True, use the fast element wise CUDA kernel for recurrence. If False, uses naive for loop. Default: True. Want to help out? First, thanks! :) Open tasks that are interesting: + Modify the ForgetMult CUDA kernel to produce a BackwardForgetMult . This will enable a bidirectional QRNN. The input should be the same f and x but the kernel should walk backwards through the inputs. + Bidirectional QRNN support (requires the modification above) + Support PyTorch's PackedSequence such that variable length sequences are correctly masked + Show how to use the underlying fast recurrence operator ForgetMult in other generic ways",Machine Translation,Machine Translation 2293,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Deep Text Corrector Deep Text Corrector uses TensorFlow to train sequence to sequence models that are capable of automatically correcting small grammatical errors in conversational written English (e.g. SMS messages). It does this by taking English text samples that are known to be mostly grammatically correct and randomly introducing a handful of small grammatical errors (e.g. removing articles) to each sentence to produce input output pairs (where the output is the original sample), which are then used to train a sequence to sequence model. See this blog post for a more thorough write up of this work. Motivation While context sensitive spell check systems are able to automatically correct a large number of input errors in instant messaging, email, and SMS messages, they are unable to correct even simple grammatical errors. For example, the message I'm going to store would be unaffected by typical autocorrection systems, when the user most likely intendend to write I'm going to _the_ store . These kinds of simple grammatical mistakes are common in so called learner English , and constructing systems capable of detecting and correcting these mistakes has been the subect of multiple CoNLL shared tasks . The goal of this project is to train sequence to sequence models that are capable of automatically correcting such errors. Specifically, the models are trained to provide a function mapping a potentially errant input sequence to a sequence with all (small) grammatical errors corrected. Given these models, it would be possible to construct tools to help correct these simple errors in written communications, such as emails, instant messaging, etc. Correcting Grammatical Errors with Deep Learning The basic idea behind this project is that we can generate large training datasets for the task of grammar correction by starting with grammatically correct samples and introducing small errors to produce input output pairs, which can then be used to train a sequence to sequence models. The details of how we construct these datasets, train models using them, and produce predictions for this task are described below. Datasets To create a dataset for Deep Text Corrector models, we start with a large collection of mostly grammatically correct samples of conversational written English. The primary dataset considered in this project is the Cornell Movie Dialogs Corpus , which contains over 300k lines from movie scripts. This was the largest collection of conversational written English I could find that was mostly grammatically correct. Given a sample of text like this, the next step is to generate input output pairs to be used during training. This is done by: 1. Drawing a sample sentence from the dataset. 2. Setting the input sequence to this sentence after randomly applying certain perturbations. 3. Setting the output sequence to the unperturbed sentence. where the perturbations applied in step (2) are intended to introduce small grammatical errors which we would like the model to learn to correct. Thus far, these perturbations are limited to the: subtraction of articles (a, an, the) subtraction of the second part of a verb contraction (e.g. 've , 'll , 's , 'm ) replacement of a few common homophones with one of their counterparts (e.g. replacing their with there , then with than ) The rates with which these perturbations are introduced are loosely based on figures taken from the CoNLL 2014 Shared Task on Grammatical Error Correction . In this project, each perturbation is applied in 25% of cases where it could potentially be applied. Training To artificially increase the dataset when training a sequence model, we perform the sampling strategy described above multiple times to arrive at 2 3x the number of input output pairs. Given this augmented dataset, training proceeds in a very similar manner to TensorFlow's sequence to sequence tutorial . That is, we train a sequence to sequence model using LSTM encoders and decoders with an attention mechanism as described in Bahdanau et al., 2014 using stochastic gradient descent. Decoding Instead of using the most probable decoding according to the seq2seq model, this project takes advantage of the unique structure of the problem to impose the prior that all tokens in a decoded sequence should either exist in the input sequence or belong to a set of corrective tokens. The corrective token set is constructed during training and contains all tokens seen in the target, but not the source, for at least one sample in the training set. The intuition here is that the errors seen during training involve the misuse of a relatively small vocabulary of common words (e.g. the , an , their ) and that the model should only be allowed to perform corrections in this domain. This prior is carried out through a modification to the seq2seq model's decoding loop in addition to a post processing step that resolves out of vocabulary (OOV) tokens: Biased Decoding To restrict the decoding such that it only ever chooses tokens from the input sequence or corrective token set, this project applies a binary mask to the model's logits prior to extracting the prediction to be fed into the next time step. This mask is constructed such that mask i 1.0 if (i in input or corrective_tokens) else 0.0 . Since this mask is applited to the result of a softmax transormation (which guarantees all outputs are non negative), we can be sure that only input or corrective tokens are ever selected. Note that this logic is not used during training, as this would only serve to eliminate potentially useful signal from the model. Handling OOV Tokens Since the decoding bias described above is applied within the truncated vocabulary used by the model, we will still see the unknown token in its output for any OOV tokens. The more generic problem of resolving these OOV tokens is non trivial (e.g. see Addressing the Rare Word Problem in NMT ), but in this project we can again take advantage of its unique structure to create a fairly straightforward OOV token resolution scheme. That is, if we assume the sequence of OOV tokens in the input is equal to the sequence of OOV tokens in the output sequence, then we can trivially assign the appropriate token to each unknown token encountered int he decoding. Empirically, and intuitively, this appears to be an appropriate assumption, as the relatively simple class of errors these models are being trained to address should never include mistakes that warrant the insertion or removal of a rare token. Experiments and Results Below are some anecdotal and aggregate results from experiments using the Deep Text Corrector model with the Cornell Movie Dialogs Corpus . The dataset consists of 304,713 lines from movie scripts, of which 243,768 lines were used to train the model and 30,474 lines each were used for the validation and testing sets. The sets were selected such that no lines from the same movie were present in both the training and testing sets. The model being evaluated below is a sequence to sequence model, with attention, where the encoder and decoder were both 2 layer, 512 hidden unit LSTMs. The model was trained with a vocabulary of the 2k most common words seen in the training set. Aggregate Performance Below are reported the BLEU scores and accuracy numbers over the test dataset for both a trained model and a baseline, where the baseline is the identity function (which assumes no errors exist in the input). You'll notice that the model outperforms this baseline for all bucket sizes in terms of accuracy, and outperforms all but one in terms of BLEU score. This tells us that applying the Deep Text Corrector model to a potentially errant writing sample would, on average, result in a more grammatically correct writing sample. Anyone who tends to make errors similar to those the model has been trained on could therefore benefit from passing their messages through this model. Bucket 0: (10, 10) Baseline BLEU 0.8341 Model BLEU 0.8516 Baseline Accuracy: 0.9083 Model Accuracy: 0.9384 Bucket 1: (15, 15) Baseline BLEU 0.8850 Model BLEU 0.8860 Baseline Accuracy: 0.8156 Model Accuracy: 0.8491 Bucket 2: (20, 20) Baseline BLEU 0.8876 Model BLEU 0.8880 Baseline Accuracy: 0.7291 Model Accuracy: 0.7817 Bucket 3: (40, 40) Baseline BLEU 0.9099 Model BLEU 0.9045 Baseline Accuracy: 0.6073 Model Accuracy: 0.6425 Examples Decoding a sentence with a missing article: In 31 : decode( Kvothe went to market ) Out 31 : 'Kvothe went to the market' Decoding a sentence with then/than confusion: In 30 : decode( the Cardinals did better then the Cubs in the offseason ) Out 30 : 'the Cardinals did better than the Cubs in the offseason' Implementation Details This project reuses and slightly extends TensorFlow's Seq2SeqModel , which itself implements a sequence to sequence model with an attention mechanism as described in The primary contributions of this project are: data_reader.py : an abstract class that defines the interface for classes which are capable of reading a source dataset and producing input output pairs, where the input is a grammatically incorrect variant of a source sentence and the output is the original sentence. text_corrector_data_readers.py : contains a few implementations of DataReader , one over the Penn Treebank dataset and one over the Cornell Movie Dialogs Corpus . text_corrector_models.py : contains a version of Seq2SeqModel modified such that it implements the logic described in Biased Decoding ( biased decoding) correct_text.py : a collection of helper functions that together allow for the training of a model and the usage of it to decode errant input sequences (at test time). The decode method defined here implements the OOV token resolution logic ( handling oov tokens). This also defines a main method, and can be invoked from the command line. It was largely derived from TensorFlow's translate.py . TextCorrector.ipynb : an IPython notebook which ties together all of the above pieces to allow for the training and evaluation of the model in an interactive fashion. Example Usage Note: this project requires TensorFlow version > 0.11. See this page for setup instructions. Preprocess Movie Dialog Data python preprocessors/preprocess_movie_dialogs.py raw_data movie_lines.txt \ out_file preprocessed_movie_lines.txt This preprocessed file can then be split up however you like to create training, validation, and testing sets. Training: python correct_text.py train_path /movie_dialog_train.txt \ val_path /movie_dialog_val.txt \ config DefaultMovieDialogConfig \ data_reader_type MovieDialogReader \ model_path /movie_dialog_model Testing: python correct_text.py test_path /movie_dialog_test.txt \ config DefaultMovieDialogConfig \ data_reader_type MovieDialogReader \ model_path /movie_dialog_model \ decode",Machine Translation,Machine Translation 2295,Natural Language Processing,Natural Language Processing,Natural Language Processing,"MUSE: Multilingual Unsupervised and Supervised Embeddings ! Model MUSE is a Python library for multilingual word embeddings , whose goal is to provide the community with: state of the art multilingual word embeddings ( fastText embeddings aligned in a common space) large scale high quality bilingual dictionaries for training and evaluation We include two methods, one supervised that uses a bilingual dictionary or identical character strings, and one unsupervised that does not use any parallel data (see Word Translation without Parallel Data for more details). Dependencies Python 2/3 with NumPy / SciPy PyTorch Faiss (recommended) for fast nearest neighbor search (CPU or GPU). MUSE is available on CPU or GPU, in Python 2 or 3. Faiss is optional for GPU users though Faiss GPU will greatly speed up nearest neighbor search and highly recommended for CPU users. Faiss can be installed using conda install faiss cpu c pytorch or conda install faiss gpu c pytorch . Get evaluation datasets Get monolingual and cross lingual word embeddings evaluation datasets: Our 110 bilingual dictionaries 28 monolingual word similarity tasks for 6 languages, and the English word analogy task Cross lingual word similarity tasks from SemEval2017 Sentence translation retrieval with Europarl corpora by simply running (in data/): bash ./get_evaluation.sh Note: Requires bash 4. The download of Europarl is disabled by default (slow), you can enable it here . Get monolingual word embeddings For pre trained monolingual word embeddings, we highly recommend fastText Wikipedia embeddings , or using fastText to train your own word embeddings from your corpus. You can download the English (en) and Spanish (es) embeddings this way: bash English fastText Wikipedia embeddings curl Lo data/wiki.en.vec Spanish fastText Wikipedia embeddings curl Lo data/wiki.es.vec Align monolingual word embeddings This project includes two ways to obtain cross lingual word embeddings: Supervised : using a train bilingual dictionary (or identical character strings as anchor points), learn a mapping from the source to the target space using (iterative) Procrustes alignment. Unsupervised : without any parallel data or anchor point, learn a mapping from the source to the target space using adversarial training and (iterative) Procrustes refinement. For more details on these approaches, please check here . The supervised way: iterative Procrustes (CPU GPU) To learn a mapping between the source and the target space, simply run: bash python supervised.py src_lang en tgt_lang es src_emb data/wiki.en.vec tgt_emb data/wiki.es.vec n_refinement 5 dico_train default By default, dico_train will point to our ground truth dictionaries (downloaded above); when set to identical_char it will use identical character strings between source and target languages to form a vocabulary. Logs and embeddings will be saved in the dumped/ directory. The unsupervised way: adversarial training and refinement (CPU GPU) To learn a mapping using adversarial training and iterative Procrustes refinement, run: bash python unsupervised.py src_lang en tgt_lang es src_emb data/wiki.en.vec tgt_emb data/wiki.es.vec n_refinement 5 By default, the validation metric is the mean cosine of word pairs from a synthetic dictionary built with CSLS (Cross domain similarity local scaling). For some language pairs (e.g. En Zh), we recommend to center the embeddings using normalize_embeddings center . Evaluate monolingual or cross lingual embeddings (CPU GPU) We also include a simple script to evaluate the quality of monolingual or cross lingual word embeddings on several tasks: Monolingual bash python evaluate.py src_lang en src_emb data/wiki.en.vec max_vocab 200000 Cross lingual bash python evaluate.py src_lang en tgt_lang es src_emb data/wiki.en es.en.vec tgt_emb data/wiki.en es.es.vec max_vocab 200000 Word embedding format By default, the aligned embeddings are exported to a text format at the end of experiments: export txt . Exporting embeddings to a text file can take a while if you have a lot of embeddings. For a very fast export, you can set export pth to export the embeddings in a PyTorch binary file, or simply disable the export ( export ). When loading embeddings, the model can load: PyTorch binary files previously generated by MUSE (.pth files) fastText binary files previously generated by fastText (.bin files) text files (text file with one word embedding per line) The two first options are very fast and can load 1 million embeddings in a few seconds, while loading text files can take a while. Download We provide multilingual embeddings and ground truth bilingual dictionaries. These embeddings are fastText embeddings that have been aligned in a common space. Multilingual word Embeddings We release fastText Wikipedia supervised word embeddings for 30 languages, aligned in a single vector space . Arabic: text Bulgarian: text Catalan: text Croatian: text Czech: text Danish: text Dutch: text English: text Estonian: text Finnish: text French: text German: text Greek: text Hebrew: text Hungarian: text Indonesian: text Italian: text Macedonian: text Norwegian: text Polish: text Portuguese: text Romanian: text Russian: text Slovak: text Slovenian: text Spanish: text Swedish: text Turkish: text Ukrainian: text Vietnamese: text You can visualize crosslingual nearest neighbors using demo.ipynb . Ground truth bilingual dictionaries We created 110 large scale ground truth bilingual dictionaries using an internal translation tool. The dictionaries handle well the polysemy of words. We provide a train and test split of 5000 and 1500 unique source words, as well as a larger set of up to 100k pairs. Our goal is to ease the development and the evaluation of cross lingual word embeddings and multilingual NLP . European languages in every direction src tgt German English Spanish French Italian Portuguese : : : : : : : : : : : : : : German full train test full train test full train test full train test full train test English full train test full train test full train test full train test full train test Spanish full train test full train test full train test full train test full train test French full train test full train test full train test full train test full train test Italian full train test full train test full train test full train test full train test Portuguese full train test full train test full train test full train test full train test Other languages to English (e.g. {fr,es} en) Afrikaans: full train test Albanian: full train test Arabic: full train test Bengali: full train test Bosnian: full train test Bulgarian: full train test Catalan: full train test Chinese: full train test Croatian: full train test Czech: full train test Danish: full train test Dutch: full train test English: full train test Estonian: full train test Filipino: full train test Finnish: full train test French: full train test German: full train test Greek: full train test Hebrew: full train test Hindi: full train test Hungarian: full train test Indonesian: full train test Italian: full train test Japanese: full train test Korean: full train test Latvian: full train test Littuanian: full train test Macedonian: full train test Malay: full train test Norwegian: full train test Persian: full train test Polish: full train test Portuguese: full train test Romanian: full train test Russian: full train test Slovak: full train test Slovenian: full train test Spanish: full train test Swedish: full train test Tamil: full train test Thai: full train test Turkish: full train test Ukrainian: full train test Vietnamese: full train test English to other languages (e.g. en {fr,es}) Afrikaans: full train test Albanian: full train test Arabic: full train test Bengali: full train test Bosnian: full train test Bulgarian: full train test Catalan: full train test Chinese: full train test Croatian: full train test Czech: full train test Danish: full train test Dutch: full train test English: full train test Estonian: full train test Filipino: full train test Finnish: full train test French: full train test German: full train test Greek: full train test Hebrew: full train test Hindi: full train test Hungarian: full train test Indonesian: full train test Italian: full train test Japanese: full train test Korean: full train test Latvian: full train test Littuanian: full train test Macedonian: full train test Malay: full train test Norwegian: full train test Persian: full train test Polish: full train test Portuguese: full train test Romanian: full train test Russian: full train test Slovak: full train test Slovenian: full train test Spanish: full train test Swedish: full train test Tamil: full train test Thai: full train test Turkish: full train test Ukrainian: full train test Vietnamese: full train test References Please cite 1 if you found the resources in this repository useful. Word Translation Without Parallel Data 1 A. Conneau\ , G. Lample\ , L. Denoyer, MA. Ranzato, H. Jégou, Word Translation Without Parallel Data \ Equal contribution. Order has been determined with a coin flip. @article{conneau2017word, title {Word Translation Without Parallel Data}, author {Conneau, Alexis and Lample, Guillaume and Ranzato, Marc'Aurelio and Denoyer, Ludovic and J{\'e}gou, Herv{\'e}}, journal {arXiv preprint arXiv:1710.04087}, year {2017} } MUSE is the project at the origin of the work on unsupervised machine translation with monolingual data only 2 . Unsupervised Machine Translation With Monolingual Data Only 2 G. Lample, A. Conneau, L. Denoyer, MA. Ranzato Unsupervised Machine Translation With Monolingual Data Only @article{lample2017unsupervised, title {Unsupervised Machine Translation Using Monolingual Corpora Only}, author {Lample, Guillaume and Conneau, Alexis and Denoyer, Ludovic and Ranzato, Marc'Aurelio}, journal {arXiv preprint arXiv:1711.00043}, year {2017} } Related work T. Mikolov, Q. V Le, I. Sutskever Exploiting similarities among languages for machine translation, 2013 G. Dinu, A. Lazaridou, M. Baroni Improving zero shot learning by mitigating the hubness problem, 2015 S. L Smith, D. HP Turban, S. Hamblin, N. Y Hammerla Offline bilingual word vectors, orthogonal transformations and the inverted softmax, 2017 M. Artetxe, G. Labaka, E. Agirre Learning bilingual word embeddings with (almost) no bilingual data, 2017 M. Zhang, Y. Liu, H. Luan, and M. Sun Adversarial training for unsupervised bilingual lexicon induction, 2017 Y. Hoshen, L. Wolf An Iterative Closest Point Method for Unsupervised Word Translation, 2018 A. Joulin, P. Bojanowski, T. Mikolov, E. Grave Improving supervised bilingual mapping of word embeddings, 2018 E. Grave, A. Joulin, Q. Berthet Unsupervised Alignment of Embeddings with Wasserstein Procrustes, 2018 Contact: gl@fb.com (mailto:gl@fb.com) aconneau@fb.com (mailto:aconneau@fb.com)",Machine Translation,Machine Translation 2300,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Neural Semantic Encoders This project contains Chainer implementation of Neural Semantic Encoders , a memory augmented neural network. Paper: Slides (pdf): www.tsendeemts.com ! nse_demo (./assets/nse_demo.png) Prerequisites Python 2.7 chainer (tested on chainer 1.19.0) Other data utils: sklearn, pandas, numpy etc. Usage To train a NSE model on SNLI dataset: $ python snli/train_nse.py snli path/to/snli_1.0 glove path/to/glove.840B.300d.txt To train a shared memory model (MMA NSE): $ python snli/train_nse_mma.py snli path/to/snli_1.0 glove path/to/glove.840B.300d.txt Results The plain NSE model obtains an accuracy around 84 85% on SNLI test set. Author Tsendsuren Munkhdalai / @tsendeemts Other 3rd party implementations Keras NSE implementation by @pdasigi",Machine Translation,Machine Translation 2302,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Long short term memory Introduction This project shows how the BasicLSTMCell is implemented internally in Tensorflow. BasicLSTMCell The implementation of Tensorflow's BasicLSTMCell is based on: The LSTM architecture is defined by the following equations: Python forget_gate sigmoid(matmul(input, w_f) + matmul(u_f, hidden_state) + b_f) input_gate sigmoid(matmul(input, w_i) + matmul(u_i, hidden_state) + b_i) new_input tanh(matmul(input, w_j) + matmul(u_j, hidden_state) + b_j) output_gate sigmoid(matmul(input, w_o) + matmul(u_o, hidden_state) + b_o) new_cell_state cell_state forget_gate + input_gate new_input new_hidden_state tanh(new_cell_state) output_gate The forget gate outputs a number between 0 and 1. It decides how much of the old cell state we forget. The input gate decides how much new input is part of the new state. The output gate decides which parts we output. Example In this project the BasicLSTMCell is used to predict the next symbol of the sequence 0, 0, 0, 1, 1, 1, 0 . The sequence is divided into inputs of 3 time steps: Input Label 0, 0, 0 1 0, 0, 1 1 0, 1, 1 1 1, 1, 1 0 Run LSTM shell $ python m run_lstm",Machine Translation,Machine Translation 2306,Natural Language Processing,Natural Language Processing,Natural Language Processing,"BERT \ \ \ \ \ New November 15th, 2018: SOTA SQuAD 2.0 System \ \ \ \ \ We released code changes to reproduce our 83% F1 SQuAD 2.0 system, which is currently 1st place on the leaderboard by 3%. See the SQuAD 2.0 section of the README for details. \ \ \ \ \ New November 5th, 2018: Third party PyTorch and Chainer versions of BERT available \ \ \ \ \ NLP researchers from HuggingFace made a PyTorch version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. Sosuke Kobayashi also made a Chainer version of BERT available (Thanks!) We were not involved in the creation or maintenance of the PyTorch implementation so please direct any questions towards the authors of that repository. \ \ \ \ \ New November 3rd, 2018: Multilingual and Chinese models available \ \ \ \ \ We have made two new BERT models available: BERT Base, Multilingual : 102 languages, 12 layer, 768 hidden, 12 heads, 110M parameters BERT Base, Chinese : Chinese Simplified and Traditional, 12 layer, 768 hidden, 12 heads, 110M parameters We use character based tokenization for Chinese, and WordPiece tokenization for all other languages. Both models should work out of the box without any code changes. We did update the implementation of BasicTokenizer in tokenization.py to support Chinese character tokenization, so please update if you forked it. However, we did not change the tokenization API. For more, see the Multilingual README . \ \ \ \ \ End new information \ \ \ \ \ Introduction BERT , or B idirectional E ncoder R epresentations from T ransformers, is a new method of pre training language representations which obtains state of the art results on a wide array of Natural Language Processing (NLP) tasks. Our academic paper which describes BERT in detail and provides full results on a number of tasks can be found here: To give a few numbers, here are the results on the SQuAD v1.1 question answering task: SQuAD v1.1 Leaderboard (Oct 8th 2018) Test EM Test F1 : : : : 1st Place Ensemble BERT 87.4 93.2 2nd Place Ensemble nlnet 86.0 91.7 1st Place Single Model BERT 85.1 91.8 2nd Place Single Model nlnet 83.5 90.1 And several natural language inference tasks: System MultiNLI Question NLI SWAG : : : : : : BERT 86.7 91.1 86.3 OpenAI GPT (Prev. SOTA) 82.2 88.1 75.0 Plus many other tasks. Moreover, these results were all obtained with almost no task specific neural network architecture design. If you already know what BERT is and you just want to get started, you can download the pre trained models ( pre trained models) and run a state of the art fine tuning ( fine tuning with bert) in only a few minutes. What is BERT? BERT is a method of pre training language representations, meaning that we train a general purpose language understanding model on a large text corpus (like Wikipedia), and then use that model for downstream NLP tasks that we care about (like question answering). BERT outperforms previous methods because it is the first unsupervised , deeply bidirectional system for pre training NLP. Unsupervised means that BERT was trained using only a plain text corpus, which is important because an enormous amount of plain text data is publicly available on the web in many languages. Pre trained representations can also either be context free or contextual , and contextual representations can further be unidirectional or bidirectional . Context free models such as word2vec or GloVe generate a single word embedding representation for each word in the vocabulary, so bank would have the same representation in bank deposit and river bank . Contextual models instead generate a representation of each word that is based on the other words in the sentence. BERT was built upon recent work in pre training contextual representations — including Semi supervised Sequence Learning , Generative Pre Training , ELMo , and ULMFit — but crucially these models are all unidirectional or shallowly bidirectional . This means that each word is only contextualized using the words to its left (or right). For example, in the sentence I made a bank deposit the unidirectional representation of bank is only based on I made a but not deposit . Some previous work does combine the representations from separate left context and right context models, but only in a shallow manner. BERT represents bank using both its left and right context — I made a ... deposit — starting from the very bottom of a deep neural network, so it is deeply bidirectional . BERT uses a simple approach for this: We mask out 15% of the words in the input, run the entire sequence through a deep bidirectional Transformer encoder, and then predict only the masked words. For example: Input: the man went to the MASK1 . he bought a MASK2 of milk. Labels: MASK1 store; MASK2 gallon In order to learn relationships between sentences, we also train on a simple task which can be generated from any monolingual corpus: Given two sentences A and B , is B the actual next sentence that comes after A , or just a random sentence from the corpus? Sentence A: the man went to the store . Sentence B: he bought a gallon of milk . Label: IsNextSentence Sentence A: the man went to the store . Sentence B: penguins are flightless . Label: NotNextSentence We then train a large model (12 layer to 24 layer Transformer) on a large corpus (Wikipedia + BookCorpus ) for a long time (1M update steps), and that's BERT. Using BERT has two stages: Pre training and fine tuning . Pre training is fairly expensive (four days on 4 to 16 Cloud TPUs), but is a one time procedure for each language (current models are English only, but multilingual models will be released in the near future). We are releasing a number of pre trained models from the paper which were pre trained at Google. Most NLP researchers will never need to pre train their own model from scratch. Fine tuning is inexpensive. All of the results in the paper can be replicated in at most 1 hour on a single Cloud TPU, or a few hours on a GPU, starting from the exact same pre trained model. SQuAD, for example, can be trained in around 30 minutes on a single Cloud TPU to achieve a Dev F1 score of 91.0%, which is the single system state of the art. The other important aspect of BERT is that it can be adapted to many types of NLP tasks very easily. In the paper, we demonstrate state of the art results on sentence level (e.g., SST 2), sentence pair level (e.g., MultiNLI), word level (e.g., NER), and span level (e.g., SQuAD) tasks with almost no task specific modifications. What has been released in this repository? We are releasing the following: TensorFlow code for the BERT model architecture (which is mostly a standard Transformer architecture). Pre trained checkpoints for both the lowercase and cased version of BERT Base and BERT Large from the paper. TensorFlow code for push button replication of the most important fine tuning experiments from the paper, including SQuAD, MultiNLI, and MRPC. All of the code in this repository works out of the box with CPU, GPU, and Cloud TPU. Pre trained models We are releasing the BERT Base and BERT Large models from the paper. Uncased means that the text has been lowercased before WordPiece tokenization, e.g., John Smith becomes john smith . The Uncased model also strips out any accent markers. Cased means that the true case and accent markers are preserved. Typically, the Uncased model is better unless you know that case information is important for your task (e.g., Named Entity Recognition or Part of Speech tagging). These models are all released under the same license as the source code (Apache 2.0). For information about the Multilingual and Chinese model, see the Multilingual README . The links to the models are here (right click, 'Save link as...' on the name): BERT Base, Uncased : 12 layer, 768 hidden, 12 heads, 110M parameters BERT Large, Uncased : 24 layer, 1024 hidden, 16 heads, 340M parameters BERT Base, Cased : 12 layer, 768 hidden, 12 heads , 110M parameters BERT Large, Cased : 24 layer, 1024 hidden, 16 heads, 340M parameters (Not available yet. Needs to be re generated). BERT Base, Multilingual : 102 languages, 12 layer, 768 hidden, 12 heads, 110M parameters BERT Base, Chinese : Chinese Simplified and Traditional, 12 layer, 768 hidden, 12 heads, 110M parameters Each .zip file contains three items: A TensorFlow checkpoint ( bert_model.ckpt ) containing the pre trained weights (which is actually 3 files). A vocab file ( vocab.txt ) to map WordPiece to word id. A config file ( bert_config.json ) which specifies the hyperparameters of the model. Fine tuning with BERT Important : All results on the paper were fine tuned on a single Cloud TPU, which has 64GB of RAM. It is currently not possible to re produce most of the BERT Large results on the paper using a GPU with 12GB 16GB of RAM, because the maximum batch size that can fit in memory is too small. We are working on adding code to this repository which allows for much larger effective batch size on the GPU. See the section on out of memory issues ( out of memory issues) for more details. This code was tested with TensorFlow 1.11.0. It was tested with Python2 and Python3 (but more thoroughly with Python2, since this is what's used internally in Google). The fine tuning examples which use BERT Base should be able to run on a GPU that has at least 12GB of RAM using the hyperparameters given. Fine tuning with Cloud TPUs Most of the examples below assumes that you will be running training/evaluation on your local machine, using a GPU like a Titan X or GTX 1080. However, if you have access to a Cloud TPU that you want to train on, just add the following flags to run_classifier.py or run_squad.py : use_tpu True \ tpu_name $TPU_NAME Please see the Google Cloud TPU tutorial for how to use Cloud TPUs. Alternatively, you can use the Google Colab notebook BERT FineTuning with Cloud TPUs . On Cloud TPUs, the pretrained model and the output directory will need to be on Google Cloud Storage. For example, if you have a bucket named some_bucket , you might use the following flags instead: output_dir gs://some_bucket/my_output_dir/ The unzipped pre trained model files can also be found in the Google Cloud Storage folder gs://bert_models/2018_10_18 . For example: export BERT_BASE_DIR gs://bert_models/2018_10_18/uncased_L 12_H 768_A 12 Sentence (and sentence pair) classification tasks Before running this example you must download the GLUE data by running this script and unpack it to some directory $GLUE_DIR . Next, download the BERT Base checkpoint and unzip it to some directory $BERT_BASE_DIR . This example code fine tunes BERT Base on the Microsoft Research Paraphrase Corpus (MRPC) corpus, which only contains 3,600 examples and can fine tune in a few minutes on most GPUs. shell export BERT_BASE_DIR /path/to/bert/uncased_L 12_H 768_A 12 export GLUE_DIR /path/to/glue python run_classifier.py \ task_name MRPC \ do_train true \ do_eval true \ data_dir $GLUE_DIR/MRPC \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ max_seq_length 128 \ train_batch_size 32 \ learning_rate 2e 5 \ num_train_epochs 3.0 \ output_dir /tmp/mrpc_output/ You should see output like this: Eval results eval_accuracy 0.845588 eval_loss 0.505248 global_step 343 loss 0.505248 This means that the Dev set accuracy was 84.55%. Small sets like MRPC have a high variance in the Dev set accuracy, even when starting from the same pre training checkpoint. If you re run multiple times (making sure to point to different output_dir ), you should see results between 84% and 88%. A few other pre trained models are implemented off the shelf in run_classifier.py , so it should be straightforward to follow those examples to use BERT for any single sentence or sentence pair classification task. Note: You might see a message Running train on CPU . This really just means that it's running on something other than a Cloud TPU, which includes a GPU. Prediction from classifier Once you have trained your classifier you can use it in inference mode by using the do_predict true command. You need to have a file named test.tsv in the input folder. Output will be created in file called test_results.tsv in the output folder. Each line will contain output for each sample, columns are the class probabilities. shell export BERT_BASE_DIR /path/to/bert/uncased_L 12_H 768_A 12 export GLUE_DIR /path/to/glue export TRAINED_CLASSIFIER /path/to/fine/tuned/classifier python run_classifier.py \ task_name MRPC \ do_predict true \ data_dir $GLUE_DIR/MRPC \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $TRAINED_CLASSIFIER \ max_seq_length 128 \ output_dir /tmp/mrpc_output/ SQuAD 1.1 The Stanford Question Answering Dataset (SQuAD) is a popular question answering benchmark dataset. BERT (at the time of the release) obtains state of the art results on SQuAD with almost no task specific network architecture modifications or data augmentation. However, it does require semi complex data pre processing and post processing to deal with (a) the variable length nature of SQuAD context paragraphs, and (b) the character level answer annotations which are used for SQuAD training. This processing is implemented and documented in run_squad.py . To run on SQuAD, you will first need to download the dataset. The SQuAD website does not seem to link to the v1.1 datasets any longer, but the necessary files can be found here: train v1.1.json dev v1.1.json evaluate v1.1.py Download these to some directory $SQUAD_DIR . The state of the art SQuAD results from the paper currently cannot be reproduced on a 12GB 16GB GPU due to memory constraints (in fact, even batch size 1 does not seem to fit on a 12GB GPU using BERT Large ). However, a reasonably strong BERT Base model can be trained on the GPU with these hyperparameters: shell python run_squad.py \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ do_train True \ train_file $SQUAD_DIR/train v1.1.json \ do_predict True \ predict_file $SQUAD_DIR/dev v1.1.json \ train_batch_size 12 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir /tmp/squad_base/ The dev set predictions will be saved into a file called predictions.json in the output_dir : shell python $SQUAD_DIR/evaluate v1.1.py $SQUAD_DIR/dev v1.1.json ./squad/predictions.json Which should produce an output like this: shell { f1 : 88.41249612335034, exact_match : 81.2488174077578} You should see a result similar to the 88.5% reported in the paper for BERT Base . If you have access to a Cloud TPU, you can train with BERT Large . Here is a set of hyperparameters (slightly different than the paper) which consistently obtain around 90.5% 91.0% F1 single system trained only on SQuAD: shell python run_squad.py \ vocab_file $BERT_LARGE_DIR/vocab.txt \ bert_config_file $BERT_LARGE_DIR/bert_config.json \ init_checkpoint $BERT_LARGE_DIR/bert_model.ckpt \ do_train True \ train_file $SQUAD_DIR/train v1.1.json \ do_predict True \ predict_file $SQUAD_DIR/dev v1.1.json \ train_batch_size 24 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir gs://some_bucket/squad_large/ \ use_tpu True \ tpu_name $TPU_NAME For example, one random run with these parameters produces the following Dev scores: shell { f1 : 90.87081895814865, exact_match : 84.38978240302744} If you fine tune for one epoch on TriviaQA before this the results will be even better, but you will need to convert TriviaQA into the SQuAD json format. SQuAD 2.0 This model is also implemented and documented in run_squad.py . To run on SQuAD 2.0, you will first need to download the dataset. The necessary files can be found here: train v2.0.json dev v2.0.json evaluate v2.0.py Download these to some directory $SQUAD_DIR . On Cloud TPU you can run with BERT Large as follows: shell python run_squad.py \ vocab_file $BERT_LARGE_DIR/vocab.txt \ bert_config_file $BERT_LARGE_DIR/bert_config.json \ init_checkpoint $BERT_LARGE_DIR/bert_model.ckpt \ do_train True \ train_file $SQUAD_DIR/train v2.0.json \ do_predict True \ predict_file $SQUAD_DIR/dev v2.0.json \ train_batch_size 24 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir gs://some_bucket/squad_large/ \ use_tpu True \ tpu_name $TPU_NAME \ version_2_with_negative True We assume you have copied everything from the output directory to a local directory called ./squad/. The initial dev set predictions will be at ./squad/predictions.json and the differences between the score of no answer ( ) and the best non null answer for each question will be in the file ./squad/null_odds.json Run this script to tune a threshold for predicting null versus non null answers: python $SQUAD_DIR/evaluate v2.0.py $SQUAD_DIR/dev v2.0.json ./squad/predictions.json na prob file ./squad/null_odds.json Assume the script outputs best_f1_thresh THRESH. (Typical values are between 1.0 and 5.0). You can now re run the model to generate predictions with the derived threshold or alternatively you can extract the appropriate answers from ./squad/nbest_predictions.json. shell python run_squad.py \ vocab_file $BERT_LARGE_DIR/vocab.txt \ bert_config_file $BERT_LARGE_DIR/bert_config.json \ init_checkpoint $BERT_LARGE_DIR/bert_model.ckpt \ do_train False \ train_file $SQUAD_DIR/train v2.0.json \ do_predict True \ predict_file $SQUAD_DIR/dev v2.0.json \ train_batch_size 24 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir gs://some_bucket/squad_large/ \ use_tpu True \ tpu_name $TPU_NAME \ version_2_with_negative True \ null_score_diff_threshold $THRESH Out of memory issues All experiments in the paper were fine tuned on a Cloud TPU, which has 64GB of device RAM. Therefore, when using a GPU with 12GB 16GB of RAM, you are likely to encounter out of memory issues if you use the same hyperparameters described in the paper. The factors that affect memory usage are: max_seq_length : The released models were trained with sequence lengths up to 512, but you can fine tune with a shorter max sequence length to save substantial memory. This is controlled by the max_seq_length flag in our example code. train_batch_size : The memory usage is also directly proportional to the batch size. Model type, BERT Base vs. BERT Large : The BERT Large model requires significantly more memory than BERT Base . Optimizer : The default optimizer for BERT is Adam, which requires a lot of extra memory to store the m and v vectors. Switching to a more memory efficient optimizer can reduce memory usage, but can also affect the results. We have not experimented with other optimizers for fine tuning. Using the default training scripts ( run_classifier.py and run_squad.py ), we benchmarked the maximum batch size on single Titan X GPU (12GB RAM) with TensorFlow 1.11.0: System Seq Length Max Batch Size BERT Base 64 64 ... 128 32 ... 256 16 ... 320 14 ... 384 12 ... 512 6 BERT Large 64 12 ... 128 6 ... 256 2 ... 320 1 ... 384 0 ... 512 0 Unfortunately, these max batch sizes for BERT Large are so small that they will actually harm the model accuracy, regardless of the learning rate used. We are working on adding code to this repository which will allow much larger effective batch sizes to be used on the GPU. The code will be based on one (or both) of the following techniques: Gradient accumulation : The samples in a minibatch are typically independent with respect to gradient computation (excluding batch normalization, which is not used here). This means that the gradients of multiple smaller minibatches can be accumulated before performing the weight update, and this will be exactly equivalent to a single larger update. Gradient checkpointing : The major use of GPU/TPU memory during DNN training is caching the intermediate activations in the forward pass that are necessary for efficient computation in the backward pass. Gradient checkpointing trades memory for compute time by re computing the activations in an intelligent way. However, this is not implemented in the current release. Using BERT to extract fixed feature vectors (like ELMo) In certain cases, rather than fine tuning the entire pre trained model end to end, it can be beneficial to obtained pre trained contextual embeddings , which are fixed contextual representations of each input token generated from the hidden layers of the pre trained model. This should also mitigate most of the out of memory issues. As an example, we include the script extract_features.py which can be used like this: shell Sentence A and Sentence B are separated by the delimiter for sentence pair tasks like question answering and entailment. For single sentence inputs, put one sentence per line and DON'T use the delimiter. echo 'Who was Jim Henson ? Jim Henson was a puppeteer' > /tmp/input.txt python extract_features.py \ input_file /tmp/input.txt \ output_file /tmp/output.jsonl \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ layers 1, 2, 3, 4 \ max_seq_length 128 \ batch_size 8 This will create a JSON file (one line per line of input) containing the BERT activations from each Transformer layer specified by layers ( 1 is the final hidden layer of the Transformer, etc.) Note that this script will produce very large output files (by default, around 15kb for every input token). If you need to maintain alignment between the original and tokenized words (for projecting training labels), see the Tokenization ( tokenization) section below. Note: You may see a message like Could not find trained model in model_dir: /tmp/tmpuB5g5c, running initialization to predict. This message is expected, it just means that we are using the init_from_checkpoint() API rather than the saved model API. If you don't specify a checkpoint or specify an invalid checkpoint, this script will complain. Tokenization For sentence level tasks (or sentence pair) tasks, tokenization is very simple. Just follow the example code in run_classifier.py and extract_features.py . The basic procedure for sentence level tasks is: 1. Instantiate an instance of tokenizer tokenization.FullTokenizer 2. Tokenize the raw text with tokens tokenizer.tokenize(raw_text) . 3. Truncate to the maximum sequence length. (You can use up to 512, but you probably want to use shorter if possible for memory and speed reasons.) 4. Add the CLS and SEP tokens in the right place. Word level and span level tasks (e.g., SQuAD and NER) are more complex, since you need to maintain alignment between your input text and output text so that you can project your training labels. SQuAD is a particularly complex example because the input labels are character based, and SQuAD paragraphs are often longer than our maximum sequence length. See the code in run_squad.py to show how we handle this. Before we describe the general recipe for handling word level tasks, it's important to understand what exactly our tokenizer is doing. It has three main steps: 1. Text normalization : Convert all whitespace characters to spaces, and (for the Uncased model) lowercase the input and strip out accent markers. E.g., John Johanson's, → john johanson's, . 2. Punctuation splitting : Split all punctuation characters on both sides (i.e., add whitespace around all punctuation characters). Punctuation characters are defined as (a) Anything with a P Unicode class, (b) any non letter/number/space ASCII character (e.g., characters like $ which are technically not punctuation). E.g., john johanson's, → john johanson ' s , 3. WordPiece tokenization : Apply whitespace tokenization to the output of the above procedure, and apply WordPiece tokenization to each token separately. (Our implementation is directly based on the one from tensor2tensor , which is linked). E.g., john johanson ' s , → john johan son ' s , The advantage of this scheme is that it is compatible with most existing English tokenizers. For example, imagine that you have a part of speech tagging task which looks like this: Input: John Johanson 's house Labels: NNP NNP POS NN The tokenized output will look like this: Tokens: john johan son ' s house Crucially, this would be the same output as if the raw text were John Johanson's house (with no space before the 's ). If you have a pre tokenized representation with word level annotations, you can simply tokenize each input word independently, and deterministically maintain an original to tokenized alignment: python Input orig_tokens John , Johanson , 's , house labels NNP , NNP , POS , NN Output bert_tokens Token map will be an int > int mapping between the orig_tokens index and the bert_tokens index. orig_to_tok_map tokenizer tokenization.FullTokenizer( vocab_file vocab_file, do_lower_case True) bert_tokens.append( CLS ) for orig_token in orig_tokens: orig_to_tok_map.append(len(bert_tokens)) bert_tokens.extend(tokenizer.tokenize(orig_token)) bert_tokens.append( SEP ) bert_tokens CLS , john , johan , son , ' , s , house , SEP orig_to_tok_map 1, 2, 4, 6 Now orig_to_tok_map can be used to project labels to the tokenized representation. There are common English tokenization schemes which will cause a slight mismatch between how BERT was pre trained. For example, if your input tokenization splits off contractions like do n't , this will cause a mismatch. If it is possible to do so, you should pre process your data to convert these back to raw looking text, but if it's not possible, this mismatch is likely not a big deal. Pre training with BERT We are releasing code to do masked LM and next sentence prediction on an arbitrary text corpus. Note that this is not the exact code that was used for the paper (the original code was written in C++, and had some additional complexity), but this code does generate pre training data as described in the paper. Here's how to run the data generation. The input is a plain text file, with one sentence per line. (It is important that these be actual sentences for the next sentence prediction task). Documents are delimited by empty lines. The output is a set of tf.train.Example s serialized into TFRecord file format. You can perform sentence segmentation with an off the shelf NLP toolkit such as spaCy . The create_pretraining_data.py script will concatenate segments until they reach the maximum sequence length to minimize computational waste from padding (see the script for more details). However, you may want to intentionally add a slight amount of noise to your input data (e.g., randomly truncate 2% of input segments) to make it more robust to non sentential input during fine tuning. This script stores all of the examples for the entire input file in memory, so for large data files you should shard the input file and call the script multiple times. (You can pass in a file glob to run_pretraining.py , e.g., tf_examples.tf_record .) The max_predictions_per_seq is the maximum number of masked LM predictions per sequence. You should set this to around max_seq_length masked_lm_prob (the script doesn't do that automatically because the exact value needs to be passed to both scripts). shell python create_pretraining_data.py \ input_file ./sample_text.txt \ output_file /tmp/tf_examples.tfrecord \ vocab_file $BERT_BASE_DIR/vocab.txt \ do_lower_case True \ max_seq_length 128 \ max_predictions_per_seq 20 \ masked_lm_prob 0.15 \ random_seed 12345 \ dupe_factor 5 Here's how to run the pre training. Do not include init_checkpoint if you are pre training from scratch. The model configuration (including vocab size) is specified in bert_config_file . This demo code only pre trains for a small number of steps (20), but in practice you will probably want to set num_train_steps to 10000 steps or more. The max_seq_length and max_predictions_per_seq parameters passed to run_pretraining.py must be the same as create_pretraining_data.py . shell python run_pretraining.py \ input_file /tmp/tf_examples.tfrecord \ output_dir /tmp/pretraining_output \ do_train True \ do_eval True \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ train_batch_size 32 \ max_seq_length 128 \ max_predictions_per_seq 20 \ num_train_steps 20 \ num_warmup_steps 10 \ learning_rate 2e 5 This will produce an output like this: Eval results global_step 20 loss 0.0979674 masked_lm_accuracy 0.985479 masked_lm_loss 0.0979328 next_sentence_accuracy 1.0 next_sentence_loss 3.45724e 05 Note that since our sample_text.txt file is very small, this example training will overfit that data in only a few steps and produce unrealistically high accuracy numbers. Pre training tips and caveats If your task has a large domain specific corpus available (e.g., movie reviews or scientific papers ), it will likely be beneficial to run additional steps of pre training on your corpus, starting from the BERT checkpoint. The learning rate we used in the paper was 1e 4. However, if you are doing additional steps of pre training starting from an existing BERT checkpoint, you should use a smaller learning rate (e.g., 2e 5). Current BERT models are English only, but we do plan to release a multilingual model which has been pre trained on a lot of languages in the near future (hopefully by the end of November 2018). Longer sequences are disproportionately expensive because attention is quadratic to the sequence length. In other words, a batch of 64 sequences of length 512 is much more expensive than a batch of 256 sequences of length 128. The fully connected/convolutional cost is the same, but the attention cost is far greater for the 512 length sequences. Therefore, one good recipe is to pre train for, say, 90,000 steps with a sequence length of 128 and then for 10,000 additional steps with a sequence length of 512. The very long sequences are mostly needed to learn positional embeddings, which can be learned fairly quickly. Note that this does require generating the data twice with different values of max_seq_length . If you are pre training from scratch, be prepared that pre training is computationally expensive, especially on GPUs. If you are pre training from scratch, our recommended recipe is to pre train a BERT Base on a single preemptible Cloud TPU v2 , which takes about 2 weeks at a cost of about $500 USD (based on the pricing in October 2018). You will have to scale down the batch size when only training on a single Cloud TPU, compared to what was used in the paper. It is recommended to use the largest batch size that fits into TPU memory. Pre training data We will not be able to release the pre processed datasets used in the paper. For Wikipedia, the recommended pre processing is to download the latest dump , extract the text with WikiExtractor.py , and then apply any necessary cleanup to convert it into plain text. Unfortunately the researchers who collected the BookCorpus no longer have it available for public download. The Project Guttenberg Dataset is a somewhat smaller (200M word) collection of older books that are public domain. Common Crawl is another very large collection of text, but you will likely have to do substantial pre processing and cleanup to extract a usable corpus for pre training BERT. Learning a new WordPiece vocabulary This repository does not include code for learning a new WordPiece vocabulary. The reason is that the code used in the paper was implemented in C++ with dependencies on Google's internal libraries. For English, it is almost always better to just start with our vocabulary and pre trained models. For learning vocabularies of other languages, there are a number of open source options available. However, keep in mind that these are not compatible with our tokenization.py library: Google's SentencePiece library tensor2tensor's WordPiece generation script Rico Sennrich's Byte Pair Encoding library Using BERT in Colab If you want to use BERT with Colab , you can get started with the notebook BERT FineTuning with Cloud TPUs . At the time of this writing (October 31st, 2018), Colab users can access a Cloud TPU completely for free. Note: One per user, availability limited, requires a Google Cloud Platform account with storage (although storage may be purchased with free credit for signing up with GCP), and this capability may not longer be available in the future. Click on the BERT Colab that was just linked for more information. FAQ Is this code compatible with Cloud TPUs? What about GPUs? Yes, all of the code in this repository works out of the box with CPU, GPU, and Cloud TPU. However, GPU training is single GPU only. I am getting out of memory errors, what is wrong? See the section on out of memory issues ( out of memory issues) for more information. Is there a PyTorch version available? There is no official PyTorch implementation. However, NLP researchers from HuggingFace made a PyTorch version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. We were not involved in the creation or maintenance of the PyTorch implementation so please direct any questions towards the authors of that repository. Is there a Chainer version available? There is no official Chainer implementation. However, Sosuke Kobayashi made a Chainer version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. We were not involved in the creation or maintenance of the Chainer implementation so please direct any questions towards the authors of that repository. Will models in other languages be released? Yes, we plan to release a multi lingual BERT model in the near future. We cannot make promises about exactly which languages will be included, but it will likely be a single model which includes most of the languages which have a significantly sized Wikipedia. Will models larger than BERT Large be released? So far we have not attempted to train anything larger than BERT Large . It is possible that we will release larger models if we are able to obtain significant improvements. What license is this library released under? All code and models are released under the Apache 2.0 license. See the LICENSE file for more information. How do I cite BERT? For now, cite the Arxiv paper : @article{devlin2018bert, title {BERT: Pre training of Deep Bidirectional Transformers for Language Understanding}, author {Devlin, Jacob and Chang, Ming Wei and Lee, Kenton and Toutanova, Kristina}, journal {arXiv preprint arXiv:1810.04805}, year {2018} } If we submit the paper to a conference or journal, we will update the BibTeX. Disclaimer This is not an official Google product. Contact information For help or issues using BERT, please submit a GitHub issue. For personal communication related to BERT, please contact Jacob Devlin ( jacobdevlin@google.com ), Ming Wei Chang ( mingweichang@google.com ), or Kenton Lee ( kentonl@google.com ).",Machine Translation,Machine Translation 2307,Natural Language Processing,Natural Language Processing,Natural Language Processing,"BERT \ \ \ \ \ New November 23rd, 2018: Un normalized multilingual model + Thai + Mongolian \ \ \ \ \ We uploaded a new multilingual model which does not perform any normalization on the input (no lower casing, accent stripping, or Unicode normalization), and additionally inclues Thai and Mongolian. It is recommended to use this version for developing multilingual models, especially on languages with non Latin alphabets. This does not require any code changes, and can be downloaded here: BERT Base, Multilingual Cased : 104 languages, 12 layer, 768 hidden, 12 heads, 110M parameters \ \ \ \ \ New November 15th, 2018: SOTA SQuAD 2.0 System \ \ \ \ \ We released code changes to reproduce our 83% F1 SQuAD 2.0 system, which is currently 1st place on the leaderboard by 3%. See the SQuAD 2.0 section of the README for details. \ \ \ \ \ New November 5th, 2018: Third party PyTorch and Chainer versions of BERT available \ \ \ \ \ NLP researchers from HuggingFace made a PyTorch version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. Sosuke Kobayashi also made a Chainer version of BERT available (Thanks!) We were not involved in the creation or maintenance of the PyTorch implementation so please direct any questions towards the authors of that repository. \ \ \ \ \ New November 3rd, 2018: Multilingual and Chinese models available \ \ \ \ \ We have made two new BERT models available: BERT Base, Multilingual (Not recommended, use Multilingual Cased instead) : 102 languages, 12 layer, 768 hidden, 12 heads, 110M parameters BERT Base, Chinese : Chinese Simplified and Traditional, 12 layer, 768 hidden, 12 heads, 110M parameters We use character based tokenization for Chinese, and WordPiece tokenization for all other languages. Both models should work out of the box without any code changes. We did update the implementation of BasicTokenizer in tokenization.py to support Chinese character tokenization, so please update if you forked it. However, we did not change the tokenization API. For more, see the Multilingual README . \ \ \ \ \ End new information \ \ \ \ \ Introduction BERT , or B idirectional E ncoder R epresentations from T ransformers, is a new method of pre training language representations which obtains state of the art results on a wide array of Natural Language Processing (NLP) tasks. Our academic paper which describes BERT in detail and provides full results on a number of tasks can be found here: To give a few numbers, here are the results on the SQuAD v1.1 question answering task: SQuAD v1.1 Leaderboard (Oct 8th 2018) Test EM Test F1 : : : : 1st Place Ensemble BERT 87.4 93.2 2nd Place Ensemble nlnet 86.0 91.7 1st Place Single Model BERT 85.1 91.8 2nd Place Single Model nlnet 83.5 90.1 And several natural language inference tasks: System MultiNLI Question NLI SWAG : : : : : : BERT 86.7 91.1 86.3 OpenAI GPT (Prev. SOTA) 82.2 88.1 75.0 Plus many other tasks. Moreover, these results were all obtained with almost no task specific neural network architecture design. If you already know what BERT is and you just want to get started, you can download the pre trained models ( pre trained models) and run a state of the art fine tuning ( fine tuning with bert) in only a few minutes. What is BERT? BERT is a method of pre training language representations, meaning that we train a general purpose language understanding model on a large text corpus (like Wikipedia), and then use that model for downstream NLP tasks that we care about (like question answering). BERT outperforms previous methods because it is the first unsupervised , deeply bidirectional system for pre training NLP. Unsupervised means that BERT was trained using only a plain text corpus, which is important because an enormous amount of plain text data is publicly available on the web in many languages. Pre trained representations can also either be context free or contextual , and contextual representations can further be unidirectional or bidirectional . Context free models such as word2vec or GloVe generate a single word embedding representation for each word in the vocabulary, so bank would have the same representation in bank deposit and river bank . Contextual models instead generate a representation of each word that is based on the other words in the sentence. BERT was built upon recent work in pre training contextual representations — including Semi supervised Sequence Learning , Generative Pre Training , ELMo , and ULMFit — but crucially these models are all unidirectional or shallowly bidirectional . This means that each word is only contextualized using the words to its left (or right). For example, in the sentence I made a bank deposit the unidirectional representation of bank is only based on I made a but not deposit . Some previous work does combine the representations from separate left context and right context models, but only in a shallow manner. BERT represents bank using both its left and right context — I made a ... deposit — starting from the very bottom of a deep neural network, so it is deeply bidirectional . BERT uses a simple approach for this: We mask out 15% of the words in the input, run the entire sequence through a deep bidirectional Transformer encoder, and then predict only the masked words. For example: Input: the man went to the MASK1 . he bought a MASK2 of milk. Labels: MASK1 store; MASK2 gallon In order to learn relationships between sentences, we also train on a simple task which can be generated from any monolingual corpus: Given two sentences A and B , is B the actual next sentence that comes after A , or just a random sentence from the corpus? Sentence A: the man went to the store . Sentence B: he bought a gallon of milk . Label: IsNextSentence Sentence A: the man went to the store . Sentence B: penguins are flightless . Label: NotNextSentence We then train a large model (12 layer to 24 layer Transformer) on a large corpus (Wikipedia + BookCorpus ) for a long time (1M update steps), and that's BERT. Using BERT has two stages: Pre training and fine tuning . Pre training is fairly expensive (four days on 4 to 16 Cloud TPUs), but is a one time procedure for each language (current models are English only, but multilingual models will be released in the near future). We are releasing a number of pre trained models from the paper which were pre trained at Google. Most NLP researchers will never need to pre train their own model from scratch. Fine tuning is inexpensive. All of the results in the paper can be replicated in at most 1 hour on a single Cloud TPU, or a few hours on a GPU, starting from the exact same pre trained model. SQuAD, for example, can be trained in around 30 minutes on a single Cloud TPU to achieve a Dev F1 score of 91.0%, which is the single system state of the art. The other important aspect of BERT is that it can be adapted to many types of NLP tasks very easily. In the paper, we demonstrate state of the art results on sentence level (e.g., SST 2), sentence pair level (e.g., MultiNLI), word level (e.g., NER), and span level (e.g., SQuAD) tasks with almost no task specific modifications. What has been released in this repository? We are releasing the following: TensorFlow code for the BERT model architecture (which is mostly a standard Transformer architecture). Pre trained checkpoints for both the lowercase and cased version of BERT Base and BERT Large from the paper. TensorFlow code for push button replication of the most important fine tuning experiments from the paper, including SQuAD, MultiNLI, and MRPC. All of the code in this repository works out of the box with CPU, GPU, and Cloud TPU. Pre trained models We are releasing the BERT Base and BERT Large models from the paper. Uncased means that the text has been lowercased before WordPiece tokenization, e.g., John Smith becomes john smith . The Uncased model also strips out any accent markers. Cased means that the true case and accent markers are preserved. Typically, the Uncased model is better unless you know that case information is important for your task (e.g., Named Entity Recognition or Part of Speech tagging). These models are all released under the same license as the source code (Apache 2.0). For information about the Multilingual and Chinese model, see the Multilingual README . When using a cased model, make sure to pass do_lower False to the training scripts. (Or pass do_lower_case False directly to FullTokenizer if you're using your own script.) The links to the models are here (right click, 'Save link as...' on the name): BERT Base, Uncased : 12 layer, 768 hidden, 12 heads, 110M parameters BERT Large, Uncased : 24 layer, 1024 hidden, 16 heads, 340M parameters BERT Base, Cased : 12 layer, 768 hidden, 12 heads , 110M parameters BERT Large, Cased : 24 layer, 1024 hidden, 16 heads, 340M parameters BERT Base, Multilingual Cased (New, recommended) : 104 languages, 12 layer, 768 hidden, 12 heads, 110M parameters BERT Base, Multilingual Uncased (Orig, not recommended) (Not recommended, use Multilingual Cased instead) : 102 languages, 12 layer, 768 hidden, 12 heads, 110M parameters BERT Base, Chinese : Chinese Simplified and Traditional, 12 layer, 768 hidden, 12 heads, 110M parameters Each .zip file contains three items: A TensorFlow checkpoint ( bert_model.ckpt ) containing the pre trained weights (which is actually 3 files). A vocab file ( vocab.txt ) to map WordPiece to word id. A config file ( bert_config.json ) which specifies the hyperparameters of the model. Fine tuning with BERT Important : All results on the paper were fine tuned on a single Cloud TPU, which has 64GB of RAM. It is currently not possible to re produce most of the BERT Large results on the paper using a GPU with 12GB 16GB of RAM, because the maximum batch size that can fit in memory is too small. We are working on adding code to this repository which allows for much larger effective batch size on the GPU. See the section on out of memory issues ( out of memory issues) for more details. This code was tested with TensorFlow 1.11.0. It was tested with Python2 and Python3 (but more thoroughly with Python2, since this is what's used internally in Google). The fine tuning examples which use BERT Base should be able to run on a GPU that has at least 12GB of RAM using the hyperparameters given. Fine tuning with Cloud TPUs Most of the examples below assumes that you will be running training/evaluation on your local machine, using a GPU like a Titan X or GTX 1080. However, if you have access to a Cloud TPU that you want to train on, just add the following flags to run_classifier.py or run_squad.py : use_tpu True \ tpu_name $TPU_NAME Please see the Google Cloud TPU tutorial for how to use Cloud TPUs. Alternatively, you can use the Google Colab notebook BERT FineTuning with Cloud TPUs . On Cloud TPUs, the pretrained model and the output directory will need to be on Google Cloud Storage. For example, if you have a bucket named some_bucket , you might use the following flags instead: output_dir gs://some_bucket/my_output_dir/ The unzipped pre trained model files can also be found in the Google Cloud Storage folder gs://bert_models/2018_10_18 . For example: export BERT_BASE_DIR gs://bert_models/2018_10_18/uncased_L 12_H 768_A 12 Sentence (and sentence pair) classification tasks Before running this example you must download the GLUE data by running this script and unpack it to some directory $GLUE_DIR . Next, download the BERT Base checkpoint and unzip it to some directory $BERT_BASE_DIR . This example code fine tunes BERT Base on the Microsoft Research Paraphrase Corpus (MRPC) corpus, which only contains 3,600 examples and can fine tune in a few minutes on most GPUs. shell export BERT_BASE_DIR /path/to/bert/uncased_L 12_H 768_A 12 export GLUE_DIR /path/to/glue python run_classifier.py \ task_name MRPC \ do_train true \ do_eval true \ data_dir $GLUE_DIR/MRPC \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ max_seq_length 128 \ train_batch_size 32 \ learning_rate 2e 5 \ num_train_epochs 3.0 \ output_dir /tmp/mrpc_output/ You should see output like this: Eval results eval_accuracy 0.845588 eval_loss 0.505248 global_step 343 loss 0.505248 This means that the Dev set accuracy was 84.55%. Small sets like MRPC have a high variance in the Dev set accuracy, even when starting from the same pre training checkpoint. If you re run multiple times (making sure to point to different output_dir ), you should see results between 84% and 88%. A few other pre trained models are implemented off the shelf in run_classifier.py , so it should be straightforward to follow those examples to use BERT for any single sentence or sentence pair classification task. Note: You might see a message Running train on CPU . This really just means that it's running on something other than a Cloud TPU, which includes a GPU. Prediction from classifier Once you have trained your classifier you can use it in inference mode by using the do_predict true command. You need to have a file named test.tsv in the input folder. Output will be created in file called test_results.tsv in the output folder. Each line will contain output for each sample, columns are the class probabilities. shell export BERT_BASE_DIR /path/to/bert/uncased_L 12_H 768_A 12 export GLUE_DIR /path/to/glue export TRAINED_CLASSIFIER /path/to/fine/tuned/classifier python run_classifier.py \ task_name MRPC \ do_predict true \ data_dir $GLUE_DIR/MRPC \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $TRAINED_CLASSIFIER \ max_seq_length 128 \ output_dir /tmp/mrpc_output/ SQuAD 1.1 The Stanford Question Answering Dataset (SQuAD) is a popular question answering benchmark dataset. BERT (at the time of the release) obtains state of the art results on SQuAD with almost no task specific network architecture modifications or data augmentation. However, it does require semi complex data pre processing and post processing to deal with (a) the variable length nature of SQuAD context paragraphs, and (b) the character level answer annotations which are used for SQuAD training. This processing is implemented and documented in run_squad.py . To run on SQuAD, you will first need to download the dataset. The SQuAD website does not seem to link to the v1.1 datasets any longer, but the necessary files can be found here: train v1.1.json dev v1.1.json evaluate v1.1.py Download these to some directory $SQUAD_DIR . The state of the art SQuAD results from the paper currently cannot be reproduced on a 12GB 16GB GPU due to memory constraints (in fact, even batch size 1 does not seem to fit on a 12GB GPU using BERT Large ). However, a reasonably strong BERT Base model can be trained on the GPU with these hyperparameters: shell python run_squad.py \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ do_train True \ train_file $SQUAD_DIR/train v1.1.json \ do_predict True \ predict_file $SQUAD_DIR/dev v1.1.json \ train_batch_size 12 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir /tmp/squad_base/ The dev set predictions will be saved into a file called predictions.json in the output_dir : shell python $SQUAD_DIR/evaluate v1.1.py $SQUAD_DIR/dev v1.1.json ./squad/predictions.json Which should produce an output like this: shell { f1 : 88.41249612335034, exact_match : 81.2488174077578} You should see a result similar to the 88.5% reported in the paper for BERT Base . If you have access to a Cloud TPU, you can train with BERT Large . Here is a set of hyperparameters (slightly different than the paper) which consistently obtain around 90.5% 91.0% F1 single system trained only on SQuAD: shell python run_squad.py \ vocab_file $BERT_LARGE_DIR/vocab.txt \ bert_config_file $BERT_LARGE_DIR/bert_config.json \ init_checkpoint $BERT_LARGE_DIR/bert_model.ckpt \ do_train True \ train_file $SQUAD_DIR/train v1.1.json \ do_predict True \ predict_file $SQUAD_DIR/dev v1.1.json \ train_batch_size 24 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir gs://some_bucket/squad_large/ \ use_tpu True \ tpu_name $TPU_NAME For example, one random run with these parameters produces the following Dev scores: shell { f1 : 90.87081895814865, exact_match : 84.38978240302744} If you fine tune for one epoch on TriviaQA before this the results will be even better, but you will need to convert TriviaQA into the SQuAD json format. SQuAD 2.0 This model is also implemented and documented in run_squad.py . To run on SQuAD 2.0, you will first need to download the dataset. The necessary files can be found here: train v2.0.json dev v2.0.json evaluate v2.0.py Download these to some directory $SQUAD_DIR . On Cloud TPU you can run with BERT Large as follows: shell python run_squad.py \ vocab_file $BERT_LARGE_DIR/vocab.txt \ bert_config_file $BERT_LARGE_DIR/bert_config.json \ init_checkpoint $BERT_LARGE_DIR/bert_model.ckpt \ do_train True \ train_file $SQUAD_DIR/train v2.0.json \ do_predict True \ predict_file $SQUAD_DIR/dev v2.0.json \ train_batch_size 24 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir gs://some_bucket/squad_large/ \ use_tpu True \ tpu_name $TPU_NAME \ version_2_with_negative True We assume you have copied everything from the output directory to a local directory called ./squad/. The initial dev set predictions will be at ./squad/predictions.json and the differences between the score of no answer ( ) and the best non null answer for each question will be in the file ./squad/null_odds.json Run this script to tune a threshold for predicting null versus non null answers: python $SQUAD_DIR/evaluate v2.0.py $SQUAD_DIR/dev v2.0.json ./squad/predictions.json na prob file ./squad/null_odds.json Assume the script outputs best_f1_thresh THRESH. (Typical values are between 1.0 and 5.0). You can now re run the model to generate predictions with the derived threshold or alternatively you can extract the appropriate answers from ./squad/nbest_predictions.json. shell python run_squad.py \ vocab_file $BERT_LARGE_DIR/vocab.txt \ bert_config_file $BERT_LARGE_DIR/bert_config.json \ init_checkpoint $BERT_LARGE_DIR/bert_model.ckpt \ do_train False \ train_file $SQUAD_DIR/train v2.0.json \ do_predict True \ predict_file $SQUAD_DIR/dev v2.0.json \ train_batch_size 24 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir gs://some_bucket/squad_large/ \ use_tpu True \ tpu_name $TPU_NAME \ version_2_with_negative True \ null_score_diff_threshold $THRESH Out of memory issues All experiments in the paper were fine tuned on a Cloud TPU, which has 64GB of device RAM. Therefore, when using a GPU with 12GB 16GB of RAM, you are likely to encounter out of memory issues if you use the same hyperparameters described in the paper. The factors that affect memory usage are: max_seq_length : The released models were trained with sequence lengths up to 512, but you can fine tune with a shorter max sequence length to save substantial memory. This is controlled by the max_seq_length flag in our example code. train_batch_size : The memory usage is also directly proportional to the batch size. Model type, BERT Base vs. BERT Large : The BERT Large model requires significantly more memory than BERT Base . Optimizer : The default optimizer for BERT is Adam, which requires a lot of extra memory to store the m and v vectors. Switching to a more memory efficient optimizer can reduce memory usage, but can also affect the results. We have not experimented with other optimizers for fine tuning. Using the default training scripts ( run_classifier.py and run_squad.py ), we benchmarked the maximum batch size on single Titan X GPU (12GB RAM) with TensorFlow 1.11.0: System Seq Length Max Batch Size BERT Base 64 64 ... 128 32 ... 256 16 ... 320 14 ... 384 12 ... 512 6 BERT Large 64 12 ... 128 6 ... 256 2 ... 320 1 ... 384 0 ... 512 0 Unfortunately, these max batch sizes for BERT Large are so small that they will actually harm the model accuracy, regardless of the learning rate used. We are working on adding code to this repository which will allow much larger effective batch sizes to be used on the GPU. The code will be based on one (or both) of the following techniques: Gradient accumulation : The samples in a minibatch are typically independent with respect to gradient computation (excluding batch normalization, which is not used here). This means that the gradients of multiple smaller minibatches can be accumulated before performing the weight update, and this will be exactly equivalent to a single larger update. Gradient checkpointing : The major use of GPU/TPU memory during DNN training is caching the intermediate activations in the forward pass that are necessary for efficient computation in the backward pass. Gradient checkpointing trades memory for compute time by re computing the activations in an intelligent way. However, this is not implemented in the current release. Using BERT to extract fixed feature vectors (like ELMo) In certain cases, rather than fine tuning the entire pre trained model end to end, it can be beneficial to obtained pre trained contextual embeddings , which are fixed contextual representations of each input token generated from the hidden layers of the pre trained model. This should also mitigate most of the out of memory issues. As an example, we include the script extract_features.py which can be used like this: shell Sentence A and Sentence B are separated by the delimiter for sentence pair tasks like question answering and entailment. For single sentence inputs, put one sentence per line and DON'T use the delimiter. echo 'Who was Jim Henson ? Jim Henson was a puppeteer' > /tmp/input.txt python extract_features.py \ input_file /tmp/input.txt \ output_file /tmp/output.jsonl \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ layers 1, 2, 3, 4 \ max_seq_length 128 \ batch_size 8 This will create a JSON file (one line per line of input) containing the BERT activations from each Transformer layer specified by layers ( 1 is the final hidden layer of the Transformer, etc.) Note that this script will produce very large output files (by default, around 15kb for every input token). If you need to maintain alignment between the original and tokenized words (for projecting training labels), see the Tokenization ( tokenization) section below. Note: You may see a message like Could not find trained model in model_dir: /tmp/tmpuB5g5c, running initialization to predict. This message is expected, it just means that we are using the init_from_checkpoint() API rather than the saved model API. If you don't specify a checkpoint or specify an invalid checkpoint, this script will complain. Tokenization For sentence level tasks (or sentence pair) tasks, tokenization is very simple. Just follow the example code in run_classifier.py and extract_features.py . The basic procedure for sentence level tasks is: 1. Instantiate an instance of tokenizer tokenization.FullTokenizer 2. Tokenize the raw text with tokens tokenizer.tokenize(raw_text) . 3. Truncate to the maximum sequence length. (You can use up to 512, but you probably want to use shorter if possible for memory and speed reasons.) 4. Add the CLS and SEP tokens in the right place. Word level and span level tasks (e.g., SQuAD and NER) are more complex, since you need to maintain alignment between your input text and output text so that you can project your training labels. SQuAD is a particularly complex example because the input labels are character based, and SQuAD paragraphs are often longer than our maximum sequence length. See the code in run_squad.py to show how we handle this. Before we describe the general recipe for handling word level tasks, it's important to understand what exactly our tokenizer is doing. It has three main steps: 1. Text normalization : Convert all whitespace characters to spaces, and (for the Uncased model) lowercase the input and strip out accent markers. E.g., John Johanson's, → john johanson's, . 2. Punctuation splitting : Split all punctuation characters on both sides (i.e., add whitespace around all punctuation characters). Punctuation characters are defined as (a) Anything with a P Unicode class, (b) any non letter/number/space ASCII character (e.g., characters like $ which are technically not punctuation). E.g., john johanson's, → john johanson ' s , 3. WordPiece tokenization : Apply whitespace tokenization to the output of the above procedure, and apply WordPiece tokenization to each token separately. (Our implementation is directly based on the one from tensor2tensor , which is linked). E.g., john johanson ' s , → john johan son ' s , The advantage of this scheme is that it is compatible with most existing English tokenizers. For example, imagine that you have a part of speech tagging task which looks like this: Input: John Johanson 's house Labels: NNP NNP POS NN The tokenized output will look like this: Tokens: john johan son ' s house Crucially, this would be the same output as if the raw text were John Johanson's house (with no space before the 's ). If you have a pre tokenized representation with word level annotations, you can simply tokenize each input word independently, and deterministically maintain an original to tokenized alignment: python Input orig_tokens John , Johanson , 's , house labels NNP , NNP , POS , NN Output bert_tokens Token map will be an int > int mapping between the orig_tokens index and the bert_tokens index. orig_to_tok_map tokenizer tokenization.FullTokenizer( vocab_file vocab_file, do_lower_case True) bert_tokens.append( CLS ) for orig_token in orig_tokens: orig_to_tok_map.append(len(bert_tokens)) bert_tokens.extend(tokenizer.tokenize(orig_token)) bert_tokens.append( SEP ) bert_tokens CLS , john , johan , son , ' , s , house , SEP orig_to_tok_map 1, 2, 4, 6 Now orig_to_tok_map can be used to project labels to the tokenized representation. There are common English tokenization schemes which will cause a slight mismatch between how BERT was pre trained. For example, if your input tokenization splits off contractions like do n't , this will cause a mismatch. If it is possible to do so, you should pre process your data to convert these back to raw looking text, but if it's not possible, this mismatch is likely not a big deal. Pre training with BERT We are releasing code to do masked LM and next sentence prediction on an arbitrary text corpus. Note that this is not the exact code that was used for the paper (the original code was written in C++, and had some additional complexity), but this code does generate pre training data as described in the paper. Here's how to run the data generation. The input is a plain text file, with one sentence per line. (It is important that these be actual sentences for the next sentence prediction task). Documents are delimited by empty lines. The output is a set of tf.train.Example s serialized into TFRecord file format. You can perform sentence segmentation with an off the shelf NLP toolkit such as spaCy . The create_pretraining_data.py script will concatenate segments until they reach the maximum sequence length to minimize computational waste from padding (see the script for more details). However, you may want to intentionally add a slight amount of noise to your input data (e.g., randomly truncate 2% of input segments) to make it more robust to non sentential input during fine tuning. This script stores all of the examples for the entire input file in memory, so for large data files you should shard the input file and call the script multiple times. (You can pass in a file glob to run_pretraining.py , e.g., tf_examples.tf_record .) The max_predictions_per_seq is the maximum number of masked LM predictions per sequence. You should set this to around max_seq_length masked_lm_prob (the script doesn't do that automatically because the exact value needs to be passed to both scripts). shell python create_pretraining_data.py \ input_file ./sample_text.txt \ output_file /tmp/tf_examples.tfrecord \ vocab_file $BERT_BASE_DIR/vocab.txt \ do_lower_case True \ max_seq_length 128 \ max_predictions_per_seq 20 \ masked_lm_prob 0.15 \ random_seed 12345 \ dupe_factor 5 Here's how to run the pre training. Do not include init_checkpoint if you are pre training from scratch. The model configuration (including vocab size) is specified in bert_config_file . This demo code only pre trains for a small number of steps (20), but in practice you will probably want to set num_train_steps to 10000 steps or more. The max_seq_length and max_predictions_per_seq parameters passed to run_pretraining.py must be the same as create_pretraining_data.py . shell python run_pretraining.py \ input_file /tmp/tf_examples.tfrecord \ output_dir /tmp/pretraining_output \ do_train True \ do_eval True \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ train_batch_size 32 \ max_seq_length 128 \ max_predictions_per_seq 20 \ num_train_steps 20 \ num_warmup_steps 10 \ learning_rate 2e 5 This will produce an output like this: Eval results global_step 20 loss 0.0979674 masked_lm_accuracy 0.985479 masked_lm_loss 0.0979328 next_sentence_accuracy 1.0 next_sentence_loss 3.45724e 05 Note that since our sample_text.txt file is very small, this example training will overfit that data in only a few steps and produce unrealistically high accuracy numbers. Pre training tips and caveats If using your own vocabulary, make sure to change vocab_size in bert_config.json . If you use a larger vocabulary without changing this, you will likely get NaNs when training on GPU or TPU due to unchecked out of bounds access. If your task has a large domain specific corpus available (e.g., movie reviews or scientific papers ), it will likely be beneficial to run additional steps of pre training on your corpus, starting from the BERT checkpoint. The learning rate we used in the paper was 1e 4. However, if you are doing additional steps of pre training starting from an existing BERT checkpoint, you should use a smaller learning rate (e.g., 2e 5). Current BERT models are English only, but we do plan to release a multilingual model which has been pre trained on a lot of languages in the near future (hopefully by the end of November 2018). Longer sequences are disproportionately expensive because attention is quadratic to the sequence length. In other words, a batch of 64 sequences of length 512 is much more expensive than a batch of 256 sequences of length 128. The fully connected/convolutional cost is the same, but the attention cost is far greater for the 512 length sequences. Therefore, one good recipe is to pre train for, say, 90,000 steps with a sequence length of 128 and then for 10,000 additional steps with a sequence length of 512. The very long sequences are mostly needed to learn positional embeddings, which can be learned fairly quickly. Note that this does require generating the data twice with different values of max_seq_length . If you are pre training from scratch, be prepared that pre training is computationally expensive, especially on GPUs. If you are pre training from scratch, our recommended recipe is to pre train a BERT Base on a single preemptible Cloud TPU v2 , which takes about 2 weeks at a cost of about $500 USD (based on the pricing in October 2018). You will have to scale down the batch size when only training on a single Cloud TPU, compared to what was used in the paper. It is recommended to use the largest batch size that fits into TPU memory. Pre training data We will not be able to release the pre processed datasets used in the paper. For Wikipedia, the recommended pre processing is to download the latest dump , extract the text with WikiExtractor.py , and then apply any necessary cleanup to convert it into plain text. Unfortunately the researchers who collected the BookCorpus no longer have it available for public download. The Project Guttenberg Dataset is a somewhat smaller (200M word) collection of older books that are public domain. Common Crawl is another very large collection of text, but you will likely have to do substantial pre processing and cleanup to extract a usable corpus for pre training BERT. Learning a new WordPiece vocabulary This repository does not include code for learning a new WordPiece vocabulary. The reason is that the code used in the paper was implemented in C++ with dependencies on Google's internal libraries. For English, it is almost always better to just start with our vocabulary and pre trained models. For learning vocabularies of other languages, there are a number of open source options available. However, keep in mind that these are not compatible with our tokenization.py library: Google's SentencePiece library tensor2tensor's WordPiece generation script Rico Sennrich's Byte Pair Encoding library Using BERT in Colab If you want to use BERT with Colab , you can get started with the notebook BERT FineTuning with Cloud TPUs . At the time of this writing (October 31st, 2018), Colab users can access a Cloud TPU completely for free. Note: One per user, availability limited, requires a Google Cloud Platform account with storage (although storage may be purchased with free credit for signing up with GCP), and this capability may not longer be available in the future. Click on the BERT Colab that was just linked for more information. FAQ Is this code compatible with Cloud TPUs? What about GPUs? Yes, all of the code in this repository works out of the box with CPU, GPU, and Cloud TPU. However, GPU training is single GPU only. I am getting out of memory errors, what is wrong? See the section on out of memory issues ( out of memory issues) for more information. Is there a PyTorch version available? There is no official PyTorch implementation. However, NLP researchers from HuggingFace made a PyTorch version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. We were not involved in the creation or maintenance of the PyTorch implementation so please direct any questions towards the authors of that repository. Is there a Chainer version available? There is no official Chainer implementation. However, Sosuke Kobayashi made a Chainer version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. We were not involved in the creation or maintenance of the Chainer implementation so please direct any questions towards the authors of that repository. Will models in other languages be released? Yes, we plan to release a multi lingual BERT model in the near future. We cannot make promises about exactly which languages will be included, but it will likely be a single model which includes most of the languages which have a significantly sized Wikipedia. Will models larger than BERT Large be released? So far we have not attempted to train anything larger than BERT Large . It is possible that we will release larger models if we are able to obtain significant improvements. What license is this library released under? All code and models are released under the Apache 2.0 license. See the LICENSE file for more information. How do I cite BERT? For now, cite the Arxiv paper : @article{devlin2018bert, title {BERT: Pre training of Deep Bidirectional Transformers for Language Understanding}, author {Devlin, Jacob and Chang, Ming Wei and Lee, Kenton and Toutanova, Kristina}, journal {arXiv preprint arXiv:1810.04805}, year {2018} } If we submit the paper to a conference or journal, we will update the BibTeX. Disclaimer This is not an official Google product. Contact information For help or issues using BERT, please submit a GitHub issue. For personal communication related to BERT, please contact Jacob Devlin ( jacobdevlin@google.com ), Ming Wei Chang ( mingweichang@google.com ), or Kenton Lee ( kentonl@google.com ).",Machine Translation,Machine Translation 2308,Natural Language Processing,Natural Language Processing,Natural Language Processing,"BERT \ \ \ \ \ New November 23rd, 2018: Un normalized multilingual model + Thai + Mongolian \ \ \ \ \ We uploaded a new multilingual model which does not perform any normalization on the input (no lower casing, accent stripping, or Unicode normalization), and additionally inclues Thai and Mongolian. It is recommended to use this version for developing multilingual models, especially on languages with non Latin alphabets. This does not require any code changes, and can be downloaded here: BERT Base, Multilingual Cased : 104 languages, 12 layer, 768 hidden, 12 heads, 110M parameters \ \ \ \ \ New November 15th, 2018: SOTA SQuAD 2.0 System \ \ \ \ \ We released code changes to reproduce our 83% F1 SQuAD 2.0 system, which is currently 1st place on the leaderboard by 3%. See the SQuAD 2.0 section of the README for details. \ \ \ \ \ New November 5th, 2018: Third party PyTorch and Chainer versions of BERT available \ \ \ \ \ NLP researchers from HuggingFace made a PyTorch version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. Sosuke Kobayashi also made a Chainer version of BERT available (Thanks!) We were not involved in the creation or maintenance of the PyTorch implementation so please direct any questions towards the authors of that repository. \ \ \ \ \ New November 3rd, 2018: Multilingual and Chinese models available \ \ \ \ \ We have made two new BERT models available: BERT Base, Multilingual (Not recommended, use Multilingual Cased instead) : 102 languages, 12 layer, 768 hidden, 12 heads, 110M parameters BERT Base, Chinese : Chinese Simplified and Traditional, 12 layer, 768 hidden, 12 heads, 110M parameters We use character based tokenization for Chinese, and WordPiece tokenization for all other languages. Both models should work out of the box without any code changes. We did update the implementation of BasicTokenizer in tokenization.py to support Chinese character tokenization, so please update if you forked it. However, we did not change the tokenization API. For more, see the Multilingual README . \ \ \ \ \ End new information \ \ \ \ \ Introduction BERT , or B idirectional E ncoder R epresentations from T ransformers, is a new method of pre training language representations which obtains state of the art results on a wide array of Natural Language Processing (NLP) tasks. Our academic paper which describes BERT in detail and provides full results on a number of tasks can be found here: To give a few numbers, here are the results on the SQuAD v1.1 question answering task: SQuAD v1.1 Leaderboard (Oct 8th 2018) Test EM Test F1 : : : : 1st Place Ensemble BERT 87.4 93.2 2nd Place Ensemble nlnet 86.0 91.7 1st Place Single Model BERT 85.1 91.8 2nd Place Single Model nlnet 83.5 90.1 And several natural language inference tasks: System MultiNLI Question NLI SWAG : : : : : : BERT 86.7 91.1 86.3 OpenAI GPT (Prev. SOTA) 82.2 88.1 75.0 Plus many other tasks. Moreover, these results were all obtained with almost no task specific neural network architecture design. If you already know what BERT is and you just want to get started, you can download the pre trained models ( pre trained models) and run a state of the art fine tuning ( fine tuning with bert) in only a few minutes. What is BERT? BERT is a method of pre training language representations, meaning that we train a general purpose language understanding model on a large text corpus (like Wikipedia), and then use that model for downstream NLP tasks that we care about (like question answering). BERT outperforms previous methods because it is the first unsupervised , deeply bidirectional system for pre training NLP. Unsupervised means that BERT was trained using only a plain text corpus, which is important because an enormous amount of plain text data is publicly available on the web in many languages. Pre trained representations can also either be context free or contextual , and contextual representations can further be unidirectional or bidirectional . Context free models such as word2vec or GloVe generate a single word embedding representation for each word in the vocabulary, so bank would have the same representation in bank deposit and river bank . Contextual models instead generate a representation of each word that is based on the other words in the sentence. BERT was built upon recent work in pre training contextual representations — including Semi supervised Sequence Learning , Generative Pre Training , ELMo , and ULMFit — but crucially these models are all unidirectional or shallowly bidirectional . This means that each word is only contextualized using the words to its left (or right). For example, in the sentence I made a bank deposit the unidirectional representation of bank is only based on I made a but not deposit . Some previous work does combine the representations from separate left context and right context models, but only in a shallow manner. BERT represents bank using both its left and right context — I made a ... deposit — starting from the very bottom of a deep neural network, so it is deeply bidirectional . BERT uses a simple approach for this: We mask out 15% of the words in the input, run the entire sequence through a deep bidirectional Transformer encoder, and then predict only the masked words. For example: Input: the man went to the MASK1 . he bought a MASK2 of milk. Labels: MASK1 store; MASK2 gallon In order to learn relationships between sentences, we also train on a simple task which can be generated from any monolingual corpus: Given two sentences A and B , is B the actual next sentence that comes after A , or just a random sentence from the corpus? Sentence A: the man went to the store . Sentence B: he bought a gallon of milk . Label: IsNextSentence Sentence A: the man went to the store . Sentence B: penguins are flightless . Label: NotNextSentence We then train a large model (12 layer to 24 layer Transformer) on a large corpus (Wikipedia + BookCorpus ) for a long time (1M update steps), and that's BERT. Using BERT has two stages: Pre training and fine tuning . Pre training is fairly expensive (four days on 4 to 16 Cloud TPUs), but is a one time procedure for each language (current models are English only, but multilingual models will be released in the near future). We are releasing a number of pre trained models from the paper which were pre trained at Google. Most NLP researchers will never need to pre train their own model from scratch. Fine tuning is inexpensive. All of the results in the paper can be replicated in at most 1 hour on a single Cloud TPU, or a few hours on a GPU, starting from the exact same pre trained model. SQuAD, for example, can be trained in around 30 minutes on a single Cloud TPU to achieve a Dev F1 score of 91.0%, which is the single system state of the art. The other important aspect of BERT is that it can be adapted to many types of NLP tasks very easily. In the paper, we demonstrate state of the art results on sentence level (e.g., SST 2), sentence pair level (e.g., MultiNLI), word level (e.g., NER), and span level (e.g., SQuAD) tasks with almost no task specific modifications. What has been released in this repository? We are releasing the following: TensorFlow code for the BERT model architecture (which is mostly a standard Transformer architecture). Pre trained checkpoints for both the lowercase and cased version of BERT Base and BERT Large from the paper. TensorFlow code for push button replication of the most important fine tuning experiments from the paper, including SQuAD, MultiNLI, and MRPC. All of the code in this repository works out of the box with CPU, GPU, and Cloud TPU. Pre trained models We are releasing the BERT Base and BERT Large models from the paper. Uncased means that the text has been lowercased before WordPiece tokenization, e.g., John Smith becomes john smith . The Uncased model also strips out any accent markers. Cased means that the true case and accent markers are preserved. Typically, the Uncased model is better unless you know that case information is important for your task (e.g., Named Entity Recognition or Part of Speech tagging). These models are all released under the same license as the source code (Apache 2.0). For information about the Multilingual and Chinese model, see the Multilingual README . When using a cased model, make sure to pass do_lower False to the training scripts. (Or pass do_lower_case False directly to FullTokenizer if you're using your own script.) The links to the models are here (right click, 'Save link as...' on the name): BERT Base, Uncased : 12 layer, 768 hidden, 12 heads, 110M parameters BERT Large, Uncased : 24 layer, 1024 hidden, 16 heads, 340M parameters BERT Base, Cased : 12 layer, 768 hidden, 12 heads , 110M parameters BERT Large, Cased : 24 layer, 1024 hidden, 16 heads, 340M parameters BERT Base, Multilingual Cased (New, recommended) : 104 languages, 12 layer, 768 hidden, 12 heads, 110M parameters BERT Base, Multilingual Uncased (Orig, not recommended) (Not recommended, use Multilingual Cased instead) : 102 languages, 12 layer, 768 hidden, 12 heads, 110M parameters BERT Base, Chinese : Chinese Simplified and Traditional, 12 layer, 768 hidden, 12 heads, 110M parameters Each .zip file contains three items: A TensorFlow checkpoint ( bert_model.ckpt ) containing the pre trained weights (which is actually 3 files). A vocab file ( vocab.txt ) to map WordPiece to word id. A config file ( bert_config.json ) which specifies the hyperparameters of the model. Fine tuning with BERT Important : All results on the paper were fine tuned on a single Cloud TPU, which has 64GB of RAM. It is currently not possible to re produce most of the BERT Large results on the paper using a GPU with 12GB 16GB of RAM, because the maximum batch size that can fit in memory is too small. We are working on adding code to this repository which allows for much larger effective batch size on the GPU. See the section on out of memory issues ( out of memory issues) for more details. This code was tested with TensorFlow 1.11.0. It was tested with Python2 and Python3 (but more thoroughly with Python2, since this is what's used internally in Google). The fine tuning examples which use BERT Base should be able to run on a GPU that has at least 12GB of RAM using the hyperparameters given. Fine tuning with Cloud TPUs Most of the examples below assumes that you will be running training/evaluation on your local machine, using a GPU like a Titan X or GTX 1080. However, if you have access to a Cloud TPU that you want to train on, just add the following flags to run_classifier.py or run_squad.py : use_tpu True \ tpu_name $TPU_NAME Please see the Google Cloud TPU tutorial for how to use Cloud TPUs. Alternatively, you can use the Google Colab notebook BERT FineTuning with Cloud TPUs . On Cloud TPUs, the pretrained model and the output directory will need to be on Google Cloud Storage. For example, if you have a bucket named some_bucket , you might use the following flags instead: output_dir gs://some_bucket/my_output_dir/ The unzipped pre trained model files can also be found in the Google Cloud Storage folder gs://bert_models/2018_10_18 . For example: export BERT_BASE_DIR gs://bert_models/2018_10_18/uncased_L 12_H 768_A 12 Sentence (and sentence pair) classification tasks Before running this example you must download the GLUE data by running this script and unpack it to some directory $GLUE_DIR . Next, download the BERT Base checkpoint and unzip it to some directory $BERT_BASE_DIR . This example code fine tunes BERT Base on the Microsoft Research Paraphrase Corpus (MRPC) corpus, which only contains 3,600 examples and can fine tune in a few minutes on most GPUs. shell export BERT_BASE_DIR /path/to/bert/uncased_L 12_H 768_A 12 export GLUE_DIR /path/to/glue python run_classifier.py \ task_name MRPC \ do_train true \ do_eval true \ data_dir $GLUE_DIR/MRPC \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ max_seq_length 128 \ train_batch_size 32 \ learning_rate 2e 5 \ num_train_epochs 3.0 \ output_dir /tmp/mrpc_output/ You should see output like this: Eval results eval_accuracy 0.845588 eval_loss 0.505248 global_step 343 loss 0.505248 This means that the Dev set accuracy was 84.55%. Small sets like MRPC have a high variance in the Dev set accuracy, even when starting from the same pre training checkpoint. If you re run multiple times (making sure to point to different output_dir ), you should see results between 84% and 88%. A few other pre trained models are implemented off the shelf in run_classifier.py , so it should be straightforward to follow those examples to use BERT for any single sentence or sentence pair classification task. Note: You might see a message Running train on CPU . This really just means that it's running on something other than a Cloud TPU, which includes a GPU. Prediction from classifier Once you have trained your classifier you can use it in inference mode by using the do_predict true command. You need to have a file named test.tsv in the input folder. Output will be created in file called test_results.tsv in the output folder. Each line will contain output for each sample, columns are the class probabilities. shell export BERT_BASE_DIR /path/to/bert/uncased_L 12_H 768_A 12 export GLUE_DIR /path/to/glue export TRAINED_CLASSIFIER /path/to/fine/tuned/classifier python run_classifier.py \ task_name MRPC \ do_predict true \ data_dir $GLUE_DIR/MRPC \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $TRAINED_CLASSIFIER \ max_seq_length 128 \ output_dir /tmp/mrpc_output/ SQuAD 1.1 The Stanford Question Answering Dataset (SQuAD) is a popular question answering benchmark dataset. BERT (at the time of the release) obtains state of the art results on SQuAD with almost no task specific network architecture modifications or data augmentation. However, it does require semi complex data pre processing and post processing to deal with (a) the variable length nature of SQuAD context paragraphs, and (b) the character level answer annotations which are used for SQuAD training. This processing is implemented and documented in run_squad.py . To run on SQuAD, you will first need to download the dataset. The SQuAD website does not seem to link to the v1.1 datasets any longer, but the necessary files can be found here: train v1.1.json dev v1.1.json evaluate v1.1.py Download these to some directory $SQUAD_DIR . The state of the art SQuAD results from the paper currently cannot be reproduced on a 12GB 16GB GPU due to memory constraints (in fact, even batch size 1 does not seem to fit on a 12GB GPU using BERT Large ). However, a reasonably strong BERT Base model can be trained on the GPU with these hyperparameters: shell python run_squad.py \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ do_train True \ train_file $SQUAD_DIR/train v1.1.json \ do_predict True \ predict_file $SQUAD_DIR/dev v1.1.json \ train_batch_size 12 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir /tmp/squad_base/ The dev set predictions will be saved into a file called predictions.json in the output_dir : shell python $SQUAD_DIR/evaluate v1.1.py $SQUAD_DIR/dev v1.1.json ./squad/predictions.json Which should produce an output like this: shell { f1 : 88.41249612335034, exact_match : 81.2488174077578} You should see a result similar to the 88.5% reported in the paper for BERT Base . If you have access to a Cloud TPU, you can train with BERT Large . Here is a set of hyperparameters (slightly different than the paper) which consistently obtain around 90.5% 91.0% F1 single system trained only on SQuAD: shell python run_squad.py \ vocab_file $BERT_LARGE_DIR/vocab.txt \ bert_config_file $BERT_LARGE_DIR/bert_config.json \ init_checkpoint $BERT_LARGE_DIR/bert_model.ckpt \ do_train True \ train_file $SQUAD_DIR/train v1.1.json \ do_predict True \ predict_file $SQUAD_DIR/dev v1.1.json \ train_batch_size 24 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir gs://some_bucket/squad_large/ \ use_tpu True \ tpu_name $TPU_NAME For example, one random run with these parameters produces the following Dev scores: shell { f1 : 90.87081895814865, exact_match : 84.38978240302744} If you fine tune for one epoch on TriviaQA before this the results will be even better, but you will need to convert TriviaQA into the SQuAD json format. SQuAD 2.0 This model is also implemented and documented in run_squad.py . To run on SQuAD 2.0, you will first need to download the dataset. The necessary files can be found here: train v2.0.json dev v2.0.json evaluate v2.0.py Download these to some directory $SQUAD_DIR . On Cloud TPU you can run with BERT Large as follows: shell python run_squad.py \ vocab_file $BERT_LARGE_DIR/vocab.txt \ bert_config_file $BERT_LARGE_DIR/bert_config.json \ init_checkpoint $BERT_LARGE_DIR/bert_model.ckpt \ do_train True \ train_file $SQUAD_DIR/train v2.0.json \ do_predict True \ predict_file $SQUAD_DIR/dev v2.0.json \ train_batch_size 24 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir gs://some_bucket/squad_large/ \ use_tpu True \ tpu_name $TPU_NAME \ version_2_with_negative True We assume you have copied everything from the output directory to a local directory called ./squad/. The initial dev set predictions will be at ./squad/predictions.json and the differences between the score of no answer ( ) and the best non null answer for each question will be in the file ./squad/null_odds.json Run this script to tune a threshold for predicting null versus non null answers: python $SQUAD_DIR/evaluate v2.0.py $SQUAD_DIR/dev v2.0.json ./squad/predictions.json na prob file ./squad/null_odds.json Assume the script outputs best_f1_thresh THRESH. (Typical values are between 1.0 and 5.0). You can now re run the model to generate predictions with the derived threshold or alternatively you can extract the appropriate answers from ./squad/nbest_predictions.json. shell python run_squad.py \ vocab_file $BERT_LARGE_DIR/vocab.txt \ bert_config_file $BERT_LARGE_DIR/bert_config.json \ init_checkpoint $BERT_LARGE_DIR/bert_model.ckpt \ do_train False \ train_file $SQUAD_DIR/train v2.0.json \ do_predict True \ predict_file $SQUAD_DIR/dev v2.0.json \ train_batch_size 24 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir gs://some_bucket/squad_large/ \ use_tpu True \ tpu_name $TPU_NAME \ version_2_with_negative True \ null_score_diff_threshold $THRESH Out of memory issues All experiments in the paper were fine tuned on a Cloud TPU, which has 64GB of device RAM. Therefore, when using a GPU with 12GB 16GB of RAM, you are likely to encounter out of memory issues if you use the same hyperparameters described in the paper. The factors that affect memory usage are: max_seq_length : The released models were trained with sequence lengths up to 512, but you can fine tune with a shorter max sequence length to save substantial memory. This is controlled by the max_seq_length flag in our example code. train_batch_size : The memory usage is also directly proportional to the batch size. Model type, BERT Base vs. BERT Large : The BERT Large model requires significantly more memory than BERT Base . Optimizer : The default optimizer for BERT is Adam, which requires a lot of extra memory to store the m and v vectors. Switching to a more memory efficient optimizer can reduce memory usage, but can also affect the results. We have not experimented with other optimizers for fine tuning. Using the default training scripts ( run_classifier.py and run_squad.py ), we benchmarked the maximum batch size on single Titan X GPU (12GB RAM) with TensorFlow 1.11.0: System Seq Length Max Batch Size BERT Base 64 64 ... 128 32 ... 256 16 ... 320 14 ... 384 12 ... 512 6 BERT Large 64 12 ... 128 6 ... 256 2 ... 320 1 ... 384 0 ... 512 0 Unfortunately, these max batch sizes for BERT Large are so small that they will actually harm the model accuracy, regardless of the learning rate used. We are working on adding code to this repository which will allow much larger effective batch sizes to be used on the GPU. The code will be based on one (or both) of the following techniques: Gradient accumulation : The samples in a minibatch are typically independent with respect to gradient computation (excluding batch normalization, which is not used here). This means that the gradients of multiple smaller minibatches can be accumulated before performing the weight update, and this will be exactly equivalent to a single larger update. Gradient checkpointing : The major use of GPU/TPU memory during DNN training is caching the intermediate activations in the forward pass that are necessary for efficient computation in the backward pass. Gradient checkpointing trades memory for compute time by re computing the activations in an intelligent way. However, this is not implemented in the current release. Using BERT to extract fixed feature vectors (like ELMo) In certain cases, rather than fine tuning the entire pre trained model end to end, it can be beneficial to obtained pre trained contextual embeddings , which are fixed contextual representations of each input token generated from the hidden layers of the pre trained model. This should also mitigate most of the out of memory issues. As an example, we include the script extract_features.py which can be used like this: shell Sentence A and Sentence B are separated by the delimiter for sentence pair tasks like question answering and entailment. For single sentence inputs, put one sentence per line and DON'T use the delimiter. echo 'Who was Jim Henson ? Jim Henson was a puppeteer' > /tmp/input.txt python extract_features.py \ input_file /tmp/input.txt \ output_file /tmp/output.jsonl \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ layers 1, 2, 3, 4 \ max_seq_length 128 \ batch_size 8 This will create a JSON file (one line per line of input) containing the BERT activations from each Transformer layer specified by layers ( 1 is the final hidden layer of the Transformer, etc.) Note that this script will produce very large output files (by default, around 15kb for every input token). If you need to maintain alignment between the original and tokenized words (for projecting training labels), see the Tokenization ( tokenization) section below. Note: You may see a message like Could not find trained model in model_dir: /tmp/tmpuB5g5c, running initialization to predict. This message is expected, it just means that we are using the init_from_checkpoint() API rather than the saved model API. If you don't specify a checkpoint or specify an invalid checkpoint, this script will complain. Tokenization For sentence level tasks (or sentence pair) tasks, tokenization is very simple. Just follow the example code in run_classifier.py and extract_features.py . The basic procedure for sentence level tasks is: 1. Instantiate an instance of tokenizer tokenization.FullTokenizer 2. Tokenize the raw text with tokens tokenizer.tokenize(raw_text) . 3. Truncate to the maximum sequence length. (You can use up to 512, but you probably want to use shorter if possible for memory and speed reasons.) 4. Add the CLS and SEP tokens in the right place. Word level and span level tasks (e.g., SQuAD and NER) are more complex, since you need to maintain alignment between your input text and output text so that you can project your training labels. SQuAD is a particularly complex example because the input labels are character based, and SQuAD paragraphs are often longer than our maximum sequence length. See the code in run_squad.py to show how we handle this. Before we describe the general recipe for handling word level tasks, it's important to understand what exactly our tokenizer is doing. It has three main steps: 1. Text normalization : Convert all whitespace characters to spaces, and (for the Uncased model) lowercase the input and strip out accent markers. E.g., John Johanson's, → john johanson's, . 2. Punctuation splitting : Split all punctuation characters on both sides (i.e., add whitespace around all punctuation characters). Punctuation characters are defined as (a) Anything with a P Unicode class, (b) any non letter/number/space ASCII character (e.g., characters like $ which are technically not punctuation). E.g., john johanson's, → john johanson ' s , 3. WordPiece tokenization : Apply whitespace tokenization to the output of the above procedure, and apply WordPiece tokenization to each token separately. (Our implementation is directly based on the one from tensor2tensor , which is linked). E.g., john johanson ' s , → john johan son ' s , The advantage of this scheme is that it is compatible with most existing English tokenizers. For example, imagine that you have a part of speech tagging task which looks like this: Input: John Johanson 's house Labels: NNP NNP POS NN The tokenized output will look like this: Tokens: john johan son ' s house Crucially, this would be the same output as if the raw text were John Johanson's house (with no space before the 's ). If you have a pre tokenized representation with word level annotations, you can simply tokenize each input word independently, and deterministically maintain an original to tokenized alignment: python Input orig_tokens John , Johanson , 's , house labels NNP , NNP , POS , NN Output bert_tokens Token map will be an int > int mapping between the orig_tokens index and the bert_tokens index. orig_to_tok_map tokenizer tokenization.FullTokenizer( vocab_file vocab_file, do_lower_case True) bert_tokens.append( CLS ) for orig_token in orig_tokens: orig_to_tok_map.append(len(bert_tokens)) bert_tokens.extend(tokenizer.tokenize(orig_token)) bert_tokens.append( SEP ) bert_tokens CLS , john , johan , son , ' , s , house , SEP orig_to_tok_map 1, 2, 4, 6 Now orig_to_tok_map can be used to project labels to the tokenized representation. There are common English tokenization schemes which will cause a slight mismatch between how BERT was pre trained. For example, if your input tokenization splits off contractions like do n't , this will cause a mismatch. If it is possible to do so, you should pre process your data to convert these back to raw looking text, but if it's not possible, this mismatch is likely not a big deal. Pre training with BERT We are releasing code to do masked LM and next sentence prediction on an arbitrary text corpus. Note that this is not the exact code that was used for the paper (the original code was written in C++, and had some additional complexity), but this code does generate pre training data as described in the paper. Here's how to run the data generation. The input is a plain text file, with one sentence per line. (It is important that these be actual sentences for the next sentence prediction task). Documents are delimited by empty lines. The output is a set of tf.train.Example s serialized into TFRecord file format. You can perform sentence segmentation with an off the shelf NLP toolkit such as spaCy . The create_pretraining_data.py script will concatenate segments until they reach the maximum sequence length to minimize computational waste from padding (see the script for more details). However, you may want to intentionally add a slight amount of noise to your input data (e.g., randomly truncate 2% of input segments) to make it more robust to non sentential input during fine tuning. This script stores all of the examples for the entire input file in memory, so for large data files you should shard the input file and call the script multiple times. (You can pass in a file glob to run_pretraining.py , e.g., tf_examples.tf_record .) The max_predictions_per_seq is the maximum number of masked LM predictions per sequence. You should set this to around max_seq_length masked_lm_prob (the script doesn't do that automatically because the exact value needs to be passed to both scripts). shell python create_pretraining_data.py \ input_file ./sample_text.txt \ output_file /tmp/tf_examples.tfrecord \ vocab_file $BERT_BASE_DIR/vocab.txt \ do_lower_case True \ max_seq_length 128 \ max_predictions_per_seq 20 \ masked_lm_prob 0.15 \ random_seed 12345 \ dupe_factor 5 Here's how to run the pre training. Do not include init_checkpoint if you are pre training from scratch. The model configuration (including vocab size) is specified in bert_config_file . This demo code only pre trains for a small number of steps (20), but in practice you will probably want to set num_train_steps to 10000 steps or more. The max_seq_length and max_predictions_per_seq parameters passed to run_pretraining.py must be the same as create_pretraining_data.py . shell python run_pretraining.py \ input_file /tmp/tf_examples.tfrecord \ output_dir /tmp/pretraining_output \ do_train True \ do_eval True \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ train_batch_size 32 \ max_seq_length 128 \ max_predictions_per_seq 20 \ num_train_steps 20 \ num_warmup_steps 10 \ learning_rate 2e 5 This will produce an output like this: Eval results global_step 20 loss 0.0979674 masked_lm_accuracy 0.985479 masked_lm_loss 0.0979328 next_sentence_accuracy 1.0 next_sentence_loss 3.45724e 05 Note that since our sample_text.txt file is very small, this example training will overfit that data in only a few steps and produce unrealistically high accuracy numbers. Pre training tips and caveats If using your own vocabulary, make sure to change vocab_size in bert_config.json . If you use a larger vocabulary without changing this, you will likely get NaNs when training on GPU or TPU due to unchecked out of bounds access. If your task has a large domain specific corpus available (e.g., movie reviews or scientific papers ), it will likely be beneficial to run additional steps of pre training on your corpus, starting from the BERT checkpoint. The learning rate we used in the paper was 1e 4. However, if you are doing additional steps of pre training starting from an existing BERT checkpoint, you should use a smaller learning rate (e.g., 2e 5). Current BERT models are English only, but we do plan to release a multilingual model which has been pre trained on a lot of languages in the near future (hopefully by the end of November 2018). Longer sequences are disproportionately expensive because attention is quadratic to the sequence length. In other words, a batch of 64 sequences of length 512 is much more expensive than a batch of 256 sequences of length 128. The fully connected/convolutional cost is the same, but the attention cost is far greater for the 512 length sequences. Therefore, one good recipe is to pre train for, say, 90,000 steps with a sequence length of 128 and then for 10,000 additional steps with a sequence length of 512. The very long sequences are mostly needed to learn positional embeddings, which can be learned fairly quickly. Note that this does require generating the data twice with different values of max_seq_length . If you are pre training from scratch, be prepared that pre training is computationally expensive, especially on GPUs. If you are pre training from scratch, our recommended recipe is to pre train a BERT Base on a single preemptible Cloud TPU v2 , which takes about 2 weeks at a cost of about $500 USD (based on the pricing in October 2018). You will have to scale down the batch size when only training on a single Cloud TPU, compared to what was used in the paper. It is recommended to use the largest batch size that fits into TPU memory. Pre training data We will not be able to release the pre processed datasets used in the paper. For Wikipedia, the recommended pre processing is to download the latest dump , extract the text with WikiExtractor.py , and then apply any necessary cleanup to convert it into plain text. Unfortunately the researchers who collected the BookCorpus no longer have it available for public download. The Project Guttenberg Dataset is a somewhat smaller (200M word) collection of older books that are public domain. Common Crawl is another very large collection of text, but you will likely have to do substantial pre processing and cleanup to extract a usable corpus for pre training BERT. Learning a new WordPiece vocabulary This repository does not include code for learning a new WordPiece vocabulary. The reason is that the code used in the paper was implemented in C++ with dependencies on Google's internal libraries. For English, it is almost always better to just start with our vocabulary and pre trained models. For learning vocabularies of other languages, there are a number of open source options available. However, keep in mind that these are not compatible with our tokenization.py library: Google's SentencePiece library tensor2tensor's WordPiece generation script Rico Sennrich's Byte Pair Encoding library Using BERT in Colab If you want to use BERT with Colab , you can get started with the notebook BERT FineTuning with Cloud TPUs . At the time of this writing (October 31st, 2018), Colab users can access a Cloud TPU completely for free. Note: One per user, availability limited, requires a Google Cloud Platform account with storage (although storage may be purchased with free credit for signing up with GCP), and this capability may not longer be available in the future. Click on the BERT Colab that was just linked for more information. FAQ Is this code compatible with Cloud TPUs? What about GPUs? Yes, all of the code in this repository works out of the box with CPU, GPU, and Cloud TPU. However, GPU training is single GPU only. I am getting out of memory errors, what is wrong? See the section on out of memory issues ( out of memory issues) for more information. Is there a PyTorch version available? There is no official PyTorch implementation. However, NLP researchers from HuggingFace made a PyTorch version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. We were not involved in the creation or maintenance of the PyTorch implementation so please direct any questions towards the authors of that repository. Is there a Chainer version available? There is no official Chainer implementation. However, Sosuke Kobayashi made a Chainer version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. We were not involved in the creation or maintenance of the Chainer implementation so please direct any questions towards the authors of that repository. Will models in other languages be released? Yes, we plan to release a multi lingual BERT model in the near future. We cannot make promises about exactly which languages will be included, but it will likely be a single model which includes most of the languages which have a significantly sized Wikipedia. Will models larger than BERT Large be released? So far we have not attempted to train anything larger than BERT Large . It is possible that we will release larger models if we are able to obtain significant improvements. What license is this library released under? All code and models are released under the Apache 2.0 license. See the LICENSE file for more information. How do I cite BERT? For now, cite the Arxiv paper : @article{devlin2018bert, title {BERT: Pre training of Deep Bidirectional Transformers for Language Understanding}, author {Devlin, Jacob and Chang, Ming Wei and Lee, Kenton and Toutanova, Kristina}, journal {arXiv preprint arXiv:1810.04805}, year {2018} } If we submit the paper to a conference or journal, we will update the BibTeX. Disclaimer This is not an official Google product. Contact information For help or issues using BERT, please submit a GitHub issue. For personal communication related to BERT, please contact Jacob Devlin ( jacobdevlin@google.com ), Ming Wei Chang ( mingweichang@google.com ), or Kenton Lee ( kentonl@google.com ). bert",Machine Translation,Machine Translation 2325,Natural Language Processing,Natural Language Processing,Natural Language Processing,"tensorflow statereader This Repository provides a simple LSTM implementation including a state extractor for tensorflow 1.1.0. A model is first trained and states then extracted and stored in a hdf5 file. This makes it possible to train custom language models for LSTMVis . This code is heavily based on tutorial implementation in the official documentation which can be found here . A standard model using the Penn Treebank can be trained by simply running python lstm.py data/ptb word . In case you want to train the model with custom parameters, or your own data, we provide the following options. Parameters data_path The folder where your training/validation data is stored. save_path The code saves the trained model to this directory. load_path If you want to load a trained model, enter its folder here. use_fp16 Train using 16 bit floats instead of 32bit floats. False init_scale Scale of the uniform parameter initialization. 0.1 learning_rate The initial learning rate. 1.0 max_grad_norm Max norm of the gradient. 5 num_layers Layers of the LSTM. 2 num_steps Steps to unroll the LSTM for. 30 hidden_size Number of cell states. 200 max_epoch How many epochs with max learning rate before decay begins. 4 max_max_epoch How many epochs to train for. 10 dropout Dropout probability. 1.0 lr_decay Decay multiplier for the learning rate. 0.5 batch_size Batchsize. 20 vocab_size Size of Vocabulary 6500 The standard parameters lead to a very small model that is quickly trained. For parameter configuration for a large models, have a look at by Zaremba et al. How to Extract States from Your Model? You might be interested in analyzing your own models in LSTMVis. This is easily possible, as long as you can extract some vector that evolves over time, such as hidden states or cell states in an LSTM (or a GRU). To extract your own states, you actually need to add very little code to your model, all of which is documented here. 1. Make sure that you can access the states It is recommended to build your model in a class and use the @property annotation for a get function. This makes it possible to access the states when you call session.run . In our case, we found it easier to unroll the RNN for n timesteps instead of using the rnn cell built into tensorflow ( Note: This is only recommended for the analysis, for best performance, keep using the rnn cell). Before: inputs tf.unstack(inputs, num num_steps, axis 1) outputs, state tf.nn.rnn(cell, inputs, initial_state self._initial_state) After: self._states state self._initial_state with tf.variable_scope( RNN ): for time_step in range(num_steps): if time_step > 0: tf.get_variable_scope().reuse_variables() (cell_output, state) cell(inputs :, time_step, : , state) outputs.append(cell_output) self._states.append(state) With the code above, you have a variable that stores an array of length $num_steps$ that contains LSTMStateTuple, a tensorflow internal data type that stores c , the cell states, and h , the hidden states. The states are of dimension batch_size state_size . You get this array by executing your existing session.run code, with the following addition: Old: vals session.run(fetches, feed_dict) New: vals, stat session.run( fetches, model.states , feed_dict) This gives you the states for one batch, stored in a variable stat . 2. Get all states sequentially and store them To store all the cell states for a data set, use the numpy.vstack function to add them all to an array, which we call all_states . We also have to transform stat into an array to get rid of the LSTMStateTuple overhead. The first line of the following code takes care of this. If you want to access hidden states instead of the cell states, change the s 0 0 to s 0 1 . curr_states np.array( s 0 0 for s in stat ) if len(all_states) 0: all_states curr_states else: all_states np.vstack((all_states, curr_states)) After the whole epoch, all_states contains a tensor of the form num_steps batch_size state_size . To transform them into an array of the dimensions data_length state_size , use stat np.reshape(stat, ( 1, stat.shape 2 )) . The last step is storing the states in a hdf5 file, which can be achieved with the following code: f h5py.File( states.h5 , w ) f states1 stat f.close() Congrats, you now have stored your cell states in a LSTMVis compatible format! Credits LSTMVis and all its parts are a collaborative project of Hendrik Strobelt, Sebastian Gehrmann, Hanspeter Pfister, and Alexander M. Rush at Harvard SEAS.",Machine Translation,Machine Translation 2332,Natural Language Processing,Natural Language Processing,Natural Language Processing,"BERT \ \ \ \ \ New November 23rd, 2018: Un normalized multilingual model + Thai + Mongolian \ \ \ \ \ We uploaded a new multilingual model which does not perform any normalization on the input (no lower casing, accent stripping, or Unicode normalization), and additionally inclues Thai and Mongolian. It is recommended to use this version for developing multilingual models, especially on languages with non Latin alphabets. This does not require any code changes, and can be downloaded here: BERT Base, Multilingual Cased : 104 languages, 12 layer, 768 hidden, 12 heads, 110M parameters \ \ \ \ \ New November 15th, 2018: SOTA SQuAD 2.0 System \ \ \ \ \ We released code changes to reproduce our 83% F1 SQuAD 2.0 system, which is currently 1st place on the leaderboard by 3%. See the SQuAD 2.0 section of the README for details. \ \ \ \ \ New November 5th, 2018: Third party PyTorch and Chainer versions of BERT available \ \ \ \ \ NLP researchers from HuggingFace made a PyTorch version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. Sosuke Kobayashi also made a Chainer version of BERT available (Thanks!) We were not involved in the creation or maintenance of the PyTorch implementation so please direct any questions towards the authors of that repository. \ \ \ \ \ New November 3rd, 2018: Multilingual and Chinese models available \ \ \ \ \ We have made two new BERT models available: BERT Base, Multilingual (Not recommended, use Multilingual Cased instead) : 102 languages, 12 layer, 768 hidden, 12 heads, 110M parameters BERT Base, Chinese : Chinese Simplified and Traditional, 12 layer, 768 hidden, 12 heads, 110M parameters We use character based tokenization for Chinese, and WordPiece tokenization for all other languages. Both models should work out of the box without any code changes. We did update the implementation of BasicTokenizer in tokenization.py to support Chinese character tokenization, so please update if you forked it. However, we did not change the tokenization API. For more, see the Multilingual README . \ \ \ \ \ End new information \ \ \ \ \ Introduction BERT , or B idirectional E ncoder R epresentations from T ransformers, is a new method of pre training language representations which obtains state of the art results on a wide array of Natural Language Processing (NLP) tasks. Our academic paper which describes BERT in detail and provides full results on a number of tasks can be found here: To give a few numbers, here are the results on the SQuAD v1.1 question answering task: SQuAD v1.1 Leaderboard (Oct 8th 2018) Test EM Test F1 : : : : 1st Place Ensemble BERT 87.4 93.2 2nd Place Ensemble nlnet 86.0 91.7 1st Place Single Model BERT 85.1 91.8 2nd Place Single Model nlnet 83.5 90.1 And several natural language inference tasks: System MultiNLI Question NLI SWAG : : : : : : BERT 86.7 91.1 86.3 OpenAI GPT (Prev. SOTA) 82.2 88.1 75.0 Plus many other tasks. Moreover, these results were all obtained with almost no task specific neural network architecture design. If you already know what BERT is and you just want to get started, you can download the pre trained models ( pre trained models) and run a state of the art fine tuning ( fine tuning with bert) in only a few minutes. What is BERT? BERT is a method of pre training language representations, meaning that we train a general purpose language understanding model on a large text corpus (like Wikipedia), and then use that model for downstream NLP tasks that we care about (like question answering). BERT outperforms previous methods because it is the first unsupervised , deeply bidirectional system for pre training NLP. Unsupervised means that BERT was trained using only a plain text corpus, which is important because an enormous amount of plain text data is publicly available on the web in many languages. Pre trained representations can also either be context free or contextual , and contextual representations can further be unidirectional or bidirectional . Context free models such as word2vec or GloVe generate a single word embedding representation for each word in the vocabulary, so bank would have the same representation in bank deposit and river bank . Contextual models instead generate a representation of each word that is based on the other words in the sentence. BERT was built upon recent work in pre training contextual representations — including Semi supervised Sequence Learning , Generative Pre Training , ELMo , and ULMFit — but crucially these models are all unidirectional or shallowly bidirectional . This means that each word is only contextualized using the words to its left (or right). For example, in the sentence I made a bank deposit the unidirectional representation of bank is only based on I made a but not deposit . Some previous work does combine the representations from separate left context and right context models, but only in a shallow manner. BERT represents bank using both its left and right context — I made a ... deposit — starting from the very bottom of a deep neural network, so it is deeply bidirectional . BERT uses a simple approach for this: We mask out 15% of the words in the input, run the entire sequence through a deep bidirectional Transformer encoder, and then predict only the masked words. For example: Input: the man went to the MASK1 . he bought a MASK2 of milk. Labels: MASK1 store; MASK2 gallon In order to learn relationships between sentences, we also train on a simple task which can be generated from any monolingual corpus: Given two sentences A and B , is B the actual next sentence that comes after A , or just a random sentence from the corpus? Sentence A: the man went to the store . Sentence B: he bought a gallon of milk . Label: IsNextSentence Sentence A: the man went to the store . Sentence B: penguins are flightless . Label: NotNextSentence We then train a large model (12 layer to 24 layer Transformer) on a large corpus (Wikipedia + BookCorpus ) for a long time (1M update steps), and that's BERT. Using BERT has two stages: Pre training and fine tuning . Pre training is fairly expensive (four days on 4 to 16 Cloud TPUs), but is a one time procedure for each language (current models are English only, but multilingual models will be released in the near future). We are releasing a number of pre trained models from the paper which were pre trained at Google. Most NLP researchers will never need to pre train their own model from scratch. Fine tuning is inexpensive. All of the results in the paper can be replicated in at most 1 hour on a single Cloud TPU, or a few hours on a GPU, starting from the exact same pre trained model. SQuAD, for example, can be trained in around 30 minutes on a single Cloud TPU to achieve a Dev F1 score of 91.0%, which is the single system state of the art. The other important aspect of BERT is that it can be adapted to many types of NLP tasks very easily. In the paper, we demonstrate state of the art results on sentence level (e.g., SST 2), sentence pair level (e.g., MultiNLI), word level (e.g., NER), and span level (e.g., SQuAD) tasks with almost no task specific modifications. What has been released in this repository? We are releasing the following: TensorFlow code for the BERT model architecture (which is mostly a standard Transformer architecture). Pre trained checkpoints for both the lowercase and cased version of BERT Base and BERT Large from the paper. TensorFlow code for push button replication of the most important fine tuning experiments from the paper, including SQuAD, MultiNLI, and MRPC. All of the code in this repository works out of the box with CPU, GPU, and Cloud TPU. Pre trained models We are releasing the BERT Base and BERT Large models from the paper. Uncased means that the text has been lowercased before WordPiece tokenization, e.g., John Smith becomes john smith . The Uncased model also strips out any accent markers. Cased means that the true case and accent markers are preserved. Typically, the Uncased model is better unless you know that case information is important for your task (e.g., Named Entity Recognition or Part of Speech tagging). These models are all released under the same license as the source code (Apache 2.0). For information about the Multilingual and Chinese model, see the Multilingual README . When using a cased model, make sure to pass do_lower False to the training scripts. (Or pass do_lower_case False directly to FullTokenizer if you're using your own script.) The links to the models are here (right click, 'Save link as...' on the name): BERT Base, Uncased : 12 layer, 768 hidden, 12 heads, 110M parameters BERT Large, Uncased : 24 layer, 1024 hidden, 16 heads, 340M parameters BERT Base, Cased : 12 layer, 768 hidden, 12 heads , 110M parameters BERT Large, Cased : 24 layer, 1024 hidden, 16 heads, 340M parameters BERT Base, Multilingual Cased (New, recommended) : 104 languages, 12 layer, 768 hidden, 12 heads, 110M parameters BERT Base, Multilingual Uncased (Orig, not recommended) (Not recommended, use Multilingual Cased instead) : 102 languages, 12 layer, 768 hidden, 12 heads, 110M parameters BERT Base, Chinese : Chinese Simplified and Traditional, 12 layer, 768 hidden, 12 heads, 110M parameters Each .zip file contains three items: A TensorFlow checkpoint ( bert_model.ckpt ) containing the pre trained weights (which is actually 3 files). A vocab file ( vocab.txt ) to map WordPiece to word id. A config file ( bert_config.json ) which specifies the hyperparameters of the model. Fine tuning with BERT Important : All results on the paper were fine tuned on a single Cloud TPU, which has 64GB of RAM. It is currently not possible to re produce most of the BERT Large results on the paper using a GPU with 12GB 16GB of RAM, because the maximum batch size that can fit in memory is too small. We are working on adding code to this repository which allows for much larger effective batch size on the GPU. See the section on out of memory issues ( out of memory issues) for more details. This code was tested with TensorFlow 1.11.0. It was tested with Python2 and Python3 (but more thoroughly with Python2, since this is what's used internally in Google). The fine tuning examples which use BERT Base should be able to run on a GPU that has at least 12GB of RAM using the hyperparameters given. Fine tuning with Cloud TPUs Most of the examples below assumes that you will be running training/evaluation on your local machine, using a GPU like a Titan X or GTX 1080. However, if you have access to a Cloud TPU that you want to train on, just add the following flags to run_classifier.py or run_squad.py : use_tpu True \ tpu_name $TPU_NAME Please see the Google Cloud TPU tutorial for how to use Cloud TPUs. Alternatively, you can use the Google Colab notebook BERT FineTuning with Cloud TPUs . On Cloud TPUs, the pretrained model and the output directory will need to be on Google Cloud Storage. For example, if you have a bucket named some_bucket , you might use the following flags instead: output_dir gs://some_bucket/my_output_dir/ The unzipped pre trained model files can also be found in the Google Cloud Storage folder gs://bert_models/2018_10_18 . For example: export BERT_BASE_DIR gs://bert_models/2018_10_18/uncased_L 12_H 768_A 12 Sentence (and sentence pair) classification tasks Before running this example you must download the GLUE data by running this script and unpack it to some directory $GLUE_DIR . Next, download the BERT Base checkpoint and unzip it to some directory $BERT_BASE_DIR . This example code fine tunes BERT Base on the Microsoft Research Paraphrase Corpus (MRPC) corpus, which only contains 3,600 examples and can fine tune in a few minutes on most GPUs. shell export BERT_BASE_DIR /path/to/bert/uncased_L 12_H 768_A 12 export GLUE_DIR /path/to/glue python run_classifier.py \ task_name MRPC \ do_train true \ do_eval true \ data_dir $GLUE_DIR/MRPC \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ max_seq_length 128 \ train_batch_size 32 \ learning_rate 2e 5 \ num_train_epochs 3.0 \ output_dir /tmp/mrpc_output/ You should see output like this: Eval results eval_accuracy 0.845588 eval_loss 0.505248 global_step 343 loss 0.505248 This means that the Dev set accuracy was 84.55%. Small sets like MRPC have a high variance in the Dev set accuracy, even when starting from the same pre training checkpoint. If you re run multiple times (making sure to point to different output_dir ), you should see results between 84% and 88%. A few other pre trained models are implemented off the shelf in run_classifier.py , so it should be straightforward to follow those examples to use BERT for any single sentence or sentence pair classification task. Note: You might see a message Running train on CPU . This really just means that it's running on something other than a Cloud TPU, which includes a GPU. Prediction from classifier Once you have trained your classifier you can use it in inference mode by using the do_predict true command. You need to have a file named test.tsv in the input folder. Output will be created in file called test_results.tsv in the output folder. Each line will contain output for each sample, columns are the class probabilities. shell export BERT_BASE_DIR /path/to/bert/uncased_L 12_H 768_A 12 export GLUE_DIR /path/to/glue export TRAINED_CLASSIFIER /path/to/fine/tuned/classifier python run_classifier.py \ task_name MRPC \ do_predict true \ data_dir $GLUE_DIR/MRPC \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $TRAINED_CLASSIFIER \ max_seq_length 128 \ output_dir /tmp/mrpc_output/ SQuAD 1.1 The Stanford Question Answering Dataset (SQuAD) is a popular question answering benchmark dataset. BERT (at the time of the release) obtains state of the art results on SQuAD with almost no task specific network architecture modifications or data augmentation. However, it does require semi complex data pre processing and post processing to deal with (a) the variable length nature of SQuAD context paragraphs, and (b) the character level answer annotations which are used for SQuAD training. This processing is implemented and documented in run_squad.py . To run on SQuAD, you will first need to download the dataset. The SQuAD website does not seem to link to the v1.1 datasets any longer, but the necessary files can be found here: train v1.1.json dev v1.1.json evaluate v1.1.py Download these to some directory $SQUAD_DIR . The state of the art SQuAD results from the paper currently cannot be reproduced on a 12GB 16GB GPU due to memory constraints (in fact, even batch size 1 does not seem to fit on a 12GB GPU using BERT Large ). However, a reasonably strong BERT Base model can be trained on the GPU with these hyperparameters: shell python run_squad.py \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ do_train True \ train_file $SQUAD_DIR/train v1.1.json \ do_predict True \ predict_file $SQUAD_DIR/dev v1.1.json \ train_batch_size 12 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir /tmp/squad_base/ The dev set predictions will be saved into a file called predictions.json in the output_dir : shell python $SQUAD_DIR/evaluate v1.1.py $SQUAD_DIR/dev v1.1.json ./squad/predictions.json Which should produce an output like this: shell { f1 : 88.41249612335034, exact_match : 81.2488174077578} You should see a result similar to the 88.5% reported in the paper for BERT Base . If you have access to a Cloud TPU, you can train with BERT Large . Here is a set of hyperparameters (slightly different than the paper) which consistently obtain around 90.5% 91.0% F1 single system trained only on SQuAD: shell python run_squad.py \ vocab_file $BERT_LARGE_DIR/vocab.txt \ bert_config_file $BERT_LARGE_DIR/bert_config.json \ init_checkpoint $BERT_LARGE_DIR/bert_model.ckpt \ do_train True \ train_file $SQUAD_DIR/train v1.1.json \ do_predict True \ predict_file $SQUAD_DIR/dev v1.1.json \ train_batch_size 24 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir gs://some_bucket/squad_large/ \ use_tpu True \ tpu_name $TPU_NAME For example, one random run with these parameters produces the following Dev scores: shell { f1 : 90.87081895814865, exact_match : 84.38978240302744} If you fine tune for one epoch on TriviaQA before this the results will be even better, but you will need to convert TriviaQA into the SQuAD json format. SQuAD 2.0 This model is also implemented and documented in run_squad.py . To run on SQuAD 2.0, you will first need to download the dataset. The necessary files can be found here: train v2.0.json dev v2.0.json evaluate v2.0.py Download these to some directory $SQUAD_DIR . On Cloud TPU you can run with BERT Large as follows: shell python run_squad.py \ vocab_file $BERT_LARGE_DIR/vocab.txt \ bert_config_file $BERT_LARGE_DIR/bert_config.json \ init_checkpoint $BERT_LARGE_DIR/bert_model.ckpt \ do_train True \ train_file $SQUAD_DIR/train v2.0.json \ do_predict True \ predict_file $SQUAD_DIR/dev v2.0.json \ train_batch_size 24 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir gs://some_bucket/squad_large/ \ use_tpu True \ tpu_name $TPU_NAME \ version_2_with_negative True We assume you have copied everything from the output directory to a local directory called ./squad/. The initial dev set predictions will be at ./squad/predictions.json and the differences between the score of no answer ( ) and the best non null answer for each question will be in the file ./squad/null_odds.json Run this script to tune a threshold for predicting null versus non null answers: python $SQUAD_DIR/evaluate v2.0.py $SQUAD_DIR/dev v2.0.json ./squad/predictions.json na prob file ./squad/null_odds.json Assume the script outputs best_f1_thresh THRESH. (Typical values are between 1.0 and 5.0). You can now re run the model to generate predictions with the derived threshold or alternatively you can extract the appropriate answers from ./squad/nbest_predictions.json. shell python run_squad.py \ vocab_file $BERT_LARGE_DIR/vocab.txt \ bert_config_file $BERT_LARGE_DIR/bert_config.json \ init_checkpoint $BERT_LARGE_DIR/bert_model.ckpt \ do_train False \ train_file $SQUAD_DIR/train v2.0.json \ do_predict True \ predict_file $SQUAD_DIR/dev v2.0.json \ train_batch_size 24 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir gs://some_bucket/squad_large/ \ use_tpu True \ tpu_name $TPU_NAME \ version_2_with_negative True \ null_score_diff_threshold $THRESH Out of memory issues All experiments in the paper were fine tuned on a Cloud TPU, which has 64GB of device RAM. Therefore, when using a GPU with 12GB 16GB of RAM, you are likely to encounter out of memory issues if you use the same hyperparameters described in the paper. The factors that affect memory usage are: max_seq_length : The released models were trained with sequence lengths up to 512, but you can fine tune with a shorter max sequence length to save substantial memory. This is controlled by the max_seq_length flag in our example code. train_batch_size : The memory usage is also directly proportional to the batch size. Model type, BERT Base vs. BERT Large : The BERT Large model requires significantly more memory than BERT Base . Optimizer : The default optimizer for BERT is Adam, which requires a lot of extra memory to store the m and v vectors. Switching to a more memory efficient optimizer can reduce memory usage, but can also affect the results. We have not experimented with other optimizers for fine tuning. Using the default training scripts ( run_classifier.py and run_squad.py ), we benchmarked the maximum batch size on single Titan X GPU (12GB RAM) with TensorFlow 1.11.0: System Seq Length Max Batch Size BERT Base 64 64 ... 128 32 ... 256 16 ... 320 14 ... 384 12 ... 512 6 BERT Large 64 12 ... 128 6 ... 256 2 ... 320 1 ... 384 0 ... 512 0 Unfortunately, these max batch sizes for BERT Large are so small that they will actually harm the model accuracy, regardless of the learning rate used. We are working on adding code to this repository which will allow much larger effective batch sizes to be used on the GPU. The code will be based on one (or both) of the following techniques: Gradient accumulation : The samples in a minibatch are typically independent with respect to gradient computation (excluding batch normalization, which is not used here). This means that the gradients of multiple smaller minibatches can be accumulated before performing the weight update, and this will be exactly equivalent to a single larger update. Gradient checkpointing : The major use of GPU/TPU memory during DNN training is caching the intermediate activations in the forward pass that are necessary for efficient computation in the backward pass. Gradient checkpointing trades memory for compute time by re computing the activations in an intelligent way. However, this is not implemented in the current release. Using BERT to extract fixed feature vectors (like ELMo) In certain cases, rather than fine tuning the entire pre trained model end to end, it can be beneficial to obtained pre trained contextual embeddings , which are fixed contextual representations of each input token generated from the hidden layers of the pre trained model. This should also mitigate most of the out of memory issues. As an example, we include the script extract_features.py which can be used like this: shell Sentence A and Sentence B are separated by the delimiter for sentence pair tasks like question answering and entailment. For single sentence inputs, put one sentence per line and DON'T use the delimiter. echo 'Who was Jim Henson ? Jim Henson was a puppeteer' > /tmp/input.txt python extract_features.py \ input_file /tmp/input.txt \ output_file /tmp/output.jsonl \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ layers 1, 2, 3, 4 \ max_seq_length 128 \ batch_size 8 This will create a JSON file (one line per line of input) containing the BERT activations from each Transformer layer specified by layers ( 1 is the final hidden layer of the Transformer, etc.) Note that this script will produce very large output files (by default, around 15kb for every input token). If you need to maintain alignment between the original and tokenized words (for projecting training labels), see the Tokenization ( tokenization) section below. Note: You may see a message like Could not find trained model in model_dir: /tmp/tmpuB5g5c, running initialization to predict. This message is expected, it just means that we are using the init_from_checkpoint() API rather than the saved model API. If you don't specify a checkpoint or specify an invalid checkpoint, this script will complain. Tokenization For sentence level tasks (or sentence pair) tasks, tokenization is very simple. Just follow the example code in run_classifier.py and extract_features.py . The basic procedure for sentence level tasks is: 1. Instantiate an instance of tokenizer tokenization.FullTokenizer 2. Tokenize the raw text with tokens tokenizer.tokenize(raw_text) . 3. Truncate to the maximum sequence length. (You can use up to 512, but you probably want to use shorter if possible for memory and speed reasons.) 4. Add the CLS and SEP tokens in the right place. Word level and span level tasks (e.g., SQuAD and NER) are more complex, since you need to maintain alignment between your input text and output text so that you can project your training labels. SQuAD is a particularly complex example because the input labels are character based, and SQuAD paragraphs are often longer than our maximum sequence length. See the code in run_squad.py to show how we handle this. Before we describe the general recipe for handling word level tasks, it's important to understand what exactly our tokenizer is doing. It has three main steps: 1. Text normalization : Convert all whitespace characters to spaces, and (for the Uncased model) lowercase the input and strip out accent markers. E.g., John Johanson's, → john johanson's, . 2. Punctuation splitting : Split all punctuation characters on both sides (i.e., add whitespace around all punctuation characters). Punctuation characters are defined as (a) Anything with a P Unicode class, (b) any non letter/number/space ASCII character (e.g., characters like $ which are technically not punctuation). E.g., john johanson's, → john johanson ' s , 3. WordPiece tokenization : Apply whitespace tokenization to the output of the above procedure, and apply WordPiece tokenization to each token separately. (Our implementation is directly based on the one from tensor2tensor , which is linked). E.g., john johanson ' s , → john johan son ' s , The advantage of this scheme is that it is compatible with most existing English tokenizers. For example, imagine that you have a part of speech tagging task which looks like this: Input: John Johanson 's house Labels: NNP NNP POS NN The tokenized output will look like this: Tokens: john johan son ' s house Crucially, this would be the same output as if the raw text were John Johanson's house (with no space before the 's ). If you have a pre tokenized representation with word level annotations, you can simply tokenize each input word independently, and deterministically maintain an original to tokenized alignment: python Input orig_tokens John , Johanson , 's , house labels NNP , NNP , POS , NN Output bert_tokens Token map will be an int > int mapping between the orig_tokens index and the bert_tokens index. orig_to_tok_map tokenizer tokenization.FullTokenizer( vocab_file vocab_file, do_lower_case True) bert_tokens.append( CLS ) for orig_token in orig_tokens: orig_to_tok_map.append(len(bert_tokens)) bert_tokens.extend(tokenizer.tokenize(orig_token)) bert_tokens.append( SEP ) bert_tokens CLS , john , johan , son , ' , s , house , SEP orig_to_tok_map 1, 2, 4, 6 Now orig_to_tok_map can be used to project labels to the tokenized representation. There are common English tokenization schemes which will cause a slight mismatch between how BERT was pre trained. For example, if your input tokenization splits off contractions like do n't , this will cause a mismatch. If it is possible to do so, you should pre process your data to convert these back to raw looking text, but if it's not possible, this mismatch is likely not a big deal. Pre training with BERT We are releasing code to do masked LM and next sentence prediction on an arbitrary text corpus. Note that this is not the exact code that was used for the paper (the original code was written in C++, and had some additional complexity), but this code does generate pre training data as described in the paper. Here's how to run the data generation. The input is a plain text file, with one sentence per line. (It is important that these be actual sentences for the next sentence prediction task). Documents are delimited by empty lines. The output is a set of tf.train.Example s serialized into TFRecord file format. You can perform sentence segmentation with an off the shelf NLP toolkit such as spaCy . The create_pretraining_data.py script will concatenate segments until they reach the maximum sequence length to minimize computational waste from padding (see the script for more details). However, you may want to intentionally add a slight amount of noise to your input data (e.g., randomly truncate 2% of input segments) to make it more robust to non sentential input during fine tuning. This script stores all of the examples for the entire input file in memory, so for large data files you should shard the input file and call the script multiple times. (You can pass in a file glob to run_pretraining.py , e.g., tf_examples.tf_record .) The max_predictions_per_seq is the maximum number of masked LM predictions per sequence. You should set this to around max_seq_length masked_lm_prob (the script doesn't do that automatically because the exact value needs to be passed to both scripts). shell python create_pretraining_data.py \ input_file ./sample_text.txt \ output_file /tmp/tf_examples.tfrecord \ vocab_file $BERT_BASE_DIR/vocab.txt \ do_lower_case True \ max_seq_length 128 \ max_predictions_per_seq 20 \ masked_lm_prob 0.15 \ random_seed 12345 \ dupe_factor 5 Here's how to run the pre training. Do not include init_checkpoint if you are pre training from scratch. The model configuration (including vocab size) is specified in bert_config_file . This demo code only pre trains for a small number of steps (20), but in practice you will probably want to set num_train_steps to 10000 steps or more. The max_seq_length and max_predictions_per_seq parameters passed to run_pretraining.py must be the same as create_pretraining_data.py . shell python run_pretraining.py \ input_file /tmp/tf_examples.tfrecord \ output_dir /tmp/pretraining_output \ do_train True \ do_eval True \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ train_batch_size 32 \ max_seq_length 128 \ max_predictions_per_seq 20 \ num_train_steps 20 \ num_warmup_steps 10 \ learning_rate 2e 5 This will produce an output like this: Eval results global_step 20 loss 0.0979674 masked_lm_accuracy 0.985479 masked_lm_loss 0.0979328 next_sentence_accuracy 1.0 next_sentence_loss 3.45724e 05 Note that since our sample_text.txt file is very small, this example training will overfit that data in only a few steps and produce unrealistically high accuracy numbers. Pre training tips and caveats If using your own vocabulary, make sure to change vocab_size in bert_config.json . If you use a larger vocabulary without changing this, you will likely get NaNs when training on GPU or TPU due to unchecked out of bounds access. If your task has a large domain specific corpus available (e.g., movie reviews or scientific papers ), it will likely be beneficial to run additional steps of pre training on your corpus, starting from the BERT checkpoint. The learning rate we used in the paper was 1e 4. However, if you are doing additional steps of pre training starting from an existing BERT checkpoint, you should use a smaller learning rate (e.g., 2e 5). Current BERT models are English only, but we do plan to release a multilingual model which has been pre trained on a lot of languages in the near future (hopefully by the end of November 2018). Longer sequences are disproportionately expensive because attention is quadratic to the sequence length. In other words, a batch of 64 sequences of length 512 is much more expensive than a batch of 256 sequences of length 128. The fully connected/convolutional cost is the same, but the attention cost is far greater for the 512 length sequences. Therefore, one good recipe is to pre train for, say, 90,000 steps with a sequence length of 128 and then for 10,000 additional steps with a sequence length of 512. The very long sequences are mostly needed to learn positional embeddings, which can be learned fairly quickly. Note that this does require generating the data twice with different values of max_seq_length . If you are pre training from scratch, be prepared that pre training is computationally expensive, especially on GPUs. If you are pre training from scratch, our recommended recipe is to pre train a BERT Base on a single preemptible Cloud TPU v2 , which takes about 2 weeks at a cost of about $500 USD (based on the pricing in October 2018). You will have to scale down the batch size when only training on a single Cloud TPU, compared to what was used in the paper. It is recommended to use the largest batch size that fits into TPU memory. Pre training data We will not be able to release the pre processed datasets used in the paper. For Wikipedia, the recommended pre processing is to download the latest dump , extract the text with WikiExtractor.py , and then apply any necessary cleanup to convert it into plain text. Unfortunately the researchers who collected the BookCorpus no longer have it available for public download. The Project Guttenberg Dataset is a somewhat smaller (200M word) collection of older books that are public domain. Common Crawl is another very large collection of text, but you will likely have to do substantial pre processing and cleanup to extract a usable corpus for pre training BERT. Learning a new WordPiece vocabulary This repository does not include code for learning a new WordPiece vocabulary. The reason is that the code used in the paper was implemented in C++ with dependencies on Google's internal libraries. For English, it is almost always better to just start with our vocabulary and pre trained models. For learning vocabularies of other languages, there are a number of open source options available. However, keep in mind that these are not compatible with our tokenization.py library: Google's SentencePiece library tensor2tensor's WordPiece generation script Rico Sennrich's Byte Pair Encoding library Using BERT in Colab If you want to use BERT with Colab , you can get started with the notebook BERT FineTuning with Cloud TPUs . At the time of this writing (October 31st, 2018), Colab users can access a Cloud TPU completely for free. Note: One per user, availability limited, requires a Google Cloud Platform account with storage (although storage may be purchased with free credit for signing up with GCP), and this capability may not longer be available in the future. Click on the BERT Colab that was just linked for more information. FAQ Is this code compatible with Cloud TPUs? What about GPUs? Yes, all of the code in this repository works out of the box with CPU, GPU, and Cloud TPU. However, GPU training is single GPU only. I am getting out of memory errors, what is wrong? See the section on out of memory issues ( out of memory issues) for more information. Is there a PyTorch version available? There is no official PyTorch implementation. However, NLP researchers from HuggingFace made a PyTorch version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. We were not involved in the creation or maintenance of the PyTorch implementation so please direct any questions towards the authors of that repository. Is there a Chainer version available? There is no official Chainer implementation. However, Sosuke Kobayashi made a Chainer version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. We were not involved in the creation or maintenance of the Chainer implementation so please direct any questions towards the authors of that repository. Will models in other languages be released? Yes, we plan to release a multi lingual BERT model in the near future. We cannot make promises about exactly which languages will be included, but it will likely be a single model which includes most of the languages which have a significantly sized Wikipedia. Will models larger than BERT Large be released? So far we have not attempted to train anything larger than BERT Large . It is possible that we will release larger models if we are able to obtain significant improvements. What license is this library released under? All code and models are released under the Apache 2.0 license. See the LICENSE file for more information. How do I cite BERT? For now, cite the Arxiv paper : @article{devlin2018bert, title {BERT: Pre training of Deep Bidirectional Transformers for Language Understanding}, author {Devlin, Jacob and Chang, Ming Wei and Lee, Kenton and Toutanova, Kristina}, journal {arXiv preprint arXiv:1810.04805}, year {2018} } If we submit the paper to a conference or journal, we will update the BibTeX. Disclaimer This is not an official Google product. Contact information For help or issues using BERT, please submit a GitHub issue. For personal communication related to BERT, please contact Jacob Devlin ( jacobdevlin@google.com ), Ming Wei Chang ( mingweichang@google.com ), or Kenton Lee ( kentonl@google.com ).",Machine Translation,Machine Translation 2334,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Papers 6/5/2019 GEC(Classfication) A simple but Effetctive Classfication Model for Grammaticla Error Correction (2018), Z. Kaili et al. PDF 5/5/2019 Domain Adaptation A Survey of Domain Adaptation for Neural Machine Translation PDF Domain Adaptation forMultilingual Neural MachineTranslation PDF 4/3/2019 LEARNING DEEP REPRESENTATIONS BY MUTUAL INFORMATION ESTIMATION AND MAXIMIZATION unsupervised learning of representations by maximizing mutual information between an input and the output of a deep neural network encoder(Deep InfoMax) 27/2/2019 Model Agnostic Meta Learning for Fast Adaptation of Deep Networks solve new learning tasks using only a small number of training samples 26/2/2019 Mixed Precision Training for NLP and Speech Recognition with OpenSeq2Seq 1. Automatically scale loss to prevent gradients from underflow and overflow during backpropagation. The optimizer inspects gradients at each iteration and scales the loss for the next iteration to ensure that the values stay within the FP16 range. 2. Maintain a FP32 copy of weights to accumulate the gradients after each optimizer step. 25/2/ 2019 Simple Recurrent Units for Highly Parallelizable Recurrence 24/2/2019 Squeeze and Excitation Networks “Squeeze and Excitation” (SE) block, that adaptively recalibrates channel wise feature responses by explicitly modelling interdependencies between channels Squeeze: global distribution of channel responses Excitation: gating mechanism to produce chanel wise weights Scale: reweighting feature maps 23/2/2019 deepQuest: A Framework for Neural based Quality Estimation, 2018, Sheffield Improvement point of Neural MT 1. Rare word problem Jean, S., Cho, K., Memisevic, R., & Bengio, Y. (2014). On using very large target vocabulary for neural machine translation. arXiv preprint arXiv:1412.2007. Luong, M. T., Sutskever, I., Le, Q. V., Vinyals, O., & Zaremba, W. (2014). Addressing the rare word problem in neural machine translation. arXiv preprint arXiv:1410.8206. 2. Monolingual data usage Sennrich, R., Haddow, B., & Birch, A. (2015). Improving neural machine translation models with monolingual data. arXiv preprint arXiv:1511.06709. Cheng, Y., Xu, W., He, Z., He, W., Wu, H., Sun, M., & Liu, Y. (2016). Semi supervised learning for neural machine translation. arXiv preprint arXiv:1606.04596. 3. Multiple language translation/multilingual NMT Dong, D., Wu, H., He, W., Yu, D., & Wang, H. (2015). Multi Task Learning for Multiple Language Translation. In ACL (1) (pp. 1723–1732). 4. Memory mechanism Wang, M., Lu, Z., Li, H., & Liu, Q. (2016). Memory enhanced decoder for neural machine translation. arXiv preprint arXiv:1606.02003 5. Linguistic integration Sennrich, R., & Haddow, B. (2016). Linguistic input features improve neural machine translation. arXiv preprint arXiv:1606.02892. 6. Coverage problem Tu, Z., Lu, Z., Liu, Y., Liu, X., & Li, H. (2016). Modeling coverage for neural machine translation. arXiv preprint arXiv:1601.04811. 7. Training process Shen, S., Cheng, Y., He, Z., He, W., Wu, H., Sun, M., & Liu, Y. (2015). Minimum risk training for neural machine translation. arXiv preprint arXiv:1512.02433. 8. Priori knowledge integration Cohn, T., Hoang, C. D. V., Vymolova, E., Yao, K., Dyer, C., & Haffari, G. (2016). Incorporating structural alignment biases into an attentional neural translation model. arXiv preprint arXiv:1601.01085. 9. Multimodal translations Hitschler, J., Schamoni, S., & Riezler, S. (2016). Multimodal pivots for image caption translation. arXiv preprint arXiv:1601.03916. NLP Awesome NLP : Deep Learing Paper Awesome Deeplearning : Contents Understanding / Generalization / Transfer ( understanding generalization transfer) Optimization / Training Techniques ( optimization training techniques) Unsupervised / Generative Models ( unsupervised generative models) Convolutional Network Models ( convolutional neural network models) Image Segmentation / Object Detection ( image segmentation object detection) Image / Video / Etc ( image video etc) Natural Language Processing / RNNs ( natural language processing rnns) Speech / Other Domain ( speech other domain) Reinforcement Learning / Robotics ( reinforcement learning robotics) More Papers from 2016 ( more papers from 2016) (More than Top 100) New Papers ( new papers) : Less than 6 months Old Papers ( old papers) : Before 2012 HW / SW / Dataset ( hw sw dataset) : Technical reports Book / Survey / Review ( book survey review) Video Lectures / Tutorials / Blogs ( video lectures tutorials blogs) Appendix: More than Top 100 ( appendix more than top 100) : More papers not in the list Understanding / Generalization / Transfer Distilling the knowledge in a neural network (2015), G. Hinton et al. pdf Deep neural networks are easily fooled: High confidence predictions for unrecognizable images (2015), A. Nguyen et al. pdf How transferable are features in deep neural networks? (2014), J. Yosinski et al. pdf CNN features off the Shelf: An astounding baseline for recognition (2014), A. Razavian et al. pdf Learning and transferring mid Level image representations using convolutional neural networks (2014), M. Oquab et al. pdf Visualizing and understanding convolutional networks (2014), M. Zeiler and R. Fergus pdf Decaf: A deep convolutional activation feature for generic visual recognition (2014), J. Donahue et al. pdf Optimization / Training Techniques Training very deep networks (2015), R. Srivastava et al. pdf Batch normalization: Accelerating deep network training by reducing internal covariate shift (2015), S. Loffe and C. Szegedy pdf Delving deep into rectifiers: Surpassing human level performance on imagenet classification (2015), K. He et al. pdf Dropout: A simple way to prevent neural networks from overfitting (2014), N. Srivastava et al. pdf Adam: A method for stochastic optimization (2014), D. Kingma and J. Ba pdf Improving neural networks by preventing co adaptation of feature detectors (2012), G. Hinton et al. pdf Random search for hyper parameter optimization (2012) J. Bergstra and Y. Bengio pdf Unsupervised / Generative Models Pixel recurrent neural networks (2016), A. Oord et al. pdf Improved techniques for training GANs (2016), T. Salimans et al. pdf Unsupervised representation learning with deep convolutional generative adversarial networks (2015), A. Radford et al. pdf DRAW: A recurrent neural network for image generation (2015), K. Gregor et al. pdf Generative adversarial nets (2014), I. Goodfellow et al. pdf Auto encoding variational Bayes (2013), D. Kingma and M. Welling pdf Building high level features using large scale unsupervised learning (2013), Q. Le et al. pdf Convolutional Neural Network Models Rethinking the inception architecture for computer vision (2016), C. Szegedy et al. pdf Inception v4, inception resnet and the impact of residual connections on learning (2016), C. Szegedy et al. pdf Identity Mappings in Deep Residual Networks (2016), K. He et al. pdf Deep residual learning for image recognition (2016), K. He et al. pdf Spatial transformer network (2015), M. Jaderberg et al., pdf Going deeper with convolutions (2015), C. Szegedy et al. pdf Very deep convolutional networks for large scale image recognition (2014), K. Simonyan and A. Zisserman pdf Return of the devil in the details: delving deep into convolutional nets (2014), K. Chatfield et al. pdf OverFeat: Integrated recognition, localization and detection using convolutional networks (2013), P. Sermanet et al. pdf Maxout networks (2013), I. Goodfellow et al. pdf Network in network (2013), M. Lin et al. pdf ImageNet classification with deep convolutional neural networks (2012), A. Krizhevsky et al. pdf Image: Segmentation / Object Detection You only look once: Unified, real time object detection (2016), J. Redmon et al. pdf Fully convolutional networks for semantic segmentation (2015), J. Long et al. pdf Faster R CNN: Towards Real Time Object Detection with Region Proposal Networks (2015), S. Ren et al. pdf Fast R CNN (2015), R. Girshick pdf Rich feature hierarchies for accurate object detection and semantic segmentation (2014), R. Girshick et al. pdf Spatial pyramid pooling in deep convolutional networks for visual recognition (2014), K. He et al. pdf Semantic image segmentation with deep convolutional nets and fully connected CRFs , L. Chen et al. pdf Learning hierarchical features for scene labeling (2013), C. Farabet et al. pdf Image / Video / Etc Image Super Resolution Using Deep Convolutional Networks (2016), C. Dong et al. pdf A neural algorithm of artistic style (2015), L. Gatys et al. pdf Deep visual semantic alignments for generating image descriptions (2015), A. Karpathy and L. Fei Fei pdf Show, attend and tell: Neural image caption generation with visual attention (2015), K. Xu et al. pdf Show and tell: A neural image caption generator (2015), O. Vinyals et al. pdf Long term recurrent convolutional networks for visual recognition and description (2015), J. Donahue et al. pdf VQA: Visual question answering (2015), S. Antol et al. pdf DeepFace: Closing the gap to human level performance in face verification (2014), Y. Taigman et al. pdf : Large scale video classification with convolutional neural networks (2014), A. Karpathy et al. pdf Two stream convolutional networks for action recognition in videos (2014), K. Simonyan et al. pdf 3D convolutional neural networks for human action recognition (2013), S. Ji et al. pdf Natural Language Processing / RNNs Neural Architectures for Named Entity Recognition (2016), G. Lample et al. pdf Exploring the limits of language modeling (2016), R. Jozefowicz et al. pdf Teaching machines to read and comprehend (2015), K. Hermann et al. pdf Effective approaches to attention based neural machine translation (2015), M. Luong et al. pdf Conditional random fields as recurrent neural networks (2015), S. Zheng and S. Jayasumana. pdf Memory networks (2014), J. Weston et al. pdf Neural turing machines (2014), A. Graves et al. pdf Neural machine translation by jointly learning to align and translate (2014), D. Bahdanau et al. pdf Sequence to sequence learning with neural networks (2014), I. Sutskever et al. pdf Learning phrase representations using RNN encoder decoder for statistical machine translation (2014), K. Cho et al. pdf A convolutional neural network for modeling sentences (2014), N. Kalchbrenner et al. pdf Convolutional neural networks for sentence classification (2014), Y. Kim pdf Glove: Global vectors for word representation (2014), J. Pennington et al. pdf Distributed representations of sentences and documents (2014), Q. Le and T. Mikolov pdf Distributed representations of words and phrases and their compositionality (2013), T. Mikolov et al. pdf Efficient estimation of word representations in vector space (2013), T. Mikolov et al. pdf Recursive deep models for semantic compositionality over a sentiment treebank (2013), R. Socher et al. pdf Generating sequences with recurrent neural networks (2013), A. Graves. pdf Speech / Other Domain End to end attention based large vocabulary speech recognition (2016), D. Bahdanau et al. pdf Deep speech 2: End to end speech recognition in English and Mandarin (2015), D. Amodei et al. pdf Speech recognition with deep recurrent neural networks (2013), A. Graves pdf Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups (2012), G. Hinton et al. pdf Context dependent pre trained deep neural networks for large vocabulary speech recognition (2012) G. Dahl et al. pdf Acoustic modeling using deep belief networks (2012), A. Mohamed et al. pdf Reinforcement Learning / Robotics End to end training of deep visuomotor policies (2016), S. Levine et al. pdf Learning Hand Eye Coordination for Robotic Grasping with Deep Learning and Large Scale Data Collection (2016), S. Levine et al. pdf Asynchronous methods for deep reinforcement learning (2016), V. Mnih et al. pdf Deep Reinforcement Learning with Double Q Learning (2016), H. Hasselt et al. pdf Mastering the game of Go with deep neural networks and tree search (2016), D. Silver et al. pdf Continuous control with deep reinforcement learning (2015), T. Lillicrap et al. pdf Human level control through deep reinforcement learning (2015), V. Mnih et al. pdf Deep learning for detecting robotic grasps (2015), I. Lenz et al. pdf Playing atari with deep reinforcement learning (2013), V. Mnih et al. pdf ) More Papers from 2016 Layer Normalization (2016), J. Ba et al. pdf Learning to learn by gradient descent by gradient descent (2016), M. Andrychowicz et al. pdf Domain adversarial training of neural networks (2016), Y. Ganin et al. pdf WaveNet: A Generative Model for Raw Audio (2016), A. Oord et al. pdf web Colorful image colorization (2016), R. Zhang et al. pdf Generative visual manipulation on the natural image manifold (2016), J. Zhu et al. pdf Texture networks: Feed forward synthesis of textures and stylized images (2016), D Ulyanov et al. pdf SSD: Single shot multibox detector (2016), W. Liu et al. pdf SqueezeNet: AlexNet level accuracy with 50x fewer parameters and< 1MB model size (2016), F. Iandola et al. pdf Eie: Efficient inference engine on compressed deep neural network (2016), S. Han et al. pdf Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or 1 (2016), M. Courbariaux et al. pdf Dynamic memory networks for visual and textual question answering (2016), C. Xiong et al. pdf Stacked attention networks for image question answering (2016), Z. Yang et al. pdf Hybrid computing using a neural network with dynamic external memory (2016), A. Graves et al. pdf Google's neural machine translation system: Bridging the gap between human and machine translation (2016), Y. Wu et al. pdf New papers Newly published papers (< 6 months) which are worth reading MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications (2017), Andrew G. Howard et al. pdf Convolutional Sequence to Sequence Learning (2017), Jonas Gehring et al. pdf A Knowledge Grounded Neural Conversation Model (2017), Marjan Ghazvininejad et al. pdf Accurate, Large Minibatch SGD:Training ImageNet in 1 Hour (2017), Priya Goyal et al. pdf TACOTRON: Towards end to end speech synthesis (2017), Y. Wang et al. pdf Deep Photo Style Transfer (2017), F. Luan et al. pdf Evolution Strategies as a Scalable Alternative to Reinforcement Learning (2017), T. Salimans et al. pdf Deformable Convolutional Networks (2017), J. Dai et al. pdf Mask R CNN (2017), K. He et al. pdf Learning to discover cross domain relations with generative adversarial networks (2017), T. Kim et al. pdf Deep voice: Real time neural text to speech (2017), S. Arik et al., pdf PixelNet: Representation of the pixels, by the pixels, and for the pixels (2017), A. Bansal et al. pdf Batch renormalization: Towards reducing minibatch dependence in batch normalized models (2017), S. Ioffe. pdf Wasserstein GAN (2017), M. Arjovsky et al. pdf Understanding deep learning requires rethinking generalization (2017), C. Zhang et al. pdf Least squares generative adversarial networks (2016), X. Mao et al. pdf Old Papers Classic papers published before 2012 An analysis of single layer networks in unsupervised feature learning (2011), A. Coates et al. pdf Deep sparse rectifier neural networks (2011), X. Glorot et al. pdf Natural language processing (almost) from scratch (2011), R. Collobert et al. pdf Recurrent neural network based language model (2010), T. Mikolov et al. pdf Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion (2010), P. Vincent et al. pdf Learning mid level features for recognition (2010), Y. Boureau pdf A practical guide to training restricted boltzmann machines (2010), G. Hinton pdf Understanding the difficulty of training deep feedforward neural networks (2010), X. Glorot and Y. Bengio pdf Why does unsupervised pre training help deep learning (2010), D. Erhan et al. pdf Learning deep architectures for AI (2009), Y. Bengio. pdf .pdf) Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations (2009), H. Lee et al. pdf Greedy layer wise training of deep networks (2007), Y. Bengio et al. pdf Reducing the dimensionality of data with neural networks, G. Hinton and R. Salakhutdinov. pdf A fast learning algorithm for deep belief nets (2006), G. Hinton et al. pdf Gradient based learning applied to document recognition (1998), Y. LeCun et al. pdf Long short term memory (1997), S. Hochreiter and J. Schmidhuber. pdf HW / SW / Dataset SQuAD: 100,000+ Questions for Machine Comprehension of Text (2016), Rajpurkar et al. pdf OpenAI gym (2016), G. Brockman et al. pdf TensorFlow: Large scale machine learning on heterogeneous distributed systems (2016), M. Abadi et al. pdf Theano: A Python framework for fast computation of mathematical expressions, R. Al Rfou et al. Torch7: A matlab like environment for machine learning, R. Collobert et al. pdf MatConvNet: Convolutional neural networks for matlab (2015), A. Vedaldi and K. Lenc pdf Imagenet large scale visual recognition challenge (2015), O. Russakovsky et al. pdf Caffe: Convolutional architecture for fast feature embedding (2014), Y. Jia et al. pdf Book / Survey / Review On the Origin of Deep Learning (2017), H. Wang and Bhiksha Raj. pdf Deep Reinforcement Learning: An Overview (2017), Y. Li, pdf Neural Machine Translation and Sequence to sequence Models(2017): A Tutorial, G. Neubig. pdf Neural Network and Deep Learning (Book, Jan 2017), Michael Nielsen. html Deep learning (Book, 2016), Goodfellow et al. html LSTM: A search space odyssey (2016), K. Greff et al. pdf Tutorial on Variational Autoencoders (2016), C. Doersch. pdf Deep learning (2015), Y. LeCun, Y. Bengio and G. Hinton pdf Deep learning in neural networks: An overview (2015), J. Schmidhuber pdf Representation learning: A review and new perspectives (2013), Y. Bengio et al. pdf Video Lectures / Tutorials / Blogs (Lectures) CS231n, Convolutional Neural Networks for Visual Recognition, Stanford University web CS224d, Deep Learning for Natural Language Processing, Stanford University web Oxford Deep NLP 2017, Deep Learning for Natural Language Processing, University of Oxford web (Tutorials) NIPS 2016 Tutorials, Long Beach web ICML 2016 Tutorials, New York City web ICLR 2016 Videos, San Juan web Deep Learning Summer School 2016, Montreal web Bay Area Deep Learning School 2016, Stanford web (Blogs) OpenAI web Distill web Andrej Karpathy Blog web Colah's Blog Web WildML Web FastML web TheMorningPaper web Appendix: More than Top 100 (2016) A character level decoder without explicit segmentation for neural machine translation (2016), J. Chung et al. pdf Dermatologist level classification of skin cancer with deep neural networks (2017), A. Esteva et al. html Weakly supervised object localization with multi fold multiple instance learning (2017), R. Gokberk et al. pdf Brain tumor segmentation with deep neural networks (2017), M. Havaei et al. pdf Professor Forcing: A New Algorithm for Training Recurrent Networks (2016), A. Lamb et al. pdf Adversarially learned inference (2016), V. Dumoulin et al. web pdf Understanding convolutional neural networks (2016), J. Koushik pdf Taking the human out of the loop: A review of bayesian optimization (2016), B. Shahriari et al. pdf Adaptive computation time for recurrent neural networks (2016), A. Graves pdf Densely connected convolutional networks (2016), G. Huang et al. pdf Region based convolutional networks for accurate object detection and segmentation (2016), R. Girshick et al. Continuous deep q learning with model based acceleration (2016), S. Gu et al. pdf A thorough examination of the cnn/daily mail reading comprehension task (2016), D. Chen et al. pdf Achieving open vocabulary neural machine translation with hybrid word character models, M. Luong and C. Manning. pdf Very Deep Convolutional Networks for Natural Language Processing (2016), A. Conneau et al. pdf Bag of tricks for efficient text classification (2016), A. Joulin et al. pdf Efficient piecewise training of deep structured models for semantic segmentation (2016), G. Lin et al. pdf Learning to compose neural networks for question answering (2016), J. Andreas et al. pdf Perceptual losses for real time style transfer and super resolution (2016), J. Johnson et al. pdf Reading text in the wild with convolutional neural networks (2016), M. Jaderberg et al. pdf What makes for effective detection proposals? (2016), J. Hosang et al. pdf Inside outside net: Detecting objects in context with skip pooling and recurrent neural networks (2016), S. Bell et al. pdf . Instance aware semantic segmentation via multi task network cascades (2016), J. Dai et al. pdf Conditional image generation with pixelcnn decoders (2016), A. van den Oord et al. pdf Deep networks with stochastic depth (2016), G. Huang et al., pdf Consistency and Fluctuations For Stochastic Gradient Langevin Dynamics (2016), Yee Whye Teh et al. pdf (2015) Ask your neurons: A neural based approach to answering questions about images (2015), M. Malinowski et al. pdf Exploring models and data for image question answering (2015), M. Ren et al. pdf Are you talking to a machine? dataset and methods for multilingual image question (2015), H. Gao et al. pdf Mind's eye: A recurrent visual representation for image caption generation (2015), X. Chen and C. Zitnick. pdf From captions to visual concepts and back (2015), H. Fang et al. pdf . Towards AI complete question answering: A set of prerequisite toy tasks (2015), J. Weston et al. pdf Ask me anything: Dynamic memory networks for natural language processing (2015), A. Kumar et al. pdf Unsupervised learning of video representations using LSTMs (2015), N. Srivastava et al. pdf Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding (2015), S. Han et al. pdf Improved semantic representations from tree structured long short term memory networks (2015), K. Tai et al. pdf Character aware neural language models (2015), Y. Kim et al. pdf Grammar as a foreign language (2015), O. Vinyals et al. pdf Trust Region Policy Optimization (2015), J. Schulman et al. pdf Beyond short snippents: Deep networks for video classification (2015) pdf Learning Deconvolution Network for Semantic Segmentation (2015), H. Noh et al. pdf Learning spatiotemporal features with 3d convolutional networks (2015), D. Tran et al. pdf Understanding neural networks through deep visualization (2015), J. Yosinski et al. pdf An Empirical Exploration of Recurrent Network Architectures (2015), R. Jozefowicz et al. pdf Deep generative image models using a laplacian pyramid of adversarial networks (2015), E.Denton et al. pdf Gated Feedback Recurrent Neural Networks (2015), J. Chung et al. pdf Fast and accurate deep network learning by exponential linear units (ELUS) (2015), D. Clevert et al. pdf Pointer networks (2015), O. Vinyals et al. pdf Visualizing and Understanding Recurrent Networks (2015), A. Karpathy et al. pdf Attention based models for speech recognition (2015), J. Chorowski et al. pdf End to end memory networks (2015), S. Sukbaatar et al. pdf Describing videos by exploiting temporal structure (2015), L. Yao et al. pdf A neural conversational model (2015), O. Vinyals and Q. Le. pdf Improving distributional similarity with lessons learned from word embeddings, O. Levy et al. pdf Transition Based Dependency Parsing with Stack Long Short Term Memory (2015), C. Dyer et al. pdf Improved Transition Based Parsing by Modeling Characters instead of Words with LSTMs (2015), M. Ballesteros et al. pdf Finding function in form: Compositional character models for open vocabulary word representation (2015), W. Ling et al. pdf (2014) DeepPose: Human pose estimation via deep neural networks (2014), A. Toshev and C. Szegedy pdf Learning a Deep Convolutional Network for Image Super Resolution (2014, C. Dong et al. pdf Recurrent models of visual attention (2014), V. Mnih et al. pdf Empirical evaluation of gated recurrent neural networks on sequence modeling (2014), J. Chung et al. pdf Addressing the rare word problem in neural machine translation (2014), M. Luong et al. pdf On the properties of neural machine translation: Encoder decoder approaches (2014), K. Cho et. al. Recurrent neural network regularization (2014), W. Zaremba et al. pdf Intriguing properties of neural networks (2014), C. Szegedy et al. pdf Towards end to end speech recognition with recurrent neural networks (2014), A. Graves and N. Jaitly. pdf Scalable object detection using deep neural networks (2014), D. Erhan et al. pdf On the importance of initialization and momentum in deep learning (2013), I. Sutskever et al. pdf Regularization of neural networks using dropconnect (2013), L. Wan et al. pdf Learning Hierarchical Features for Scene Labeling (2013), C. Farabet et al. pdf Linguistic Regularities in Continuous Space Word Representations (2013), T. Mikolov et al. pdf Large scale distributed deep networks (2012), J. Dean et al. pdf A Fast and Accurate Dependency Parser using Neural Networks. Chen and Manning. pdf",Machine Translation,Machine Translation 2345,Natural Language Processing,Natural Language Processing,Natural Language Processing,"BERT \ \ \ \ \ New February 7th, 2019: TfHub Module \ \ \ \ \ BERT has been uploaded to TensorFlow Hub . See run_classifier_with_tfhub.py for an example of how to use the TF Hub module. \ \ \ \ \ New November 23rd, 2018: Un normalized multilingual model + Thai + Mongolian \ \ \ \ \ We uploaded a new multilingual model which does not perform any normalization on the input (no lower casing, accent stripping, or Unicode normalization), and additionally inclues Thai and Mongolian. It is recommended to use this version for developing multilingual models, especially on languages with non Latin alphabets. This does not require any code changes, and can be downloaded here: BERT Base, Multilingual Cased : 104 languages, 12 layer, 768 hidden, 12 heads, 110M parameters \ \ \ \ \ New November 15th, 2018: SOTA SQuAD 2.0 System \ \ \ \ \ We released code changes to reproduce our 83% F1 SQuAD 2.0 system, which is currently 1st place on the leaderboard by 3%. See the SQuAD 2.0 section of the README for details. \ \ \ \ \ New November 5th, 2018: Third party PyTorch and Chainer versions of BERT available \ \ \ \ \ NLP researchers from HuggingFace made a PyTorch version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. Sosuke Kobayashi also made a Chainer version of BERT available (Thanks!) We were not involved in the creation or maintenance of the PyTorch implementation so please direct any questions towards the authors of that repository. \ \ \ \ \ New November 3rd, 2018: Multilingual and Chinese models available \ \ \ \ \ We have made two new BERT models available: BERT Base, Multilingual (Not recommended, use Multilingual Cased instead) : 102 languages, 12 layer, 768 hidden, 12 heads, 110M parameters BERT Base, Chinese : Chinese Simplified and Traditional, 12 layer, 768 hidden, 12 heads, 110M parameters We use character based tokenization for Chinese, and WordPiece tokenization for all other languages. Both models should work out of the box without any code changes. We did update the implementation of BasicTokenizer in tokenization.py to support Chinese character tokenization, so please update if you forked it. However, we did not change the tokenization API. For more, see the Multilingual README . \ \ \ \ \ End new information \ \ \ \ \ Introduction BERT , or B idirectional E ncoder R epresentations from T ransformers, is a new method of pre training language representations which obtains state of the art results on a wide array of Natural Language Processing (NLP) tasks. Our academic paper which describes BERT in detail and provides full results on a number of tasks can be found here: To give a few numbers, here are the results on the SQuAD v1.1 question answering task: SQuAD v1.1 Leaderboard (Oct 8th 2018) Test EM Test F1 : : : : 1st Place Ensemble BERT 87.4 93.2 2nd Place Ensemble nlnet 86.0 91.7 1st Place Single Model BERT 85.1 91.8 2nd Place Single Model nlnet 83.5 90.1 And several natural language inference tasks: System MultiNLI Question NLI SWAG : : : : : : BERT 86.7 91.1 86.3 OpenAI GPT (Prev. SOTA) 82.2 88.1 75.0 Plus many other tasks. Moreover, these results were all obtained with almost no task specific neural network architecture design. If you already know what BERT is and you just want to get started, you can download the pre trained models ( pre trained models) and run a state of the art fine tuning ( fine tuning with bert) in only a few minutes. What is BERT? BERT is a method of pre training language representations, meaning that we train a general purpose language understanding model on a large text corpus (like Wikipedia), and then use that model for downstream NLP tasks that we care about (like question answering). BERT outperforms previous methods because it is the first unsupervised , deeply bidirectional system for pre training NLP. Unsupervised means that BERT was trained using only a plain text corpus, which is important because an enormous amount of plain text data is publicly available on the web in many languages. Pre trained representations can also either be context free or contextual , and contextual representations can further be unidirectional or bidirectional . Context free models such as word2vec or GloVe generate a single word embedding representation for each word in the vocabulary, so bank would have the same representation in bank deposit and river bank . Contextual models instead generate a representation of each word that is based on the other words in the sentence. BERT was built upon recent work in pre training contextual representations — including Semi supervised Sequence Learning , Generative Pre Training , ELMo , and ULMFit — but crucially these models are all unidirectional or shallowly bidirectional . This means that each word is only contextualized using the words to its left (or right). For example, in the sentence I made a bank deposit the unidirectional representation of bank is only based on I made a but not deposit . Some previous work does combine the representations from separate left context and right context models, but only in a shallow manner. BERT represents bank using both its left and right context — I made a ... deposit — starting from the very bottom of a deep neural network, so it is deeply bidirectional . BERT uses a simple approach for this: We mask out 15% of the words in the input, run the entire sequence through a deep bidirectional Transformer encoder, and then predict only the masked words. For example: Input: the man went to the MASK1 . he bought a MASK2 of milk. Labels: MASK1 store; MASK2 gallon In order to learn relationships between sentences, we also train on a simple task which can be generated from any monolingual corpus: Given two sentences A and B , is B the actual next sentence that comes after A , or just a random sentence from the corpus? Sentence A: the man went to the store . Sentence B: he bought a gallon of milk . Label: IsNextSentence Sentence A: the man went to the store . Sentence B: penguins are flightless . Label: NotNextSentence We then train a large model (12 layer to 24 layer Transformer) on a large corpus (Wikipedia + BookCorpus ) for a long time (1M update steps), and that's BERT. Using BERT has two stages: Pre training and fine tuning . Pre training is fairly expensive (four days on 4 to 16 Cloud TPUs), but is a one time procedure for each language (current models are English only, but multilingual models will be released in the near future). We are releasing a number of pre trained models from the paper which were pre trained at Google. Most NLP researchers will never need to pre train their own model from scratch. Fine tuning is inexpensive. All of the results in the paper can be replicated in at most 1 hour on a single Cloud TPU, or a few hours on a GPU, starting from the exact same pre trained model. SQuAD, for example, can be trained in around 30 minutes on a single Cloud TPU to achieve a Dev F1 score of 91.0%, which is the single system state of the art. The other important aspect of BERT is that it can be adapted to many types of NLP tasks very easily. In the paper, we demonstrate state of the art results on sentence level (e.g., SST 2), sentence pair level (e.g., MultiNLI), word level (e.g., NER), and span level (e.g., SQuAD) tasks with almost no task specific modifications. What has been released in this repository? We are releasing the following: TensorFlow code for the BERT model architecture (which is mostly a standard Transformer architecture). Pre trained checkpoints for both the lowercase and cased version of BERT Base and BERT Large from the paper. TensorFlow code for push button replication of the most important fine tuning experiments from the paper, including SQuAD, MultiNLI, and MRPC. All of the code in this repository works out of the box with CPU, GPU, and Cloud TPU. Pre trained models We are releasing the BERT Base and BERT Large models from the paper. Uncased means that the text has been lowercased before WordPiece tokenization, e.g., John Smith becomes john smith . The Uncased model also strips out any accent markers. Cased means that the true case and accent markers are preserved. Typically, the Uncased model is better unless you know that case information is important for your task (e.g., Named Entity Recognition or Part of Speech tagging). These models are all released under the same license as the source code (Apache 2.0). For information about the Multilingual and Chinese model, see the Multilingual README . When using a cased model, make sure to pass do_lower False to the training scripts. (Or pass do_lower_case False directly to FullTokenizer if you're using your own script.) The links to the models are here (right click, 'Save link as...' on the name): BERT Base, Uncased : 12 layer, 768 hidden, 12 heads, 110M parameters BERT Large, Uncased : 24 layer, 1024 hidden, 16 heads, 340M parameters BERT Base, Cased : 12 layer, 768 hidden, 12 heads , 110M parameters BERT Large, Cased : 24 layer, 1024 hidden, 16 heads, 340M parameters BERT Base, Multilingual Cased (New, recommended) : 104 languages, 12 layer, 768 hidden, 12 heads, 110M parameters BERT Base, Multilingual Uncased (Orig, not recommended) (Not recommended, use Multilingual Cased instead) : 102 languages, 12 layer, 768 hidden, 12 heads, 110M parameters BERT Base, Chinese : Chinese Simplified and Traditional, 12 layer, 768 hidden, 12 heads, 110M parameters Each .zip file contains three items: A TensorFlow checkpoint ( bert_model.ckpt ) containing the pre trained weights (which is actually 3 files). A vocab file ( vocab.txt ) to map WordPiece to word id. A config file ( bert_config.json ) which specifies the hyperparameters of the model. Fine tuning with BERT Important : All results on the paper were fine tuned on a single Cloud TPU, which has 64GB of RAM. It is currently not possible to re produce most of the BERT Large results on the paper using a GPU with 12GB 16GB of RAM, because the maximum batch size that can fit in memory is too small. We are working on adding code to this repository which allows for much larger effective batch size on the GPU. See the section on out of memory issues ( out of memory issues) for more details. This code was tested with TensorFlow 1.11.0. It was tested with Python2 and Python3 (but more thoroughly with Python2, since this is what's used internally in Google). The fine tuning examples which use BERT Base should be able to run on a GPU that has at least 12GB of RAM using the hyperparameters given. Fine tuning with Cloud TPUs Most of the examples below assumes that you will be running training/evaluation on your local machine, using a GPU like a Titan X or GTX 1080. However, if you have access to a Cloud TPU that you want to train on, just add the following flags to run_classifier.py or run_squad.py : use_tpu True \ tpu_name $TPU_NAME Please see the Google Cloud TPU tutorial for how to use Cloud TPUs. Alternatively, you can use the Google Colab notebook BERT FineTuning with Cloud TPUs . On Cloud TPUs, the pretrained model and the output directory will need to be on Google Cloud Storage. For example, if you have a bucket named some_bucket , you might use the following flags instead: output_dir gs://some_bucket/my_output_dir/ The unzipped pre trained model files can also be found in the Google Cloud Storage folder gs://bert_models/2018_10_18 . For example: export BERT_BASE_DIR gs://bert_models/2018_10_18/uncased_L 12_H 768_A 12 Sentence (and sentence pair) classification tasks Before running this example you must download the GLUE data by running this script and unpack it to some directory $GLUE_DIR . Next, download the BERT Base checkpoint and unzip it to some directory $BERT_BASE_DIR . This example code fine tunes BERT Base on the Microsoft Research Paraphrase Corpus (MRPC) corpus, which only contains 3,600 examples and can fine tune in a few minutes on most GPUs. shell export BERT_BASE_DIR /path/to/bert/uncased_L 12_H 768_A 12 export GLUE_DIR /path/to/glue python run_classifier.py \ task_name MRPC \ do_train true \ do_eval true \ data_dir $GLUE_DIR/MRPC \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ max_seq_length 128 \ train_batch_size 32 \ learning_rate 2e 5 \ num_train_epochs 3.0 \ output_dir /tmp/mrpc_output/ You should see output like this: Eval results eval_accuracy 0.845588 eval_loss 0.505248 global_step 343 loss 0.505248 This means that the Dev set accuracy was 84.55%. Small sets like MRPC have a high variance in the Dev set accuracy, even when starting from the same pre training checkpoint. If you re run multiple times (making sure to point to different output_dir ), you should see results between 84% and 88%. A few other pre trained models are implemented off the shelf in run_classifier.py , so it should be straightforward to follow those examples to use BERT for any single sentence or sentence pair classification task. Note: You might see a message Running train on CPU . This really just means that it's running on something other than a Cloud TPU, which includes a GPU. Prediction from classifier Once you have trained your classifier you can use it in inference mode by using the do_predict true command. You need to have a file named test.tsv in the input folder. Output will be created in file called test_results.tsv in the output folder. Each line will contain output for each sample, columns are the class probabilities. shell export BERT_BASE_DIR /path/to/bert/uncased_L 12_H 768_A 12 export GLUE_DIR /path/to/glue export TRAINED_CLASSIFIER /path/to/fine/tuned/classifier python run_classifier.py \ task_name MRPC \ do_predict true \ data_dir $GLUE_DIR/MRPC \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $TRAINED_CLASSIFIER \ max_seq_length 128 \ output_dir /tmp/mrpc_output/ SQuAD 1.1 The Stanford Question Answering Dataset (SQuAD) is a popular question answering benchmark dataset. BERT (at the time of the release) obtains state of the art results on SQuAD with almost no task specific network architecture modifications or data augmentation. However, it does require semi complex data pre processing and post processing to deal with (a) the variable length nature of SQuAD context paragraphs, and (b) the character level answer annotations which are used for SQuAD training. This processing is implemented and documented in run_squad.py . To run on SQuAD, you will first need to download the dataset. The SQuAD website does not seem to link to the v1.1 datasets any longer, but the necessary files can be found here: train v1.1.json dev v1.1.json evaluate v1.1.py Download these to some directory $SQUAD_DIR . The state of the art SQuAD results from the paper currently cannot be reproduced on a 12GB 16GB GPU due to memory constraints (in fact, even batch size 1 does not seem to fit on a 12GB GPU using BERT Large ). However, a reasonably strong BERT Base model can be trained on the GPU with these hyperparameters: shell python run_squad.py \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ do_train True \ train_file $SQUAD_DIR/train v1.1.json \ do_predict True \ predict_file $SQUAD_DIR/dev v1.1.json \ train_batch_size 12 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir /tmp/squad_base/ The dev set predictions will be saved into a file called predictions.json in the output_dir : shell python $SQUAD_DIR/evaluate v1.1.py $SQUAD_DIR/dev v1.1.json ./squad/predictions.json Which should produce an output like this: shell { f1 : 88.41249612335034, exact_match : 81.2488174077578} You should see a result similar to the 88.5% reported in the paper for BERT Base . If you have access to a Cloud TPU, you can train with BERT Large . Here is a set of hyperparameters (slightly different than the paper) which consistently obtain around 90.5% 91.0% F1 single system trained only on SQuAD: shell python run_squad.py \ vocab_file $BERT_LARGE_DIR/vocab.txt \ bert_config_file $BERT_LARGE_DIR/bert_config.json \ init_checkpoint $BERT_LARGE_DIR/bert_model.ckpt \ do_train True \ train_file $SQUAD_DIR/train v1.1.json \ do_predict True \ predict_file $SQUAD_DIR/dev v1.1.json \ train_batch_size 24 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir gs://some_bucket/squad_large/ \ use_tpu True \ tpu_name $TPU_NAME For example, one random run with these parameters produces the following Dev scores: shell { f1 : 90.87081895814865, exact_match : 84.38978240302744} If you fine tune for one epoch on TriviaQA before this the results will be even better, but you will need to convert TriviaQA into the SQuAD json format. SQuAD 2.0 This model is also implemented and documented in run_squad.py . To run on SQuAD 2.0, you will first need to download the dataset. The necessary files can be found here: train v2.0.json dev v2.0.json evaluate v2.0.py Download these to some directory $SQUAD_DIR . On Cloud TPU you can run with BERT Large as follows: shell python run_squad.py \ vocab_file $BERT_LARGE_DIR/vocab.txt \ bert_config_file $BERT_LARGE_DIR/bert_config.json \ init_checkpoint $BERT_LARGE_DIR/bert_model.ckpt \ do_train True \ train_file $SQUAD_DIR/train v2.0.json \ do_predict True \ predict_file $SQUAD_DIR/dev v2.0.json \ train_batch_size 24 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir gs://some_bucket/squad_large/ \ use_tpu True \ tpu_name $TPU_NAME \ version_2_with_negative True We assume you have copied everything from the output directory to a local directory called ./squad/. The initial dev set predictions will be at ./squad/predictions.json and the differences between the score of no answer ( ) and the best non null answer for each question will be in the file ./squad/null_odds.json Run this script to tune a threshold for predicting null versus non null answers: python $SQUAD_DIR/evaluate v2.0.py $SQUAD_DIR/dev v2.0.json ./squad/predictions.json na prob file ./squad/null_odds.json Assume the script outputs best_f1_thresh THRESH. (Typical values are between 1.0 and 5.0). You can now re run the model to generate predictions with the derived threshold or alternatively you can extract the appropriate answers from ./squad/nbest_predictions.json. shell python run_squad.py \ vocab_file $BERT_LARGE_DIR/vocab.txt \ bert_config_file $BERT_LARGE_DIR/bert_config.json \ init_checkpoint $BERT_LARGE_DIR/bert_model.ckpt \ do_train False \ train_file $SQUAD_DIR/train v2.0.json \ do_predict True \ predict_file $SQUAD_DIR/dev v2.0.json \ train_batch_size 24 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir gs://some_bucket/squad_large/ \ use_tpu True \ tpu_name $TPU_NAME \ version_2_with_negative True \ null_score_diff_threshold $THRESH Out of memory issues All experiments in the paper were fine tuned on a Cloud TPU, which has 64GB of device RAM. Therefore, when using a GPU with 12GB 16GB of RAM, you are likely to encounter out of memory issues if you use the same hyperparameters described in the paper. The factors that affect memory usage are: max_seq_length : The released models were trained with sequence lengths up to 512, but you can fine tune with a shorter max sequence length to save substantial memory. This is controlled by the max_seq_length flag in our example code. train_batch_size : The memory usage is also directly proportional to the batch size. Model type, BERT Base vs. BERT Large : The BERT Large model requires significantly more memory than BERT Base . Optimizer : The default optimizer for BERT is Adam, which requires a lot of extra memory to store the m and v vectors. Switching to a more memory efficient optimizer can reduce memory usage, but can also affect the results. We have not experimented with other optimizers for fine tuning. Using the default training scripts ( run_classifier.py and run_squad.py ), we benchmarked the maximum batch size on single Titan X GPU (12GB RAM) with TensorFlow 1.11.0: System Seq Length Max Batch Size BERT Base 64 64 ... 128 32 ... 256 16 ... 320 14 ... 384 12 ... 512 6 BERT Large 64 12 ... 128 6 ... 256 2 ... 320 1 ... 384 0 ... 512 0 Unfortunately, these max batch sizes for BERT Large are so small that they will actually harm the model accuracy, regardless of the learning rate used. We are working on adding code to this repository which will allow much larger effective batch sizes to be used on the GPU. The code will be based on one (or both) of the following techniques: Gradient accumulation : The samples in a minibatch are typically independent with respect to gradient computation (excluding batch normalization, which is not used here). This means that the gradients of multiple smaller minibatches can be accumulated before performing the weight update, and this will be exactly equivalent to a single larger update. Gradient checkpointing : The major use of GPU/TPU memory during DNN training is caching the intermediate activations in the forward pass that are necessary for efficient computation in the backward pass. Gradient checkpointing trades memory for compute time by re computing the activations in an intelligent way. However, this is not implemented in the current release. Using BERT to extract fixed feature vectors (like ELMo) In certain cases, rather than fine tuning the entire pre trained model end to end, it can be beneficial to obtained pre trained contextual embeddings , which are fixed contextual representations of each input token generated from the hidden layers of the pre trained model. This should also mitigate most of the out of memory issues. As an example, we include the script extract_features.py which can be used like this: shell Sentence A and Sentence B are separated by the delimiter for sentence pair tasks like question answering and entailment. For single sentence inputs, put one sentence per line and DON'T use the delimiter. echo 'Who was Jim Henson ? Jim Henson was a puppeteer' > /tmp/input.txt python extract_features.py \ input_file /tmp/input.txt \ output_file /tmp/output.jsonl \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ layers 1, 2, 3, 4 \ max_seq_length 128 \ batch_size 8 This will create a JSON file (one line per line of input) containing the BERT activations from each Transformer layer specified by layers ( 1 is the final hidden layer of the Transformer, etc.) Note that this script will produce very large output files (by default, around 15kb for every input token). If you need to maintain alignment between the original and tokenized words (for projecting training labels), see the Tokenization ( tokenization) section below. Note: You may see a message like Could not find trained model in model_dir: /tmp/tmpuB5g5c, running initialization to predict. This message is expected, it just means that we are using the init_from_checkpoint() API rather than the saved model API. If you don't specify a checkpoint or specify an invalid checkpoint, this script will complain. Tokenization For sentence level tasks (or sentence pair) tasks, tokenization is very simple. Just follow the example code in run_classifier.py and extract_features.py . The basic procedure for sentence level tasks is: 1. Instantiate an instance of tokenizer tokenization.FullTokenizer 2. Tokenize the raw text with tokens tokenizer.tokenize(raw_text) . 3. Truncate to the maximum sequence length. (You can use up to 512, but you probably want to use shorter if possible for memory and speed reasons.) 4. Add the CLS and SEP tokens in the right place. Word level and span level tasks (e.g., SQuAD and NER) are more complex, since you need to maintain alignment between your input text and output text so that you can project your training labels. SQuAD is a particularly complex example because the input labels are character based, and SQuAD paragraphs are often longer than our maximum sequence length. See the code in run_squad.py to show how we handle this. Before we describe the general recipe for handling word level tasks, it's important to understand what exactly our tokenizer is doing. It has three main steps: 1. Text normalization : Convert all whitespace characters to spaces, and (for the Uncased model) lowercase the input and strip out accent markers. E.g., John Johanson's, → john johanson's, . 2. Punctuation splitting : Split all punctuation characters on both sides (i.e., add whitespace around all punctuation characters). Punctuation characters are defined as (a) Anything with a P Unicode class, (b) any non letter/number/space ASCII character (e.g., characters like $ which are technically not punctuation). E.g., john johanson's, → john johanson ' s , 3. WordPiece tokenization : Apply whitespace tokenization to the output of the above procedure, and apply WordPiece tokenization to each token separately. (Our implementation is directly based on the one from tensor2tensor , which is linked). E.g., john johanson ' s , → john johan son ' s , The advantage of this scheme is that it is compatible with most existing English tokenizers. For example, imagine that you have a part of speech tagging task which looks like this: Input: John Johanson 's house Labels: NNP NNP POS NN The tokenized output will look like this: Tokens: john johan son ' s house Crucially, this would be the same output as if the raw text were John Johanson's house (with no space before the 's ). If you have a pre tokenized representation with word level annotations, you can simply tokenize each input word independently, and deterministically maintain an original to tokenized alignment: python Input orig_tokens John , Johanson , 's , house labels NNP , NNP , POS , NN Output bert_tokens Token map will be an int > int mapping between the orig_tokens index and the bert_tokens index. orig_to_tok_map tokenizer tokenization.FullTokenizer( vocab_file vocab_file, do_lower_case True) bert_tokens.append( CLS ) for orig_token in orig_tokens: orig_to_tok_map.append(len(bert_tokens)) bert_tokens.extend(tokenizer.tokenize(orig_token)) bert_tokens.append( SEP ) bert_tokens CLS , john , johan , son , ' , s , house , SEP orig_to_tok_map 1, 2, 4, 6 Now orig_to_tok_map can be used to project labels to the tokenized representation. There are common English tokenization schemes which will cause a slight mismatch between how BERT was pre trained. For example, if your input tokenization splits off contractions like do n't , this will cause a mismatch. If it is possible to do so, you should pre process your data to convert these back to raw looking text, but if it's not possible, this mismatch is likely not a big deal. Pre training with BERT We are releasing code to do masked LM and next sentence prediction on an arbitrary text corpus. Note that this is not the exact code that was used for the paper (the original code was written in C++, and had some additional complexity), but this code does generate pre training data as described in the paper. Here's how to run the data generation. The input is a plain text file, with one sentence per line. (It is important that these be actual sentences for the next sentence prediction task). Documents are delimited by empty lines. The output is a set of tf.train.Example s serialized into TFRecord file format. You can perform sentence segmentation with an off the shelf NLP toolkit such as spaCy . The create_pretraining_data.py script will concatenate segments until they reach the maximum sequence length to minimize computational waste from padding (see the script for more details). However, you may want to intentionally add a slight amount of noise to your input data (e.g., randomly truncate 2% of input segments) to make it more robust to non sentential input during fine tuning. This script stores all of the examples for the entire input file in memory, so for large data files you should shard the input file and call the script multiple times. (You can pass in a file glob to run_pretraining.py , e.g., tf_examples.tf_record .) The max_predictions_per_seq is the maximum number of masked LM predictions per sequence. You should set this to around max_seq_length masked_lm_prob (the script doesn't do that automatically because the exact value needs to be passed to both scripts). shell python create_pretraining_data.py \ input_file ./sample_text.txt \ output_file /tmp/tf_examples.tfrecord \ vocab_file $BERT_BASE_DIR/vocab.txt \ do_lower_case True \ max_seq_length 128 \ max_predictions_per_seq 20 \ masked_lm_prob 0.15 \ random_seed 12345 \ dupe_factor 5 Here's how to run the pre training. Do not include init_checkpoint if you are pre training from scratch. The model configuration (including vocab size) is specified in bert_config_file . This demo code only pre trains for a small number of steps (20), but in practice you will probably want to set num_train_steps to 10000 steps or more. The max_seq_length and max_predictions_per_seq parameters passed to run_pretraining.py must be the same as create_pretraining_data.py . shell python run_pretraining.py \ input_file /tmp/tf_examples.tfrecord \ output_dir /tmp/pretraining_output \ do_train True \ do_eval True \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ train_batch_size 32 \ max_seq_length 128 \ max_predictions_per_seq 20 \ num_train_steps 20 \ num_warmup_steps 10 \ learning_rate 2e 5 This will produce an output like this: Eval results global_step 20 loss 0.0979674 masked_lm_accuracy 0.985479 masked_lm_loss 0.0979328 next_sentence_accuracy 1.0 next_sentence_loss 3.45724e 05 Note that since our sample_text.txt file is very small, this example training will overfit that data in only a few steps and produce unrealistically high accuracy numbers. Pre training tips and caveats If using your own vocabulary, make sure to change vocab_size in bert_config.json . If you use a larger vocabulary without changing this, you will likely get NaNs when training on GPU or TPU due to unchecked out of bounds access. If your task has a large domain specific corpus available (e.g., movie reviews or scientific papers ), it will likely be beneficial to run additional steps of pre training on your corpus, starting from the BERT checkpoint. The learning rate we used in the paper was 1e 4. However, if you are doing additional steps of pre training starting from an existing BERT checkpoint, you should use a smaller learning rate (e.g., 2e 5). Current BERT models are English only, but we do plan to release a multilingual model which has been pre trained on a lot of languages in the near future (hopefully by the end of November 2018). Longer sequences are disproportionately expensive because attention is quadratic to the sequence length. In other words, a batch of 64 sequences of length 512 is much more expensive than a batch of 256 sequences of length 128. The fully connected/convolutional cost is the same, but the attention cost is far greater for the 512 length sequences. Therefore, one good recipe is to pre train for, say, 90,000 steps with a sequence length of 128 and then for 10,000 additional steps with a sequence length of 512. The very long sequences are mostly needed to learn positional embeddings, which can be learned fairly quickly. Note that this does require generating the data twice with different values of max_seq_length . If you are pre training from scratch, be prepared that pre training is computationally expensive, especially on GPUs. If you are pre training from scratch, our recommended recipe is to pre train a BERT Base on a single preemptible Cloud TPU v2 , which takes about 2 weeks at a cost of about $500 USD (based on the pricing in October 2018). You will have to scale down the batch size when only training on a single Cloud TPU, compared to what was used in the paper. It is recommended to use the largest batch size that fits into TPU memory. Pre training data We will not be able to release the pre processed datasets used in the paper. For Wikipedia, the recommended pre processing is to download the latest dump , extract the text with WikiExtractor.py , and then apply any necessary cleanup to convert it into plain text. Unfortunately the researchers who collected the BookCorpus no longer have it available for public download. The Project Guttenberg Dataset is a somewhat smaller (200M word) collection of older books that are public domain. Common Crawl is another very large collection of text, but you will likely have to do substantial pre processing and cleanup to extract a usable corpus for pre training BERT. Learning a new WordPiece vocabulary This repository does not include code for learning a new WordPiece vocabulary. The reason is that the code used in the paper was implemented in C++ with dependencies on Google's internal libraries. For English, it is almost always better to just start with our vocabulary and pre trained models. For learning vocabularies of other languages, there are a number of open source options available. However, keep in mind that these are not compatible with our tokenization.py library: Google's SentencePiece library tensor2tensor's WordPiece generation script Rico Sennrich's Byte Pair Encoding library Using BERT in Colab If you want to use BERT with Colab , you can get started with the notebook BERT FineTuning with Cloud TPUs . At the time of this writing (October 31st, 2018), Colab users can access a Cloud TPU completely for free. Note: One per user, availability limited, requires a Google Cloud Platform account with storage (although storage may be purchased with free credit for signing up with GCP), and this capability may not longer be available in the future. Click on the BERT Colab that was just linked for more information. FAQ Is this code compatible with Cloud TPUs? What about GPUs? Yes, all of the code in this repository works out of the box with CPU, GPU, and Cloud TPU. However, GPU training is single GPU only. I am getting out of memory errors, what is wrong? See the section on out of memory issues ( out of memory issues) for more information. Is there a PyTorch version available? There is no official PyTorch implementation. However, NLP researchers from HuggingFace made a PyTorch version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. We were not involved in the creation or maintenance of the PyTorch implementation so please direct any questions towards the authors of that repository. Is there a Chainer version available? There is no official Chainer implementation. However, Sosuke Kobayashi made a Chainer version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. We were not involved in the creation or maintenance of the Chainer implementation so please direct any questions towards the authors of that repository. Will models in other languages be released? Yes, we plan to release a multi lingual BERT model in the near future. We cannot make promises about exactly which languages will be included, but it will likely be a single model which includes most of the languages which have a significantly sized Wikipedia. Will models larger than BERT Large be released? So far we have not attempted to train anything larger than BERT Large . It is possible that we will release larger models if we are able to obtain significant improvements. What license is this library released under? All code and models are released under the Apache 2.0 license. See the LICENSE file for more information. How do I cite BERT? For now, cite the Arxiv paper : @article{devlin2018bert, title {BERT: Pre training of Deep Bidirectional Transformers for Language Understanding}, author {Devlin, Jacob and Chang, Ming Wei and Lee, Kenton and Toutanova, Kristina}, journal {arXiv preprint arXiv:1810.04805}, year {2018} } If we submit the paper to a conference or journal, we will update the BibTeX. Disclaimer This is not an official Google product. Contact information For help or issues using BERT, please submit a GitHub issue. For personal communication related to BERT, please contact Jacob Devlin ( jacobdevlin@google.com ), Ming Wei Chang ( mingweichang@google.com ), or Kenton Lee ( kentonl@google.com ).",Machine Translation,Machine Translation 2352,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Extreme Summarization This repository contains data and code for our EMNLP 2018 paper Don't Give Me the Details, Just the Summary! Topic Aware Convolutional Neural Networks for Extreme Summarization . Please contact me at shashi.narayan@gmail.com for any question. Please cite this paper if you use our code or data. @InProceedings{xsum emnlp, author Shashi Narayan and Shay B. Cohen and Mirella Lapata , title Don't Give Me the Details, Just the Summary! {T}opic Aware Convolutional Neural Networks for Extreme Summarization , booktitle Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing , year 2018 , address Brussels, Belgium , } Extreme Summarization (XSum) dataset Instructions to download and preprocess the extreme summarization dataset are here (./XSum Dataset). Looking for a Running Demo of Our System? A running demo of our abstractive system can be found here . Pretrained models and Test Predictions (Narayan et al., EMNLP 2018) Pretrained ConvS2S model and dictionary files (1.1GB) Pretrained Topic ConvS2S model and dictionary files (1.2GB) Pretrained Gensim LDA model (200MB) Our model Predictions (xsum model predictions.tar.gz) Human Evaluation Data (xsum human evaluation data.tar.gz) Topic Aware Convolutional Model for Extreme Summarization This repository contains PyTorch code for our Topic ConvS2S model. Our code builds on an earlier copy of Facebook AI Research Sequence to Sequence Toolkit . We also release the code for the ConvS2S model . It uses optimized hyperparameters for extreme summarization. Our release facilitates the replication of our experiments, such as training from scratch or predicting with released pretrained models, as reported in the paper. Installation Our code requires PyTorch version 0.4.0 or 0.4.1. Please follow the instructions here: After PyTorch is installed, you can install ConvS2S and Topic ConvS2S: Install ConvS2S cd ./XSum ConvS2S pip install r requirements.txt python setup.py build python setup.py develop Install Topic ConvS2S cd ../XSum Topic ConvS2S pip install r requirements.txt python setup.py build python setup.py develop Training a New Model Data Preprocessing We partition the extracted datset into training, development and test sets. The input document is truncated to 400 tokens and the length of the summary is limited to 90 tokens. Both document and summary files are lowercased. ConvS2S python scripts/xsum preprocessing convs2s.py It generates the following files in the data convs2s directory: train.document and train.summary validation.document and validation.summary test.document and test.summary Lines in document and summary files are paired as (input document, corresponding output summary). TEXT ./data convs2s python XSum ConvS2S/preprocess.py source lang document target lang summary trainpref $TEXT/train validpref $TEXT/validation testpref $TEXT/test destdir ./data convs2s bin joined dictionary nwordstgt 50000 nwordssrc 50000 This will create binarized data that will be used for model training. It also generates source and target dictionary files. In this case, both files are identical (due to joined dictionary ) and have 50000 tokens. Topic ConvS2S python scripts/xsum preprocessing topic convs2s.py It generates the following files in the data topic convs2s directory: train.document, train.summary, train.document lemma and train.doc topics validation.document, validation.summary, validation.document lemma and validation.doc topics test.document, test.summary, test.document lemma and test.doc topics Lines in document, summary, document lemma and doc topics files are paired as (input document, output summary, input lemmatized document, document topic vector). TEXT ./data topic convs2s python XSum Topic ConvS2S/preprocess.py source lang document target lang summary trainpref $TEXT/train validpref $TEXT/validation testpref $TEXT/test destdir ./data topic convs2s joined dictionary nwordstgt 50000 nwordssrc 50000 output format raw This will generate source and target dictionary files. In this case, both files are identical (due to joined dictionary ) and have 50000 tokens. It operates on the raw format data. Model Training By default, the code will use all available GPUs on your machine. We have used CUDA_VISIBLE_DEVICES environment variable to select specific GPU(s). ConvS2S CUDA_VISIBLE_DEVICES 1 python XSum ConvS2S/train.py ./data convs2s bin source lang document target lang summary max sentences 32 arch fconv criterion label_smoothed_cross_entropy max epoch 200 clip norm 0.1 lr 0.10 dropout 0.2 save dir ./checkpoints convs2s no progress bar log interval 10 Topic ConvS2S CUDA_VISIBLE_DEVICES 1 python XSum Topic ConvS2S/train.py ./data topic convs2s source lang document target lang summary doctopics doc topics max sentences 32 arch fconv criterion label_smoothed_cross_entropy max epoch 200 clip norm 0.1 lr 0.10 dropout 0.2 save dir ./checkpoints topic convs2s no progress bar log interval 10 Generation with Pre trained Models ConvS2S CUDA_VISIBLE_DEVICES 1 python XSum ConvS2S/generate.py ./data convs2s path ./checkpoints convs2s/checkpoint best.pt batch size 1 beam 10 replace unk source lang document target lang summary > test output convs2s checkpoint best.pt Make sure that ./data convs2s also has the source and target dictionary files. Topic ConvS2S CUDA_VISIBLE_DEVICES 1 python XSum Topic ConvS2S/generate.py ./data topic convs2s path ./checkpoints topic convs2s/checkpoint_best.pt batch size 1 beam 10 replace unk source lang document target lang summary doctopics doc topics encoder embed dim 512 > test output topic convs2s checkpoint best.pt Make sure that ./data topic convs2s has the test files to decode, the source and target dictionary files. Extract final hypothesis python scripts/extract hypothesis fairseq.py o test output convs2s checkpoint best.pt f final test output convs2s checkpoint best.pt python scripts/extract hypothesis fairseq.py o test output topic convs2s checkpoint best.pt f final test output topic convs2s checkpoint best.pt",Machine Translation,Machine Translation 2373,Natural Language Processing,Natural Language Processing,Natural Language Processing,"NMT Practice 1: Pytorch Implementation of the Transformer model in 《Attention is all you need》 Hi, there, this is just a personal practice project of implementing SOTA NMT model Transformer in Attention is All You Need (Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, arxiv, 2017). Requirement python 3.6+ pytorch 0.4.1+ tqdm numpy Usage script description: parameters.py contains the parameter environment setting Transformer package comtains the main architecture of Transformer Encoder Decoder Model to be continued Acknowledgement some scripts and the dataset preprocessing steps of the project are borrowed from OpenNMT/OpenNMT py . main structure and useful functions are borrowed from jadore801120/attention is all you need pytorch",Machine Translation,Machine Translation 2375,Natural Language Processing,Natural Language Processing,Natural Language Processing,"THUMT: An Open Source Toolkit for Neural Machine Translation Contents Introduction ( introduction) Implementations ( implementations) License ( license) Citation ( citation) Development Team ( development team) Contributors ( Contributors) Contact ( contact) Introduction Machine translation is a natural language processing task that aims to translate natural languages using computers automatically. Recent several years have witnessed the rapid development of end to end neural machine translation, which has become the new mainstream method in practical MT systems. THUMT is an open source toolkit for neural machine translation developed by the Natural Language Processing Group at Tsinghua University . Implementations THUMT has currently two main implementations: THUMT TensorFlow : a new implementation developed with TensorFlow . It implements the sequence to sequence model ( Seq2Seq ) ( Sutskever et al., 2014 ), the standard attention based model ( RNNsearch ) ( Bahdanau et al., 2014 ), and the Transformer model ( Transformer ) ( Vaswani et al., 2017 ). THUMT Theano : the original project developed with Theano , which is no longer updated because MLA put an end to Theano . It implements the standard attention based model ( RNNsearch ) ( Bahdanau et al., 2014 ), minimum risk training ( MRT ) ( Shen et al., 2016 ) for optimizing model parameters with respect to evaluation metrics, semi supervised training ( SST ) ( Cheng et al., 2016 ) for exploiting monolingual corpora to learn bi directional translation models, and layer wise relevance propagation ( LRP ) ( Ding et al., 2017 ) for visualizing and anlayzing RNNsearch. The following table summarizes the features of two implementations: Implementation Model Criterion Optimizer LRP : : : : : : : : : : Theano RNNsearch MLE, MRT, SST SGD, AdaDelta, Adam RNNsearch TensorFlow Seq2Seq, RNNsearch, Transformer MLE Adam RNNsearch, Transformer We recommend using THUMT TensorFlow , which delivers better translation performance than THUMT Theano . We will keep adding new features to THUMT TensorFlow . License The source code is dual licensed. Open source licensing is under the BSD 3 Clause , which allows free use for research purposes. For commercial licensing, please email thumt17@gmail.com (mailto:thumt17@gmail.com). Citation Please cite the following paper: > Jiacheng Zhang, Yanzhuo Ding, Shiqi Shen, Yong Cheng, Maosong Sun, Huanbo Luan, Yang Liu. 2017. THUMT: An Open Source Toolkit for Neural Machine Translation . arXiv:1706.06415. Development Team Project leaders: Maosong Sun , Yang Liu , Huanbo Luan Project members: Jiacheng Zhang, Yanzhuo Ding, Shiqi Shen, Yong Cheng Contributors Zhixing Tan (mailto:playinf@stu.xmu.edu.cn) (Xiamen University) Contact If you have questions, suggestions and bug reports, please email thumt17@gmail.com (mailto:thumt17@gmail.com).",Machine Translation,Machine Translation 2387,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Neural Machine Translation Paper Implementation: Neural Machine Translation by Jointly Learning to Align and Translate Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio (v7 2016) Getting Started Prerequisites pytorch 0.4.0 argparse 1.1 numpy 1.14.3 matplotlib 2.2.2 Tutorial for NMT Jupyter notebook: link Preparing for demo Result Trained IWSLT 2016 dataset. Check torchtext.dataset.IWSLT to download & loading dataset. How to Start For 'HELP' please insert argument behind main.py h . or you can just run $ cd model $ sh runtrain.sh Trainlog I devide trains for 3 times, because of computing power. After 1st training, load model and retrain at 2nd, 3rd time. Lowest losses & check points: 1st train: 8/30 (train) loss 2.4335 (valid) loss 5.6971 2nd train: 1/30 (train) loss 2.3545 (valid) loss 5.6575 3rd train: 6/20 (train) loss 1.9401 (valid) loss 5.4970 you can see how i choose hyperparameters below Hyperparameters Hyperparameters 1st Train 2st Train 3st Train Explaination BATCH 50 50 50 batch size MAX_LEN 30 30 30 max length of training sentences MIN_FREQ 2 2 2 minimum frequence of words that appear in training sentences EMBED 256 256 256 embedding size HIDDEN 512 512 512 hidden size ENC_N_LAYER 3 3 3 number of layer in encoder DEC_N_LAYER 1 1 1 number of layer in decoder L_NORM True True True whether to use layer normalization after embedding DROP_RATE 0.2 0.2 0.2 dropout after embedding, if drop rate equal to 0, means not use it METHOD general general general attention methods, dot , general are ready to use LAMBDA 0.00001 0.00001 0.0001 weight decay rate LR 0.001 0.0001 1.0 learning rate DECLR 5.0 5.0 decoder learning weight, multiplied to LR OPTIM adam adam adelta optimizer algorithm STEP 30 20 20 control learning rate at 1/3 step, 3/4 step by multiply 0.1 TF True True True teacher forcing, whether to teach what token becomes next to model Please check train logs are in trainlog directory. Todo: Layer Normalizaiton for GRU: seq2seq beam search: large output vocab problem: Recurrent Memory Networks(using Memory Block): BPE: License This project is licensed under the MIT License",Machine Translation,Machine Translation 2388,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Annotated Transformer KR The repository that implementated of Transformer with PyTorch. Also some annotation with detail codes. Korean Blog: Annotated Transformer KR Jupyter Notebook: Getting Started $ python main.py help A quick start training is also ready for you. You can check basic settings in run main.sh $ sh run main.sh Performance: Training Multi30k Result Trained 30 Steps using NVIDIA GTX 1080 ti, Training excution time with validation: 0 h 36 m 36.5517 s ! (figs/perplexity acc.png) You can run python translate.py to see translate a random sample sentence. For Example the attention weights(trg src weight) are something like figure below. You can check all attentions in figs folder. Source Sentence: a young lassie looking dog is in the snow . Target Sentence: ein junger hund , der aussieht wie lassie , im schnee . Predicted Sentence: ein junger hund schaut im schnee . Google Translated Sentence: ein junger lassie aussehender hund ist im schnee. Showing the 6 th layer of trg src sentence attention with 8 heads. ! dec_enc_attns 6 (figs/dec_enc_attns 6.png) Requirements python > 3.6 pytorch > 1.0.0 torchtext numpy TODO 1. Train bigger datas and make a demo server 2. Beam Search 3. Calculate BLEU Scores for Translation Task references I checked a lot of references. Please visit them and learn it! paper : reference blog: reference code: https://github.com/jadore801120/attention is all you need pytorch",Machine Translation,Machine Translation 2400,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Lyrics generator Table of Contents Introduction ( introduction) 1. Motivation ( 1 motivation) 2. Exploratory data analysis ( 2 exploratory data analysis) Bag of words ( bag of words) Character level recurrent neural network(RNN) ( character level recurrent neural networkrnn) 1. How to prepare the data and train the model? ( 1 how to prepare the data and train the model) 2. How to generate lyrics? Teacher forcing ( 2 how to generate lyrics teacher forcing) 3. Eminem's lyrics generator ( 3 eminems lyrics generator) Word level RNN ( word level rnn) 1. Word embedding and word2vec ( 1 word embedding and word2vec) 2. Michael Jackson's lyrics generator based on word level RNN ( 2 michael jacksons lyrics generator based on word level rnn) 3. How to improve word level model? ( 3 how to improve word level model) a. Using pretrained word embedding ( a using pretrained word embedding) b. Word level seq2seq model ( b word level seq2seq model) Concluding remarks ( concluding remarks) Reference ( reference) Acknowledgement ( acknowledgement) Introduction 1. Motivation Natural language processing is among the most attractive and difficult field in machine learning. Different from computer vision and other machine learning tasks, NLP does not convey meaning through any physical manifestation. By the virtue of deep learning, NLP achieved tremendous progress in keyword search, machine translation, semantic analysis and etc. In this project, I would like to make a lyrics generator by using both character level and word level RNN(recurrent neural network). Neural network Lyric generator : : : : The dataset is from kaggle with 3.8 million song lyrics from various artist. index song year artist genre lyrics 0 ego remix 2009 beyonce Pop Oh baby, how you doing?\nYou know I'm gonna cu... 1 then tell me 2009 beyonce Pop playin' everything so easy,\nit's like you see... 2 honesty 2009 beyonce Pop If you search\nFor tenderness\nIt isn't hard t... 3 you are my rock 2009 beyonce Pop Oh oh oh I, oh oh oh I\n Verse 1: \nIf I wrote... 4 black culture 2009 beyonce Pop Party the people, the people the party it's po... But I will only use lyrics from Eminem and Michael Jackson. Because they have around 400 songs, it is easier to extract regular patterns from them. 2. Exploratory data analysis Unlike other machine learning tasks, there is not much visualization we can do with NLP. And data cleaning will be based on the model that we want to explore. For character based model, it is necessay to keep punctuations as they are part of the characters. However, for word based model, punctuations are supposed to be removed. To briefly explore the dataset, we will count most frequently used words for different artists (bag of words or unigram). Consequently, very crude sentiment score can be obtained from it. Bag of words Remove stop words and punctuations from lyrics python from sklearn.feature_extraction.text import CountVectorizer vectorizer CountVectorizer(stop_words 'english') X vectorizer.fit_transform( eminem_lyrics ) Sentiment score base on bag of words python from nltk.sentiment.vader import SentimentIntensityAnalyzer def sentiment(eminem_lyrics): sentiment SentimentIntensityAnalyzer() score sentiment.polarity_scores(eminem_lyrics) return score Michael Jackson Eminem : : : : sentiment score: 1.0 sentiment score: 1.0 Based on bag of words, one can do naive bayes to predict the genre of songs, but we will not cover it here. Let us go deeper to deep learning. Character level recurrent neural network(RNN) Why recurrent? Different from vanilla neural network, RNN (see below, pic from wikipedia) is able to process sequences of inputs (such as words and sentences) by utilizing the internel state (memory state). Hence, it is regarded as a very promising candidate to solve NLP tasks. Inspired by the minimal character level Vanilla RNN model from Andrej Karpathy, we decided to build a more complicated RNN model to generate lyrics. Below is the summary of my model: 2 LSTM layers and 1 dense layer. Layer (type) Output Shape Param lstm_1 (LSTM) (None, 10, 200) 230400 _________________________________________________________________ dropout_1 (Dropout) (None, 10, 200) 0 _________________________________________________________________ lstm_2 (LSTM) (None, 200) 320800 _________________________________________________________________ dropout_2 (Dropout) (None, 200) 0 _________________________________________________________________ dense_1 (Dense) (None, 87) 17487 Total params: 568,687 Trainable params: 568,687 Non trainable params: 0 _________________________________________________________________ After 600 epochs, the model achieved 64% accuracy for validation set. Epoch 600/600 loss: 0.7529 acc: 0.7858 val_loss: 1.6630 val_acc: 0.6402 1. How to prepare the data and train the model? One hot encode all characters ( a z, 0 9 and punctuations ! $%&() +, ./:; ?@ \\ ^_ { }\t\n ) python from keras.utils import to_categorical 1,2,3 > 1,0,0 , 0,1,0 , 0,0,1 one hot_X to_categorical(X, num_classes vocab_size) Make a sliding window that collects 10 characters as input. Model only generates one character. input: 'hello, world' > output: 'n', true output: 'p' Calculate cross entropy and backpropagate the neural network to update 568,687 parameters. Slide the window to the right by one character. Extract new input and iterate above process until cost function reaches the minimum. 2. How to generate lyrics? Teacher forcing Make seed lyrics. Feed it to the neural network to generate one character after the seed lyrics, 'b'. input: 'I want to ' > output 'b' Append new character to the seed lyrics and remove the very first character. new input : ' want to b' Feed the new seed lyrics into the neural network and iterate the above process as many as you want. 'I want to ' > ' want to b' > 'want to be' > .... > 'ing of pop' In the end, you might get something like 'I want to be king of pop' This process is known as teacher forcing: training neural network that uses model output from a prior time step as an input. 3. Eminem's lyrics generator After one epoch, generated lyrics: input: 'not afraid' output: 'the me the me the me the me the me the me the me the me the' Not smart, it repeats the same words over and over. After 600 epochs, generated lyrics: 20 characters input: 'slim shady' output: '(what you was a mom' 40 characters input: i'm rap go output: ing, you should see her, she won't even 60 characters input: 'the way i' output: lounding off, they can try to take your heart and I don't kn 80 characters input: 'lose myself' output: ' in the music, the moment You own it, you better never let it go You only get o' 100 characters input: 'not afraid' output: ') To take a stand) Maybe a dontte the back of the way that I know what the fuck I say to the motherf' It is amazing as the generator can spell the word correctly and it is not hard to tell that they have eminem style. Word level RNN 1. Word embedding and word2vec Instead of letting the model learning how to spell words. One can upgrade the model from character level to word level. Correspondingly, this endows model the ability to learning semantics from the corpus. Since the number of unique words is much larger than that of characters, it is necessay to introduce a new representation: word embedding. This is basically the only difference from character based model. However, there is much more wisdom than just dimension reduction. The notion was first referred by Misolov, et al.,2013 . Word embedding rotates word vector from one hot representation to word2vec representation. _________________________________________________________________ Layer (type) Output Shape Param embedding_1 (Embedding) (None, 10, 100) 600500 _________________________________________________________________ lstm_1 (LSTM) (None, 10, 300) 481200 _________________________________________________________________ dropout_1 (Dropout) (None, 10, 300) 0 _________________________________________________________________ lstm_2 (LSTM) (None, 300) 721200 _________________________________________________________________ dropout_2 (Dropout) (None, 300) 0 _________________________________________________________________ dense_1 (Dense) (None, 300) 90300 _________________________________________________________________ dense_2 (Dense) (None, 6005) 1807505 Total params: 3,700,705 Trainable params: 3,700,705 Non trainable params: 0 _________________________________________________________________ Training is much harder than character base model. Only 32% accuracy is obtained from this model. Epoch 300/300 loss: 2.1268 acc: 0.7121 val_loss: 8.1230 val_acc: 0.3291 2. Michael Jackson's lyrics generator based on word level RNN After 300 epochs: Generate 20 word: here to change the world hee were afraid that away when we change my hand verse ill can never change my heart in pieces lost my heart on the carousel to Generate 40 word: a circus girl who left my heart in pieces lost my heart on the carousel to a circus girl who left my heart in pieces lost my heart on the carousel to a circus girl who left my heart in just a little bit baby thats all i need thats all Generate 60 word: night need you dont understand you need what about hard let me change you comes your truth out and to guess down together together then be her day every thing out the carpet were gonna see this one understand by much then ever so to see you there are long so game if let me get away verse verse cant gonna tell you right just show your face in broad daylight Generate 80 word: im create your crescendo how they dance well theres a reason what its a door when here all never door brother its yes we yeah yeah when you start to say if you truth its whole past and they start that can do tell me girl ive wont you have it tell me no little when the door now im cries chorus bridge cause its door made your game chorus ive never stop up and tell me no true oh her dreams left behind everything for the movie scene nothing more Generate 100 word: little start i go a i am away im dreams of this life girl girl tell me fall im together so much i feel you all i say that you say into into i lost my heart i can and you game on me baby ill see i can be far away today i love you from your truth cause youre across the bitch baby does it feel it needs me from a door start that not here there was ghost of moon aint a you better he made the you she win your dreams off the madness i never It generates something with correct grammar at some parts but it is hard to understand the meaning. 3. How to improve word level model? a. Using pretrained word embedding Word level RNN is essentially concatenation of two neural networks. If we train two parts separately, we should achieve better accuracy. However, after using a pretrained word embedding , the accuracy of validation set decreases to 20%. Such a counterintuitive result! Epoch 50/50 loss: 4.7769 acc: 0.3353 val_loss: 6.1902 val_acc: 0.2160 python from keras.layers import Embedding model Sequential() model.add(Embedding(num_of_tokens, latent_dim, input_length seq_length, weights pretrained_embedding , train False)) _________________________________________________________________ Layer (type) Output Shape Param embedding_1 (Embedding) (None, 10, 100) 600500 _________________________________________________________________ lstm_1 (LSTM) (None, 10, 250) 351000 _________________________________________________________________ dropout_1 (Dropout) (None, 10, 250) 0 _________________________________________________________________ lstm_2 (LSTM) (None, 250) 501000 _________________________________________________________________ dropout_2 (Dropout) (None, 250) 0 _________________________________________________________________ dense_1 (Dense) (None, 6005) 1507255 Total params: 2,959,755 Trainable params: 2,359,255 Non trainable params: 600,500 _________________________________________________________________ There are 600,500 non trainable parameters which are from pretrained word embedding. Using non trainable embedding seems not working well. Maybe it would be better to train the word embedding particularlly for the dataset by using CBOW or skip gram. b. Word level Seq2Seq model Seq2Seq model was widely used in neural tranlation machine . But there is nothing wrong to apply it to lyrics generator. The basic idea is to process input in the encoder end and generate a memory state (a vector) that represents the whole input message. Decoder take the memory state and SOS token to generate one token. Generated token becomes the next input for decoder to predict the next token. The process iterates many cycles until EOS is generated or max length of output is reached. Accordingly, unlike many to one model, seq2seq model has advantage of generating more than one token. However, the performance does not change by much. But from recently research, inplementing attention mechanism could significantly improve the performance. Attention mechanism not only resolves long term dependency problem of vanilla Seq2Seq languange model but also speeds up training process as it dicarded RNN which disfavors parallel computation. Concluding remarks Takeaways for tunning hyperparameters: 1. It is easy to overfit the model. It is necessay to add Dropout layer after each LSTM layer. 2. Sometimes GRU is better than LSTM and computationally cheaper. 3. Initialization and luck are very important. Try to restart kernel if model is stuck at local minimum. 4. Try importance sampling which randomly takes samples from distrution instead of feeding datapoints in order. Character based models perform better than word base. Even though word embedding is a very innovative method in NLP, word vectors by itself hardly convey semantics efficiently. Future work: 1. Try negative sampling to boost the training and improve the metrics (better than softmax). 2. Implement attention mechanism to seq2seq model. Reference 1. Distributed Representations of Sentences and Documents 2. Distributed Representations of Words and Phrases and their Compositionality 3. Neural machine translation by jointly learning to align and translate 4. Attention Is All You Need 5. Importance Sampling 6. Sequence to Sequence Learning with Neural Networks Acknowledgement I want to thank Frank, Kayla and Danny for the guidance and support. Dataset is from Kaggle . Thanks for the tutorial GPU accelerated Deep Learning on Windows 10 .",Machine Translation,Machine Translation 2413,Natural Language Processing,Natural Language Processing,Natural Language Processing,"BERT \ \ \ \ \ New November 23rd, 2018: Un normalized multilingual model + Thai + Mongolian \ \ \ \ \ We uploaded a new multilingual model which does not perform any normalization on the input (no lower casing, accent stripping, or Unicode normalization), and additionally inclues Thai and Mongolian. It is recommended to use this version for developing multilingual models, especially on languages with non Latin alphabets. This does not require any code changes, and can be downloaded here: BERT Base, Multilingual Cased : 104 languages, 12 layer, 768 hidden, 12 heads, 110M parameters \ \ \ \ \ New November 15th, 2018: SOTA SQuAD 2.0 System \ \ \ \ \ We released code changes to reproduce our 83% F1 SQuAD 2.0 system, which is currently 1st place on the leaderboard by 3%. See the SQuAD 2.0 section of the README for details. \ \ \ \ \ New November 5th, 2018: Third party PyTorch and Chainer versions of BERT available \ \ \ \ \ NLP researchers from HuggingFace made a PyTorch version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. Sosuke Kobayashi also made a Chainer version of BERT available (Thanks!) We were not involved in the creation or maintenance of the PyTorch implementation so please direct any questions towards the authors of that repository. \ \ \ \ \ New November 3rd, 2018: Multilingual and Chinese models available \ \ \ \ \ We have made two new BERT models available: BERT Base, Multilingual (Not recommended, use Multilingual Cased instead) : 102 languages, 12 layer, 768 hidden, 12 heads, 110M parameters BERT Base, Chinese : Chinese Simplified and Traditional, 12 layer, 768 hidden, 12 heads, 110M parameters We use character based tokenization for Chinese, and WordPiece tokenization for all other languages. Both models should work out of the box without any code changes. We did update the implementation of BasicTokenizer in tokenization.py to support Chinese character tokenization, so please update if you forked it. However, we did not change the tokenization API. For more, see the Multilingual README . \ \ \ \ \ End new information \ \ \ \ \ Introduction BERT , or B idirectional E ncoder R epresentations from T ransformers, is a new method of pre training language representations which obtains state of the art results on a wide array of Natural Language Processing (NLP) tasks. Our academic paper which describes BERT in detail and provides full results on a number of tasks can be found here: To give a few numbers, here are the results on the SQuAD v1.1 question answering task: SQuAD v1.1 Leaderboard (Oct 8th 2018) Test EM Test F1 : : : : 1st Place Ensemble BERT 87.4 93.2 2nd Place Ensemble nlnet 86.0 91.7 1st Place Single Model BERT 85.1 91.8 2nd Place Single Model nlnet 83.5 90.1 And several natural language inference tasks: System MultiNLI Question NLI SWAG : : : : : : BERT 86.7 91.1 86.3 OpenAI GPT (Prev. SOTA) 82.2 88.1 75.0 Plus many other tasks. Moreover, these results were all obtained with almost no task specific neural network architecture design. If you already know what BERT is and you just want to get started, you can download the pre trained models ( pre trained models) and run a state of the art fine tuning ( fine tuning with bert) in only a few minutes. What is BERT? BERT is a method of pre training language representations, meaning that we train a general purpose language understanding model on a large text corpus (like Wikipedia), and then use that model for downstream NLP tasks that we care about (like question answering). BERT outperforms previous methods because it is the first unsupervised , deeply bidirectional system for pre training NLP. Unsupervised means that BERT was trained using only a plain text corpus, which is important because an enormous amount of plain text data is publicly available on the web in many languages. Pre trained representations can also either be context free or contextual , and contextual representations can further be unidirectional or bidirectional . Context free models such as word2vec or GloVe generate a single word embedding representation for each word in the vocabulary, so bank would have the same representation in bank deposit and river bank . Contextual models instead generate a representation of each word that is based on the other words in the sentence. BERT was built upon recent work in pre training contextual representations — including Semi supervised Sequence Learning , Generative Pre Training , ELMo , and ULMFit — but crucially these models are all unidirectional or shallowly bidirectional . This means that each word is only contextualized using the words to its left (or right). For example, in the sentence I made a bank deposit the unidirectional representation of bank is only based on I made a but not deposit . Some previous work does combine the representations from separate left context and right context models, but only in a shallow manner. BERT represents bank using both its left and right context — I made a ... deposit — starting from the very bottom of a deep neural network, so it is deeply bidirectional . BERT uses a simple approach for this: We mask out 15% of the words in the input, run the entire sequence through a deep bidirectional Transformer encoder, and then predict only the masked words. For example: Input: the man went to the MASK1 . he bought a MASK2 of milk. Labels: MASK1 store; MASK2 gallon In order to learn relationships between sentences, we also train on a simple task which can be generated from any monolingual corpus: Given two sentences A and B , is B the actual next sentence that comes after A , or just a random sentence from the corpus? Sentence A: the man went to the store . Sentence B: he bought a gallon of milk . Label: IsNextSentence Sentence A: the man went to the store . Sentence B: penguins are flightless . Label: NotNextSentence We then train a large model (12 layer to 24 layer Transformer) on a large corpus (Wikipedia + BookCorpus ) for a long time (1M update steps), and that's BERT. Using BERT has two stages: Pre training and fine tuning . Pre training is fairly expensive (four days on 4 to 16 Cloud TPUs), but is a one time procedure for each language (current models are English only, but multilingual models will be released in the near future). We are releasing a number of pre trained models from the paper which were pre trained at Google. Most NLP researchers will never need to pre train their own model from scratch. Fine tuning is inexpensive. All of the results in the paper can be replicated in at most 1 hour on a single Cloud TPU, or a few hours on a GPU, starting from the exact same pre trained model. SQuAD, for example, can be trained in around 30 minutes on a single Cloud TPU to achieve a Dev F1 score of 91.0%, which is the single system state of the art. The other important aspect of BERT is that it can be adapted to many types of NLP tasks very easily. In the paper, we demonstrate state of the art results on sentence level (e.g., SST 2), sentence pair level (e.g., MultiNLI), word level (e.g., NER), and span level (e.g., SQuAD) tasks with almost no task specific modifications. What has been released in this repository? We are releasing the following: TensorFlow code for the BERT model architecture (which is mostly a standard Transformer architecture). Pre trained checkpoints for both the lowercase and cased version of BERT Base and BERT Large from the paper. TensorFlow code for push button replication of the most important fine tuning experiments from the paper, including SQuAD, MultiNLI, and MRPC. All of the code in this repository works out of the box with CPU, GPU, and Cloud TPU. Pre trained models We are releasing the BERT Base and BERT Large models from the paper. Uncased means that the text has been lowercased before WordPiece tokenization, e.g., John Smith becomes john smith . The Uncased model also strips out any accent markers. Cased means that the true case and accent markers are preserved. Typically, the Uncased model is better unless you know that case information is important for your task (e.g., Named Entity Recognition or Part of Speech tagging). These models are all released under the same license as the source code (Apache 2.0). For information about the Multilingual and Chinese model, see the Multilingual README . When using a cased model, make sure to pass do_lower False to the training scripts. (Or pass do_lower_case False directly to FullTokenizer if you're using your own script.) The links to the models are here (right click, 'Save link as...' on the name): BERT Base, Uncased : 12 layer, 768 hidden, 12 heads, 110M parameters BERT Large, Uncased : 24 layer, 1024 hidden, 16 heads, 340M parameters BERT Base, Cased : 12 layer, 768 hidden, 12 heads , 110M parameters BERT Large, Cased : 24 layer, 1024 hidden, 16 heads, 340M parameters BERT Base, Multilingual Cased (New, recommended) : 104 languages, 12 layer, 768 hidden, 12 heads, 110M parameters BERT Base, Multilingual Uncased (Orig, not recommended) (Not recommended, use Multilingual Cased instead) : 102 languages, 12 layer, 768 hidden, 12 heads, 110M parameters BERT Base, Chinese : Chinese Simplified and Traditional, 12 layer, 768 hidden, 12 heads, 110M parameters Each .zip file contains three items: A TensorFlow checkpoint ( bert_model.ckpt ) containing the pre trained weights (which is actually 3 files). A vocab file ( vocab.txt ) to map WordPiece to word id. A config file ( bert_config.json ) which specifies the hyperparameters of the model. Fine tuning with BERT Important : All results on the paper were fine tuned on a single Cloud TPU, which has 64GB of RAM. It is currently not possible to re produce most of the BERT Large results on the paper using a GPU with 12GB 16GB of RAM, because the maximum batch size that can fit in memory is too small. We are working on adding code to this repository which allows for much larger effective batch size on the GPU. See the section on out of memory issues ( out of memory issues) for more details. This code was tested with TensorFlow 1.11.0. It was tested with Python2 and Python3 (but more thoroughly with Python2, since this is what's used internally in Google). The fine tuning examples which use BERT Base should be able to run on a GPU that has at least 12GB of RAM using the hyperparameters given. Fine tuning with Cloud TPUs Most of the examples below assumes that you will be running training/evaluation on your local machine, using a GPU like a Titan X or GTX 1080. However, if you have access to a Cloud TPU that you want to train on, just add the following flags to run_classifier.py or run_squad.py : use_tpu True \ tpu_name $TPU_NAME Please see the Google Cloud TPU tutorial for how to use Cloud TPUs. Alternatively, you can use the Google Colab notebook BERT FineTuning with Cloud TPUs . On Cloud TPUs, the pretrained model and the output directory will need to be on Google Cloud Storage. For example, if you have a bucket named some_bucket , you might use the following flags instead: output_dir gs://some_bucket/my_output_dir/ The unzipped pre trained model files can also be found in the Google Cloud Storage folder gs://bert_models/2018_10_18 . For example: export BERT_BASE_DIR gs://bert_models/2018_10_18/uncased_L 12_H 768_A 12 Sentence (and sentence pair) classification tasks Before running this example you must download the GLUE data by running this script and unpack it to some directory $GLUE_DIR . Next, download the BERT Base checkpoint and unzip it to some directory $BERT_BASE_DIR . This example code fine tunes BERT Base on the Microsoft Research Paraphrase Corpus (MRPC) corpus, which only contains 3,600 examples and can fine tune in a few minutes on most GPUs. shell export BERT_BASE_DIR /path/to/bert/uncased_L 12_H 768_A 12 export GLUE_DIR /path/to/glue python run_classifier.py \ task_name MRPC \ do_train true \ do_eval true \ data_dir $GLUE_DIR/MRPC \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ max_seq_length 128 \ train_batch_size 32 \ learning_rate 2e 5 \ num_train_epochs 3.0 \ output_dir /tmp/mrpc_output/ You should see output like this: Eval results eval_accuracy 0.845588 eval_loss 0.505248 global_step 343 loss 0.505248 This means that the Dev set accuracy was 84.55%. Small sets like MRPC have a high variance in the Dev set accuracy, even when starting from the same pre training checkpoint. If you re run multiple times (making sure to point to different output_dir ), you should see results between 84% and 88%. A few other pre trained models are implemented off the shelf in run_classifier.py , so it should be straightforward to follow those examples to use BERT for any single sentence or sentence pair classification task. Note: You might see a message Running train on CPU . This really just means that it's running on something other than a Cloud TPU, which includes a GPU. Prediction from classifier Once you have trained your classifier you can use it in inference mode by using the do_predict true command. You need to have a file named test.tsv in the input folder. Output will be created in file called test_results.tsv in the output folder. Each line will contain output for each sample, columns are the class probabilities. shell export BERT_BASE_DIR /path/to/bert/uncased_L 12_H 768_A 12 export GLUE_DIR /path/to/glue export TRAINED_CLASSIFIER /path/to/fine/tuned/classifier python run_classifier.py \ task_name MRPC \ do_predict true \ data_dir $GLUE_DIR/MRPC \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $TRAINED_CLASSIFIER \ max_seq_length 128 \ output_dir /tmp/mrpc_output/ SQuAD 1.1 The Stanford Question Answering Dataset (SQuAD) is a popular question answering benchmark dataset. BERT (at the time of the release) obtains state of the art results on SQuAD with almost no task specific network architecture modifications or data augmentation. However, it does require semi complex data pre processing and post processing to deal with (a) the variable length nature of SQuAD context paragraphs, and (b) the character level answer annotations which are used for SQuAD training. This processing is implemented and documented in run_squad.py . To run on SQuAD, you will first need to download the dataset. The SQuAD website does not seem to link to the v1.1 datasets any longer, but the necessary files can be found here: train v1.1.json dev v1.1.json evaluate v1.1.py Download these to some directory $SQUAD_DIR . The state of the art SQuAD results from the paper currently cannot be reproduced on a 12GB 16GB GPU due to memory constraints (in fact, even batch size 1 does not seem to fit on a 12GB GPU using BERT Large ). However, a reasonably strong BERT Base model can be trained on the GPU with these hyperparameters: shell python run_squad.py \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ do_train True \ train_file $SQUAD_DIR/train v1.1.json \ do_predict True \ predict_file $SQUAD_DIR/dev v1.1.json \ train_batch_size 12 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir /tmp/squad_base/ The dev set predictions will be saved into a file called predictions.json in the output_dir : shell python $SQUAD_DIR/evaluate v1.1.py $SQUAD_DIR/dev v1.1.json ./squad/predictions.json Which should produce an output like this: shell { f1 : 88.41249612335034, exact_match : 81.2488174077578} You should see a result similar to the 88.5% reported in the paper for BERT Base . If you have access to a Cloud TPU, you can train with BERT Large . Here is a set of hyperparameters (slightly different than the paper) which consistently obtain around 90.5% 91.0% F1 single system trained only on SQuAD: shell python run_squad.py \ vocab_file $BERT_LARGE_DIR/vocab.txt \ bert_config_file $BERT_LARGE_DIR/bert_config.json \ init_checkpoint $BERT_LARGE_DIR/bert_model.ckpt \ do_train True \ train_file $SQUAD_DIR/train v1.1.json \ do_predict True \ predict_file $SQUAD_DIR/dev v1.1.json \ train_batch_size 24 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir gs://some_bucket/squad_large/ \ use_tpu True \ tpu_name $TPU_NAME For example, one random run with these parameters produces the following Dev scores: shell { f1 : 90.87081895814865, exact_match : 84.38978240302744} If you fine tune for one epoch on TriviaQA before this the results will be even better, but you will need to convert TriviaQA into the SQuAD json format. SQuAD 2.0 This model is also implemented and documented in run_squad.py . To run on SQuAD 2.0, you will first need to download the dataset. The necessary files can be found here: train v2.0.json dev v2.0.json evaluate v2.0.py Download these to some directory $SQUAD_DIR . On Cloud TPU you can run with BERT Large as follows: shell python run_squad.py \ vocab_file $BERT_LARGE_DIR/vocab.txt \ bert_config_file $BERT_LARGE_DIR/bert_config.json \ init_checkpoint $BERT_LARGE_DIR/bert_model.ckpt \ do_train True \ train_file $SQUAD_DIR/train v2.0.json \ do_predict True \ predict_file $SQUAD_DIR/dev v2.0.json \ train_batch_size 24 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir gs://some_bucket/squad_large/ \ use_tpu True \ tpu_name $TPU_NAME \ version_2_with_negative True We assume you have copied everything from the output directory to a local directory called ./squad/. The initial dev set predictions will be at ./squad/predictions.json and the differences between the score of no answer ( ) and the best non null answer for each question will be in the file ./squad/null_odds.json Run this script to tune a threshold for predicting null versus non null answers: python $SQUAD_DIR/evaluate v2.0.py $SQUAD_DIR/dev v2.0.json ./squad/predictions.json na prob file ./squad/null_odds.json Assume the script outputs best_f1_thresh THRESH. (Typical values are between 1.0 and 5.0). You can now re run the model to generate predictions with the derived threshold or alternatively you can extract the appropriate answers from ./squad/nbest_predictions.json. shell python run_squad.py \ vocab_file $BERT_LARGE_DIR/vocab.txt \ bert_config_file $BERT_LARGE_DIR/bert_config.json \ init_checkpoint $BERT_LARGE_DIR/bert_model.ckpt \ do_train False \ train_file $SQUAD_DIR/train v2.0.json \ do_predict True \ predict_file $SQUAD_DIR/dev v2.0.json \ train_batch_size 24 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir gs://some_bucket/squad_large/ \ use_tpu True \ tpu_name $TPU_NAME \ version_2_with_negative True \ null_score_diff_threshold $THRESH Out of memory issues All experiments in the paper were fine tuned on a Cloud TPU, which has 64GB of device RAM. Therefore, when using a GPU with 12GB 16GB of RAM, you are likely to encounter out of memory issues if you use the same hyperparameters described in the paper. The factors that affect memory usage are: max_seq_length : The released models were trained with sequence lengths up to 512, but you can fine tune with a shorter max sequence length to save substantial memory. This is controlled by the max_seq_length flag in our example code. train_batch_size : The memory usage is also directly proportional to the batch size. Model type, BERT Base vs. BERT Large : The BERT Large model requires significantly more memory than BERT Base . Optimizer : The default optimizer for BERT is Adam, which requires a lot of extra memory to store the m and v vectors. Switching to a more memory efficient optimizer can reduce memory usage, but can also affect the results. We have not experimented with other optimizers for fine tuning. Using the default training scripts ( run_classifier.py and run_squad.py ), we benchmarked the maximum batch size on single Titan X GPU (12GB RAM) with TensorFlow 1.11.0: System Seq Length Max Batch Size BERT Base 64 64 ... 128 32 ... 256 16 ... 320 14 ... 384 12 ... 512 6 BERT Large 64 12 ... 128 6 ... 256 2 ... 320 1 ... 384 0 ... 512 0 Unfortunately, these max batch sizes for BERT Large are so small that they will actually harm the model accuracy, regardless of the learning rate used. We are working on adding code to this repository which will allow much larger effective batch sizes to be used on the GPU. The code will be based on one (or both) of the following techniques: Gradient accumulation : The samples in a minibatch are typically independent with respect to gradient computation (excluding batch normalization, which is not used here). This means that the gradients of multiple smaller minibatches can be accumulated before performing the weight update, and this will be exactly equivalent to a single larger update. Gradient checkpointing : The major use of GPU/TPU memory during DNN training is caching the intermediate activations in the forward pass that are necessary for efficient computation in the backward pass. Gradient checkpointing trades memory for compute time by re computing the activations in an intelligent way. However, this is not implemented in the current release. Using BERT to extract fixed feature vectors (like ELMo) In certain cases, rather than fine tuning the entire pre trained model end to end, it can be beneficial to obtained pre trained contextual embeddings , which are fixed contextual representations of each input token generated from the hidden layers of the pre trained model. This should also mitigate most of the out of memory issues. As an example, we include the script extract_features.py which can be used like this: shell Sentence A and Sentence B are separated by the delimiter for sentence pair tasks like question answering and entailment. For single sentence inputs, put one sentence per line and DON'T use the delimiter. echo 'Who was Jim Henson ? Jim Henson was a puppeteer' > /tmp/input.txt python extract_features.py \ input_file /tmp/input.txt \ output_file /tmp/output.jsonl \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ layers 1, 2, 3, 4 \ max_seq_length 128 \ batch_size 8 This will create a JSON file (one line per line of input) containing the BERT activations from each Transformer layer specified by layers ( 1 is the final hidden layer of the Transformer, etc.) Note that this script will produce very large output files (by default, around 15kb for every input token). If you need to maintain alignment between the original and tokenized words (for projecting training labels), see the Tokenization ( tokenization) section below. Note: You may see a message like Could not find trained model in model_dir: /tmp/tmpuB5g5c, running initialization to predict. This message is expected, it just means that we are using the init_from_checkpoint() API rather than the saved model API. If you don't specify a checkpoint or specify an invalid checkpoint, this script will complain. Tokenization For sentence level tasks (or sentence pair) tasks, tokenization is very simple. Just follow the example code in run_classifier.py and extract_features.py . The basic procedure for sentence level tasks is: 1. Instantiate an instance of tokenizer tokenization.FullTokenizer 2. Tokenize the raw text with tokens tokenizer.tokenize(raw_text) . 3. Truncate to the maximum sequence length. (You can use up to 512, but you probably want to use shorter if possible for memory and speed reasons.) 4. Add the CLS and SEP tokens in the right place. Word level and span level tasks (e.g., SQuAD and NER) are more complex, since you need to maintain alignment between your input text and output text so that you can project your training labels. SQuAD is a particularly complex example because the input labels are character based, and SQuAD paragraphs are often longer than our maximum sequence length. See the code in run_squad.py to show how we handle this. Before we describe the general recipe for handling word level tasks, it's important to understand what exactly our tokenizer is doing. It has three main steps: 1. Text normalization : Convert all whitespace characters to spaces, and (for the Uncased model) lowercase the input and strip out accent markers. E.g., John Johanson's, → john johanson's, . 2. Punctuation splitting : Split all punctuation characters on both sides (i.e., add whitespace around all punctuation characters). Punctuation characters are defined as (a) Anything with a P Unicode class, (b) any non letter/number/space ASCII character (e.g., characters like $ which are technically not punctuation). E.g., john johanson's, → john johanson ' s , 3. WordPiece tokenization : Apply whitespace tokenization to the output of the above procedure, and apply WordPiece tokenization to each token separately. (Our implementation is directly based on the one from tensor2tensor , which is linked). E.g., john johanson ' s , → john johan son ' s , The advantage of this scheme is that it is compatible with most existing English tokenizers. For example, imagine that you have a part of speech tagging task which looks like this: Input: John Johanson 's house Labels: NNP NNP POS NN The tokenized output will look like this: Tokens: john johan son ' s house Crucially, this would be the same output as if the raw text were John Johanson's house (with no space before the 's ). If you have a pre tokenized representation with word level annotations, you can simply tokenize each input word independently, and deterministically maintain an original to tokenized alignment: python Input orig_tokens John , Johanson , 's , house labels NNP , NNP , POS , NN Output bert_tokens Token map will be an int > int mapping between the orig_tokens index and the bert_tokens index. orig_to_tok_map tokenizer tokenization.FullTokenizer( vocab_file vocab_file, do_lower_case True) bert_tokens.append( CLS ) for orig_token in orig_tokens: orig_to_tok_map.append(len(bert_tokens)) bert_tokens.extend(tokenizer.tokenize(orig_token)) bert_tokens.append( SEP ) bert_tokens CLS , john , johan , son , ' , s , house , SEP orig_to_tok_map 1, 2, 4, 6 Now orig_to_tok_map can be used to project labels to the tokenized representation. There are common English tokenization schemes which will cause a slight mismatch between how BERT was pre trained. For example, if your input tokenization splits off contractions like do n't , this will cause a mismatch. If it is possible to do so, you should pre process your data to convert these back to raw looking text, but if it's not possible, this mismatch is likely not a big deal. Pre training with BERT We are releasing code to do masked LM and next sentence prediction on an arbitrary text corpus. Note that this is not the exact code that was used for the paper (the original code was written in C++, and had some additional complexity), but this code does generate pre training data as described in the paper. Here's how to run the data generation. The input is a plain text file, with one sentence per line. (It is important that these be actual sentences for the next sentence prediction task). Documents are delimited by empty lines. The output is a set of tf.train.Example s serialized into TFRecord file format. You can perform sentence segmentation with an off the shelf NLP toolkit such as spaCy . The create_pretraining_data.py script will concatenate segments until they reach the maximum sequence length to minimize computational waste from padding (see the script for more details). However, you may want to intentionally add a slight amount of noise to your input data (e.g., randomly truncate 2% of input segments) to make it more robust to non sentential input during fine tuning. This script stores all of the examples for the entire input file in memory, so for large data files you should shard the input file and call the script multiple times. (You can pass in a file glob to run_pretraining.py , e.g., tf_examples.tf_record .) The max_predictions_per_seq is the maximum number of masked LM predictions per sequence. You should set this to around max_seq_length masked_lm_prob (the script doesn't do that automatically because the exact value needs to be passed to both scripts). shell python create_pretraining_data.py \ input_file ./sample_text.txt \ output_file /tmp/tf_examples.tfrecord \ vocab_file $BERT_BASE_DIR/vocab.txt \ do_lower_case True \ max_seq_length 128 \ max_predictions_per_seq 20 \ masked_lm_prob 0.15 \ random_seed 12345 \ dupe_factor 5 Here's how to run the pre training. Do not include init_checkpoint if you are pre training from scratch. The model configuration (including vocab size) is specified in bert_config_file . This demo code only pre trains for a small number of steps (20), but in practice you will probably want to set num_train_steps to 10000 steps or more. The max_seq_length and max_predictions_per_seq parameters passed to run_pretraining.py must be the same as create_pretraining_data.py . shell python run_pretraining.py \ input_file /tmp/tf_examples.tfrecord \ output_dir /tmp/pretraining_output \ do_train True \ do_eval True \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ train_batch_size 32 \ max_seq_length 128 \ max_predictions_per_seq 20 \ num_train_steps 20 \ num_warmup_steps 10 \ learning_rate 2e 5 This will produce an output like this: Eval results global_step 20 loss 0.0979674 masked_lm_accuracy 0.985479 masked_lm_loss 0.0979328 next_sentence_accuracy 1.0 next_sentence_loss 3.45724e 05 Note that since our sample_text.txt file is very small, this example training will overfit that data in only a few steps and produce unrealistically high accuracy numbers. Pre training tips and caveats If using your own vocabulary, make sure to change vocab_size in bert_config.json . If you use a larger vocabulary without changing this, you will likely get NaNs when training on GPU or TPU due to unchecked out of bounds access. If your task has a large domain specific corpus available (e.g., movie reviews or scientific papers ), it will likely be beneficial to run additional steps of pre training on your corpus, starting from the BERT checkpoint. The learning rate we used in the paper was 1e 4. However, if you are doing additional steps of pre training starting from an existing BERT checkpoint, you should use a smaller learning rate (e.g., 2e 5). Current BERT models are English only, but we do plan to release a multilingual model which has been pre trained on a lot of languages in the near future (hopefully by the end of November 2018). Longer sequences are disproportionately expensive because attention is quadratic to the sequence length. In other words, a batch of 64 sequences of length 512 is much more expensive than a batch of 256 sequences of length 128. The fully connected/convolutional cost is the same, but the attention cost is far greater for the 512 length sequences. Therefore, one good recipe is to pre train for, say, 90,000 steps with a sequence length of 128 and then for 10,000 additional steps with a sequence length of 512. The very long sequences are mostly needed to learn positional embeddings, which can be learned fairly quickly. Note that this does require generating the data twice with different values of max_seq_length . If you are pre training from scratch, be prepared that pre training is computationally expensive, especially on GPUs. If you are pre training from scratch, our recommended recipe is to pre train a BERT Base on a single preemptible Cloud TPU v2 , which takes about 2 weeks at a cost of about $500 USD (based on the pricing in October 2018). You will have to scale down the batch size when only training on a single Cloud TPU, compared to what was used in the paper. It is recommended to use the largest batch size that fits into TPU memory. Pre training data We will not be able to release the pre processed datasets used in the paper. For Wikipedia, the recommended pre processing is to download the latest dump , extract the text with WikiExtractor.py , and then apply any necessary cleanup to convert it into plain text. Unfortunately the researchers who collected the BookCorpus no longer have it available for public download. The Project Guttenberg Dataset is a somewhat smaller (200M word) collection of older books that are public domain. Common Crawl is another very large collection of text, but you will likely have to do substantial pre processing and cleanup to extract a usable corpus for pre training BERT. Learning a new WordPiece vocabulary This repository does not include code for learning a new WordPiece vocabulary. The reason is that the code used in the paper was implemented in C++ with dependencies on Google's internal libraries. For English, it is almost always better to just start with our vocabulary and pre trained models. For learning vocabularies of other languages, there are a number of open source options available. However, keep in mind that these are not compatible with our tokenization.py library: Google's SentencePiece library tensor2tensor's WordPiece generation script Rico Sennrich's Byte Pair Encoding library Using BERT in Colab If you want to use BERT with Colab , you can get started with the notebook BERT FineTuning with Cloud TPUs . At the time of this writing (October 31st, 2018), Colab users can access a Cloud TPU completely for free. Note: One per user, availability limited, requires a Google Cloud Platform account with storage (although storage may be purchased with free credit for signing up with GCP), and this capability may not longer be available in the future. Click on the BERT Colab that was just linked for more information. FAQ Is this code compatible with Cloud TPUs? What about GPUs? Yes, all of the code in this repository works out of the box with CPU, GPU, and Cloud TPU. However, GPU training is single GPU only. I am getting out of memory errors, what is wrong? See the section on out of memory issues ( out of memory issues) for more information. Is there a PyTorch version available? There is no official PyTorch implementation. However, NLP researchers from HuggingFace made a PyTorch version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. We were not involved in the creation or maintenance of the PyTorch implementation so please direct any questions towards the authors of that repository. Is there a Chainer version available? There is no official Chainer implementation. However, Sosuke Kobayashi made a Chainer version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. We were not involved in the creation or maintenance of the Chainer implementation so please direct any questions towards the authors of that repository. Will models in other languages be released? Yes, we plan to release a multi lingual BERT model in the near future. We cannot make promises about exactly which languages will be included, but it will likely be a single model which includes most of the languages which have a significantly sized Wikipedia. Will models larger than BERT Large be released? So far we have not attempted to train anything larger than BERT Large . It is possible that we will release larger models if we are able to obtain significant improvements. What license is this library released under? All code and models are released under the Apache 2.0 license. See the LICENSE file for more information. How do I cite BERT? For now, cite the Arxiv paper : @article{devlin2018bert, title {BERT: Pre training of Deep Bidirectional Transformers for Language Understanding}, author {Devlin, Jacob and Chang, Ming Wei and Lee, Kenton and Toutanova, Kristina}, journal {arXiv preprint arXiv:1810.04805}, year {2018} } If we submit the paper to a conference or journal, we will update the BibTeX. Disclaimer This is not an official Google product. Contact information For help or issues using BERT, please submit a GitHub issue. For personal communication related to BERT, please contact Jacob Devlin ( jacobdevlin@google.com ), Ming Wei Chang ( mingweichang@google.com ), or Kenton Lee ( kentonl@google.com ).",Machine Translation,Machine Translation 2418,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Introduction This is fairseq, a sequence to sequence learning toolkit for Torch from Facebook AI Research tailored to Neural Machine Translation (NMT). It implements the convolutional NMT models proposed in Convolutional Sequence to Sequence Learning and A Convolutional Encoder Model for Neural Machine Translation as well as a standard LSTM based model. It features multi GPU training on a single machine as well as fast beam search generation on both CPU and GPU. We provide pre trained models for English to French, English to German and English to Romanian translation. Note, there is now a PyTorch version fairseq py of this toolkit and new development efforts will focus on it. ! Model (fairseq.gif) Citation If you use the code in your paper, then please cite it as: @article{gehring2017convs2s, author {Gehring, Jonas, and Auli, Michael and Grangier, David and Yarats, Denis and Dauphin, Yann N}, title {Convolutional Sequence to Sequence Learning} , journal {ArXiv e prints}, archivePrefix arXiv , eprinttype {arxiv}, eprint {1705.03122}, primaryClass cs.CL , keywords {Computer Science Computation and Language}, year 2017, month May, } and @article{gehring2016convenc, author {Gehring, Jonas, and Auli, Michael and Grangier, David and Dauphin, Yann N}, title {A Convolutional Encoder Model for Neural Machine Translation} , journal {ArXiv e prints}, archivePrefix arXiv , eprinttype {arxiv}, eprint {1611.02344}, primaryClass cs.CL , keywords {Computer Science Computation and Language}, year 2016, month Nov, } Requirements and Installation A computer running macOS or Linux For training new models, you'll also need a NVIDIA GPU and NCCL A Torch installation . For maximum speed, we recommend using LuaJIT and Intel MKL . A recent version nn . The minimum required version is from May 5th, 2017. A simple luarocks install nn is sufficient to update your locally installed version. Install fairseq by cloning the GitHub repository and running luarocks make rocks/fairseq scm 1.rockspec LuaRocks will fetch and build any additional dependencies that may be missing. In order to install the CPU only version (which is only useful for translating new data with an existing model), do luarocks make rocks/fairseq cpu scm 1.rockspec The LuaRocks installation provides a command line tool that includes the following functionality: fairseq preprocess : Data pre processing: build vocabularies and binarize training data fairseq train : Train a new model on one or multiple GPUs fairseq generate : Translate pre processed data with a trained model fairseq generate lines : Translate raw text with a trained model fairseq score : BLEU scoring of generated translations against reference translations fairseq tofloat : Convert a trained model to a CPU model fairseq optimize fconv : Optimize a fully convolutional model for generation. This can also be achieved by passing the fconvfast flag to the generation scripts. Quick Start Evaluating Pre trained Models First, download a pre trained model along with its vocabularies: $ curl tar xvjf This will unpack vocabulary files and a serialized model for English to French translation to wmt14.en fr.fconv cuda/ . Alternatively, use a CPU based model: $ curl tar xvjf Let's use fairseq generate lines to translate some text. This model uses a Byte Pair Encoding (BPE) vocabulary , so we'll have to apply the encoding to the source text. This can be done with apply_bpe.py using the bpecodes file in within wmt14.en fr.fconv cuda/ . @@ is used as a continuation marker and the original text can be easily recovered with e.g. sed s/@@ //g . Prior to BPE, input text needs to be tokenized using tokenizer.perl from mosesdecoder . Here, we use a beam size of 5: $ fairseq generate lines path wmt14.en fr.fconv cuda/model.th7 sourcedict wmt14.en fr.fconv cuda/dict.en.th7 \ targetdict wmt14.en fr.fconv cuda/dict.fr.th7 beam 5 target Dictionary: 44666 types source Dictionary: 44409 types > Why is it rare to discover new marine mam@@ mal species ? S Why is it rare to discover new marine mam@@ mal species ? O Why is it rare to discover new marine mam@@ mal species ? H 0.068684287369251 Pourquoi est il rare de découvrir de nouvelles espèces de mammifères marins ? A 1 1 4 4 6 6 7 11 9 9 9 12 13 This generation script produces four types of output: a line prefixed with S shows the supplied source sentence after applying the vocabulary; O is a copy of the original source sentence; H is the hypothesis along with an average log likelihood and A are attention maxima for each word in the hypothesis (including the end of sentence marker which is omitted from the text). Check below ( pre trained models) for a full list of pre trained models available. Training a New Model Data Pre processing The fairseq source distribution contains an example pre processing script for the IWSLT14 German English corpus. Pre process and binarize the data as follows: $ cd data/ $ bash prepare iwslt14.sh $ cd .. $ TEXT data/iwslt14.tokenized.de en $ fairseq preprocess sourcelang de targetlang en \ trainpref $TEXT/train validpref $TEXT/valid testpref $TEXT/test \ thresholdsrc 3 thresholdtgt 3 destdir data bin/iwslt14.tokenized.de en This will write binarized data that can be used for model training to data bin/iwslt14.tokenized.de en. Training Use fairseq train to train a new model. Here a few example settings that work well for the IWSLT14 dataset: Standard bi directional LSTM model $ mkdir p trainings/blstm $ fairseq train sourcelang de targetlang en datadir data bin/iwslt14.tokenized.de en \ model blstm nhid 512 dropout 0.2 dropout_hid 0 optim adam lr 0.0003125 savedir trainings/blstm Fully convolutional sequence to sequence model $ mkdir p trainings/fconv $ fairseq train sourcelang de targetlang en datadir data bin/iwslt14.tokenized.de en \ model fconv nenclayer 4 nlayer 3 dropout 0.2 optim nag lr 0.25 clip 0.1 \ momentum 0.99 timeavg bptt 0 savedir trainings/fconv Convolutional encoder, LSTM decoder $ mkdir p trainings/convenc $ fairseq train sourcelang de targetlang en datadir data bin/iwslt14.tokenized.de en \ model conv nenclayer 6 dropout 0.2 dropout_hid 0 savedir trainings/convenc By default, fairseq train will use all available GPUs on your machine. Use the CUDA_VISIBLE_DEVICES environment variable to select specific GPUs or ngpus to change the number of GPU devices that will be used. Generation Once your model is trained, you can translate with it using fairseq generate (for binarized data) or fairseq generate lines (for text). Here, we'll do it for a fully convolutional model: Optional: optimize for generation speed $ fairseq optimize fconv input_model trainings/fconv/model_best.th7 output_model trainings/fconv/model_best_opt.th7 Translate some text $ DATA data bin/iwslt14.tokenized.de en $ fairseq generate lines sourcedict $DATA/dict.de.th7 targetdict $DATA/dict.en.th7 \ path trainings/fconv/model_best_opt.th7 beam 10 nbest 2 target Dictionary: 24738 types source Dictionary: 35474 types > eine sprache ist ausdruck des menschlichen geistes . S eine sprache ist ausdruck des menschlichen geistes . O eine sprache ist ausdruck des menschlichen geistes . H 0.23804219067097 a language is expression of human mind . A 2 2 3 4 5 6 7 8 9 H 0.23861141502857 a language is expression of the human mind . A 2 2 3 4 5 7 6 7 9 9 CPU Generation Use fairseq tofloat to convert a trained model to use CPU only operations (this has to be done on a GPU machine): Optional: optimize for generation speed $ fairseq optimize fconv input_model trainings/fconv/model_best.th7 output_model trainings/fconv/model_best_opt.th7 Convert to float $ fairseq tofloat input_model trainings/fconv/model_best_opt.th7 \ output_model trainings/fconv/model_best_opt float.th7 Translate some text $ fairseq generate lines sourcedict $DATA/dict.de.th7 targetdict $DATA/dict.en.th7 \ path trainings/fconv/model_best_opt float.th7 beam 10 nbest 2 > eine sprache ist ausdruck des menschlichen geistes . S eine sprache ist ausdruck des menschlichen geistes . O eine sprache ist ausdruck des menschlichen geistes . H 0.2380430996418 a language is expression of human mind . A 2 2 3 4 5 6 7 8 9 H 0.23861189186573 a language is expression of the human mind . A 2 2 3 4 5 7 6 7 9 9 Pre trained Models We provide the following pre trained fully convolutional sequence to sequence models: wmt14.en fr.fconv cuda.tar.bz2 : Pre trained model for WMT14 English French including vocabularies wmt14.en fr.fconv float.tar.bz2 : CPU version of the above wmt14.en de.fconv cuda.tar.bz2 : Pre trained model for WMT14 English German including vocabularies wmt14.en de.fconv float.tar.bz2 : CPU version of the above wmt16.en ro.fconv cuda.tar.bz2 : Pre trained model for WMT16 English Romanian including vocabularies. This model was trained on the original WMT bitext as well as back translated data provided by Rico Sennrich. wmt16.en ro.fconv float.tar.bz2 : CPU version of the above In addition, we provide pre processed and binarized test sets for the models above: wmt14.en fr.newstest2014.tar.bz2 : newstest2014 test set for WMT14 English French wmt14.en fr.ntst1213.tar.bz2 : newstest2012 and newstest2013 test sets for WMT14 English French wmt14.en de.newstest2014.tar.bz2 : newstest2014 test set for WMT14 English German wmt16.en ro.newstest2014.tar.bz2 : newstest2016 test set for WMT16 English Romanian Generation with the binarized test sets can be run in batch mode as follows, e.g. for English French on a GTX 1080ti: $ curl tar xvjf $ fairseq generate sourcelang en targetlang fr datadir data bin/wmt14.en fr dataset newstest2014 \ path wmt14.en fr.fconv cuda/model.th7 beam 5 batchsize 128 tee /tmp/gen.out ... Translated 3003 sentences (95451 tokens) in 136.3s (700.49 tokens/s) Timings: setup 0.1s (0.1%), encoder 1.9s (1.4%), decoder 108.9s (79.9%), search_results 0.0s (0.0%), search_prune 12.5s (9.2%) BLEU4 43.43, 68.2/49.2/37.4/28.8 (BP 0.996, ratio 1.004, sys_len 92087, ref_len 92448) Word level BLEU scoring: $ grep ^H /tmp/gen.out cut f3 sed 's/@@ //g' > /tmp/gen.out.sys $ grep ^T /tmp/gen.out cut f2 sed 's/@@ //g' > /tmp/gen.out.ref $ fairseq score sys /tmp/gen.out.sys ref /tmp/gen.out.ref BLEU4 40.55, 67.6/46.5/34.0/25.3 (BP 1.000, ratio 0.998, sys_len 81369, ref_len 81194) Join the fairseq community Facebook page: Google group: Contact: jgehring@fb.com (mailto:jgehring@fb.com), michaelauli@fb.com (mailto:michaelauli@fb.com) License fairseq is BSD licensed. The license applies to the pre trained models as well. We also provide an additional patent grant.",Machine Translation,Machine Translation 2423,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Transformer Attention Is All You Need Chainer based Python implementation of Transformer, an attention based seq2seq model without convolution and recurrence. If you want to see the architecture, please see net.py . See Attention Is All You Need , Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, arxiv, 2017. This repository is partly derived from my convolutional seq2seq repo, which is also derived from Chainer's official seq2seq example . Requirement Python 3.6.0+ Chainer 2.0.0+ numpy 1.12.1+ cupy 1.0.0+ (if using gpu) nltk progressbar (You can install all through pip ) and their dependencies Prepare Dataset You can use any parallel corpus. For example, run sh download_wmt.sh which downloads and decompresses training dataset and development dataset from WMT / europal into your current directory. These files and their paths are set in training script train.py as default. How to Run PYTHONIOENCODING utf 8 python u train.py g 0 i DATA_DIR o SAVE_DIR During training, logs for loss, perplexity, word accuracy and time are printed at a certain internval, in addition to validation tests (perplexity and BLEU for generation) every half epoch. And also, generation test is performed and printed for checking training progress. Arguments Some of them is as follows: g : your gpu id. If cpu, set 1 . i DATA_DIR , s SOURCE , t TARGET , svalid SVALID , tvalid TVALID : DATA_DIR directory needs to include a pair of training dataset SOURCE and TARGET with a pair of validation dataset SVALID and TVALID . Each pair should be parallell corpus with line by line sentence alignment. o SAVE_DIR : JSON log report file and a model snapshot will be saved in SAVE_DIR directory (if it does not exist, it will be automatically made). e : max epochs of training corpus. b : minibatch size. u : size of units and word embeddings. l : number of layers in both the encoder and the decoder. source vocab : max size of vocabulary set of source language target vocab : max size of vocabulary set of target language Please see the others by python train.py h . Note This repository does not aim for complete validation of results in the paper, so I have not eagerly confirmed validity of performance. But, I expect my implementation is almost compatible with a model described in the paper. Some differences where I am aware are as follows: Optimization/training strategy. Detailed information about batchsize, parameter initialization, etc. is unclear in the paper. Additionally, the learning rate proposed in the paper may work only with a large batchsize (e.g. 4000) for deep layer nets. I changed warmup_step to 32000 from 4000, though there is room for improvement. I also changed relu into leaky relu in feedforward net layers for easy gradient propagation. Vocabulary set, dataset, preprocessing and evaluation. This repo uses a common word based tokenization, although the paper uses byte pair encoding. Size of token set also differs. Evaluation (validation) is little unfair and incompatible with one in the paper, e.g., even validation set replaces unknown words to a single unk token. Beam search is unused in BLEU calculation. Model size. The setting of a model in this repo is one of base model in the paper, although you can modify some lines for using big model . This code follows some settings used in tensor2tensor repository , which includes a Transformer model. For example, positional encoding used in the repository seems to differ from one in the paper. This code follows the former one.",Machine Translation,Machine Translation 2442,Natural Language Processing,Natural Language Processing,Natural Language Processing,"NLP Translation Models Experimenting with different architectures for neural machine translation I'm experimenting with different deep learning architectures for Neural Machine Translation on a small dataset (English to Romanian from ' which is a small dataset of roughly 8000 English and Romanian phrases. To train a much better model at scale, recommend using the WMT 14 dataset which was used in Gehring et al from Facebook AI Research, or the EuroParl dataset. Beware that this dataset is huge and training will take a while. The ultimate goal of this repo is to understand more recent literature pointing to convolutional neural networks outperforming RNNs and Bi Directional LSTMs on sequence learning tasks. I use the small dataset for prototyping and the larger dataset for final training of the model. The .py files contain the original code for the models. The code is modular, and written usig classes which can be simply lifted as is and put into other models for example the encoder Decoder model class definitions can also be used to train a text generation model (see the Seq_2_Seq repo) Starter Model The benchmark model I use is an encoder decoder model trained with Bahnadau attention, which closely follows the Tensorflow tutorial on the topic. As you can see from Translation.ipynb, the model is trained on a small set of data, and for only few epochs and it doesn't perform as well as it should. Training this model on a larger dataset is left as future work. Stay tuned for more developments. Currently, I am working on Convolutional Translation models which have appeared in recent literature and been shown to outperform RNN's and LSTMs on long sequences. I am training this model on the much larger EuroParl dataset. An updated notebook containing results will be uploaded soon.",Machine Translation,Machine Translation 2458,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Code Mixed Dialog This repository contains the dataset and baseline implementations for the paper A Dataset for Building Code Mixed Goal Oriented Conversation Systems. There is an increasing demand for goal oriented conversation systems which can assist users in various day to day activities such as booking tickets, restaurant reservations, shopping, etc. Most of the existing datasets for building such conversation systems focus on monolingual conversations and there is hardly any work on multilingual and/or code mixed conversations. Such datasets and systems thus do not cater to the multilingual regions of the world, such as India, where it is very common for people to speak more than one language and seamlessly switch between them resulting in code mixed conversations. For example, a Hindi speaking user looking to book a restaurant would typically ask, Kya tum is restaurant mein ek table book karne mein meri help karoge? ( Can you help me in booking a table at this restaurant? ).To facilitate the development of such code mixed conversation models, we build a goal oriented dialog dataset containing code mixed conversations. Specifically, we take the text from the DSTC2 restaurant reservation dataset and create code mixed versions of it in Hindi English, Bengali English, Gujarati English and Tamil English. We also establish initial baselines on this dataset using existing state of the art models like sequence to sequence and Hierarchical Recurrent Encoder Decoder models. The dataset and baseline implementations are provided here. The Dataset The dialogue data for English and code mixed Hindi, Bengali, Gujarati and Tamil are provided in the data directory. The respective native language directories also contain the splits of the vocabulary into: English Words Native Language Words Other Words (Named Entities) Such a split serves as annotation of the words as being code mixed or belonging to the native language. The Baselines There are two baseline models for this dataset: Sequence to Sequence with Attention ( Bahdanau et al., 2015 ) Hierarchical Recurrent Encoder Decoder ( Serban et al., 2015 ) Dependencies tqdm Tensorflow version 1.2 Pandas Preprocessing The baseline models are provided in the code directory. Before running them you need to preprocess the data using the preprocess.py file in the respective baseline directory. The preprocessing is different for both the baselines. You need to provide the source directory in which the train, dev and test data files are and the target directory where the preprocessed files will be dumped: python preprocess.py source_dir ../../data/hindi target_dir ../../data/hindi Training The models can be trained using train_seq2seq.py and train_hred.py files in the code directory. The arguments required are: config_id: The experiment number. data_dir: The directory in which the preprocessed files are dumped (The target_dir in preprocessing step) infer_data: The dataset split(train, dev or test) on which inference should be performed. logs_dir: The directory in which log files should be dumped. checkpoint_dir: The directory in which model checkpoints should be stored. rnn_unit: The cell type (GRU or LSTM) to be used for the RNNs. learning_rate: The initial learning rate for Adam. batch_size: The mini batch size to be used for optimimzation. epochs: The maximum number of epochs to train. max_gradient_norm: The maximum norm of the gradients to be used for gradient clipping. dropout: The keep probability of RNN units. num_layers: The number of layers of RNN to be used for encoding. word_emb_dim: The size of the word embeddings to be used for input to the RNN. hidden_units: The size of RNN cell hidden units. eval_interval: The number of epochs after which validation is to be performed on the dev set. patience: The patience parameter for early stopping. train: To run the model in train mode or test mode. True means train mode is on. debug: To run the code in debug mode or not. In debug mode the code runs on a smaller dataset (67 examples) for only 2 epochs. True means debug mode is on. To run the training: python train_seq2seq.py config_id 1 data_dir ../data/hindi infer_data test logs_dir logs checkpoint_dir checkpoints rnn_unit gru learning_rate 0.0004 batch_size 32 epochs 50 max_gradient_norm 5 dropout 0.75 num_layers 1 word_emb_dim 300 hiden_units 350 eval_interval 1 patience 5 train True debug False Testing To just run inference on the test set use the train flag as False : python train_seq2seq.py config_id 1 data_dir ../data/hindi infer_data test logs_dir logs checkpoint_dir checkpoints rnn_unit gru learning_rate 0.0004 batch_size 32 epochs 50 max_gradient_norm 5 dropout 0.75 num_layers 1 word_emb_dim 300 hiden_units 350 eval_interval 1 patience 5 train False debug False Evaluation The file get_scores.py in the scores directory produces the BLEU (moses and pycoco), ROUGE, per response accuracy and the per dialogue accuracy. We used the BLEU scripts from Google's seq2seq repo for the moses BLEU and the scripts from Microsoft COCO Caption Evaluation for pycoco BLEU. It requires the following 3 arguments: preds_path: The directory where the inference on the test set has dumped its predictions file and labels file. config_id: The experiment number which is appended to the predictions' filename and labels' filename. lang: Can be one of 'english', 'hindi', 'bengali', 'gujarati' and 'tamil'.",Machine Translation,Machine Translation 2462,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Text Corrector Software Text Corrector Software uses TensorFlow to train sequence to sequence models that are capable of automatically correcting small grammatical errors in conversational written English (e.g. SMS messages). It does this by taking English text samples that are known to be mostly grammatically correct and randomly introducing a handful of small grammatical errors (e.g. removing articles) to each sentence to produce input output pairs (where the output is the original sample), which are then used to train a sequence to sequence model. See this blog post for a more thorough write up of this work. Motivation While context sensitive spell check systems are able to automatically correct a large number of input errors in instant messaging, email, and SMS messages, they are unable to correct even simple grammatical errors. For example, the message I'm going to store would be unaffected by typical autocorrection systems, when the user most likely intendend to write I'm going to _the_ store . These kinds of simple grammatical mistakes are common in so called learner English , and constructing systems capable of detecting and correcting these mistakes has been the subect of multiple CoNLL shared tasks . The goal of this project is to train sequence to sequence models that are capable of automatically correcting such errors. Specifically, the models are trained to provide a function mapping a potentially errant input sequence to a sequence with all (small) grammatical errors corrected. Given these models, it would be possible to construct tools to help correct these simple errors in written communications, such as emails, instant messaging, etc. Correcting Grammatical Errors with Deep Learning The basic idea behind this project is that we can generate large training datasets for the task of grammar correction by starting with grammatically correct samples and introducing small errors to produce input output pairs, which can then be used to train a sequence to sequence models. The details of how we construct these datasets, train models using them, and produce predictions for this task are described below. Datasets To create a dataset for Deep Text Corrector models, we start with a large collection of mostly grammatically correct samples of conversational written English. The primary dataset considered in this project is the Cornell Movie Dialogs Corpus , which contains over 300k lines from movie scripts. This was the largest collection of conversational written English I could find that was mostly grammatically correct. Given a sample of text like this, the next step is to generate input output pairs to be used during training. This is done by: 1. Drawing a sample sentence from the dataset. 2. Setting the input sequence to this sentence after randomly applying certain perturbations. 3. Setting the output sequence to the unperturbed sentence. where the perturbations applied in step (2) are intended to introduce small grammatical errors which we would like the model to learn to correct. Thus far, these perturbations are limited to the: subtraction of articles (a, an, the) subtraction of the second part of a verb contraction (e.g. 've , 'll , 's , 'm ) replacement of a few common homophones with one of their counterparts (e.g. replacing their with there , then with than ) The rates with which these perturbations are introduced are loosely based on figures taken from the CoNLL 2014 Shared Task on Grammatical Error Correction . In this project, each perturbation is applied in 25% of cases where it could potentially be applied. Training To artificially increase the dataset when training a sequence model, we perform the sampling strategy described above multiple times to arrive at 2 3x the number of input output pairs. Given this augmented dataset, training proceeds in a very similar manner to TensorFlow's sequence to sequence tutorial . That is, we train a sequence to sequence model using LSTM encoders and decoders with an attention mechanism as described in Bahdanau et al., 2014 using stochastic gradient descent. Decoding Instead of using the most probable decoding according to the seq2seq model, this project takes advantage of the unique structure of the problem to impose the prior that all tokens in a decoded sequence should either exist in the input sequence or belong to a set of corrective tokens. The corrective token set is constructed during training and contains all tokens seen in the target, but not the source, for at least one sample in the training set. The intuition here is that the errors seen during training involve the misuse of a relatively small vocabulary of common words (e.g. the , an , their ) and that the model should only be allowed to perform corrections in this domain. This prior is carried out through a modification to the seq2seq model's decoding loop in addition to a post processing step that resolves out of vocabulary (OOV) tokens: Biased Decoding To restrict the decoding such that it only ever chooses tokens from the input sequence or corrective token set, this project applies a binary mask to the model's logits prior to extracting the prediction to be fed into the next time step. This mask is constructed such that mask i 1.0 if (i in input or corrective_tokens) else 0.0 . Since this mask is applited to the result of a softmax transormation (which guarantees all outputs are non negative), we can be sure that only input or corrective tokens are ever selected. Note that this logic is not used during training, as this would only serve to eliminate potentially useful signal from the model. Handling OOV Tokens Since the decoding bias described above is applied within the truncated vocabulary used by the model, we will still see the unknown token in its output for any OOV tokens. The more generic problem of resolving these OOV tokens is non trivial (e.g. see Addressing the Rare Word Problem in NMT ), but in this project we can again take advantage of its unique structure to create a fairly straightforward OOV token resolution scheme. That is, if we assume the sequence of OOV tokens in the input is equal to the sequence of OOV tokens in the output sequence, then we can trivially assign the appropriate token to each unknown token encountered int he decoding. Empirically, and intuitively, this appears to be an appropriate assumption, as the relatively simple class of errors these models are being trained to address should never include mistakes that warrant the insertion or removal of a rare token. Experiments and Results Below are some anecdotal and aggregate results from experiments using the Deep Text Corrector model with the Cornell Movie Dialogs Corpus . The dataset consists of 304,713 lines from movie scripts, of which 243,768 lines were used to train the model and 30,474 lines each were used for the validation and testing sets. The sets were selected such that no lines from the same movie were present in both the training and testing sets. The model being evaluated below is a sequence to sequence model, with attention, where the encoder and decoder were both 2 layer, 512 hidden unit LSTMs. The model was trained with a vocabulary of the 2k most common words seen in the training set. Aggregate Performance Below are reported the BLEU scores and accuracy numbers over the test dataset for both a trained model and a baseline, where the baseline is the identity function (which assumes no errors exist in the input). You'll notice that the model outperforms this baseline for all bucket sizes in terms of accuracy, and outperforms all but one in terms of BLEU score. This tells us that applying the Deep Text Corrector model to a potentially errant writing sample would, on average, result in a more grammatically correct writing sample. Anyone who tends to make errors similar to those the model has been trained on could therefore benefit from passing their messages through this model. Bucket 0: (10, 10) Baseline BLEU 0.8341 Model BLEU 0.8516 Baseline Accuracy: 0.9083 Model Accuracy: 0.9384 Bucket 1: (15, 15) Baseline BLEU 0.8850 Model BLEU 0.8860 Baseline Accuracy: 0.8156 Model Accuracy: 0.8491 Bucket 2: (20, 20) Baseline BLEU 0.8876 Model BLEU 0.8880 Baseline Accuracy: 0.7291 Model Accuracy: 0.7817 Bucket 3: (40, 40) Baseline BLEU 0.9099 Model BLEU 0.9045 Baseline Accuracy: 0.6073 Model Accuracy: 0.6425 Examples Decoding a sentence with a missing article: In 31 : decode( Kvothe went to market ) Out 31 : 'Kvothe went to the market' Decoding a sentence with then/than confusion: In 30 : decode( the Cardinals did better then the Cubs in the offseason ) Out 30 : 'the Cardinals did better than the Cubs in the offseason' Implementation Details This project reuses and slightly extends TensorFlow's Seq2SeqModel , which itself implements a sequence to sequence model with an attention mechanism as described in The primary contributions of this project are: data_reader.py : an abstract class that defines the interface for classes which are capable of reading a source dataset and producing input output pairs, where the input is a grammatically incorrect variant of a source sentence and the output is the original sentence. text_corrector_data_readers.py : contains a few implementations of DataReader , one over the Penn Treebank dataset and one over the Cornell Movie Dialogs Corpus . text_corrector_models.py : contains a version of Seq2SeqModel modified such that it implements the logic described in Biased Decoding ( biased decoding) correct_text.py : a collection of helper functions that together allow for the training of a model and the usage of it to decode errant input sequences (at test time). The decode method defined here implements the OOV token resolution logic ( handling oov tokens). This also defines a main method, and can be invoked from the command line. It was largely derived from TensorFlow's translate.py . TextCorrector.ipynb : an IPython notebook which ties together all of the above pieces to allow for the training and evaluation of the model in an interactive fashion. Example Usage Note: this project requires TensorFlow version > 0.11. See this page for setup instructions. Preprocess Movie Dialog Data python preprocessors/preprocess_movie_dialogs.py raw_data movie_lines.txt \ out_file preprocessed_movie_lines.txt This preprocessed file can then be split up however you like to create training, validation, and testing sets. Training: python correct_text.py train_path /movie_dialog_train.txt \ val_path /movie_dialog_val.txt \ config DefaultMovieDialogConfig \ data_reader_type MovieDialogReader \ model_path /movie_dialog_model Testing: python correct_text.py test_path /movie_dialog_test.txt \ config DefaultMovieDialogConfig \ data_reader_type MovieDialogReader \ model_path /movie_dialog_model \ decode",Machine Translation,Machine Translation 2463,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Text Corrector Software Text Corrector Software uses TensorFlow to train sequence to sequence models that are capable of automatically correcting small grammatical errors in conversational written English (e.g. SMS messages). It does this by taking English text samples that are known to be mostly grammatically correct and randomly introducing a handful of small grammatical errors (e.g. removing articles) to each sentence to produce input output pairs (where the output is the original sample), which are then used to train a sequence to sequence model. See this blog post for a more thorough write up of this work. Motivation While context sensitive spell check systems are able to automatically correct a large number of input errors in instant messaging, email, and SMS messages, they are unable to correct even simple grammatical errors. For example, the message I'm going to store would be unaffected by typical autocorrection systems, when the user most likely intendend to write I'm going to _the_ store . These kinds of simple grammatical mistakes are common in so called learner English , and constructing systems capable of detecting and correcting these mistakes has been the subect of multiple CoNLL shared tasks . The goal of this project is to train sequence to sequence models that are capable of automatically correcting such errors. Specifically, the models are trained to provide a function mapping a potentially errant input sequence to a sequence with all (small) grammatical errors corrected. Given these models, it would be possible to construct tools to help correct these simple errors in written communications, such as emails, instant messaging, etc. Correcting Grammatical Errors with Deep Learning The basic idea behind this project is that we can generate large training datasets for the task of grammar correction by starting with grammatically correct samples and introducing small errors to produce input output pairs, which can then be used to train a sequence to sequence models. The details of how we construct these datasets, train models using them, and produce predictions for this task are described below. Datasets To create a dataset for Deep Text Corrector models, we start with a large collection of mostly grammatically correct samples of conversational written English. The primary dataset considered in this project is the Cornell Movie Dialogs Corpus , which contains over 300k lines from movie scripts. This was the largest collection of conversational written English I could find that was mostly grammatically correct. Given a sample of text like this, the next step is to generate input output pairs to be used during training. This is done by: 1. Drawing a sample sentence from the dataset. 2. Setting the input sequence to this sentence after randomly applying certain perturbations. 3. Setting the output sequence to the unperturbed sentence. where the perturbations applied in step (2) are intended to introduce small grammatical errors which we would like the model to learn to correct. Thus far, these perturbations are limited to the: subtraction of articles (a, an, the) subtraction of the second part of a verb contraction (e.g. 've , 'll , 's , 'm ) replacement of a few common homophones with one of their counterparts (e.g. replacing their with there , then with than ) The rates with which these perturbations are introduced are loosely based on figures taken from the CoNLL 2014 Shared Task on Grammatical Error Correction . In this project, each perturbation is applied in 25% of cases where it could potentially be applied. Training To artificially increase the dataset when training a sequence model, we perform the sampling strategy described above multiple times to arrive at 2 3x the number of input output pairs. Given this augmented dataset, training proceeds in a very similar manner to TensorFlow's sequence to sequence tutorial . That is, we train a sequence to sequence model using LSTM encoders and decoders with an attention mechanism as described in Bahdanau et al., 2014 using stochastic gradient descent. Decoding Instead of using the most probable decoding according to the seq2seq model, this project takes advantage of the unique structure of the problem to impose the prior that all tokens in a decoded sequence should either exist in the input sequence or belong to a set of corrective tokens. The corrective token set is constructed during training and contains all tokens seen in the target, but not the source, for at least one sample in the training set. The intuition here is that the errors seen during training involve the misuse of a relatively small vocabulary of common words (e.g. the , an , their ) and that the model should only be allowed to perform corrections in this domain. This prior is carried out through a modification to the seq2seq model's decoding loop in addition to a post processing step that resolves out of vocabulary (OOV) tokens: Biased Decoding To restrict the decoding such that it only ever chooses tokens from the input sequence or corrective token set, this project applies a binary mask to the model's logits prior to extracting the prediction to be fed into the next time step. This mask is constructed such that mask i 1.0 if (i in input or corrective_tokens) else 0.0 . Since this mask is applited to the result of a softmax transormation (which guarantees all outputs are non negative), we can be sure that only input or corrective tokens are ever selected. Note that this logic is not used during training, as this would only serve to eliminate potentially useful signal from the model. Handling OOV Tokens Since the decoding bias described above is applied within the truncated vocabulary used by the model, we will still see the unknown token in its output for any OOV tokens. The more generic problem of resolving these OOV tokens is non trivial (e.g. see Addressing the Rare Word Problem in NMT ), but in this project we can again take advantage of its unique structure to create a fairly straightforward OOV token resolution scheme. That is, if we assume the sequence of OOV tokens in the input is equal to the sequence of OOV tokens in the output sequence, then we can trivially assign the appropriate token to each unknown token encountered int he decoding. Empirically, and intuitively, this appears to be an appropriate assumption, as the relatively simple class of errors these models are being trained to address should never include mistakes that warrant the insertion or removal of a rare token. Experiments and Results Below are some anecdotal and aggregate results from experiments using the Deep Text Corrector model with the Cornell Movie Dialogs Corpus . The dataset consists of 304,713 lines from movie scripts, of which 243,768 lines were used to train the model and 30,474 lines each were used for the validation and testing sets. The sets were selected such that no lines from the same movie were present in both the training and testing sets. The model being evaluated below is a sequence to sequence model, with attention, where the encoder and decoder were both 2 layer, 512 hidden unit LSTMs. The model was trained with a vocabulary of the 2k most common words seen in the training set. Aggregate Performance Below are reported the BLEU scores and accuracy numbers over the test dataset for both a trained model and a baseline, where the baseline is the identity function (which assumes no errors exist in the input). You'll notice that the model outperforms this baseline for all bucket sizes in terms of accuracy, and outperforms all but one in terms of BLEU score. This tells us that applying the Deep Text Corrector model to a potentially errant writing sample would, on average, result in a more grammatically correct writing sample. Anyone who tends to make errors similar to those the model has been trained on could therefore benefit from passing their messages through this model. Bucket 0: (10, 10) Baseline BLEU 0.8341 Model BLEU 0.8516 Baseline Accuracy: 0.9083 Model Accuracy: 0.9384 Bucket 1: (15, 15) Baseline BLEU 0.8850 Model BLEU 0.8860 Baseline Accuracy: 0.8156 Model Accuracy: 0.8491 Bucket 2: (20, 20) Baseline BLEU 0.8876 Model BLEU 0.8880 Baseline Accuracy: 0.7291 Model Accuracy: 0.7817 Bucket 3: (40, 40) Baseline BLEU 0.9099 Model BLEU 0.9045 Baseline Accuracy: 0.6073 Model Accuracy: 0.6425 Examples Decoding a sentence with a missing article: In 31 : decode( Kvothe went to market ) Out 31 : 'Kvothe went to the market' Decoding a sentence with then/than confusion: In 30 : decode( the Cardinals did better then the Cubs in the offseason ) Out 30 : 'the Cardinals did better than the Cubs in the offseason' Implementation Details This project reuses and slightly extends TensorFlow's Seq2SeqModel , which itself implements a sequence to sequence model with an attention mechanism as described in The primary contributions of this project are: data_reader.py : an abstract class that defines the interface for classes which are capable of reading a source dataset and producing input output pairs, where the input is a grammatically incorrect variant of a source sentence and the output is the original sentence. text_corrector_data_readers.py : contains a few implementations of DataReader , one over the Penn Treebank dataset and one over the Cornell Movie Dialogs Corpus . text_corrector_models.py : contains a version of Seq2SeqModel modified such that it implements the logic described in Biased Decoding ( biased decoding) correct_text.py : a collection of helper functions that together allow for the training of a model and the usage of it to decode errant input sequences (at test time). The decode method defined here implements the OOV token resolution logic ( handling oov tokens). This also defines a main method, and can be invoked from the command line. It was largely derived from TensorFlow's translate.py . TextCorrector.ipynb : an IPython notebook which ties together all of the above pieces to allow for the training and evaluation of the model in an interactive fashion. Example Usage Note: this project requires TensorFlow version > 0.11. See this page for setup instructions. Preprocess Movie Dialog Data python preprocessors/preprocess_movie_dialogs.py raw_data movie_lines.txt \ out_file preprocessed_movie_lines.txt This preprocessed file can then be split up however you like to create training, validation, and testing sets. Training: python correct_text.py train_path /movie_dialog_train.txt \ val_path /movie_dialog_val.txt \ config DefaultMovieDialogConfig \ data_reader_type MovieDialogReader \ model_path /movie_dialog_model Testing: python correct_text.py test_path /movie_dialog_test.txt \ config DefaultMovieDialogConfig \ data_reader_type MovieDialogReader \ model_path /movie_dialog_model \ decode",Machine Translation,Machine Translation 2468,Natural Language Processing,Natural Language Processing,Natural Language Processing,"SimpleRecurrentUnits SRU SRU based acoustic model for merlin SRU introduction >Tao Lei《Simple Recurrent Units for Highly Parallelizable Recurrence》 SRU is applied in acoustic model by DabiaoMa ZhibaSu WenxuanWang ChengZou YuhaoLu @ Turing Robot data_preprocess >file folder for cmp( acoustic feature )and label (input text label, see HTS label) data_train >file folder for model and training files. 基于SRU的声学模型 for merlin 马达标 苏志霸 王文轩 邹城 陆羽皓 @图灵机器人 data_preprocess >放置训练用的特征(cmp文件)和label(lab文件)以及对应的数据索引文件(list)。 data_train >训练的代码,包括数据载入函数,训练文件和模型定义文件,保存每一个epoch训练出来的trainer文件,以及从trainer提取出model的代码。",Machine Translation,Machine Translation 2469,Natural Language Processing,Natural Language Processing,Natural Language Processing,"BERT \ \ \ \ \ New February 7th, 2019: TfHub Module \ \ \ \ \ BERT has been uploaded to TensorFlow Hub . See run_classifier_with_tfhub.py for an example of how to use the TF Hub module, or run an example in the browser on Colab . \ \ \ \ \ New November 23rd, 2018: Un normalized multilingual model + Thai + Mongolian \ \ \ \ \ We uploaded a new multilingual model which does not perform any normalization on the input (no lower casing, accent stripping, or Unicode normalization), and additionally inclues Thai and Mongolian. It is recommended to use this version for developing multilingual models, especially on languages with non Latin alphabets. This does not require any code changes, and can be downloaded here: BERT Base, Multilingual Cased : 104 languages, 12 layer, 768 hidden, 12 heads, 110M parameters \ \ \ \ \ New November 15th, 2018: SOTA SQuAD 2.0 System \ \ \ \ \ We released code changes to reproduce our 83% F1 SQuAD 2.0 system, which is currently 1st place on the leaderboard by 3%. See the SQuAD 2.0 section of the README for details. \ \ \ \ \ New November 5th, 2018: Third party PyTorch and Chainer versions of BERT available \ \ \ \ \ NLP researchers from HuggingFace made a PyTorch version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. Sosuke Kobayashi also made a Chainer version of BERT available (Thanks!) We were not involved in the creation or maintenance of the PyTorch implementation so please direct any questions towards the authors of that repository. \ \ \ \ \ New November 3rd, 2018: Multilingual and Chinese models available \ \ \ \ \ We have made two new BERT models available: BERT Base, Multilingual (Not recommended, use Multilingual Cased instead) : 102 languages, 12 layer, 768 hidden, 12 heads, 110M parameters BERT Base, Chinese : Chinese Simplified and Traditional, 12 layer, 768 hidden, 12 heads, 110M parameters We use character based tokenization for Chinese, and WordPiece tokenization for all other languages. Both models should work out of the box without any code changes. We did update the implementation of BasicTokenizer in tokenization.py to support Chinese character tokenization, so please update if you forked it. However, we did not change the tokenization API. For more, see the Multilingual README . \ \ \ \ \ End new information \ \ \ \ \ Introduction BERT , or B idirectional E ncoder R epresentations from T ransformers, is a new method of pre training language representations which obtains state of the art results on a wide array of Natural Language Processing (NLP) tasks. Our academic paper which describes BERT in detail and provides full results on a number of tasks can be found here: To give a few numbers, here are the results on the SQuAD v1.1 question answering task: SQuAD v1.1 Leaderboard (Oct 8th 2018) Test EM Test F1 : : : : 1st Place Ensemble BERT 87.4 93.2 2nd Place Ensemble nlnet 86.0 91.7 1st Place Single Model BERT 85.1 91.8 2nd Place Single Model nlnet 83.5 90.1 And several natural language inference tasks: System MultiNLI Question NLI SWAG : : : : : : BERT 86.7 91.1 86.3 OpenAI GPT (Prev. SOTA) 82.2 88.1 75.0 Plus many other tasks. Moreover, these results were all obtained with almost no task specific neural network architecture design. If you already know what BERT is and you just want to get started, you can download the pre trained models ( pre trained models) and run a state of the art fine tuning ( fine tuning with bert) in only a few minutes. What is BERT? BERT is a method of pre training language representations, meaning that we train a general purpose language understanding model on a large text corpus (like Wikipedia), and then use that model for downstream NLP tasks that we care about (like question answering). BERT outperforms previous methods because it is the first unsupervised , deeply bidirectional system for pre training NLP. Unsupervised means that BERT was trained using only a plain text corpus, which is important because an enormous amount of plain text data is publicly available on the web in many languages. Pre trained representations can also either be context free or contextual , and contextual representations can further be unidirectional or bidirectional . Context free models such as word2vec or GloVe generate a single word embedding representation for each word in the vocabulary, so bank would have the same representation in bank deposit and river bank . Contextual models instead generate a representation of each word that is based on the other words in the sentence. BERT was built upon recent work in pre training contextual representations — including Semi supervised Sequence Learning , Generative Pre Training , ELMo , and ULMFit — but crucially these models are all unidirectional or shallowly bidirectional . This means that each word is only contextualized using the words to its left (or right). For example, in the sentence I made a bank deposit the unidirectional representation of bank is only based on I made a but not deposit . Some previous work does combine the representations from separate left context and right context models, but only in a shallow manner. BERT represents bank using both its left and right context — I made a ... deposit — starting from the very bottom of a deep neural network, so it is deeply bidirectional . BERT uses a simple approach for this: We mask out 15% of the words in the input, run the entire sequence through a deep bidirectional Transformer encoder, and then predict only the masked words. For example: Input: the man went to the MASK1 . he bought a MASK2 of milk. Labels: MASK1 store; MASK2 gallon In order to learn relationships between sentences, we also train on a simple task which can be generated from any monolingual corpus: Given two sentences A and B , is B the actual next sentence that comes after A , or just a random sentence from the corpus? Sentence A: the man went to the store . Sentence B: he bought a gallon of milk . Label: IsNextSentence Sentence A: the man went to the store . Sentence B: penguins are flightless . Label: NotNextSentence We then train a large model (12 layer to 24 layer Transformer) on a large corpus (Wikipedia + BookCorpus ) for a long time (1M update steps), and that's BERT. Using BERT has two stages: Pre training and fine tuning . Pre training is fairly expensive (four days on 4 to 16 Cloud TPUs), but is a one time procedure for each language (current models are English only, but multilingual models will be released in the near future). We are releasing a number of pre trained models from the paper which were pre trained at Google. Most NLP researchers will never need to pre train their own model from scratch. Fine tuning is inexpensive. All of the results in the paper can be replicated in at most 1 hour on a single Cloud TPU, or a few hours on a GPU, starting from the exact same pre trained model. SQuAD, for example, can be trained in around 30 minutes on a single Cloud TPU to achieve a Dev F1 score of 91.0%, which is the single system state of the art. The other important aspect of BERT is that it can be adapted to many types of NLP tasks very easily. In the paper, we demonstrate state of the art results on sentence level (e.g., SST 2), sentence pair level (e.g., MultiNLI), word level (e.g., NER), and span level (e.g., SQuAD) tasks with almost no task specific modifications. What has been released in this repository? We are releasing the following: TensorFlow code for the BERT model architecture (which is mostly a standard Transformer architecture). Pre trained checkpoints for both the lowercase and cased version of BERT Base and BERT Large from the paper. TensorFlow code for push button replication of the most important fine tuning experiments from the paper, including SQuAD, MultiNLI, and MRPC. All of the code in this repository works out of the box with CPU, GPU, and Cloud TPU. Pre trained models We are releasing the BERT Base and BERT Large models from the paper. Uncased means that the text has been lowercased before WordPiece tokenization, e.g., John Smith becomes john smith . The Uncased model also strips out any accent markers. Cased means that the true case and accent markers are preserved. Typically, the Uncased model is better unless you know that case information is important for your task (e.g., Named Entity Recognition or Part of Speech tagging). These models are all released under the same license as the source code (Apache 2.0). For information about the Multilingual and Chinese model, see the Multilingual README . When using a cased model, make sure to pass do_lower False to the training scripts. (Or pass do_lower_case False directly to FullTokenizer if you're using your own script.) The links to the models are here (right click, 'Save link as...' on the name): BERT Base, Uncased : 12 layer, 768 hidden, 12 heads, 110M parameters BERT Large, Uncased : 24 layer, 1024 hidden, 16 heads, 340M parameters BERT Base, Cased : 12 layer, 768 hidden, 12 heads , 110M parameters BERT Large, Cased : 24 layer, 1024 hidden, 16 heads, 340M parameters BERT Base, Multilingual Cased (New, recommended) : 104 languages, 12 layer, 768 hidden, 12 heads, 110M parameters BERT Base, Multilingual Uncased (Orig, not recommended) (Not recommended, use Multilingual Cased instead) : 102 languages, 12 layer, 768 hidden, 12 heads, 110M parameters BERT Base, Chinese : Chinese Simplified and Traditional, 12 layer, 768 hidden, 12 heads, 110M parameters Each .zip file contains three items: A TensorFlow checkpoint ( bert_model.ckpt ) containing the pre trained weights (which is actually 3 files). A vocab file ( vocab.txt ) to map WordPiece to word id. A config file ( bert_config.json ) which specifies the hyperparameters of the model. Fine tuning with BERT Important : All results on the paper were fine tuned on a single Cloud TPU, which has 64GB of RAM. It is currently not possible to re produce most of the BERT Large results on the paper using a GPU with 12GB 16GB of RAM, because the maximum batch size that can fit in memory is too small. We are working on adding code to this repository which allows for much larger effective batch size on the GPU. See the section on out of memory issues ( out of memory issues) for more details. This code was tested with TensorFlow 1.11.0. It was tested with Python2 and Python3 (but more thoroughly with Python2, since this is what's used internally in Google). The fine tuning examples which use BERT Base should be able to run on a GPU that has at least 12GB of RAM using the hyperparameters given. Fine tuning with Cloud TPUs Most of the examples below assumes that you will be running training/evaluation on your local machine, using a GPU like a Titan X or GTX 1080. However, if you have access to a Cloud TPU that you want to train on, just add the following flags to run_classifier.py or run_squad.py : use_tpu True \ tpu_name $TPU_NAME Please see the Google Cloud TPU tutorial for how to use Cloud TPUs. Alternatively, you can use the Google Colab notebook BERT FineTuning with Cloud TPUs . On Cloud TPUs, the pretrained model and the output directory will need to be on Google Cloud Storage. For example, if you have a bucket named some_bucket , you might use the following flags instead: output_dir gs://some_bucket/my_output_dir/ The unzipped pre trained model files can also be found in the Google Cloud Storage folder gs://bert_models/2018_10_18 . For example: export BERT_BASE_DIR gs://bert_models/2018_10_18/uncased_L 12_H 768_A 12 Sentence (and sentence pair) classification tasks Before running this example you must download the GLUE data by running this script and unpack it to some directory $GLUE_DIR . Next, download the BERT Base checkpoint and unzip it to some directory $BERT_BASE_DIR . This example code fine tunes BERT Base on the Microsoft Research Paraphrase Corpus (MRPC) corpus, which only contains 3,600 examples and can fine tune in a few minutes on most GPUs. shell export BERT_BASE_DIR /path/to/bert/uncased_L 12_H 768_A 12 export GLUE_DIR /path/to/glue python run_classifier.py \ task_name MRPC \ do_train true \ do_eval true \ data_dir $GLUE_DIR/MRPC \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ max_seq_length 128 \ train_batch_size 32 \ learning_rate 2e 5 \ num_train_epochs 3.0 \ output_dir /tmp/mrpc_output/ You should see output like this: Eval results eval_accuracy 0.845588 eval_loss 0.505248 global_step 343 loss 0.505248 This means that the Dev set accuracy was 84.55%. Small sets like MRPC have a high variance in the Dev set accuracy, even when starting from the same pre training checkpoint. If you re run multiple times (making sure to point to different output_dir ), you should see results between 84% and 88%. A few other pre trained models are implemented off the shelf in run_classifier.py , so it should be straightforward to follow those examples to use BERT for any single sentence or sentence pair classification task. Note: You might see a message Running train on CPU . This really just means that it's running on something other than a Cloud TPU, which includes a GPU. Prediction from classifier Once you have trained your classifier you can use it in inference mode by using the do_predict true command. You need to have a file named test.tsv in the input folder. Output will be created in file called test_results.tsv in the output folder. Each line will contain output for each sample, columns are the class probabilities. shell export BERT_BASE_DIR /path/to/bert/uncased_L 12_H 768_A 12 export GLUE_DIR /path/to/glue export TRAINED_CLASSIFIER /path/to/fine/tuned/classifier python run_classifier.py \ task_name MRPC \ do_predict true \ data_dir $GLUE_DIR/MRPC \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $TRAINED_CLASSIFIER \ max_seq_length 128 \ output_dir /tmp/mrpc_output/ SQuAD 1.1 The Stanford Question Answering Dataset (SQuAD) is a popular question answering benchmark dataset. BERT (at the time of the release) obtains state of the art results on SQuAD with almost no task specific network architecture modifications or data augmentation. However, it does require semi complex data pre processing and post processing to deal with (a) the variable length nature of SQuAD context paragraphs, and (b) the character level answer annotations which are used for SQuAD training. This processing is implemented and documented in run_squad.py . To run on SQuAD, you will first need to download the dataset. The SQuAD website does not seem to link to the v1.1 datasets any longer, but the necessary files can be found here: train v1.1.json dev v1.1.json evaluate v1.1.py Download these to some directory $SQUAD_DIR . The state of the art SQuAD results from the paper currently cannot be reproduced on a 12GB 16GB GPU due to memory constraints (in fact, even batch size 1 does not seem to fit on a 12GB GPU using BERT Large ). However, a reasonably strong BERT Base model can be trained on the GPU with these hyperparameters: shell python run_squad.py \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ do_train True \ train_file $SQUAD_DIR/train v1.1.json \ do_predict True \ predict_file $SQUAD_DIR/dev v1.1.json \ train_batch_size 12 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir /tmp/squad_base/ The dev set predictions will be saved into a file called predictions.json in the output_dir : shell python $SQUAD_DIR/evaluate v1.1.py $SQUAD_DIR/dev v1.1.json ./squad/predictions.json Which should produce an output like this: shell { f1 : 88.41249612335034, exact_match : 81.2488174077578} You should see a result similar to the 88.5% reported in the paper for BERT Base . If you have access to a Cloud TPU, you can train with BERT Large . Here is a set of hyperparameters (slightly different than the paper) which consistently obtain around 90.5% 91.0% F1 single system trained only on SQuAD: shell python run_squad.py \ vocab_file $BERT_LARGE_DIR/vocab.txt \ bert_config_file $BERT_LARGE_DIR/bert_config.json \ init_checkpoint $BERT_LARGE_DIR/bert_model.ckpt \ do_train True \ train_file $SQUAD_DIR/train v1.1.json \ do_predict True \ predict_file $SQUAD_DIR/dev v1.1.json \ train_batch_size 24 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir gs://some_bucket/squad_large/ \ use_tpu True \ tpu_name $TPU_NAME For example, one random run with these parameters produces the following Dev scores: shell { f1 : 90.87081895814865, exact_match : 84.38978240302744} If you fine tune for one epoch on TriviaQA before this the results will be even better, but you will need to convert TriviaQA into the SQuAD json format. SQuAD 2.0 This model is also implemented and documented in run_squad.py . To run on SQuAD 2.0, you will first need to download the dataset. The necessary files can be found here: train v2.0.json dev v2.0.json evaluate v2.0.py Download these to some directory $SQUAD_DIR . On Cloud TPU you can run with BERT Large as follows: shell python run_squad.py \ vocab_file $BERT_LARGE_DIR/vocab.txt \ bert_config_file $BERT_LARGE_DIR/bert_config.json \ init_checkpoint $BERT_LARGE_DIR/bert_model.ckpt \ do_train True \ train_file $SQUAD_DIR/train v2.0.json \ do_predict True \ predict_file $SQUAD_DIR/dev v2.0.json \ train_batch_size 24 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir gs://some_bucket/squad_large/ \ use_tpu True \ tpu_name $TPU_NAME \ version_2_with_negative True We assume you have copied everything from the output directory to a local directory called ./squad/. The initial dev set predictions will be at ./squad/predictions.json and the differences between the score of no answer ( ) and the best non null answer for each question will be in the file ./squad/null_odds.json Run this script to tune a threshold for predicting null versus non null answers: python $SQUAD_DIR/evaluate v2.0.py $SQUAD_DIR/dev v2.0.json ./squad/predictions.json na prob file ./squad/null_odds.json Assume the script outputs best_f1_thresh THRESH. (Typical values are between 1.0 and 5.0). You can now re run the model to generate predictions with the derived threshold or alternatively you can extract the appropriate answers from ./squad/nbest_predictions.json. shell python run_squad.py \ vocab_file $BERT_LARGE_DIR/vocab.txt \ bert_config_file $BERT_LARGE_DIR/bert_config.json \ init_checkpoint $BERT_LARGE_DIR/bert_model.ckpt \ do_train False \ train_file $SQUAD_DIR/train v2.0.json \ do_predict True \ predict_file $SQUAD_DIR/dev v2.0.json \ train_batch_size 24 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir gs://some_bucket/squad_large/ \ use_tpu True \ tpu_name $TPU_NAME \ version_2_with_negative True \ null_score_diff_threshold $THRESH Out of memory issues All experiments in the paper were fine tuned on a Cloud TPU, which has 64GB of device RAM. Therefore, when using a GPU with 12GB 16GB of RAM, you are likely to encounter out of memory issues if you use the same hyperparameters described in the paper. The factors that affect memory usage are: max_seq_length : The released models were trained with sequence lengths up to 512, but you can fine tune with a shorter max sequence length to save substantial memory. This is controlled by the max_seq_length flag in our example code. train_batch_size : The memory usage is also directly proportional to the batch size. Model type, BERT Base vs. BERT Large : The BERT Large model requires significantly more memory than BERT Base . Optimizer : The default optimizer for BERT is Adam, which requires a lot of extra memory to store the m and v vectors. Switching to a more memory efficient optimizer can reduce memory usage, but can also affect the results. We have not experimented with other optimizers for fine tuning. Using the default training scripts ( run_classifier.py and run_squad.py ), we benchmarked the maximum batch size on single Titan X GPU (12GB RAM) with TensorFlow 1.11.0: System Seq Length Max Batch Size BERT Base 64 64 ... 128 32 ... 256 16 ... 320 14 ... 384 12 ... 512 6 BERT Large 64 12 ... 128 6 ... 256 2 ... 320 1 ... 384 0 ... 512 0 Unfortunately, these max batch sizes for BERT Large are so small that they will actually harm the model accuracy, regardless of the learning rate used. We are working on adding code to this repository which will allow much larger effective batch sizes to be used on the GPU. The code will be based on one (or both) of the following techniques: Gradient accumulation : The samples in a minibatch are typically independent with respect to gradient computation (excluding batch normalization, which is not used here). This means that the gradients of multiple smaller minibatches can be accumulated before performing the weight update, and this will be exactly equivalent to a single larger update. Gradient checkpointing : The major use of GPU/TPU memory during DNN training is caching the intermediate activations in the forward pass that are necessary for efficient computation in the backward pass. Gradient checkpointing trades memory for compute time by re computing the activations in an intelligent way. However, this is not implemented in the current release. Using BERT to extract fixed feature vectors (like ELMo) In certain cases, rather than fine tuning the entire pre trained model end to end, it can be beneficial to obtained pre trained contextual embeddings , which are fixed contextual representations of each input token generated from the hidden layers of the pre trained model. This should also mitigate most of the out of memory issues. As an example, we include the script extract_features.py which can be used like this: shell Sentence A and Sentence B are separated by the delimiter for sentence pair tasks like question answering and entailment. For single sentence inputs, put one sentence per line and DON'T use the delimiter. echo 'Who was Jim Henson ? Jim Henson was a puppeteer' > /tmp/input.txt python extract_features.py \ input_file /tmp/input.txt \ output_file /tmp/output.jsonl \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ layers 1, 2, 3, 4 \ max_seq_length 128 \ batch_size 8 This will create a JSON file (one line per line of input) containing the BERT activations from each Transformer layer specified by layers ( 1 is the final hidden layer of the Transformer, etc.) Note that this script will produce very large output files (by default, around 15kb for every input token). If you need to maintain alignment between the original and tokenized words (for projecting training labels), see the Tokenization ( tokenization) section below. Note: You may see a message like Could not find trained model in model_dir: /tmp/tmpuB5g5c, running initialization to predict. This message is expected, it just means that we are using the init_from_checkpoint() API rather than the saved model API. If you don't specify a checkpoint or specify an invalid checkpoint, this script will complain. Tokenization For sentence level tasks (or sentence pair) tasks, tokenization is very simple. Just follow the example code in run_classifier.py and extract_features.py . The basic procedure for sentence level tasks is: 1. Instantiate an instance of tokenizer tokenization.FullTokenizer 2. Tokenize the raw text with tokens tokenizer.tokenize(raw_text) . 3. Truncate to the maximum sequence length. (You can use up to 512, but you probably want to use shorter if possible for memory and speed reasons.) 4. Add the CLS and SEP tokens in the right place. Word level and span level tasks (e.g., SQuAD and NER) are more complex, since you need to maintain alignment between your input text and output text so that you can project your training labels. SQuAD is a particularly complex example because the input labels are character based, and SQuAD paragraphs are often longer than our maximum sequence length. See the code in run_squad.py to show how we handle this. Before we describe the general recipe for handling word level tasks, it's important to understand what exactly our tokenizer is doing. It has three main steps: 1. Text normalization : Convert all whitespace characters to spaces, and (for the Uncased model) lowercase the input and strip out accent markers. E.g., John Johanson's, → john johanson's, . 2. Punctuation splitting : Split all punctuation characters on both sides (i.e., add whitespace around all punctuation characters). Punctuation characters are defined as (a) Anything with a P Unicode class, (b) any non letter/number/space ASCII character (e.g., characters like $ which are technically not punctuation). E.g., john johanson's, → john johanson ' s , 3. WordPiece tokenization : Apply whitespace tokenization to the output of the above procedure, and apply WordPiece tokenization to each token separately. (Our implementation is directly based on the one from tensor2tensor , which is linked). E.g., john johanson ' s , → john johan son ' s , The advantage of this scheme is that it is compatible with most existing English tokenizers. For example, imagine that you have a part of speech tagging task which looks like this: Input: John Johanson 's house Labels: NNP NNP POS NN The tokenized output will look like this: Tokens: john johan son ' s house Crucially, this would be the same output as if the raw text were John Johanson's house (with no space before the 's ). If you have a pre tokenized representation with word level annotations, you can simply tokenize each input word independently, and deterministically maintain an original to tokenized alignment: python Input orig_tokens John , Johanson , 's , house labels NNP , NNP , POS , NN Output bert_tokens Token map will be an int > int mapping between the orig_tokens index and the bert_tokens index. orig_to_tok_map tokenizer tokenization.FullTokenizer( vocab_file vocab_file, do_lower_case True) bert_tokens.append( CLS ) for orig_token in orig_tokens: orig_to_tok_map.append(len(bert_tokens)) bert_tokens.extend(tokenizer.tokenize(orig_token)) bert_tokens.append( SEP ) bert_tokens CLS , john , johan , son , ' , s , house , SEP orig_to_tok_map 1, 2, 4, 6 Now orig_to_tok_map can be used to project labels to the tokenized representation. There are common English tokenization schemes which will cause a slight mismatch between how BERT was pre trained. For example, if your input tokenization splits off contractions like do n't , this will cause a mismatch. If it is possible to do so, you should pre process your data to convert these back to raw looking text, but if it's not possible, this mismatch is likely not a big deal. Pre training with BERT We are releasing code to do masked LM and next sentence prediction on an arbitrary text corpus. Note that this is not the exact code that was used for the paper (the original code was written in C++, and had some additional complexity), but this code does generate pre training data as described in the paper. Here's how to run the data generation. The input is a plain text file, with one sentence per line. (It is important that these be actual sentences for the next sentence prediction task). Documents are delimited by empty lines. The output is a set of tf.train.Example s serialized into TFRecord file format. You can perform sentence segmentation with an off the shelf NLP toolkit such as spaCy . The create_pretraining_data.py script will concatenate segments until they reach the maximum sequence length to minimize computational waste from padding (see the script for more details). However, you may want to intentionally add a slight amount of noise to your input data (e.g., randomly truncate 2% of input segments) to make it more robust to non sentential input during fine tuning. This script stores all of the examples for the entire input file in memory, so for large data files you should shard the input file and call the script multiple times. (You can pass in a file glob to run_pretraining.py , e.g., tf_examples.tf_record .) The max_predictions_per_seq is the maximum number of masked LM predictions per sequence. You should set this to around max_seq_length masked_lm_prob (the script doesn't do that automatically because the exact value needs to be passed to both scripts). shell python create_pretraining_data.py \ input_file ./sample_text.txt \ output_file /tmp/tf_examples.tfrecord \ vocab_file $BERT_BASE_DIR/vocab.txt \ do_lower_case True \ max_seq_length 128 \ max_predictions_per_seq 20 \ masked_lm_prob 0.15 \ random_seed 12345 \ dupe_factor 5 Here's how to run the pre training. Do not include init_checkpoint if you are pre training from scratch. The model configuration (including vocab size) is specified in bert_config_file . This demo code only pre trains for a small number of steps (20), but in practice you will probably want to set num_train_steps to 10000 steps or more. The max_seq_length and max_predictions_per_seq parameters passed to run_pretraining.py must be the same as create_pretraining_data.py . shell python run_pretraining.py \ input_file /tmp/tf_examples.tfrecord \ output_dir /tmp/pretraining_output \ do_train True \ do_eval True \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ train_batch_size 32 \ max_seq_length 128 \ max_predictions_per_seq 20 \ num_train_steps 20 \ num_warmup_steps 10 \ learning_rate 2e 5 This will produce an output like this: Eval results global_step 20 loss 0.0979674 masked_lm_accuracy 0.985479 masked_lm_loss 0.0979328 next_sentence_accuracy 1.0 next_sentence_loss 3.45724e 05 Note that since our sample_text.txt file is very small, this example training will overfit that data in only a few steps and produce unrealistically high accuracy numbers. Pre training tips and caveats If using your own vocabulary, make sure to change vocab_size in bert_config.json . If you use a larger vocabulary without changing this, you will likely get NaNs when training on GPU or TPU due to unchecked out of bounds access. If your task has a large domain specific corpus available (e.g., movie reviews or scientific papers ), it will likely be beneficial to run additional steps of pre training on your corpus, starting from the BERT checkpoint. The learning rate we used in the paper was 1e 4. However, if you are doing additional steps of pre training starting from an existing BERT checkpoint, you should use a smaller learning rate (e.g., 2e 5). Current BERT models are English only, but we do plan to release a multilingual model which has been pre trained on a lot of languages in the near future (hopefully by the end of November 2018). Longer sequences are disproportionately expensive because attention is quadratic to the sequence length. In other words, a batch of 64 sequences of length 512 is much more expensive than a batch of 256 sequences of length 128. The fully connected/convolutional cost is the same, but the attention cost is far greater for the 512 length sequences. Therefore, one good recipe is to pre train for, say, 90,000 steps with a sequence length of 128 and then for 10,000 additional steps with a sequence length of 512. The very long sequences are mostly needed to learn positional embeddings, which can be learned fairly quickly. Note that this does require generating the data twice with different values of max_seq_length . If you are pre training from scratch, be prepared that pre training is computationally expensive, especially on GPUs. If you are pre training from scratch, our recommended recipe is to pre train a BERT Base on a single preemptible Cloud TPU v2 , which takes about 2 weeks at a cost of about $500 USD (based on the pricing in October 2018). You will have to scale down the batch size when only training on a single Cloud TPU, compared to what was used in the paper. It is recommended to use the largest batch size that fits into TPU memory. Pre training data We will not be able to release the pre processed datasets used in the paper. For Wikipedia, the recommended pre processing is to download the latest dump , extract the text with WikiExtractor.py , and then apply any necessary cleanup to convert it into plain text. Unfortunately the researchers who collected the BookCorpus no longer have it available for public download. The Project Guttenberg Dataset is a somewhat smaller (200M word) collection of older books that are public domain. Common Crawl is another very large collection of text, but you will likely have to do substantial pre processing and cleanup to extract a usable corpus for pre training BERT. Learning a new WordPiece vocabulary This repository does not include code for learning a new WordPiece vocabulary. The reason is that the code used in the paper was implemented in C++ with dependencies on Google's internal libraries. For English, it is almost always better to just start with our vocabulary and pre trained models. For learning vocabularies of other languages, there are a number of open source options available. However, keep in mind that these are not compatible with our tokenization.py library: Google's SentencePiece library tensor2tensor's WordPiece generation script Rico Sennrich's Byte Pair Encoding library Using BERT in Colab If you want to use BERT with Colab , you can get started with the notebook BERT FineTuning with Cloud TPUs . At the time of this writing (October 31st, 2018), Colab users can access a Cloud TPU completely for free. Note: One per user, availability limited, requires a Google Cloud Platform account with storage (although storage may be purchased with free credit for signing up with GCP), and this capability may not longer be available in the future. Click on the BERT Colab that was just linked for more information. FAQ Is this code compatible with Cloud TPUs? What about GPUs? Yes, all of the code in this repository works out of the box with CPU, GPU, and Cloud TPU. However, GPU training is single GPU only. I am getting out of memory errors, what is wrong? See the section on out of memory issues ( out of memory issues) for more information. Is there a PyTorch version available? There is no official PyTorch implementation. However, NLP researchers from HuggingFace made a PyTorch version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. We were not involved in the creation or maintenance of the PyTorch implementation so please direct any questions towards the authors of that repository. Is there a Chainer version available? There is no official Chainer implementation. However, Sosuke Kobayashi made a Chainer version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. We were not involved in the creation or maintenance of the Chainer implementation so please direct any questions towards the authors of that repository. Will models in other languages be released? Yes, we plan to release a multi lingual BERT model in the near future. We cannot make promises about exactly which languages will be included, but it will likely be a single model which includes most of the languages which have a significantly sized Wikipedia. Will models larger than BERT Large be released? So far we have not attempted to train anything larger than BERT Large . It is possible that we will release larger models if we are able to obtain significant improvements. What license is this library released under? All code and models are released under the Apache 2.0 license. See the LICENSE file for more information. How do I cite BERT? For now, cite the Arxiv paper : @article{devlin2018bert, title {BERT: Pre training of Deep Bidirectional Transformers for Language Understanding}, author {Devlin, Jacob and Chang, Ming Wei and Lee, Kenton and Toutanova, Kristina}, journal {arXiv preprint arXiv:1810.04805}, year {2018} } If we submit the paper to a conference or journal, we will update the BibTeX. Disclaimer This is not an official Google product. Contact information For help or issues using BERT, please submit a GitHub issue. For personal communication related to BERT, please contact Jacob Devlin ( jacobdevlin@google.com ), Ming Wei Chang ( mingweichang@google.com ), or Kenton Lee ( kentonl@google.com ).",Machine Translation,Machine Translation 2503,Natural Language Processing,Natural Language Processing,Natural Language Processing,"BERT \ \ \ \ \ New November 23rd, 2018: Un normalized multilingual model + Thai + Mongolian \ \ \ \ \ We uploaded a new multilingual model which does not perform any normalization on the input (no lower casing, accent stripping, or Unicode normalization), and additionally inclues Thai and Mongolian. It is recommended to use this version for developing multilingual models, especially on languages with non Latin alphabets. This does not require any code changes, and can be downloaded here: BERT Base, Multilingual Cased : 104 languages, 12 layer, 768 hidden, 12 heads, 110M parameters \ \ \ \ \ New November 15th, 2018: SOTA SQuAD 2.0 System \ \ \ \ \ We released code changes to reproduce our 83% F1 SQuAD 2.0 system, which is currently 1st place on the leaderboard by 3%. See the SQuAD 2.0 section of the README for details. \ \ \ \ \ New November 5th, 2018: Third party PyTorch and Chainer versions of BERT available \ \ \ \ \ NLP researchers from HuggingFace made a PyTorch version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. Sosuke Kobayashi also made a Chainer version of BERT available (Thanks!) We were not involved in the creation or maintenance of the PyTorch implementation so please direct any questions towards the authors of that repository. \ \ \ \ \ New November 3rd, 2018: Multilingual and Chinese models available \ \ \ \ \ We have made two new BERT models available: BERT Base, Multilingual (Not recommended, use Multilingual Cased instead) : 102 languages, 12 layer, 768 hidden, 12 heads, 110M parameters BERT Base, Chinese : Chinese Simplified and Traditional, 12 layer, 768 hidden, 12 heads, 110M parameters We use character based tokenization for Chinese, and WordPiece tokenization for all other languages. Both models should work out of the box without any code changes. We did update the implementation of BasicTokenizer in tokenization.py to support Chinese character tokenization, so please update if you forked it. However, we did not change the tokenization API. For more, see the Multilingual README . \ \ \ \ \ End new information \ \ \ \ \ Introduction BERT , or B idirectional E ncoder R epresentations from T ransformers, is a new method of pre training language representations which obtains state of the art results on a wide array of Natural Language Processing (NLP) tasks. Our academic paper which describes BERT in detail and provides full results on a number of tasks can be found here: To give a few numbers, here are the results on the SQuAD v1.1 question answering task: SQuAD v1.1 Leaderboard (Oct 8th 2018) Test EM Test F1 : : : : 1st Place Ensemble BERT 87.4 93.2 2nd Place Ensemble nlnet 86.0 91.7 1st Place Single Model BERT 85.1 91.8 2nd Place Single Model nlnet 83.5 90.1 And several natural language inference tasks: System MultiNLI Question NLI SWAG : : : : : : BERT 86.7 91.1 86.3 OpenAI GPT (Prev. SOTA) 82.2 88.1 75.0 Plus many other tasks. Moreover, these results were all obtained with almost no task specific neural network architecture design. If you already know what BERT is and you just want to get started, you can download the pre trained models ( pre trained models) and run a state of the art fine tuning ( fine tuning with bert) in only a few minutes. What is BERT? BERT is a method of pre training language representations, meaning that we train a general purpose language understanding model on a large text corpus (like Wikipedia), and then use that model for downstream NLP tasks that we care about (like question answering). BERT outperforms previous methods because it is the first unsupervised , deeply bidirectional system for pre training NLP. Unsupervised means that BERT was trained using only a plain text corpus, which is important because an enormous amount of plain text data is publicly available on the web in many languages. Pre trained representations can also either be context free or contextual , and contextual representations can further be unidirectional or bidirectional . Context free models such as word2vec or GloVe generate a single word embedding representation for each word in the vocabulary, so bank would have the same representation in bank deposit and river bank . Contextual models instead generate a representation of each word that is based on the other words in the sentence. BERT was built upon recent work in pre training contextual representations — including Semi supervised Sequence Learning , Generative Pre Training , ELMo , and ULMFit — but crucially these models are all unidirectional or shallowly bidirectional . This means that each word is only contextualized using the words to its left (or right). For example, in the sentence I made a bank deposit the unidirectional representation of bank is only based on I made a but not deposit . Some previous work does combine the representations from separate left context and right context models, but only in a shallow manner. BERT represents bank using both its left and right context — I made a ... deposit — starting from the very bottom of a deep neural network, so it is deeply bidirectional . BERT uses a simple approach for this: We mask out 15% of the words in the input, run the entire sequence through a deep bidirectional Transformer encoder, and then predict only the masked words. For example: Input: the man went to the MASK1 . he bought a MASK2 of milk. Labels: MASK1 store; MASK2 gallon In order to learn relationships between sentences, we also train on a simple task which can be generated from any monolingual corpus: Given two sentences A and B , is B the actual next sentence that comes after A , or just a random sentence from the corpus? Sentence A: the man went to the store . Sentence B: he bought a gallon of milk . Label: IsNextSentence Sentence A: the man went to the store . Sentence B: penguins are flightless . Label: NotNextSentence We then train a large model (12 layer to 24 layer Transformer) on a large corpus (Wikipedia + BookCorpus ) for a long time (1M update steps), and that's BERT. Using BERT has two stages: Pre training and fine tuning . Pre training is fairly expensive (four days on 4 to 16 Cloud TPUs), but is a one time procedure for each language (current models are English only, but multilingual models will be released in the near future). We are releasing a number of pre trained models from the paper which were pre trained at Google. Most NLP researchers will never need to pre train their own model from scratch. Fine tuning is inexpensive. All of the results in the paper can be replicated in at most 1 hour on a single Cloud TPU, or a few hours on a GPU, starting from the exact same pre trained model. SQuAD, for example, can be trained in around 30 minutes on a single Cloud TPU to achieve a Dev F1 score of 91.0%, which is the single system state of the art. The other important aspect of BERT is that it can be adapted to many types of NLP tasks very easily. In the paper, we demonstrate state of the art results on sentence level (e.g., SST 2), sentence pair level (e.g., MultiNLI), word level (e.g., NER), and span level (e.g., SQuAD) tasks with almost no task specific modifications. What has been released in this repository? We are releasing the following: TensorFlow code for the BERT model architecture (which is mostly a standard Transformer architecture). Pre trained checkpoints for both the lowercase and cased version of BERT Base and BERT Large from the paper. TensorFlow code for push button replication of the most important fine tuning experiments from the paper, including SQuAD, MultiNLI, and MRPC. All of the code in this repository works out of the box with CPU, GPU, and Cloud TPU. Pre trained models We are releasing the BERT Base and BERT Large models from the paper. Uncased means that the text has been lowercased before WordPiece tokenization, e.g., John Smith becomes john smith . The Uncased model also strips out any accent markers. Cased means that the true case and accent markers are preserved. Typically, the Uncased model is better unless you know that case information is important for your task (e.g., Named Entity Recognition or Part of Speech tagging). These models are all released under the same license as the source code (Apache 2.0). For information about the Multilingual and Chinese model, see the Multilingual README . When using a cased model, make sure to pass do_lower False to the training scripts. (Or pass do_lower_case False directly to FullTokenizer if you're using your own script.) The links to the models are here (right click, 'Save link as...' on the name): BERT Base, Uncased : 12 layer, 768 hidden, 12 heads, 110M parameters BERT Large, Uncased : 24 layer, 1024 hidden, 16 heads, 340M parameters BERT Base, Cased : 12 layer, 768 hidden, 12 heads , 110M parameters BERT Large, Cased : 24 layer, 1024 hidden, 16 heads, 340M parameters BERT Base, Multilingual Cased (New, recommended) : 104 languages, 12 layer, 768 hidden, 12 heads, 110M parameters BERT Base, Multilingual Uncased (Orig, not recommended) (Not recommended, use Multilingual Cased instead) : 102 languages, 12 layer, 768 hidden, 12 heads, 110M parameters BERT Base, Chinese : Chinese Simplified and Traditional, 12 layer, 768 hidden, 12 heads, 110M parameters Each .zip file contains three items: A TensorFlow checkpoint ( bert_model.ckpt ) containing the pre trained weights (which is actually 3 files). A vocab file ( vocab.txt ) to map WordPiece to word id. A config file ( bert_config.json ) which specifies the hyperparameters of the model. Fine tuning with BERT Important : All results on the paper were fine tuned on a single Cloud TPU, which has 64GB of RAM. It is currently not possible to re produce most of the BERT Large results on the paper using a GPU with 12GB 16GB of RAM, because the maximum batch size that can fit in memory is too small. We are working on adding code to this repository which allows for much larger effective batch size on the GPU. See the section on out of memory issues ( out of memory issues) for more details. This code was tested with TensorFlow 1.11.0. It was tested with Python2 and Python3 (but more thoroughly with Python2, since this is what's used internally in Google). The fine tuning examples which use BERT Base should be able to run on a GPU that has at least 12GB of RAM using the hyperparameters given. Fine tuning with Cloud TPUs Most of the examples below assumes that you will be running training/evaluation on your local machine, using a GPU like a Titan X or GTX 1080. However, if you have access to a Cloud TPU that you want to train on, just add the following flags to run_classifier.py or run_squad.py : use_tpu True \ tpu_name $TPU_NAME Please see the Google Cloud TPU tutorial for how to use Cloud TPUs. Alternatively, you can use the Google Colab notebook BERT FineTuning with Cloud TPUs . On Cloud TPUs, the pretrained model and the output directory will need to be on Google Cloud Storage. For example, if you have a bucket named some_bucket , you might use the following flags instead: output_dir gs://some_bucket/my_output_dir/ The unzipped pre trained model files can also be found in the Google Cloud Storage folder gs://bert_models/2018_10_18 . For example: export BERT_BASE_DIR gs://bert_models/2018_10_18/uncased_L 12_H 768_A 12 Sentence (and sentence pair) classification tasks Before running this example you must download the GLUE data by running this script and unpack it to some directory $GLUE_DIR . Next, download the BERT Base checkpoint and unzip it to some directory $BERT_BASE_DIR . This example code fine tunes BERT Base on the Microsoft Research Paraphrase Corpus (MRPC) corpus, which only contains 3,600 examples and can fine tune in a few minutes on most GPUs. shell export BERT_BASE_DIR /path/to/bert/uncased_L 12_H 768_A 12 export GLUE_DIR /path/to/glue python run_classifier.py \ task_name MRPC \ do_train true \ do_eval true \ data_dir $GLUE_DIR/MRPC \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ max_seq_length 128 \ train_batch_size 32 \ learning_rate 2e 5 \ num_train_epochs 3.0 \ output_dir /tmp/mrpc_output/ You should see output like this: Eval results eval_accuracy 0.845588 eval_loss 0.505248 global_step 343 loss 0.505248 This means that the Dev set accuracy was 84.55%. Small sets like MRPC have a high variance in the Dev set accuracy, even when starting from the same pre training checkpoint. If you re run multiple times (making sure to point to different output_dir ), you should see results between 84% and 88%. A few other pre trained models are implemented off the shelf in run_classifier.py , so it should be straightforward to follow those examples to use BERT for any single sentence or sentence pair classification task. Note: You might see a message Running train on CPU . This really just means that it's running on something other than a Cloud TPU, which includes a GPU. Prediction from classifier Once you have trained your classifier you can use it in inference mode by using the do_predict true command. You need to have a file named test.tsv in the input folder. Output will be created in file called test_results.tsv in the output folder. Each line will contain output for each sample, columns are the class probabilities. shell export BERT_BASE_DIR /path/to/bert/uncased_L 12_H 768_A 12 export GLUE_DIR /path/to/glue export TRAINED_CLASSIFIER /path/to/fine/tuned/classifier python run_classifier.py \ task_name MRPC \ do_predict true \ data_dir $GLUE_DIR/MRPC \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $TRAINED_CLASSIFIER \ max_seq_length 128 \ output_dir /tmp/mrpc_output/ SQuAD 1.1 The Stanford Question Answering Dataset (SQuAD) is a popular question answering benchmark dataset. BERT (at the time of the release) obtains state of the art results on SQuAD with almost no task specific network architecture modifications or data augmentation. However, it does require semi complex data pre processing and post processing to deal with (a) the variable length nature of SQuAD context paragraphs, and (b) the character level answer annotations which are used for SQuAD training. This processing is implemented and documented in run_squad.py . To run on SQuAD, you will first need to download the dataset. The SQuAD website does not seem to link to the v1.1 datasets any longer, but the necessary files can be found here: train v1.1.json dev v1.1.json evaluate v1.1.py Download these to some directory $SQUAD_DIR . The state of the art SQuAD results from the paper currently cannot be reproduced on a 12GB 16GB GPU due to memory constraints (in fact, even batch size 1 does not seem to fit on a 12GB GPU using BERT Large ). However, a reasonably strong BERT Base model can be trained on the GPU with these hyperparameters: shell python run_squad.py \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ do_train True \ train_file $SQUAD_DIR/train v1.1.json \ do_predict True \ predict_file $SQUAD_DIR/dev v1.1.json \ train_batch_size 12 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir /tmp/squad_base/ The dev set predictions will be saved into a file called predictions.json in the output_dir : shell python $SQUAD_DIR/evaluate v1.1.py $SQUAD_DIR/dev v1.1.json ./squad/predictions.json Which should produce an output like this: shell { f1 : 88.41249612335034, exact_match : 81.2488174077578} You should see a result similar to the 88.5% reported in the paper for BERT Base . If you have access to a Cloud TPU, you can train with BERT Large . Here is a set of hyperparameters (slightly different than the paper) which consistently obtain around 90.5% 91.0% F1 single system trained only on SQuAD: shell python run_squad.py \ vocab_file $BERT_LARGE_DIR/vocab.txt \ bert_config_file $BERT_LARGE_DIR/bert_config.json \ init_checkpoint $BERT_LARGE_DIR/bert_model.ckpt \ do_train True \ train_file $SQUAD_DIR/train v1.1.json \ do_predict True \ predict_file $SQUAD_DIR/dev v1.1.json \ train_batch_size 24 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir gs://some_bucket/squad_large/ \ use_tpu True \ tpu_name $TPU_NAME For example, one random run with these parameters produces the following Dev scores: shell { f1 : 90.87081895814865, exact_match : 84.38978240302744} If you fine tune for one epoch on TriviaQA before this the results will be even better, but you will need to convert TriviaQA into the SQuAD json format. SQuAD 2.0 This model is also implemented and documented in run_squad.py . To run on SQuAD 2.0, you will first need to download the dataset. The necessary files can be found here: train v2.0.json dev v2.0.json evaluate v2.0.py Download these to some directory $SQUAD_DIR . On Cloud TPU you can run with BERT Large as follows: shell python run_squad.py \ vocab_file $BERT_LARGE_DIR/vocab.txt \ bert_config_file $BERT_LARGE_DIR/bert_config.json \ init_checkpoint $BERT_LARGE_DIR/bert_model.ckpt \ do_train True \ train_file $SQUAD_DIR/train v2.0.json \ do_predict True \ predict_file $SQUAD_DIR/dev v2.0.json \ train_batch_size 24 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir gs://some_bucket/squad_large/ \ use_tpu True \ tpu_name $TPU_NAME \ version_2_with_negative True We assume you have copied everything from the output directory to a local directory called ./squad/. The initial dev set predictions will be at ./squad/predictions.json and the differences between the score of no answer ( ) and the best non null answer for each question will be in the file ./squad/null_odds.json Run this script to tune a threshold for predicting null versus non null answers: python $SQUAD_DIR/evaluate v2.0.py $SQUAD_DIR/dev v2.0.json ./squad/predictions.json na prob file ./squad/null_odds.json Assume the script outputs best_f1_thresh THRESH. (Typical values are between 1.0 and 5.0). You can now re run the model to generate predictions with the derived threshold or alternatively you can extract the appropriate answers from ./squad/nbest_predictions.json. shell python run_squad.py \ vocab_file $BERT_LARGE_DIR/vocab.txt \ bert_config_file $BERT_LARGE_DIR/bert_config.json \ init_checkpoint $BERT_LARGE_DIR/bert_model.ckpt \ do_train False \ train_file $SQUAD_DIR/train v2.0.json \ do_predict True \ predict_file $SQUAD_DIR/dev v2.0.json \ train_batch_size 24 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir gs://some_bucket/squad_large/ \ use_tpu True \ tpu_name $TPU_NAME \ version_2_with_negative True \ null_score_diff_threshold $THRESH Out of memory issues All experiments in the paper were fine tuned on a Cloud TPU, which has 64GB of device RAM. Therefore, when using a GPU with 12GB 16GB of RAM, you are likely to encounter out of memory issues if you use the same hyperparameters described in the paper. The factors that affect memory usage are: max_seq_length : The released models were trained with sequence lengths up to 512, but you can fine tune with a shorter max sequence length to save substantial memory. This is controlled by the max_seq_length flag in our example code. train_batch_size : The memory usage is also directly proportional to the batch size. Model type, BERT Base vs. BERT Large : The BERT Large model requires significantly more memory than BERT Base . Optimizer : The default optimizer for BERT is Adam, which requires a lot of extra memory to store the m and v vectors. Switching to a more memory efficient optimizer can reduce memory usage, but can also affect the results. We have not experimented with other optimizers for fine tuning. Using the default training scripts ( run_classifier.py and run_squad.py ), we benchmarked the maximum batch size on single Titan X GPU (12GB RAM) with TensorFlow 1.11.0: System Seq Length Max Batch Size BERT Base 64 64 ... 128 32 ... 256 16 ... 320 14 ... 384 12 ... 512 6 BERT Large 64 12 ... 128 6 ... 256 2 ... 320 1 ... 384 0 ... 512 0 Unfortunately, these max batch sizes for BERT Large are so small that they will actually harm the model accuracy, regardless of the learning rate used. We are working on adding code to this repository which will allow much larger effective batch sizes to be used on the GPU. The code will be based on one (or both) of the following techniques: Gradient accumulation : The samples in a minibatch are typically independent with respect to gradient computation (excluding batch normalization, which is not used here). This means that the gradients of multiple smaller minibatches can be accumulated before performing the weight update, and this will be exactly equivalent to a single larger update. Gradient checkpointing : The major use of GPU/TPU memory during DNN training is caching the intermediate activations in the forward pass that are necessary for efficient computation in the backward pass. Gradient checkpointing trades memory for compute time by re computing the activations in an intelligent way. However, this is not implemented in the current release. Using BERT to extract fixed feature vectors (like ELMo) In certain cases, rather than fine tuning the entire pre trained model end to end, it can be beneficial to obtained pre trained contextual embeddings , which are fixed contextual representations of each input token generated from the hidden layers of the pre trained model. This should also mitigate most of the out of memory issues. As an example, we include the script extract_features.py which can be used like this: shell Sentence A and Sentence B are separated by the delimiter for sentence pair tasks like question answering and entailment. For single sentence inputs, put one sentence per line and DON'T use the delimiter. echo 'Who was Jim Henson ? Jim Henson was a puppeteer' > /tmp/input.txt python extract_features.py \ input_file /tmp/input.txt \ output_file /tmp/output.jsonl \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ layers 1, 2, 3, 4 \ max_seq_length 128 \ batch_size 8 This will create a JSON file (one line per line of input) containing the BERT activations from each Transformer layer specified by layers ( 1 is the final hidden layer of the Transformer, etc.) Note that this script will produce very large output files (by default, around 15kb for every input token). If you need to maintain alignment between the original and tokenized words (for projecting training labels), see the Tokenization ( tokenization) section below. Note: You may see a message like Could not find trained model in model_dir: /tmp/tmpuB5g5c, running initialization to predict. This message is expected, it just means that we are using the init_from_checkpoint() API rather than the saved model API. If you don't specify a checkpoint or specify an invalid checkpoint, this script will complain. Tokenization For sentence level tasks (or sentence pair) tasks, tokenization is very simple. Just follow the example code in run_classifier.py and extract_features.py . The basic procedure for sentence level tasks is: 1. Instantiate an instance of tokenizer tokenization.FullTokenizer 2. Tokenize the raw text with tokens tokenizer.tokenize(raw_text) . 3. Truncate to the maximum sequence length. (You can use up to 512, but you probably want to use shorter if possible for memory and speed reasons.) 4. Add the CLS and SEP tokens in the right place. Word level and span level tasks (e.g., SQuAD and NER) are more complex, since you need to maintain alignment between your input text and output text so that you can project your training labels. SQuAD is a particularly complex example because the input labels are character based, and SQuAD paragraphs are often longer than our maximum sequence length. See the code in run_squad.py to show how we handle this. Before we describe the general recipe for handling word level tasks, it's important to understand what exactly our tokenizer is doing. It has three main steps: 1. Text normalization : Convert all whitespace characters to spaces, and (for the Uncased model) lowercase the input and strip out accent markers. E.g., John Johanson's, → john johanson's, . 2. Punctuation splitting : Split all punctuation characters on both sides (i.e., add whitespace around all punctuation characters). Punctuation characters are defined as (a) Anything with a P Unicode class, (b) any non letter/number/space ASCII character (e.g., characters like $ which are technically not punctuation). E.g., john johanson's, → john johanson ' s , 3. WordPiece tokenization : Apply whitespace tokenization to the output of the above procedure, and apply WordPiece tokenization to each token separately. (Our implementation is directly based on the one from tensor2tensor , which is linked). E.g., john johanson ' s , → john johan son ' s , The advantage of this scheme is that it is compatible with most existing English tokenizers. For example, imagine that you have a part of speech tagging task which looks like this: Input: John Johanson 's house Labels: NNP NNP POS NN The tokenized output will look like this: Tokens: john johan son ' s house Crucially, this would be the same output as if the raw text were John Johanson's house (with no space before the 's ). If you have a pre tokenized representation with word level annotations, you can simply tokenize each input word independently, and deterministically maintain an original to tokenized alignment: python Input orig_tokens John , Johanson , 's , house labels NNP , NNP , POS , NN Output bert_tokens Token map will be an int > int mapping between the orig_tokens index and the bert_tokens index. orig_to_tok_map tokenizer tokenization.FullTokenizer( vocab_file vocab_file, do_lower_case True) bert_tokens.append( CLS ) for orig_token in orig_tokens: orig_to_tok_map.append(len(bert_tokens)) bert_tokens.extend(tokenizer.tokenize(orig_token)) bert_tokens.append( SEP ) bert_tokens CLS , john , johan , son , ' , s , house , SEP orig_to_tok_map 1, 2, 4, 6 Now orig_to_tok_map can be used to project labels to the tokenized representation. There are common English tokenization schemes which will cause a slight mismatch between how BERT was pre trained. For example, if your input tokenization splits off contractions like do n't , this will cause a mismatch. If it is possible to do so, you should pre process your data to convert these back to raw looking text, but if it's not possible, this mismatch is likely not a big deal. Pre training with BERT We are releasing code to do masked LM and next sentence prediction on an arbitrary text corpus. Note that this is not the exact code that was used for the paper (the original code was written in C++, and had some additional complexity), but this code does generate pre training data as described in the paper. Here's how to run the data generation. The input is a plain text file, with one sentence per line. (It is important that these be actual sentences for the next sentence prediction task). Documents are delimited by empty lines. The output is a set of tf.train.Example s serialized into TFRecord file format. You can perform sentence segmentation with an off the shelf NLP toolkit such as spaCy . The create_pretraining_data.py script will concatenate segments until they reach the maximum sequence length to minimize computational waste from padding (see the script for more details). However, you may want to intentionally add a slight amount of noise to your input data (e.g., randomly truncate 2% of input segments) to make it more robust to non sentential input during fine tuning. This script stores all of the examples for the entire input file in memory, so for large data files you should shard the input file and call the script multiple times. (You can pass in a file glob to run_pretraining.py , e.g., tf_examples.tf_record .) The max_predictions_per_seq is the maximum number of masked LM predictions per sequence. You should set this to around max_seq_length masked_lm_prob (the script doesn't do that automatically because the exact value needs to be passed to both scripts). shell python create_pretraining_data.py \ input_file ./sample_text.txt \ output_file /tmp/tf_examples.tfrecord \ vocab_file $BERT_BASE_DIR/vocab.txt \ do_lower_case True \ max_seq_length 128 \ max_predictions_per_seq 20 \ masked_lm_prob 0.15 \ random_seed 12345 \ dupe_factor 5 Here's how to run the pre training. Do not include init_checkpoint if you are pre training from scratch. The model configuration (including vocab size) is specified in bert_config_file . This demo code only pre trains for a small number of steps (20), but in practice you will probably want to set num_train_steps to 10000 steps or more. The max_seq_length and max_predictions_per_seq parameters passed to run_pretraining.py must be the same as create_pretraining_data.py . shell python run_pretraining.py \ input_file /tmp/tf_examples.tfrecord \ output_dir /tmp/pretraining_output \ do_train True \ do_eval True \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ train_batch_size 32 \ max_seq_length 128 \ max_predictions_per_seq 20 \ num_train_steps 20 \ num_warmup_steps 10 \ learning_rate 2e 5 This will produce an output like this: Eval results global_step 20 loss 0.0979674 masked_lm_accuracy 0.985479 masked_lm_loss 0.0979328 next_sentence_accuracy 1.0 next_sentence_loss 3.45724e 05 Note that since our sample_text.txt file is very small, this example training will overfit that data in only a few steps and produce unrealistically high accuracy numbers. Pre training tips and caveats If using your own vocabulary, make sure to change vocab_size in bert_config.json . If you use a larger vocabulary without changing this, you will likely get NaNs when training on GPU or TPU due to unchecked out of bounds access. If your task has a large domain specific corpus available (e.g., movie reviews or scientific papers ), it will likely be beneficial to run additional steps of pre training on your corpus, starting from the BERT checkpoint. The learning rate we used in the paper was 1e 4. However, if you are doing additional steps of pre training starting from an existing BERT checkpoint, you should use a smaller learning rate (e.g., 2e 5). Current BERT models are English only, but we do plan to release a multilingual model which has been pre trained on a lot of languages in the near future (hopefully by the end of November 2018). Longer sequences are disproportionately expensive because attention is quadratic to the sequence length. In other words, a batch of 64 sequences of length 512 is much more expensive than a batch of 256 sequences of length 128. The fully connected/convolutional cost is the same, but the attention cost is far greater for the 512 length sequences. Therefore, one good recipe is to pre train for, say, 90,000 steps with a sequence length of 128 and then for 10,000 additional steps with a sequence length of 512. The very long sequences are mostly needed to learn positional embeddings, which can be learned fairly quickly. Note that this does require generating the data twice with different values of max_seq_length . If you are pre training from scratch, be prepared that pre training is computationally expensive, especially on GPUs. If you are pre training from scratch, our recommended recipe is to pre train a BERT Base on a single preemptible Cloud TPU v2 , which takes about 2 weeks at a cost of about $500 USD (based on the pricing in October 2018). You will have to scale down the batch size when only training on a single Cloud TPU, compared to what was used in the paper. It is recommended to use the largest batch size that fits into TPU memory. Pre training data We will not be able to release the pre processed datasets used in the paper. For Wikipedia, the recommended pre processing is to download the latest dump , extract the text with WikiExtractor.py , and then apply any necessary cleanup to convert it into plain text. Unfortunately the researchers who collected the BookCorpus no longer have it available for public download. The Project Guttenberg Dataset is a somewhat smaller (200M word) collection of older books that are public domain. Common Crawl is another very large collection of text, but you will likely have to do substantial pre processing and cleanup to extract a usable corpus for pre training BERT. Learning a new WordPiece vocabulary This repository does not include code for learning a new WordPiece vocabulary. The reason is that the code used in the paper was implemented in C++ with dependencies on Google's internal libraries. For English, it is almost always better to just start with our vocabulary and pre trained models. For learning vocabularies of other languages, there are a number of open source options available. However, keep in mind that these are not compatible with our tokenization.py library: Google's SentencePiece library tensor2tensor's WordPiece generation script Rico Sennrich's Byte Pair Encoding library Using BERT in Colab If you want to use BERT with Colab , you can get started with the notebook BERT FineTuning with Cloud TPUs . At the time of this writing (October 31st, 2018), Colab users can access a Cloud TPU completely for free. Note: One per user, availability limited, requires a Google Cloud Platform account with storage (although storage may be purchased with free credit for signing up with GCP), and this capability may not longer be available in the future. Click on the BERT Colab that was just linked for more information. FAQ Is this code compatible with Cloud TPUs? What about GPUs? Yes, all of the code in this repository works out of the box with CPU, GPU, and Cloud TPU. However, GPU training is single GPU only. I am getting out of memory errors, what is wrong? See the section on out of memory issues ( out of memory issues) for more information. Is there a PyTorch version available? There is no official PyTorch implementation. However, NLP researchers from HuggingFace made a PyTorch version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. We were not involved in the creation or maintenance of the PyTorch implementation so please direct any questions towards the authors of that repository. Is there a Chainer version available? There is no official Chainer implementation. However, Sosuke Kobayashi made a Chainer version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. We were not involved in the creation or maintenance of the Chainer implementation so please direct any questions towards the authors of that repository. Will models in other languages be released? Yes, we plan to release a multi lingual BERT model in the near future. We cannot make promises about exactly which languages will be included, but it will likely be a single model which includes most of the languages which have a significantly sized Wikipedia. Will models larger than BERT Large be released? So far we have not attempted to train anything larger than BERT Large . It is possible that we will release larger models if we are able to obtain significant improvements. What license is this library released under? All code and models are released under the Apache 2.0 license. See the LICENSE file for more information. How do I cite BERT? For now, cite the Arxiv paper : @article{devlin2018bert, title {BERT: Pre training of Deep Bidirectional Transformers for Language Understanding}, author {Devlin, Jacob and Chang, Ming Wei and Lee, Kenton and Toutanova, Kristina}, journal {arXiv preprint arXiv:1810.04805}, year {2018} } If we submit the paper to a conference or journal, we will update the BibTeX. Disclaimer This is not an official Google product. Contact information For help or issues using BERT, please submit a GitHub issue. For personal communication related to BERT, please contact Jacob Devlin ( jacobdevlin@google.com ), Ming Wei Chang ( mingweichang@google.com ), or Kenton Lee ( kentonl@google.com ).",Machine Translation,Machine Translation 2508,Natural Language Processing,Natural Language Processing,Natural Language Processing,"About SRU is a recurrent unit that can run over 10 times faster than cuDNN LSTM, without loss of accuracy tested on many tasks. Average processing time of LSTM, conv2d and SRU, tested on GTX 1070 For example, the figure above presents the processing time of a single mini batch of 32 samples. SRU achieves 10 to 16 times speed up compared to LSTM, and operates as fast as (or faster than) word level convolution using conv2d. The paper has multiple versions, please check the latest one. Reference: Simple Recurrent Units for Highly Parallelizable Recurrence @inproceedings{lei2018sru, title {Simple Recurrent Units for Highly Parallelizable Recurrence}, author {Tao Lei and Yu Zhang and Sida I. Wang and Hui Dai and Yoav Artzi}, booktitle {Empirical Methods in Natural Language Processing (EMNLP)}, year {2018} } Requirements PyTorch > 0.4.1 recommended, pytorch installation details (docs/pytorch_installation.md) CuPy pynvrtc ninja (optional) for fast inference on CPU. Install requirements via pip install r requirements.txt . CuPy and pynvrtc needed to support training / testing on GPU. Installation From source: SRU can be installed as a regular package via python setup.py install or pip install . . From PyPi: pip install sru pip install sru cuda additionally installs Cupy and pynvrtc. pip install sru cpu additionally installs ninja Directly use the source without installation: Make sure this repo and CUDA library can be found by the system, e.g. export PYTHONPATH path_to_repo/sru export LD_LIBRARY_PATH /usr/local/cuda/lib64 Examples The usage of SRU is similar to nn.LSTM . SRU likely requires more stacking layers than LSTM. We recommend starting by 2 layers and use more if necessary (see our report for more experimental details). python import torch from torch.autograd import Variable from sru import SRU, SRUCell input has length 20, batch size 32 and dimension 128 x Variable(torch.FloatTensor(20, 32, 128).cuda()) input_size, hidden_size 128, 128 rnn SRU(input_size, hidden_size, num_layers 2, number of stacking RNN layers dropout 0.0, dropout applied between RNN layers bidirectional False, bidirectional RNN layer_norm False, apply layer normalization on the output of each layer highway_bias 0, initial bias of highway gate ( Contributors Other Implementations @musyoku had a very nice SRU implementaion in chainer. @adrianbg implemented the first CPU version .",Machine Translation,Machine Translation 2519,Natural Language Processing,Natural Language Processing,Natural Language Processing,"BERT \ \ \ \ \ New November 23rd, 2018: Un normalized multilingual model + Thai + Mongolian \ \ \ \ \ We uploaded a new multilingual model which does not perform any normalization on the input (no lower casing, accent stripping, or Unicode normalization), and additionally includes Thai and Mongolian. It is recommended to use this version for developing multilingual models, especially on languages with non Latin alphabets. This does not require any code changes, and can be downloaded here: BERT Base, Multilingual Cased : 104 languages, 12 layer, 768 hidden, 12 heads, 110M parameters \ \ \ \ \ New November 15th, 2018: SOTA SQuAD 2.0 System \ \ \ \ \ We released code changes to reproduce our 83% F1 SQuAD 2.0 system, which is currently 1st place on the leaderboard by 3%. See the SQuAD 2.0 section of the README for details. \ \ \ \ \ New November 5th, 2018: Third party PyTorch and Chainer versions of BERT available \ \ \ \ \ NLP researchers from HuggingFace made a PyTorch version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. Sosuke Kobayashi also made a Chainer version of BERT available (Thanks!) We were not involved in the creation or maintenance of the PyTorch implementation so please direct any questions towards the authors of that repository. \ \ \ \ \ New November 3rd, 2018: Multilingual and Chinese models available \ \ \ \ \ We have made two new BERT models available: BERT Base, Multilingual : 102 languages, 12 layer, 768 hidden, 12 heads, 110M parameters BERT Base, Chinese : Chinese Simplified and Traditional, 12 layer, 768 hidden, 12 heads, 110M parameters We use character based tokenization for Chinese, and WordPiece tokenization for all other languages. Both models should work out of the box without any code changes. We did update the implementation of BasicTokenizer in tokenization.py to support Chinese character tokenization, so please update if you forked it. However, we did not change the tokenization API. For more, see the Multilingual README . \ \ \ \ \ End new information \ \ \ \ \ Introduction BERT , or B idirectional E ncoder R epresentations from T ransformers, is a new method of pre training language representations which obtains state of the art results on a wide array of Natural Language Processing (NLP) tasks. Our academic paper which describes BERT in detail and provides full results on a number of tasks can be found here: To give a few numbers, here are the results on the SQuAD v1.1 question answering task: SQuAD v1.1 Leaderboard (Oct 8th 2018) Test EM Test F1 : : : : 1st Place Ensemble BERT 87.4 93.2 2nd Place Ensemble nlnet 86.0 91.7 1st Place Single Model BERT 85.1 91.8 2nd Place Single Model nlnet 83.5 90.1 And several natural language inference tasks: System MultiNLI Question NLI SWAG : : : : : : BERT 86.7 91.1 86.3 OpenAI GPT (Prev. SOTA) 82.2 88.1 75.0 Plus many other tasks. Moreover, these results were all obtained with almost no task specific neural network architecture design. If you already know what BERT is and you just want to get started, you can download the pre trained models ( pre trained models) and run a state of the art fine tuning ( fine tuning with bert) in only a few minutes. What is BERT? BERT is a method of pre training language representations, meaning that we train a general purpose language understanding model on a large text corpus (like Wikipedia), and then use that model for downstream NLP tasks that we care about (like question answering). BERT outperforms previous methods because it is the first unsupervised , deeply bidirectional system for pre training NLP. Unsupervised means that BERT was trained using only a plain text corpus, which is important because an enormous amount of plain text data is publicly available on the web in many languages. Pre trained representations can also either be context free or contextual , and contextual representations can further be unidirectional or bidirectional . Context free models such as word2vec or GloVe generate a single word embedding representation for each word in the vocabulary, so bank would have the same representation in bank deposit and river bank . Contextual models instead generate a representation of each word that is based on the other words in the sentence. BERT was built upon recent work in pre training contextual representations — including Semi supervised Sequence Learning , Generative Pre Training , ELMo , and ULMFit — but crucially these models are all unidirectional or shallowly bidirectional . This means that each word is only contextualized using the words to its left (or right). For example, in the sentence I made a bank deposit the unidirectional representation of bank is only based on I made a but not deposit . Some previous work does combine the representations from separate left context and right context models, but only in a shallow manner. BERT represents bank using both its left and right context — I made a ... deposit — starting from the very bottom of a deep neural network, so it is deeply bidirectional . BERT uses a simple approach for this: We mask out 15% of the words in the input, run the entire sequence through a deep bidirectional Transformer encoder, and then predict only the masked words. For example: Input: the man went to the MASK1 . he bought a MASK2 of milk. Labels: MASK1 store; MASK2 gallon In order to learn relationships between sentences, we also train on a simple task which can be generated from any monolingual corpus: Given two sentences A and B , is B the actual next sentence that comes after A , or just a random sentence from the corpus? Sentence A: the man went to the store . Sentence B: he bought a gallon of milk . Label: IsNextSentence Sentence A: the man went to the store . Sentence B: penguins are flightless . Label: NotNextSentence We then train a large model (12 layer to 24 layer Transformer) on a large corpus (Wikipedia + BookCorpus ) for a long time (1M update steps), and that's BERT. Using BERT has two stages: Pre training and fine tuning . Pre training is fairly expensive (four days on 4 to 16 Cloud TPUs), but is a one time procedure for each language (current models are English only, but multilingual models will be released in the near future). We are releasing a number of pre trained models from the paper which were pre trained at Google. Most NLP researchers will never need to pre train their own model from scratch. Fine tuning is inexpensive. All of the results in the paper can be replicated in at most 1 hour on a single Cloud TPU, or a few hours on a GPU, starting from the exact same pre trained model. SQuAD, for example, can be trained in around 30 minutes on a single Cloud TPU to achieve a Dev F1 score of 91.0%, which is the single system state of the art. The other important aspect of BERT is that it can be adapted to many types of NLP tasks very easily. In the paper, we demonstrate state of the art results on sentence level (e.g., SST 2), sentence pair level (e.g., MultiNLI), word level (e.g., NER), and span level (e.g., SQuAD) tasks with almost no task specific modifications. What has been released in this repository? We are releasing the following: TensorFlow code for the BERT model architecture (which is mostly a standard Transformer architecture). Pre trained checkpoints for both the lowercase and cased version of BERT Base and BERT Large from the paper. TensorFlow code for push button replication of the most important fine tuning experiments from the paper, including SQuAD, MultiNLI, and MRPC. All of the code in this repository works out of the box with CPU, GPU, and Cloud TPU. Pre trained models We are releasing the BERT Base and BERT Large models from the paper. Uncased means that the text has been lowercased before WordPiece tokenization, e.g., John Smith becomes john smith . The Uncased model also strips out any accent markers. Cased means that the true case and accent markers are preserved. Typically, the Uncased model is better unless you know that case information is important for your task (e.g., Named Entity Recognition or Part of Speech tagging). These models are all released under the same license as the source code (Apache 2.0). For information about the Multilingual and Chinese model, see the Multilingual README . When using a cased model, make sure to pass do_lower False to the training scripts. (Or pass do_lower_case False directly to FullTokenizer if you're using your own script.) The links to the models are here (right click, 'Save link as...' on the name): BERT Base, Uncased : 12 layer, 768 hidden, 12 heads, 110M parameters BERT Large, Uncased : 24 layer, 1024 hidden, 16 heads, 340M parameters BERT Base, Cased : 12 layer, 768 hidden, 12 heads , 110M parameters BERT Large, Cased : 24 layer, 1024 hidden, 16 heads, 340M parameters BERT Base, Multilingual Cased (New, recommended) : 104 languages, 12 layer, 768 hidden, 12 heads, 110M parameters BERT Base, Multilingual Uncased (Orig, not recommended) : 102 languages, 12 layer, 768 hidden, 12 heads, 110M parameters BERT Base, Chinese : Chinese Simplified and Traditional, 12 layer, 768 hidden, 12 heads, 110M parameters Each .zip file contains three items: A TensorFlow checkpoint ( bert_model.ckpt ) containing the pre trained weights (which is actually 3 files). A vocab file ( vocab.txt ) to map WordPiece to word id. A config file ( bert_config.json ) which specifies the hyperparameters of the model. Fine tuning with BERT Important : All results on the paper were fine tuned on a single Cloud TPU, which has 64GB of RAM. It is currently not possible to re produce most of the BERT Large results on the paper using a GPU with 12GB 16GB of RAM, because the maximum batch size that can fit in memory is too small. We are working on adding code to this repository which allows for much larger effective batch size on the GPU. See the section on out of memory issues ( out of memory issues) for more details. This code was tested with TensorFlow 1.11.0. It was tested with Python2 and Python3 (but more thoroughly with Python2, since this is what's used internally in Google). The fine tuning examples which use BERT Base should be able to run on a GPU that has at least 12GB of RAM using the hyperparameters given. Fine tuning with Cloud TPUs Most of the examples below assumes that you will be running training/evaluation on your local machine, using a GPU like a Titan X or GTX 1080. However, if you have access to a Cloud TPU that you want to train on, just add the following flags to run_classifier.py or run_squad.py : use_tpu True \ tpu_name $TPU_NAME Please see the Google Cloud TPU tutorial for how to use Cloud TPUs. Alternatively, you can use the Google Colab notebook BERT FineTuning with Cloud TPUs . On Cloud TPUs, the pretrained model and the output directory will need to be on Google Cloud Storage. For example, if you have a bucket named some_bucket , you might use the following flags instead: output_dir gs://some_bucket/my_output_dir/ The unzipped pre trained model files can also be found in the Google Cloud Storage folder gs://bert_models/2018_10_18 . For example: export BERT_BASE_DIR gs://bert_models/2018_10_18/uncased_L 12_H 768_A 12 Sentence (and sentence pair) classification tasks Before running this example you must download the GLUE data by running this script and unpack it to some directory $GLUE_DIR . Next, download the BERT Base checkpoint and unzip it to some directory $BERT_BASE_DIR . This example code fine tunes BERT Base on the Microsoft Research Paraphrase Corpus (MRPC) corpus, which only contains 3,600 examples and can fine tune in a few minutes on most GPUs. shell export BERT_BASE_DIR /path/to/bert/uncased_L 12_H 768_A 12 export GLUE_DIR /path/to/glue python run_classifier.py \ task_name MRPC \ do_train true \ do_eval true \ data_dir $GLUE_DIR/MRPC \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ max_seq_length 128 \ train_batch_size 32 \ learning_rate 2e 5 \ num_train_epochs 3.0 \ output_dir /tmp/mrpc_output/ You should see output like this: Eval results eval_accuracy 0.845588 eval_loss 0.505248 global_step 343 loss 0.505248 This means that the Dev set accuracy was 84.55%. Small sets like MRPC have a high variance in the Dev set accuracy, even when starting from the same pre training checkpoint. If you re run multiple times (making sure to point to different output_dir ), you should see results between 84% and 88%. A few other pre trained models are implemented off the shelf in run_classifier.py , so it should be straightforward to follow those examples to use BERT for any single sentence or sentence pair classification task. Note: You might see a message Running train on CPU . This really just means that it's running on something other than a Cloud TPU, which includes a GPU. Prediction from classifier Once you have trained your classifier you can use it in inference mode by using the do_predict true command. You need to have a file named test.tsv in the input folder. Output will be created in file called test_results.tsv in the output folder. Each line will contain output for each sample, columns are the class probabilities. shell export BERT_BASE_DIR /path/to/bert/uncased_L 12_H 768_A 12 export GLUE_DIR /path/to/glue export TRAINED_CLASSIFIER /path/to/fine/tuned/classifier python run_classifier.py \ task_name MRPC \ do_predict true \ data_dir $GLUE_DIR/MRPC \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $TRAINED_CLASSIFIER \ max_seq_length 128 \ output_dir /tmp/mrpc_output/ SQuAD 1.1 The Stanford Question Answering Dataset (SQuAD) is a popular question answering benchmark dataset. BERT (at the time of the release) obtains state of the art results on SQuAD with almost no task specific network architecture modifications or data augmentation. However, it does require semi complex data pre processing and post processing to deal with (a) the variable length nature of SQuAD context paragraphs, and (b) the character level answer annotations which are used for SQuAD training. This processing is implemented and documented in run_squad.py . To run on SQuAD, you will first need to download the dataset. The SQuAD website does not seem to link to the v1.1 datasets any longer, but the necessary files can be found here: train v1.1.json dev v1.1.json evaluate v1.1.py Download these to some directory $SQUAD_DIR . The state of the art SQuAD results from the paper currently cannot be reproduced on a 12GB 16GB GPU due to memory constraints (in fact, even batch size 1 does not seem to fit on a 12GB GPU using BERT Large ). However, a reasonably strong BERT Base model can be trained on the GPU with these hyperparameters: shell python run_squad.py \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ do_train True \ train_file $SQUAD_DIR/train v1.1.json \ do_predict True \ predict_file $SQUAD_DIR/dev v1.1.json \ train_batch_size 12 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir /tmp/squad_base/ The dev set predictions will be saved into a file called predictions.json in the output_dir : shell python $SQUAD_DIR/evaluate v1.1.py $SQUAD_DIR/dev v1.1.json ./squad/predictions.json Which should produce an output like this: shell { f1 : 88.41249612335034, exact_match : 81.2488174077578} You should see a result similar to the 88.5% reported in the paper for BERT Base . If you have access to a Cloud TPU, you can train with BERT Large . Here is a set of hyperparameters (slightly different than the paper) which consistently obtain around 90.5% 91.0% F1 single system trained only on SQuAD: shell python run_squad.py \ vocab_file $BERT_LARGE_DIR/vocab.txt \ bert_config_file $BERT_LARGE_DIR/bert_config.json \ init_checkpoint $BERT_LARGE_DIR/bert_model.ckpt \ do_train True \ train_file $SQUAD_DIR/train v1.1.json \ do_predict True \ predict_file $SQUAD_DIR/dev v1.1.json \ train_batch_size 24 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir gs://some_bucket/squad_large/ \ use_tpu True \ tpu_name $TPU_NAME For example, one random run with these parameters produces the following Dev scores: shell { f1 : 90.87081895814865, exact_match : 84.38978240302744} If you fine tune for one epoch on TriviaQA before this the results will be even better, but you will need to convert TriviaQA into the SQuAD json format. SQuAD 2.0 This model is also implemented and documented in run_squad.py . To run on SQuAD 2.0, you will first need to download the dataset. The necessary files can be found here: train v2.0.json dev v2.0.json evaluate v2.0.py Download these to some directory $SQUAD_DIR . On Cloud TPU you can run with BERT Large as follows: shell python run_squad.py \ vocab_file $BERT_LARGE_DIR/vocab.txt \ bert_config_file $BERT_LARGE_DIR/bert_config.json \ init_checkpoint $BERT_LARGE_DIR/bert_model.ckpt \ do_train True \ train_file $SQUAD_DIR/train v2.0.json \ do_predict True \ predict_file $SQUAD_DIR/dev v2.0.json \ train_batch_size 24 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir gs://some_bucket/squad_large/ \ use_tpu True \ tpu_name $TPU_NAME \ version_2_with_negative True We assume you have copied everything from the output directory to a local directory called ./squad/. The initial dev set predictions will be at ./squad/predictions.json and the differences between the score of no answer ( ) and the best non null answer for each question will be in the file ./squad/null_odds.json Run this script to tune a threshold for predicting null versus non null answers: python $SQUAD_DIR/evaluate v2.0.py $SQUAD_DIR/dev v2.0.json ./squad/predictions.json na prob file ./squad/null_odds.json Assume the script outputs best_f1_thresh THRESH. (Typical values are between 1.0 and 5.0). You can now re run the model to generate predictions with the derived threshold or alternatively you can extract the appropriate answers from ./squad/nbest_predictions.json. shell python run_squad.py \ vocab_file $BERT_LARGE_DIR/vocab.txt \ bert_config_file $BERT_LARGE_DIR/bert_config.json \ init_checkpoint $BERT_LARGE_DIR/bert_model.ckpt \ do_train False \ train_file $SQUAD_DIR/train v2.0.json \ do_predict True \ predict_file $SQUAD_DIR/dev v2.0.json \ train_batch_size 24 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir gs://some_bucket/squad_large/ \ use_tpu True \ tpu_name $TPU_NAME \ version_2_with_negative True \ null_score_diff_threshold $THRESH Out of memory issues All experiments in the paper were fine tuned on a Cloud TPU, which has 64GB of device RAM. Therefore, when using a GPU with 12GB 16GB of RAM, you are likely to encounter out of memory issues if you use the same hyperparameters described in the paper. The factors that affect memory usage are: max_seq_length : The released models were trained with sequence lengths up to 512, but you can fine tune with a shorter max sequence length to save substantial memory. This is controlled by the max_seq_length flag in our example code. train_batch_size : The memory usage is also directly proportional to the batch size. Model type, BERT Base vs. BERT Large : The BERT Large model requires significantly more memory than BERT Base . Optimizer : The default optimizer for BERT is Adam, which requires a lot of extra memory to store the m and v vectors. Switching to a more memory efficient optimizer can reduce memory usage, but can also affect the results. We have not experimented with other optimizers for fine tuning. Using the default training scripts ( run_classifier.py and run_squad.py ), we benchmarked the maximum batch size on single Titan X GPU (12GB RAM) with TensorFlow 1.11.0: System Seq Length Max Batch Size BERT Base 64 64 ... 128 32 ... 256 16 ... 320 14 ... 384 12 ... 512 6 BERT Large 64 12 ... 128 6 ... 256 2 ... 320 1 ... 384 0 ... 512 0 Unfortunately, these max batch sizes for BERT Large are so small that they will actually harm the model accuracy, regardless of the learning rate used. We are working on adding code to this repository which will allow much larger effective batch sizes to be used on the GPU. The code will be based on one (or both) of the following techniques: Gradient accumulation : The samples in a minibatch are typically independent with respect to gradient computation (excluding batch normalization, which is not used here). This means that the gradients of multiple smaller minibatches can be accumulated before performing the weight update, and this will be exactly equivalent to a single larger update. Gradient checkpointing : The major use of GPU/TPU memory during DNN training is caching the intermediate activations in the forward pass that are necessary for efficient computation in the backward pass. Gradient checkpointing trades memory for compute time by re computing the activations in an intelligent way. However, this is not implemented in the current release. Using BERT to extract fixed feature vectors (like ELMo) In certain cases, rather than fine tuning the entire pre trained model end to end, it can be beneficial to obtained pre trained contextual embeddings , which are fixed contextual representations of each input token generated from the hidden layers of the pre trained model. This should also mitigate most of the out of memory issues. As an example, we include the script extract_features.py which can be used like this: shell Sentence A and Sentence B are separated by the delimiter for sentence pair tasks like question answering and entailment. For single sentence inputs, put one sentence per line and DON'T use the delimiter. echo 'Who was Jim Henson ? Jim Henson was a puppeteer' > /tmp/input.txt python extract_features.py \ input_file /tmp/input.txt \ output_file /tmp/output.jsonl \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ layers 1, 2, 3, 4 \ max_seq_length 128 \ batch_size 8 This will create a JSON file (one line per line of input) containing the BERT activations from each Transformer layer specified by layers ( 1 is the final hidden layer of the Transformer, etc.) Note that this script will produce very large output files (by default, around 15kb for every input token). If you need to maintain alignment between the original and tokenized words (for projecting training labels), see the Tokenization ( tokenization) section below. Note: You may see a message like Could not find trained model in model_dir: /tmp/tmpuB5g5c, running initialization to predict. This message is expected, it just means that we are using the init_from_checkpoint() API rather than the saved model API. If you don't specify a checkpoint or specify an invalid checkpoint, this script will complain. Tokenization For sentence level tasks (or sentence pair) tasks, tokenization is very simple. Just follow the example code in run_classifier.py and extract_features.py . The basic procedure for sentence level tasks is: 1. Instantiate an instance of tokenizer tokenization.FullTokenizer 2. Tokenize the raw text with tokens tokenizer.tokenize(raw_text) . 3. Truncate to the maximum sequence length. (You can use up to 512, but you probably want to use shorter if possible for memory and speed reasons.) 4. Add the CLS and SEP tokens in the right place. Word level and span level tasks (e.g., SQuAD and NER) are more complex, since you need to maintain alignment between your input text and output text so that you can project your training labels. SQuAD is a particularly complex example because the input labels are character based, and SQuAD paragraphs are often longer than our maximum sequence length. See the code in run_squad.py to show how we handle this. Before we describe the general recipe for handling word level tasks, it's important to understand what exactly our tokenizer is doing. It has three main steps: 1. Text normalization : Convert all whitespace characters to spaces, and (for the Uncased model) lowercase the input and strip out accent markers. E.g., John Johanson's, → john johanson's, . 2. Punctuation splitting : Split all punctuation characters on both sides (i.e., add whitespace around all punctuation characters). Punctuation characters are defined as (a) Anything with a P Unicode class, (b) any non letter/number/space ASCII character (e.g., characters like $ which are technically not punctuation). E.g., john johanson's, → john johanson ' s , 3. WordPiece tokenization : Apply whitespace tokenization to the output of the above procedure, and apply WordPiece tokenization to each token separately. (Our implementation is directly based on the one from tensor2tensor , which is linked). E.g., john johanson ' s , → john johan son ' s , The advantage of this scheme is that it is compatible with most existing English tokenizers. For example, imagine that you have a part of speech tagging task which looks like this: Input: John Johanson 's house Labels: NNP NNP POS NN The tokenized output will look like this: Tokens: john johan son ' s house Crucially, this would be the same output as if the raw text were John Johanson's house (with no space before the 's ). If you have a pre tokenized representation with word level annotations, you can simply tokenize each input word independently, and deterministically maintain an original to tokenized alignment: python Input orig_tokens John , Johanson , 's , house labels NNP , NNP , POS , NN Output bert_tokens Token map will be an int > int mapping between the orig_tokens index and the bert_tokens index. orig_to_tok_map tokenizer tokenization.FullTokenizer( vocab_file vocab_file, do_lower_case True) bert_tokens.append( CLS ) for orig_token in orig_tokens: orig_to_tok_map.append(len(bert_tokens)) bert_tokens.extend(tokenizer.tokenize(orig_token)) bert_tokens.append( SEP ) bert_tokens CLS , john , johan , son , ' , s , house , SEP orig_to_tok_map 1, 2, 4, 6 Now orig_to_tok_map can be used to project labels to the tokenized representation. There are common English tokenization schemes which will cause a slight mismatch between how BERT was pre trained. For example, if your input tokenization splits off contractions like do n't , this will cause a mismatch. If it is possible to do so, you should pre process your data to convert these back to raw looking text, but if it's not possible, this mismatch is likely not a big deal. Pre training with BERT We are releasing code to do masked LM and next sentence prediction on an arbitrary text corpus. Note that this is not the exact code that was used for the paper (the original code was written in C++, and had some additional complexity), but this code does generate pre training data as described in the paper. Here's how to run the data generation. The input is a plain text file, with one sentence per line. (It is important that these be actual sentences for the next sentence prediction task). Documents are delimited by empty lines. The output is a set of tf.train.Example s serialized into TFRecord file format. You can perform sentence segmentation with an off the shelf NLP toolkit such as spaCy . The create_pretraining_data.py script will concatenate segments until they reach the maximum sequence length to minimize computational waste from padding (see the script for more details). However, you may want to intentionally add a slight amount of noise to your input data (e.g., randomly truncate 2% of input segments) to make it more robust to non sentential input during fine tuning. This script stores all of the examples for the entire input file in memory, so for large data files you should shard the input file and call the script multiple times. (You can pass in a file glob to run_pretraining.py , e.g., tf_examples.tf_record .) The max_predictions_per_seq is the maximum number of masked LM predictions per sequence. You should set this to around max_seq_length masked_lm_prob (the script doesn't do that automatically because the exact value needs to be passed to both scripts). shell python create_pretraining_data.py \ input_file ./sample_text.txt \ output_file /tmp/tf_examples.tfrecord \ vocab_file $BERT_BASE_DIR/vocab.txt \ do_lower_case True \ max_seq_length 128 \ max_predictions_per_seq 20 \ masked_lm_prob 0.15 \ random_seed 12345 \ dupe_factor 5 Here's how to run the pre training. Do not include init_checkpoint if you are pre training from scratch. The model configuration (including vocab size) is specified in bert_config_file . This demo code only pre trains for a small number of steps (20), but in practice you will probably want to set num_train_steps to 10000 steps or more. The max_seq_length and max_predictions_per_seq parameters passed to run_pretraining.py must be the same as create_pretraining_data.py . shell python run_pretraining.py \ input_file /tmp/tf_examples.tfrecord \ output_dir /tmp/pretraining_output \ do_train True \ do_eval True \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ train_batch_size 32 \ max_seq_length 128 \ max_predictions_per_seq 20 \ num_train_steps 20 \ num_warmup_steps 10 \ learning_rate 2e 5 This will produce an output like this: Eval results global_step 20 loss 0.0979674 masked_lm_accuracy 0.985479 masked_lm_loss 0.0979328 next_sentence_accuracy 1.0 next_sentence_loss 3.45724e 05 Note that since our sample_text.txt file is very small, this example training will overfit that data in only a few steps and produce unrealistically high accuracy numbers. Pre training tips and caveats If using your own vocabulary, make sure to change vocab_size in bert_config.json . If you use a larger vocabulary without changing this, you will likely get NaNs when training on GPU or TPU due to unchecked out of bounds access. If your task has a large domain specific corpus available (e.g., movie reviews or scientific papers ), it will likely be beneficial to run additional steps of pre training on your corpus, starting from the BERT checkpoint. The learning rate we used in the paper was 1e 4. However, if you are doing additional steps of pre training starting from an existing BERT checkpoint, you should use a smaller learning rate (e.g., 2e 5). Current BERT models are English only, but we do plan to release a multilingual model which has been pre trained on a lot of languages in the near future (hopefully by the end of November 2018). Longer sequences are disproportionately expensive because attention is quadratic to the sequence length. In other words, a batch of 64 sequences of length 512 is much more expensive than a batch of 256 sequences of length 128. The fully connected/convolutional cost is the same, but the attention cost is far greater for the 512 length sequences. Therefore, one good recipe is to pre train for, say, 90,000 steps with a sequence length of 128 and then for 10,000 additional steps with a sequence length of 512. The very long sequences are mostly needed to learn positional embeddings, which can be learned fairly quickly. Note that this does require generating the data twice with different values of max_seq_length . If you are pre training from scratch, be prepared that pre training is computationally expensive, especially on GPUs. If you are pre training from scratch, our recommended recipe is to pre train a BERT Base on a single preemptible Cloud TPU v2 , which takes about 2 weeks at a cost of about $500 USD (based on the pricing in October 2018). You will have to scale down the batch size when only training on a single Cloud TPU, compared to what was used in the paper. It is recommended to use the largest batch size that fits into TPU memory. Pre training data We will not be able to release the pre processed datasets used in the paper. For Wikipedia, the recommended pre processing is to download the latest dump , extract the text with WikiExtractor.py , and then apply any necessary cleanup to convert it into plain text. Unfortunately the researchers who collected the BookCorpus no longer have it available for public download. The Project Guttenberg Dataset is a somewhat smaller (200M word) collection of older books that are public domain. Common Crawl is another very large collection of text, but you will likely have to do substantial pre processing and cleanup to extract a usable corpus for pre training BERT. Learning a new WordPiece vocabulary This repository does not include code for learning a new WordPiece vocabulary. The reason is that the code used in the paper was implemented in C++ with dependencies on Google's internal libraries. For English, it is almost always better to just start with our vocabulary and pre trained models. For learning vocabularies of other languages, there are a number of open source options available. However, keep in mind that these are not compatible with our tokenization.py library: Google's SentencePiece library tensor2tensor's WordPiece generation script Rico Sennrich's Byte Pair Encoding library Using BERT in Colab If you want to use BERT with Colab , you can get started with the notebook BERT FineTuning with Cloud TPUs . At the time of this writing (October 31st, 2018), Colab users can access a Cloud TPU completely for free. Note: One per user, availability limited, requires a Google Cloud Platform account with storage (although storage may be purchased with free credit for signing up with GCP), and this capability may not longer be available in the future. Click on the BERT Colab that was just linked for more information. FAQ Is this code compatible with Cloud TPUs? What about GPUs? Yes, all of the code in this repository works out of the box with CPU, GPU, and Cloud TPU. However, GPU training is single GPU only. I am getting out of memory errors, what is wrong? See the section on out of memory issues ( out of memory issues) for more information. Is there a PyTorch version available? There is no official PyTorch implementation. However, NLP researchers from HuggingFace made a PyTorch version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. We were not involved in the creation or maintenance of the PyTorch implementation so please direct any questions towards the authors of that repository. Is there a Chainer version available? There is no official Chainer implementation. However, Sosuke Kobayashi made a Chainer version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. We were not involved in the creation or maintenance of the Chainer implementation so please direct any questions towards the authors of that repository. Will models in other languages be released? Yes, we plan to release a multi lingual BERT model in the near future. We cannot make promises about exactly which languages will be included, but it will likely be a single model which includes most of the languages which have a significantly sized Wikipedia. Will models larger than BERT Large be released? So far we have not attempted to train anything larger than BERT Large . It is possible that we will release larger models if we are able to obtain significant improvements. What license is this library released under? All code and models are released under the Apache 2.0 license. See the LICENSE file for more information. How do I cite BERT? For now, cite the Arxiv paper : @article{devlin2018bert, title {BERT: Pre training of Deep Bidirectional Transformers for Language Understanding}, author {Devlin, Jacob and Chang, Ming Wei and Lee, Kenton and Toutanova, Kristina}, journal {arXiv preprint arXiv:1810.04805}, year {2018} } If we submit the paper to a conference or journal, we will update the BibTeX. Disclaimer This is not an official Google product. Contact information For help or issues using BERT, please submit a GitHub issue. For personal communication related to BERT, please contact Jacob Devlin ( jacobdevlin@google.com ), Ming Wei Chang ( mingweichang@google.com ), or Kenton Lee ( kentonl@google.com ).",Machine Translation,Machine Translation 2520,Natural Language Processing,Natural Language Processing,Natural Language Processing,"中译英 Neural Machine Translation (Chinese to English) for AI_Challenger dataset. Requirenments python 3.6 TensorFlow 1.12.0 tensor2tensor 1.10.0 jieba 0.39 tensorflow hub 0.4.0 tensorflow_serving_api Prepare Data 1. Download the dataset and put the dataset in raw_data file 2. Run the data preparation script cd train ./self_prepare.sh Train Model Run the training script ./self_run.sh Inference Run the inference script ./self_infer.sh 导出模型 ./export_model.sh 启动服务端 (需要安装tensorflow serving,安装方式 ./server.sh 启动客户端 ./client.sh References Attention Is All You Need Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin Full text available at: Code availabel at:",Machine Translation,Machine Translation 2521,Natural Language Processing,Natural Language Processing,Natural Language Processing,"README General translational inveriance: deep learning book: classification: best practice: RNN seq 2 seq learning Instead of RNN, implement temporal convolution: The fall of rnn: Attention mechanism: Papers (to implement): 2017, VDCNN Very Deep Convolutional Networks for Text Classification: (implementation at 2018, Pervasive Attention: 2D Convolutional Neural Networks for Sequence to Sequence Prediction: 2016, HAN Hierarchical Attention Networks for Document Classification, (blog: implementation: https://github.com/richliao/textClassifier)",Machine Translation,Machine Translation 2546,Natural Language Processing,Natural Language Processing,Natural Language Processing,"TensorFlow Models This repository contains a number of different models implemented in TensorFlow : The official models (official) are a collection of example models that use TensorFlow's high level APIs. They are intended to be well maintained, tested, and kept up to date with the latest stable TensorFlow API. They should also be reasonably optimized for fast performance while still being easy to read. We especially recommend newer TensorFlow users to start here. The research models are a large collection of models implemented in TensorFlow by researchers. They are not officially supported or available in release branches; it is up to the individual researchers to maintain the models and/or provide support on issues and pull requests. The samples folder (samples) contains code snippets and smaller models that demonstrate features of TensorFlow, including code presented in various blog posts. The tutorials folder (tutorials) is a collection of models described in the TensorFlow tutorials . Contribution guidelines If you want to contribute to models, be sure to review the contribution guidelines (CONTRIBUTING.md). License Apache License 2.0 (LICENSE)",Machine Translation,Machine Translation 2554,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Tensor2Tensor PyPI version GitHub Issues Contributions welcome (CONTRIBUTING.md) Gitter License Travis Run on FH Tensor2Tensor , or T2T for short, is a library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research . T2T is actively used and maintained by researchers and engineers within the Google Brain team and a community of users. We're eager to collaborate with you too, so feel free to open an issue on GitHub or send along a pull request (see our contribution doc (CONTRIBUTING.md)). You can chat with us on Gitter and join the T2T Google Group . Quick Start This iPython notebook explains T2T and runs in your browser using a free VM from Google, no installation needed. Alternatively, here is a one command version that installs T2T, downloads MNIST, trains a model and evaluates it: pip install tensor2tensor && t2t trainer \ generate_data \ data_dir /t2t_data \ output_dir /t2t_train/mnist \ problem image_mnist \ model shake_shake \ hparams_set shake_shake_quick \ train_steps 1000 \ eval_steps 100 Contents Suggested Datasets and Models ( suggested datasets and models) Mathematical Language Understanding ( mathematical language understanding) Story, Question and Answer ( story question and answer) Image Classification ( image classification) Image Generation ( image generation) Language Modeling ( language modeling) Sentiment Analysis ( sentiment analysis) Speech Recognition ( speech recognition) Summarization ( summarization) Translation ( translation) Basics ( basics) Walkthrough ( walkthrough) Installation ( installation) Features ( features) T2T Overview ( t2t overview) Datasets ( datasets) Problems and Modalities ( problems and modalities) Models ( models) Hyperparameter Sets ( hyperparameter sets) Trainer ( trainer) Adding your own components ( adding your own components) Adding a dataset ( adding a dataset) Papers ( papers) Run on FloydHub ( run on floydhub) Suggested Datasets and Models Below we list a number of tasks that can be solved with T2T when you train the appropriate model on the appropriate problem. We give the problem and model below and we suggest a setting of hyperparameters that we know works well in our setup. We usually run either on Cloud TPUs or on 8 GPU machines; you might need to modify the hyperparameters if you run on a different setup. Mathematical Language Understanding For evaluating mathematical expressions at the character level involving addition, subtraction and multiplication of both positive and negative decimal numbers with variable digits assigned to symbolic variables, use the MLU data set: problem algorithmic_math_two_variables You can try solving the problem with different transformer models and hyperparameters as described in the paper : Standard transformer: model transformer hparams_set transformer_tiny Universal transformer: model universal_transformer hparams_set universal_transformer_tiny Adaptive universal transformer: model universal_transformer hparams_set adaptive_universal_transformer_tiny Story, Question and Answer For answering questions based on a story, use the bAbi data set: problem babi_qa_concat_task1_1k You can choose the bAbi task from the range 1,20 and the subset from 1k or 10k. To combine test data from all tasks into a single test set, use problem babi_qa_concat_all_tasks_10k Image Classification For image classification, we have a number of standard data sets: ImageNet (a large data set): problem image_imagenet , or one of the re scaled versions ( image_imagenet224 , image_imagenet64 , image_imagenet32 ) CIFAR 10: problem image_cifar10 (or problem image_cifar10_plain to turn off data augmentation) CIFAR 100: problem image_cifar100 MNIST: problem image_mnist For ImageNet, we suggest to use the ResNet or Xception, i.e., use model resnet hparams_set resnet_50 or model xception hparams_set xception_base . Resnet should get to above 76% top 1 accuracy on ImageNet. For CIFAR and MNIST, we suggest to try the shake shake model: model shake_shake hparams_set shakeshake_big . This setting trained for train_steps 700000 should yield close to 97% accuracy on CIFAR 10. Image Generation For (un)conditional image generation, we have a number of standard data sets: CelebA: problem img2img_celeba for image to image translation, namely, superresolution from 8x8 to 32x32. CelebA HQ: problem image_celeba256_rev for a downsampled 256x256. CIFAR 10: problem image_cifar10_plain_gen_rev for class conditional 32x32 generation. LSUN Bedrooms: problem image_lsun_bedrooms_rev MS COCO: problem image_text_ms_coco_rev for text to image generation. Small ImageNet (a large data set): problem image_imagenet32_gen_rev for 32x32 or problem image_imagenet64_gen_rev for 64x64. We suggest to use the Image Transformer, i.e., model imagetransformer , or the Image Transformer Plus, i.e., model imagetransformerpp that uses discretized mixture of logistics, or variational auto encoder, i.e., model transformer_ae . For CIFAR 10, using hparams_set imagetransformer_cifar10_base or hparams_set imagetransformer_cifar10_base_dmol yields 2.90 bits per dimension. For Imagenet 32, using hparams_set imagetransformer_imagenet32_base yields 3.77 bits per dimension. Language Modeling For language modeling, we have these data sets in T2T: PTB (a small data set): problem languagemodel_ptb10k for word level modeling and problem languagemodel_ptb_characters for character level modeling. LM1B (a billion word corpus): problem languagemodel_lm1b32k for subword level modeling and problem languagemodel_lm1b_characters for character level modeling. We suggest to start with model transformer on this task and use hparams_set transformer_small for PTB and hparams_set transformer_base for LM1B. Sentiment Analysis For the task of recognizing the sentiment of a sentence, use the IMDB data set: problem sentiment_imdb We suggest to use model transformer_encoder here and since it is a small data set, try hparams_set transformer_tiny and train for few steps (e.g., train_steps 2000 ). Speech Recognition For speech to text, we have these data sets in T2T: Librispeech (US English): problem librispeech for the whole set and problem librispeech_clean for a smaller but nicely filtered part. Mozilla Common Voice (US English): problem common_voice for the whole set problem common_voice_clean for a quality checked subset. Summarization For summarizing longer text into shorter one we have these data sets: CNN/DailyMail articles summarized into a few sentences: problem summarize_cnn_dailymail32k We suggest to use model transformer and hparams_set transformer_prepend for this task. This yields good ROUGE scores. Translation There are a number of translation data sets in T2T: English German: problem translate_ende_wmt32k English French: problem translate_enfr_wmt32k English Czech: problem translate_encs_wmt32k English Chinese: problem translate_enzh_wmt32k English Vietnamese: problem translate_envi_iwslt32k You can get translations in the other direction by appending _rev to the problem name, e.g., for German English use problem translate_ende_wmt32k_rev (note that you still need to download the original data with t2t datagen problem translate_ende_wmt32k ). For all translation problems, we suggest to try the Transformer model: model transformer . At first it is best to try the base setting, hparams_set transformer_base . When trained on 8 GPUs for 300K steps this should reach a BLEU score of about 28 on the English German data set, which is close to state of the art. If training on a single GPU, try the hparams_set transformer_base_single_gpu setting. For very good results or larger data sets (e.g., for English French), try the big model with hparams_set transformer_big . Basics Walkthrough Here's a walkthrough training a good English to German translation model using the Transformer model from Attention Is All You Need on WMT data. pip install tensor2tensor See what problems, models, and hyperparameter sets are available. You can easily swap between them (and add new ones). t2t trainer registry_help PROBLEM translate_ende_wmt32k MODEL transformer HPARAMS transformer_base_single_gpu DATA_DIR $HOME/t2t_data TMP_DIR /tmp/t2t_datagen TRAIN_DIR $HOME/t2t_train/$PROBLEM/$MODEL $HPARAMS mkdir p $DATA_DIR $TMP_DIR $TRAIN_DIR Generate data t2t datagen \ data_dir $DATA_DIR \ tmp_dir $TMP_DIR \ problem $PROBLEM Train If you run out of memory, add hparams 'batch_size 1024'. t2t trainer \ data_dir $DATA_DIR \ problem $PROBLEM \ model $MODEL \ hparams_set $HPARAMS \ output_dir $TRAIN_DIR Decode DECODE_FILE $DATA_DIR/decode_this.txt echo Hello world >> $DECODE_FILE echo Goodbye world >> $DECODE_FILE echo e 'Hallo Welt\nAuf Wiedersehen Welt' > ref translation.de BEAM_SIZE 4 ALPHA 0.6 t2t decoder \ data_dir $DATA_DIR \ problem $PROBLEM \ model $MODEL \ hparams_set $HPARAMS \ output_dir $TRAIN_DIR \ decode_hparams beam_size $BEAM_SIZE,alpha $ALPHA \ decode_from_file $DECODE_FILE \ decode_to_file translation.en See the translations cat translation.en Evaluate the BLEU score Note: Report this BLEU score in papers, not the internal approx_bleu metric. t2t bleu translation translation.en reference ref translation.de Installation Assumes tensorflow or tensorflow gpu installed pip install tensor2tensor Installs with tensorflow gpu requirement pip install tensor2tensor tensorflow_gpu Installs with tensorflow (cpu) requirement pip install tensor2tensor tensorflow Binaries: Data generator t2t datagen Trainer t2t trainer registry_help Library usage: python c from tensor2tensor.models.transformer import Transformer Features Many state of the art and baseline models are built in and new models can be added easily (open an issue or pull request!). Many datasets across modalities text, audio, image available for generation and use, and new ones can be added easily (open an issue or pull request for public datasets!). Models can be used with any dataset and input mode (or even multiple); all modality specific processing (e.g. embedding lookups for text tokens) is done with bottom and top transformations, which are specified per feature in the model. Support for multi GPU machines and synchronous (1 master, many workers) and asynchronous (independent workers synchronizing through a parameter server) distributed training . Easily swap amongst datasets and models by command line flag with the data generation script t2t datagen and the training script t2t trainer . Train on Google Cloud ML and Cloud TPUs . T2T overview Problems Problems consist of features such as inputs and targets, and metadata such as each feature's modality (e.g. symbol, image, audio) and vocabularies. Problem features are given by a dataset, which is stored as a TFRecord file with tensorflow.Example protocol buffers. All problems are imported in all_problems.py or are registered with @registry.register_problem . Run t2t datagen to see the list of available problems and download them. Models T2TModel s define the core tensor to tensor computation. They apply a default transformation to each input and output so that models may deal with modality independent tensors (e.g. embeddings at the input; and a linear transform at the output to produce logits for a softmax over classes). All models are imported in the models subpackage , inherit from T2TModel , and are registered with @registry.register_model . Hyperparameter Sets Hyperparameter sets are encoded in HParams objects, and are registered with @registry.register_hparams . Every model and problem has a HParams . A basic set of hyperparameters are defined in common_hparams.py and hyperparameter set functions can compose other hyperparameter set functions. Trainer The trainer binary is the entrypoint for training, evaluation, and inference. Users can easily switch between problems, models, and hyperparameter sets by using the model , problem , and hparams_set flags. Specific hyperparameters can be overridden with the hparams flag. schedule and related flags control local and distributed training/evaluation ( distributed training documentation ). Adding your own components T2T's components are registered using a central registration mechanism that enables easily adding new ones and easily swapping amongst them by command line flag. You can add your own components without editing the T2T codebase by specifying the t2t_usr_dir flag in t2t trainer . You can do so for models, hyperparameter sets, modalities, and problems. Please do submit a pull request if your component might be useful to others. See the example_usr_dir for an example user directory. Adding a dataset To add a new dataset, subclass Problem and register it with @registry.register_problem . See TranslateEndeWmt8k for an example. Also see the data generators README . Run on FloydHub Run on FloydHub Click this button to open a Workspace on FloydHub . You can use the workspace to develop and test your code on a fully configured cloud GPU machine. Tensor2Tensor comes preinstalled in the environment, you can simply open a Terminal and run your code. bash Test the quick start on a Workspace's Terminal with this command t2t trainer \ generate_data \ data_dir ./t2t_data \ output_dir ./t2t_train/mnist \ problem image_mnist \ model shake_shake \ hparams_set shake_shake_quick \ train_steps 1000 \ eval_steps 100 Note: Ensure compliance with the FloydHub Terms of Service . Papers When referencing Tensor2Tensor, please cite this paper . @article{tensor2tensor, author {Ashish Vaswani and Samy Bengio and Eugene Brevdo and Francois Chollet and Aidan N. Gomez and Stephan Gouws and Llion Jones and \L{}ukasz Kaiser and Nal Kalchbrenner and Niki Parmar and Ryan Sepassi and Noam Shazeer and Jakob Uszkoreit}, title {Tensor2Tensor for Neural Machine Translation}, journal {CoRR}, volume {abs/1803.07416}, year {2018}, url { } Tensor2Tensor was used to develop a number of state of the art models and deep learning methods. Here we list some papers that were based on T2T from the start and benefited from its features and architecture in ways described in the Google Research Blog post introducing T2T . Attention Is All You Need Depthwise Separable Convolutions for Neural Machine Translation One Model To Learn Them All Discrete Autoencoders for Sequence Models Generating Wikipedia by Summarizing Long Sequences Image Transformer Training Tips for the Transformer Model Self Attention with Relative Position Representations Fast Decoding in Sequence Models using Discrete Latent Variables Adafactor: Adaptive Learning Rates with Sublinear Memory Cost Universal Transformers Attending to Mathematical Language with Transformers The Evolved Transformer Model Based Reinforcement Learning for Atari VideoFlow: A Flow Based Generative Model for Video NOTE: This is not an official Google product.",Machine Translation,Machine Translation 2568,Natural Language Processing,Natural Language Processing,Natural Language Processing,"python def attend(query, context, value None, score 'dot', normalize 'softmax', context_sizes None, context_mask None, return_weight False ): Attend to value (or context) by scoring each query and context. Args query: Variable of size (B, M, D1) Batch of M query vectors. context: Variable of size (B, N, D2) Batch of N context vectors. value: Variable of size (B, N, P), default None If given, the output vectors will be weighted combinations of the value vectors. Otherwise, the context vectors will be used. score: str or callable, default 'dot' If score 'dot', scores are computed as the dot product between context and query vectors. This Requires D1 D2. Otherwise, score should be a callable: query context score (B,M,D1) (B,N,D2) > (B,M,N) normalize: str, default 'softmax' One of 'softmax', 'sigmoid', or 'identity'. Name of function used to map scores to weights. context_mask: Tensor of (B, M, N), default None A Tensor used to mask context. Masked and unmasked entries should be filled appropriately for the normalization function. context_sizes: list int , default None, List giving the size of context for each item in the batch and used to compute a context_mask. If context_mask or context_sizes are not given, context is assumed to have fixed size. return_weight: bool, default False If True, return the attention weight Tensor. Returns output: Variable of size (B, M, P) If return_weight is False. weight, output: Variable of size (B, M, N), Variable of size (B, M, P) If return_weight is True. Install bash python setup.py install Test bash python m pytest Tested with pytorch 1.0.0 About Attention is used to focus processing on a particular region of input. The attend function provided by this package implements the most common attention mechanism 1 ( 1), 2 ( 2), 3 ( 3), 4 ( 4) , which produces an output by taking a weighted combination of value vectors with weights from a scoring function operating over pairs of query and context vectors. Given query vector q , context vectors c_1,...,c_n , and value vectors v_1,...,v_n the attention score of q with c_i is given by s_i f(q, c_i) Frequently f takes the form of a dot product between query and context vectors. s_i q^T c_i The scores are passed through a normalization functions g (normally the softmax function). w_i g(s_1,...,s_n)_i Finally, the output is computed as a weighted sum of the value vectors. z \sum_{i 1}^n w_i v_i In many applications 1 ( 1), 4 ( 4), 5 ( 5) attention is applied to the context vectors themselves, v_i c_i . Sizes This attend function provided by this package accepts batches of size B containing M query vectors of dimension D1 , N context vectors of dimension D2 , and optionally N value vectors of dimension P . Variable Length If the number of context vectors varies within a batch, a context can be ignored by forcing the corresponding weight to be zero. In the case of the softmax, this can be achieved by adding negative infinity to the corresponding score before normalization. Similarly, for elementwise normalization functions the weights can be multiplied by an appropriate {0,1} mask after normalization. To facilitate the above behavior, a context mask, with entries in { inf, 0} or {0, 1} depending on the normalization function, can be passed to this function. The masks should have size (B, M, N) . Alternatively, a list can be passed giving the size of the context for each item in the batch. Appropriate masks will be created from these lists. Note that the size of output does not depend on the number of context vectors. Because of this, context positions are truly unaccounted for in the output. References 1 @article{bahdanau2014neural, title {Neural machine translation by jointly learning to align and translate}, author {Bahdanau, Dzmitry and Cho, Kyunghyun and Bengio, Yoshua}, journal {arXiv preprint arXiv:1409.0473}, year {2014} } 2 @article{graves2014neural, title {Neural turing machines}, author {Graves, Alex and Wayne, Greg and Danihelka, Ivo}, journal {arXiv preprint arXiv:1410.5401}, year {2014} } 3 @inproceedings{sukhbaatar2015end, title {End to end memory networks}, author {Sukhbaatar, Sainbayar and Weston, Jason and Fergus, Rob and others}, booktitle {Advances in neural information processing systems}, pages {2440 2448}, year {2015} } 4 @article{olah2016attention, title {Attention and augmented recurrent neural networks}, author {Olah, Chris and Carter, Shan}, journal {Distill}, volume {1}, number {9}, pages {e1}, year {2016} } 5 @inproceedings{vinyals2015pointer, title {Pointer networks}, author {Vinyals, Oriol and Fortunato, Meire and Jaitly, Navdeep}, booktitle {Advances in Neural Information Processing Systems}, pages {2692 2700}, year {2015} }",Machine Translation,Machine Translation 2572,Natural Language Processing,Natural Language Processing,Natural Language Processing,"The Transformer in PyTorch A minimal PyTorch implementation of the Transformer for sequence to sequence learning. Supported features: Mini batch training with CUDA Usage Training data should be formatted as below: source_sequence \t target_sequence source_sequence \t target_sequence ... To prepare data: python prepare.py training_data To train: python train.py model vocab.src vocab.tgt training_data.csv num_epoch To predict: python predict.py model.epochN vocab.src vocab.tgt test_data References Jimmy Lei Ba, Jamie Ryan Kiros, Geoffrey E. Hinton. 2016. Layer Normalization. arXiv:1607.06450. Hideya Mino, Masao Utiyama, Eiichiro Sumita, Takenobu Tokunaga. 2017. Key value Attention Mechanism for Neural Machine Translation. In Proceedings of the 8th International Joint Conference on Natural Language Processing, pp. 290 295. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin. 2017. Attention Is All You Need. arXiv:1706.03762.",Machine Translation,Machine Translation 2577,Natural Language Processing,Natural Language Processing,Natural Language Processing,"THUMT: An Open Source Toolkit for Neural Machine Translation Contents Introduction ( introduction) Online Demo ( online demo) Implementations ( implementations) License ( license) Citation ( citation) Development Team ( development team) Contributors ( contributors) Contact ( contact) Derivative Repositories ( derivative repositories) Introduction Machine translation is a natural language processing task that aims to translate natural languages using computers automatically. Recent several years have witnessed the rapid development of end to end neural machine translation, which has become the new mainstream method in practical MT systems. THUMT is an open source toolkit for neural machine translation developed by the Natural Language Processing Group at Tsinghua University . The website of THUMT is: Online Demo The online demo of THUMT is available at The languages involved include Ancient Chinese, Arabic, Chinese, English, French, German, Indonesian, Japanese, Portugese, Russian, and Spanish. Implementations THUMT has currently two main implementations: THUMT TensorFlow : a new implementation developed with TensorFlow . It implements the sequence to sequence model ( Seq2Seq ) ( Sutskever et al., 2014 ), the standard attention based model ( RNNsearch ) ( Bahdanau et al., 2014 ), and the Transformer model ( Transformer ) ( Vaswani et al., 2017 ). THUMT Theano : the original project developed with Theano , which is no longer updated because MLA put an end to Theano . It implements the standard attention based model ( RNNsearch ) ( Bahdanau et al., 2014 ), minimum risk training ( MRT ) ( Shen et al., 2016 ) for optimizing model parameters with respect to evaluation metrics, semi supervised training ( SST ) ( Cheng et al., 2016 ) for exploiting monolingual corpora to learn bi directional translation models, and layer wise relevance propagation ( LRP ) ( Ding et al., 2017 ) for visualizing and anlayzing RNNsearch. The following table summarizes the features of two implementations: Implementation Model Criterion Optimizer LRP : : : : : : : : : : Theano RNNsearch MLE, MRT, SST SGD, AdaDelta, Adam RNNsearch TensorFlow Seq2Seq, RNNsearch, Transformer MLE Adam RNNsearch, Transformer We recommend using THUMT TensorFlow , which delivers better translation performance than THUMT Theano . We will keep adding new features to THUMT TensorFlow . It is also possible to exploit layer wise relevance propagation to visualize the relevance between source and target words with THUMT: ! Visualization with LRP License The source code is dual licensed. Open source licensing is under the BSD 3 Clause , which allows free use for research purposes. For commercial licensing, please email thumt17@gmail.com (mailto:thumt17@gmail.com). Citation Please cite the following paper: > Jiacheng Zhang, Yanzhuo Ding, Shiqi Shen, Yong Cheng, Maosong Sun, Huanbo Luan, Yang Liu. 2017. THUMT: An Open Source Toolkit for Neural Machine Translation . arXiv:1706.06415. Development Team Project leaders: Maosong Sun , Yang Liu , Huanbo Luan Project members: Jiacheng Zhang, Yanzhuo Ding, Shiqi Shen, Yong Cheng Contributors Zhixing Tan (mailto:playinf@stu.xmu.edu.cn) (Xiamen University) Contact If you have questions, suggestions and bug reports, please email thumt17@gmail.com (mailto:thumt17@gmail.com). Derivative Repositories Document Transformer (Improving the Transformer Translation Model with Document Level Context) PR4NMT (Prior Knowledge Integration for Neural Machine Translation using Posterior Regularization)",Machine Translation,Machine Translation 2579,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Tensorflow Seq2seq Text Summarization This branch uses new tf.contrib.seq2seq APIs in tensorflow r1.1. For r1.0 users, please check Branch tf1.0 This is an implementation of sequence to sequence model using a bidirectional GRU encoder and a GRU decoder. This project aims to help people start working on Abstractive Short Text Summarization immediately. And hopefully, it may also work on machine translation tasks. Dataset Please check harvardnlp/sent summary . Pre trained Models Download Usage Setup Environment With GPU If you want to train the model and have Nvidia GPUs (like GTX 1080, GTX Titan, etc), please setup CUDA environment and install tensorflow gpu. > pip3 install U tensorflow gpu 1.1 You can check whether the GPU works by > python3 >>> import tensorflow >>> and make sure there are no error outputs. Without GPU If you don't have a GPU, you can still use the pretrained models and generate summaries using your CPU. > pip3 install U tensorflow 1.1 Model and Data Files should be organized like this. ! (misc/files.png) Please find these files in the harvardnlp/sent summary and rename them as duc2003/input.txt > test.duc2003.txt duc2004/input.txt > test.duc2004.txt Giga/input.txt > test.giga.txt Train Model > python3 script/train.py can reproduce the experiments shown below. By doing so, it will train 200k batches first. Then do generation on giga, duc2003, duc2004 with beam_size in 1, 10 respectively every 20k batches. It will terminate at 300k batches. Also, the model will be saved every 20k batches. ! (misc/train.png) Test Model > python3 script/test.py will automatically use the most updated model to do generation. ! (misc/test.png) To do customized test, please put input data as data/test.your_test_name.txt Change script/test.py line 13 14 from datasets giga , duc2003 , duc2004 geneos True, False, False to datasets your_test_name geneos True For advanced users, python3 src/summarization.py h can print help. Please check the code for details. Implementation Details Bucketing In tensorflow r0.11 and earlier, using bucketing is recommended. r1.0 provides dynamic rnn seq2seq framework which is much easier to understand than the tricky bucketing mechanism. We use dynamic rnn to generate compute graph. There is only one computing graph in our implemention. However, we still split the dataset into several buckets and use data from the same bucket to create a batch. By doing so, we can add less padding, leading to a better efficiency. Attention Mechanism The attention mechanism follows Bahdanau et. al . We follow the implementation in tf.contrib.seq2seq. We refine the softmax function in attention so that paddings always get 0. Beam Search For simplicity and flexibility, we implement the beam search algorithm in python while leave the network part in tensorflow. In testing, we consider batch\_size as beam\_size. The tensorflow graph will generate only 1 word, then some python code will create a new batch according to the result. By iteratively doing so, beam search result is generated. Check step_beam(...) in bigru_model.py for details. Results We train the model for 300k batches with batch size 80. We clip all summaries to 75 bytes. For DUC datasets, we eliminate EOS and generate 12 words. For GIGA dataset, we let the model to generate EOS. Negative Log Likelihood of Sentence ! (misc/loss.png) Rouge Evaluation Dataset Beam Size R1 R R1 P R1 F R2 R R2 P R2 F RL R RL P RL F : : : : : : : : : : : : : : : : : : : : : : duc2003 1 0.25758 0.23003 0.24235 0.07511 0.06611 0.07009 0.22608 0.20174 0.21262 duc2003 10 0.27312 0.23864 0.25416 0.08977 0.07732 0.08286 0.24129 0.21074 0.22449 duc2004 1 0.27584 0.25971 0.26673 0.08328 0.07832 0.08046 0.24253 0.22853 0.23461 duc2004 10 0.28024 0.25987 0.26889 0.09377 0.08631 0.08959 0.24849 0.23048 0.23844 giga 1 0.3185 0.38779 0.3391 0.14542 0.17537 0.15393 0.29925 0.363 0.3181 giga 10 0.30179 0.41224 0.33635 0.14378 0.1951 0.15936 0.28447 0.38733 0.31664 Requirement Python3 Tensorflow r1.1 TODO Improve automatic scripts by parameterizing magic numbers. Some tricks caused by new tensorflow seq2seq framework.",Machine Translation,Machine Translation 2581,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Papers Representation Learning Paper 核心思想 备注 CoVe 基于2层双向 LSTM 预训练翻译模型作为 embedding encoder 2017 NIPS ELMo 基于2层双向 LSTM 预训练 Langeage Model 作为 embedding encoder 2018 NAACL Best Paper GPT 基于12层 Transformer Decoder 预训练 Langeage Model 作为 embedding encoder 2018 OpenAI BERT 基于双向 Transformer 预训练 Masked Language Model 作为 embedding encoder 2019 NAACL Best Paper MT DNN 基于 BERT 利用 multi task finetune 提升 embedding 的领域泛化性 2019 arXiv GPT 2 Todo 2019 OpenAI XLM Todo 2019 Facebook AI Research Deep Learning System Course 课程名 备注 cs294 Paper Paper 备注 TensorFlow: A System for Large Scale Machine Learning TensorFlow 白皮书 TensorFlow: Large Scale Machine Learning on Heterogeneous Distributed Systems TensorFlow 白皮书 MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems MXNet On the fly Operation Batching in Dynamic Computation Graphs TENSORFLOW EAGER: A MULTI STAGE, PYTHON EMBEDDED DSL FOR MACHINE LEARNING 2019 SysML AUTOGRAPH: IMPERATIVE STYLE CODING WITH GRAPH BASED PERFORMANCE 2019 SysML PYTORCH BIGGRAPH: A LARGE SCALE GRAPH EMBEDDING SYSTEM 2019 SysML",Machine Translation,Machine Translation 2593,Natural Language Processing,Natural Language Processing,Natural Language Processing,ChatBot 2018.07.17 2018.07.24 (1주차) + 조편성 + 주제 설명 2018.07.24 2018.07.31 (2주차) + 데이터 수집 + 데이터 전처리 계획 학습 데이터 전처리 과정 계획 + 텍스트의 토큰화 + 빈도수가 낮은 단어 제거 + 시작 토큰과 끝 토큰 붙이기 + 단어를 index로 mapping (벡터화) 해결해야할 문제 + 모델 구현 + 단어 벡터화 데이터 수집처 + (곰플레이어 한글자막) + 국립국어원 언어정보나눔터 (대화 데이터) 참고 논문/사이트 + (논문 스터디) + (Sequence to Sequence Learning with NN) + (Attention Is All You Need) + (A Hierarchical Recurrent Encoder Decoder for Generative Context Aware Query Suggestion) + (A Hierarchical Latent Variable Encoder Decoder Model for Generating Dialogues),Machine Translation,Machine Translation 2601,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Training an LSTM network on the Penn Tree Bank (PTB) dataset Introduction Long Short Term Memory (LSTM) networks were first proposed by Sepp Hochreiter and Jürgen Schmidhuber 1 in 1997 for modeling sequence data. Christopher Olah 2 has nicely illustrated how they work. The fifth course in the deep learning specialization 3 on Coursera teaches recurrent neural networks (RNN), of which the LSTM is a variant, in detail, and explains many interesting applications. For a succinct summary of the mathematics of these models, see, for example, Stanford cs231n lecture 10 4 or Greff, et al. (2016) 5 . This is a series of illustrative examples of training an LSTM network. In these examples, an LSTM network is trained on the Penn Tree Bank (PTB) dataset to replicate some previously published work. The PTB dataset is an English corpus available from Tomáš Mikolov's web page 6 , and used by many researchers in language modeling experiments. It contains 929K training words, 73K validation words, and 82K test words. It has 10K words in its vocabulary. Wojciech Zaremba, Ilya Sutskever, and Oriol Vinyals used this dataset in their ICLR 2015 paper 7 where they showed that the correct place to implement dropout regularization in an RNN is in the connections between layers and not between time steps. To demonstrate the effectiveness of their regularization strategy, they reported word level perplexities on the PTB dataset with three different networks: a small non regularized LSTM, a medium regularized LSTM, and a large regularized LSTM. It is their small non regularized LSTM model that is replicated in these examples. Part I (lstm_np.ipynb) of this series presents an object oriented design of the non regularized LSTM network implemented in pure Python 8 / NumPy 9 . Equations are coded up from scratch to carry out the computations without dependencies on extraneous frameworks or libraries. This is a minimalist implementation, partly inspired by Andrej Karpathy's minimalist character level language model 14 . The program executes on a CPU. Part II (lstm_tfe.ipynb) shows how the same model can be easily implemented using TensorFlow 10 , the open source framework originally developed by researchers and engineers from the Google Brain team within Google’s AI organization. The model is programmed in TensorFlow's eager execution 11 imperative programming environment that evaluates operations immediately without building dataflow graphs. This is akin to regular Python programming following Python control flow. The program is executed in Colaboratory 12 with GPU acceleration. Part III (lstm_tf.ipynb) demonstrates how the model can be implemented using TensorFlow's low level programming model in which you first define the dataflow graph 13 and then create a TensorFlow session 13 to run parts of the graph. In a dataflow graph, the nodes (ops) represent units of computation, and the edges (tensors) represent the data consumed or produced by a computation. Calling most functions in the TensorFlow low level API merely adds operations and tensors to the default graph, but does not perform the actual computation. Instead, you compose these functions until you have a tensor or operation that represents the overall computation, such as performing one step of gradient descent, and then pass that object to a TensorFlow session to run the computation. This model is different from the familiar imperative model, but is a common model for parallel computing. The program is executed in Colaboratory 12 with GPU acceleration. It is shown that all these implementations yield results which agree with each other and with those in Zaremba et al. (2015) 7 . References 1. S. Hochreiter, and J. Schmidhuber 1 . Long Short Term Memory. Neural Computation, 9(8):1735 1780, 1997 2. Christopher Olah, Understanding LSTM networks 2 , colah's blog, 27 August 2015 3. Deep learning specialization 3 , Taught by Andrew Ng, Kian Katanforoosh, and Younes Bensouda Mourri, Coursera 4. Fei Fei Li, Justin Johnson, and Serena Yeung, Stanford cs231n lecture 10 4 , 4 May 2017 5. Klaus Greff, Rupesh Kumar Srivastava, Jan Koutník, Bas R. Steunebrink, and Jürgen Schmidhuber, LSTM: A Search Space Odyssey 5 , Transactions on Neural Networks and Learning Systems, 2016 (Errata: In version 2 of the paper on arXiv, on page 2, the first equation under B. Backpropagation Through Time gives the derivative of the loss with respect to yt. In that equation, there should be an over bar over z, i, f and o, denoting gradients inside the non linear activation functions.) 6. Andrej Karpathy, Minimal character level language model with a Vanilla Recurrent Neural Network, in Python/numpy 14 7. Tomáš Mikolov's web page, Penn Tree Bank (PTB) dataset 6 8. Wojciech Zaremba, IlyaSutskever, and Oriol Vinyals, Recurrent Neural Network Regularization 7 , ICLR 2015 9. TensorFlow tutorial example 16 with eager execution 10. TensorFlow tutorial example 15 with graph execution 11. Sashank J. Reddi, Satyen Kale, and Sanjiv Kumar, On the convergence of Adam and beyond 17 , ICLR 2018 (Errata: On 'slide 3 Algorithms', 'slide 6 Primary cause for non convergence', and 'slide 10 AMSGrad' of Sashank's presentation at ICLR 2018 18 , in three places the exponent of beta inside the square root should be t j instead of t i. In one place on slide 10 in the AMSGrad update equation, the exponent of beta inside the square root should be k j instead of k i. Also, note that 1< k< t is implied.) 1 : 2 : 3 : 4 : 5 : 6 : 7 : 8 : 9 : 10 : 11 : 12 : 13 : 14 : 15 : 16 : 17 : 18 : https://www.facebook.com/iclr.cc/videos/2123421684353553/",Machine Translation,Machine Translation 2609,Natural Language Processing,Natural Language Processing,Natural Language Processing,"rnn_zoo This repository tests various recurrent neural network architectures on baseline datasets SeqMNIST and pMNIST. The network architectures chosen were those deemed to be the most effective currently available. Architectures tested include: RNN LSTM GRU IRNN Peephole LSTM UGRNN Intersection RNN Results The following results are were generated using the architectures listed above. \ Hyperparameters used: layers 3, num neurons 50, optimizer Adam, learning rate .0001 and batch 64. Running the code python train.py model type irnn task seqmnist layers 2 batch size 64 epochs 10 Installing Update BASE_DIR in config.ini with the absolute path to the current directory. Packages needed to run the code include: numpy python PyToch argparse configparser",Machine Translation,Machine Translation 2620,Natural Language Processing,Natural Language Processing,Natural Language Processing,"BERT \ \ \ \ \ New November 23rd, 2018: Un normalized multilingual model + Thai + Mongolian \ \ \ \ \ We uploaded a new multilingual model which does not perform any normalization on the input (no lower casing, accent stripping, or Unicode normalization), and additionally includes Thai and Mongolian. It is recommended to use this version for developing multilingual models, especially on languages with non Latin alphabets. This does not require any code changes, and can be downloaded here: BERT Base, Multilingual Cased : 104 languages, 12 layer, 768 hidden, 12 heads, 110M parameters \ \ \ \ \ New November 15th, 2018: SOTA SQuAD 2.0 System \ \ \ \ \ We released code changes to reproduce our 83% F1 SQuAD 2.0 system, which is currently 1st place on the leaderboard by 3%. See the SQuAD 2.0 section of the README for details. \ \ \ \ \ New November 5th, 2018: Third party PyTorch and Chainer versions of BERT available \ \ \ \ \ NLP researchers from HuggingFace made a PyTorch version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. Sosuke Kobayashi also made a Chainer version of BERT available (Thanks!) We were not involved in the creation or maintenance of the PyTorch implementation so please direct any questions towards the authors of that repository. \ \ \ \ \ New November 3rd, 2018: Multilingual and Chinese models available \ \ \ \ \ We have made two new BERT models available: BERT Base, Multilingual : 102 languages, 12 layer, 768 hidden, 12 heads, 110M parameters BERT Base, Chinese : Chinese Simplified and Traditional, 12 layer, 768 hidden, 12 heads, 110M parameters We use character based tokenization for Chinese, and WordPiece tokenization for all other languages. Both models should work out of the box without any code changes. We did update the implementation of BasicTokenizer in tokenization.py to support Chinese character tokenization, so please update if you forked it. However, we did not change the tokenization API. For more, see the Multilingual README . \ \ \ \ \ End new information \ \ \ \ \ Introduction BERT , or B idirectional E ncoder R epresentations from T ransformers, is a new method of pre training language representations which obtains state of the art results on a wide array of Natural Language Processing (NLP) tasks. Our academic paper which describes BERT in detail and provides full results on a number of tasks can be found here: To give a few numbers, here are the results on the SQuAD v1.1 question answering task: SQuAD v1.1 Leaderboard (Oct 8th 2018) Test EM Test F1 : : : : 1st Place Ensemble BERT 87.4 93.2 2nd Place Ensemble nlnet 86.0 91.7 1st Place Single Model BERT 85.1 91.8 2nd Place Single Model nlnet 83.5 90.1 And several natural language inference tasks: System MultiNLI Question NLI SWAG : : : : : : BERT 86.7 91.1 86.3 OpenAI GPT (Prev. SOTA) 82.2 88.1 75.0 Plus many other tasks. Moreover, these results were all obtained with almost no task specific neural network architecture design. If you already know what BERT is and you just want to get started, you can download the pre trained models ( pre trained models) and run a state of the art fine tuning ( fine tuning with bert) in only a few minutes. What is BERT? BERT is a method of pre training language representations, meaning that we train a general purpose language understanding model on a large text corpus (like Wikipedia), and then use that model for downstream NLP tasks that we care about (like question answering). BERT outperforms previous methods because it is the first unsupervised , deeply bidirectional system for pre training NLP. Unsupervised means that BERT was trained using only a plain text corpus, which is important because an enormous amount of plain text data is publicly available on the web in many languages. Pre trained representations can also either be context free or contextual , and contextual representations can further be unidirectional or bidirectional . Context free models such as word2vec or GloVe generate a single word embedding representation for each word in the vocabulary, so bank would have the same representation in bank deposit and river bank . Contextual models instead generate a representation of each word that is based on the other words in the sentence. BERT was built upon recent work in pre training contextual representations — including Semi supervised Sequence Learning , Generative Pre Training , ELMo , and ULMFit — but crucially these models are all unidirectional or shallowly bidirectional . This means that each word is only contextualized using the words to its left (or right). For example, in the sentence I made a bank deposit the unidirectional representation of bank is only based on I made a but not deposit . Some previous work does combine the representations from separate left context and right context models, but only in a shallow manner. BERT represents bank using both its left and right context — I made a ... deposit — starting from the very bottom of a deep neural network, so it is deeply bidirectional . BERT uses a simple approach for this: We mask out 15% of the words in the input, run the entire sequence through a deep bidirectional Transformer encoder, and then predict only the masked words. For example: Input: the man went to the MASK1 . he bought a MASK2 of milk. Labels: MASK1 store; MASK2 gallon In order to learn relationships between sentences, we also train on a simple task which can be generated from any monolingual corpus: Given two sentences A and B , is B the actual next sentence that comes after A , or just a random sentence from the corpus? Sentence A: the man went to the store . Sentence B: he bought a gallon of milk . Label: IsNextSentence Sentence A: the man went to the store . Sentence B: penguins are flightless . Label: NotNextSentence We then train a large model (12 layer to 24 layer Transformer) on a large corpus (Wikipedia + BookCorpus ) for a long time (1M update steps), and that's BERT. Using BERT has two stages: Pre training and fine tuning . Pre training is fairly expensive (four days on 4 to 16 Cloud TPUs), but is a one time procedure for each language (current models are English only, but multilingual models will be released in the near future). We are releasing a number of pre trained models from the paper which were pre trained at Google. Most NLP researchers will never need to pre train their own model from scratch. Fine tuning is inexpensive. All of the results in the paper can be replicated in at most 1 hour on a single Cloud TPU, or a few hours on a GPU, starting from the exact same pre trained model. SQuAD, for example, can be trained in around 30 minutes on a single Cloud TPU to achieve a Dev F1 score of 91.0%, which is the single system state of the art. The other important aspect of BERT is that it can be adapted to many types of NLP tasks very easily. In the paper, we demonstrate state of the art results on sentence level (e.g., SST 2), sentence pair level (e.g., MultiNLI), word level (e.g., NER), and span level (e.g., SQuAD) tasks with almost no task specific modifications. What has been released in this repository? We are releasing the following: TensorFlow code for the BERT model architecture (which is mostly a standard Transformer architecture). Pre trained checkpoints for both the lowercase and cased version of BERT Base and BERT Large from the paper. TensorFlow code for push button replication of the most important fine tuning experiments from the paper, including SQuAD, MultiNLI, and MRPC. All of the code in this repository works out of the box with CPU, GPU, and Cloud TPU. Pre trained models We are releasing the BERT Base and BERT Large models from the paper. Uncased means that the text has been lowercased before WordPiece tokenization, e.g., John Smith becomes john smith . The Uncased model also strips out any accent markers. Cased means that the true case and accent markers are preserved. Typically, the Uncased model is better unless you know that case information is important for your task (e.g., Named Entity Recognition or Part of Speech tagging). These models are all released under the same license as the source code (Apache 2.0). For information about the Multilingual and Chinese model, see the Multilingual README . When using a cased model, make sure to pass do_lower False to the training scripts. (Or pass do_lower_case False directly to FullTokenizer if you're using your own script.) The links to the models are here (right click, 'Save link as...' on the name): BERT Base, Uncased : 12 layer, 768 hidden, 12 heads, 110M parameters BERT Large, Uncased : 24 layer, 1024 hidden, 16 heads, 340M parameters BERT Base, Cased : 12 layer, 768 hidden, 12 heads , 110M parameters BERT Large, Cased : 24 layer, 1024 hidden, 16 heads, 340M parameters BERT Base, Multilingual Cased (New, recommended) : 104 languages, 12 layer, 768 hidden, 12 heads, 110M parameters BERT Base, Multilingual Uncased (Orig, not recommended) : 102 languages, 12 layer, 768 hidden, 12 heads, 110M parameters BERT Base, Chinese : Chinese Simplified and Traditional, 12 layer, 768 hidden, 12 heads, 110M parameters Each .zip file contains three items: A TensorFlow checkpoint ( bert_model.ckpt ) containing the pre trained weights (which is actually 3 files). A vocab file ( vocab.txt ) to map WordPiece to word id. A config file ( bert_config.json ) which specifies the hyperparameters of the model. Fine tuning with BERT Important : All results on the paper were fine tuned on a single Cloud TPU, which has 64GB of RAM. It is currently not possible to re produce most of the BERT Large results on the paper using a GPU with 12GB 16GB of RAM, because the maximum batch size that can fit in memory is too small. We are working on adding code to this repository which allows for much larger effective batch size on the GPU. See the section on out of memory issues ( out of memory issues) for more details. This code was tested with TensorFlow 1.11.0. It was tested with Python2 and Python3 (but more thoroughly with Python2, since this is what's used internally in Google). The fine tuning examples which use BERT Base should be able to run on a GPU that has at least 12GB of RAM using the hyperparameters given. Fine tuning with Cloud TPUs Most of the examples below assumes that you will be running training/evaluation on your local machine, using a GPU like a Titan X or GTX 1080. However, if you have access to a Cloud TPU that you want to train on, just add the following flags to run_classifier.py or run_squad.py : use_tpu True \ tpu_name $TPU_NAME Please see the Google Cloud TPU tutorial for how to use Cloud TPUs. Alternatively, you can use the Google Colab notebook BERT FineTuning with Cloud TPUs . On Cloud TPUs, the pretrained model and the output directory will need to be on Google Cloud Storage. For example, if you have a bucket named some_bucket , you might use the following flags instead: output_dir gs://some_bucket/my_output_dir/ The unzipped pre trained model files can also be found in the Google Cloud Storage folder gs://bert_models/2018_10_18 . For example: export BERT_BASE_DIR gs://bert_models/2018_10_18/uncased_L 12_H 768_A 12 Sentence (and sentence pair) classification tasks Before running this example you must download the GLUE data by running this script and unpack it to some directory $GLUE_DIR . Next, download the BERT Base checkpoint and unzip it to some directory $BERT_BASE_DIR . This example code fine tunes BERT Base on the Microsoft Research Paraphrase Corpus (MRPC) corpus, which only contains 3,600 examples and can fine tune in a few minutes on most GPUs. shell export BERT_BASE_DIR /path/to/bert/uncased_L 12_H 768_A 12 export GLUE_DIR /path/to/glue python run_classifier.py \ task_name MRPC \ do_train true \ do_eval true \ data_dir $GLUE_DIR/MRPC \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ max_seq_length 128 \ train_batch_size 32 \ learning_rate 2e 5 \ num_train_epochs 3.0 \ output_dir /tmp/mrpc_output/ You should see output like this: Eval results eval_accuracy 0.845588 eval_loss 0.505248 global_step 343 loss 0.505248 This means that the Dev set accuracy was 84.55%. Small sets like MRPC have a high variance in the Dev set accuracy, even when starting from the same pre training checkpoint. If you re run multiple times (making sure to point to different output_dir ), you should see results between 84% and 88%. A few other pre trained models are implemented off the shelf in run_classifier.py , so it should be straightforward to follow those examples to use BERT for any single sentence or sentence pair classification task. Note: You might see a message Running train on CPU . This really just means that it's running on something other than a Cloud TPU, which includes a GPU. Prediction from classifier Once you have trained your classifier you can use it in inference mode by using the do_predict true command. You need to have a file named test.tsv in the input folder. Output will be created in file called test_results.tsv in the output folder. Each line will contain output for each sample, columns are the class probabilities. shell export BERT_BASE_DIR /path/to/bert/uncased_L 12_H 768_A 12 export GLUE_DIR /path/to/glue export TRAINED_CLASSIFIER /path/to/fine/tuned/classifier python run_classifier.py \ task_name MRPC \ do_predict true \ data_dir $GLUE_DIR/MRPC \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $TRAINED_CLASSIFIER \ max_seq_length 128 \ output_dir /tmp/mrpc_output/ SQuAD 1.1 The Stanford Question Answering Dataset (SQuAD) is a popular question answering benchmark dataset. BERT (at the time of the release) obtains state of the art results on SQuAD with almost no task specific network architecture modifications or data augmentation. However, it does require semi complex data pre processing and post processing to deal with (a) the variable length nature of SQuAD context paragraphs, and (b) the character level answer annotations which are used for SQuAD training. This processing is implemented and documented in run_squad.py . To run on SQuAD, you will first need to download the dataset. The SQuAD website does not seem to link to the v1.1 datasets any longer, but the necessary files can be found here: train v1.1.json dev v1.1.json evaluate v1.1.py Download these to some directory $SQUAD_DIR . The state of the art SQuAD results from the paper currently cannot be reproduced on a 12GB 16GB GPU due to memory constraints (in fact, even batch size 1 does not seem to fit on a 12GB GPU using BERT Large ). However, a reasonably strong BERT Base model can be trained on the GPU with these hyperparameters: shell python run_squad.py \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ do_train True \ train_file $SQUAD_DIR/train v1.1.json \ do_predict True \ predict_file $SQUAD_DIR/dev v1.1.json \ train_batch_size 12 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir /tmp/squad_base/ The dev set predictions will be saved into a file called predictions.json in the output_dir : shell python $SQUAD_DIR/evaluate v1.1.py $SQUAD_DIR/dev v1.1.json ./squad/predictions.json Which should produce an output like this: shell { f1 : 88.41249612335034, exact_match : 81.2488174077578} You should see a result similar to the 88.5% reported in the paper for BERT Base . If you have access to a Cloud TPU, you can train with BERT Large . Here is a set of hyperparameters (slightly different than the paper) which consistently obtain around 90.5% 91.0% F1 single system trained only on SQuAD: shell python run_squad.py \ vocab_file $BERT_LARGE_DIR/vocab.txt \ bert_config_file $BERT_LARGE_DIR/bert_config.json \ init_checkpoint $BERT_LARGE_DIR/bert_model.ckpt \ do_train True \ train_file $SQUAD_DIR/train v1.1.json \ do_predict True \ predict_file $SQUAD_DIR/dev v1.1.json \ train_batch_size 24 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir gs://some_bucket/squad_large/ \ use_tpu True \ tpu_name $TPU_NAME For example, one random run with these parameters produces the following Dev scores: shell { f1 : 90.87081895814865, exact_match : 84.38978240302744} If you fine tune for one epoch on TriviaQA before this the results will be even better, but you will need to convert TriviaQA into the SQuAD json format. SQuAD 2.0 This model is also implemented and documented in run_squad.py . To run on SQuAD 2.0, you will first need to download the dataset. The necessary files can be found here: train v2.0.json dev v2.0.json evaluate v2.0.py Download these to some directory $SQUAD_DIR . On Cloud TPU you can run with BERT Large as follows: shell python run_squad.py \ vocab_file $BERT_LARGE_DIR/vocab.txt \ bert_config_file $BERT_LARGE_DIR/bert_config.json \ init_checkpoint $BERT_LARGE_DIR/bert_model.ckpt \ do_train True \ train_file $SQUAD_DIR/train v2.0.json \ do_predict True \ predict_file $SQUAD_DIR/dev v2.0.json \ train_batch_size 24 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir gs://some_bucket/squad_large/ \ use_tpu True \ tpu_name $TPU_NAME \ version_2_with_negative True We assume you have copied everything from the output directory to a local directory called ./squad/. The initial dev set predictions will be at ./squad/predictions.json and the differences between the score of no answer ( ) and the best non null answer for each question will be in the file ./squad/null_odds.json Run this script to tune a threshold for predicting null versus non null answers: python $SQUAD_DIR/evaluate v2.0.py $SQUAD_DIR/dev v2.0.json ./squad/predictions.json na prob file ./squad/null_odds.json Assume the script outputs best_f1_thresh THRESH. (Typical values are between 1.0 and 5.0). You can now re run the model to generate predictions with the derived threshold or alternatively you can extract the appropriate answers from ./squad/nbest_predictions.json. shell python run_squad.py \ vocab_file $BERT_LARGE_DIR/vocab.txt \ bert_config_file $BERT_LARGE_DIR/bert_config.json \ init_checkpoint $BERT_LARGE_DIR/bert_model.ckpt \ do_train False \ train_file $SQUAD_DIR/train v2.0.json \ do_predict True \ predict_file $SQUAD_DIR/dev v2.0.json \ train_batch_size 24 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir gs://some_bucket/squad_large/ \ use_tpu True \ tpu_name $TPU_NAME \ version_2_with_negative True \ null_score_diff_threshold $THRESH Out of memory issues All experiments in the paper were fine tuned on a Cloud TPU, which has 64GB of device RAM. Therefore, when using a GPU with 12GB 16GB of RAM, you are likely to encounter out of memory issues if you use the same hyperparameters described in the paper. The factors that affect memory usage are: max_seq_length : The released models were trained with sequence lengths up to 512, but you can fine tune with a shorter max sequence length to save substantial memory. This is controlled by the max_seq_length flag in our example code. train_batch_size : The memory usage is also directly proportional to the batch size. Model type, BERT Base vs. BERT Large : The BERT Large model requires significantly more memory than BERT Base . Optimizer : The default optimizer for BERT is Adam, which requires a lot of extra memory to store the m and v vectors. Switching to a more memory efficient optimizer can reduce memory usage, but can also affect the results. We have not experimented with other optimizers for fine tuning. Using the default training scripts ( run_classifier.py and run_squad.py ), we benchmarked the maximum batch size on single Titan X GPU (12GB RAM) with TensorFlow 1.11.0: System Seq Length Max Batch Size BERT Base 64 64 ... 128 32 ... 256 16 ... 320 14 ... 384 12 ... 512 6 BERT Large 64 12 ... 128 6 ... 256 2 ... 320 1 ... 384 0 ... 512 0 Unfortunately, these max batch sizes for BERT Large are so small that they will actually harm the model accuracy, regardless of the learning rate used. We are working on adding code to this repository which will allow much larger effective batch sizes to be used on the GPU. The code will be based on one (or both) of the following techniques: Gradient accumulation : The samples in a minibatch are typically independent with respect to gradient computation (excluding batch normalization, which is not used here). This means that the gradients of multiple smaller minibatches can be accumulated before performing the weight update, and this will be exactly equivalent to a single larger update. Gradient checkpointing : The major use of GPU/TPU memory during DNN training is caching the intermediate activations in the forward pass that are necessary for efficient computation in the backward pass. Gradient checkpointing trades memory for compute time by re computing the activations in an intelligent way. However, this is not implemented in the current release. Using BERT to extract fixed feature vectors (like ELMo) In certain cases, rather than fine tuning the entire pre trained model end to end, it can be beneficial to obtained pre trained contextual embeddings , which are fixed contextual representations of each input token generated from the hidden layers of the pre trained model. This should also mitigate most of the out of memory issues. As an example, we include the script extract_features.py which can be used like this: shell Sentence A and Sentence B are separated by the delimiter for sentence pair tasks like question answering and entailment. For single sentence inputs, put one sentence per line and DON'T use the delimiter. echo 'Who was Jim Henson ? Jim Henson was a puppeteer' > /tmp/input.txt python extract_features.py \ input_file /tmp/input.txt \ output_file /tmp/output.jsonl \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ layers 1, 2, 3, 4 \ max_seq_length 128 \ batch_size 8 This will create a JSON file (one line per line of input) containing the BERT activations from each Transformer layer specified by layers ( 1 is the final hidden layer of the Transformer, etc.) Note that this script will produce very large output files (by default, around 15kb for every input token). If you need to maintain alignment between the original and tokenized words (for projecting training labels), see the Tokenization ( tokenization) section below. Note: You may see a message like Could not find trained model in model_dir: /tmp/tmpuB5g5c, running initialization to predict. This message is expected, it just means that we are using the init_from_checkpoint() API rather than the saved model API. If you don't specify a checkpoint or specify an invalid checkpoint, this script will complain. Tokenization For sentence level tasks (or sentence pair) tasks, tokenization is very simple. Just follow the example code in run_classifier.py and extract_features.py . The basic procedure for sentence level tasks is: 1. Instantiate an instance of tokenizer tokenization.FullTokenizer 2. Tokenize the raw text with tokens tokenizer.tokenize(raw_text) . 3. Truncate to the maximum sequence length. (You can use up to 512, but you probably want to use shorter if possible for memory and speed reasons.) 4. Add the CLS and SEP tokens in the right place. Word level and span level tasks (e.g., SQuAD and NER) are more complex, since you need to maintain alignment between your input text and output text so that you can project your training labels. SQuAD is a particularly complex example because the input labels are character based, and SQuAD paragraphs are often longer than our maximum sequence length. See the code in run_squad.py to show how we handle this. Before we describe the general recipe for handling word level tasks, it's important to understand what exactly our tokenizer is doing. It has three main steps: 1. Text normalization : Convert all whitespace characters to spaces, and (for the Uncased model) lowercase the input and strip out accent markers. E.g., John Johanson's, → john johanson's, . 2. Punctuation splitting : Split all punctuation characters on both sides (i.e., add whitespace around all punctuation characters). Punctuation characters are defined as (a) Anything with a P Unicode class, (b) any non letter/number/space ASCII character (e.g., characters like $ which are technically not punctuation). E.g., john johanson's, → john johanson ' s , 3. WordPiece tokenization : Apply whitespace tokenization to the output of the above procedure, and apply WordPiece tokenization to each token separately. (Our implementation is directly based on the one from tensor2tensor , which is linked). E.g., john johanson ' s , → john johan son ' s , The advantage of this scheme is that it is compatible with most existing English tokenizers. For example, imagine that you have a part of speech tagging task which looks like this: Input: John Johanson 's house Labels: NNP NNP POS NN The tokenized output will look like this: Tokens: john johan son ' s house Crucially, this would be the same output as if the raw text were John Johanson's house (with no space before the 's ). If you have a pre tokenized representation with word level annotations, you can simply tokenize each input word independently, and deterministically maintain an original to tokenized alignment: python Input orig_tokens John , Johanson , 's , house labels NNP , NNP , POS , NN Output bert_tokens Token map will be an int > int mapping between the orig_tokens index and the bert_tokens index. orig_to_tok_map tokenizer tokenization.FullTokenizer( vocab_file vocab_file, do_lower_case True) bert_tokens.append( CLS ) for orig_token in orig_tokens: orig_to_tok_map.append(len(bert_tokens)) bert_tokens.extend(tokenizer.tokenize(orig_token)) bert_tokens.append( SEP ) bert_tokens CLS , john , johan , son , ' , s , house , SEP orig_to_tok_map 1, 2, 4, 6 Now orig_to_tok_map can be used to project labels to the tokenized representation. There are common English tokenization schemes which will cause a slight mismatch between how BERT was pre trained. For example, if your input tokenization splits off contractions like do n't , this will cause a mismatch. If it is possible to do so, you should pre process your data to convert these back to raw looking text, but if it's not possible, this mismatch is likely not a big deal. Pre training with BERT We are releasing code to do masked LM and next sentence prediction on an arbitrary text corpus. Note that this is not the exact code that was used for the paper (the original code was written in C++, and had some additional complexity), but this code does generate pre training data as described in the paper. Here's how to run the data generation. The input is a plain text file, with one sentence per line. (It is important that these be actual sentences for the next sentence prediction task). Documents are delimited by empty lines. The output is a set of tf.train.Example s serialized into TFRecord file format. You can perform sentence segmentation with an off the shelf NLP toolkit such as spaCy . The create_pretraining_data.py script will concatenate segments until they reach the maximum sequence length to minimize computational waste from padding (see the script for more details). However, you may want to intentionally add a slight amount of noise to your input data (e.g., randomly truncate 2% of input segments) to make it more robust to non sentential input during fine tuning. This script stores all of the examples for the entire input file in memory, so for large data files you should shard the input file and call the script multiple times. (You can pass in a file glob to run_pretraining.py , e.g., tf_examples.tf_record .) The max_predictions_per_seq is the maximum number of masked LM predictions per sequence. You should set this to around max_seq_length masked_lm_prob (the script doesn't do that automatically because the exact value needs to be passed to both scripts). shell python create_pretraining_data.py \ input_file ./sample_text.txt \ output_file /tmp/tf_examples.tfrecord \ vocab_file $BERT_BASE_DIR/vocab.txt \ do_lower_case True \ max_seq_length 128 \ max_predictions_per_seq 20 \ masked_lm_prob 0.15 \ random_seed 12345 \ dupe_factor 5 Here's how to run the pre training. Do not include init_checkpoint if you are pre training from scratch. The model configuration (including vocab size) is specified in bert_config_file . This demo code only pre trains for a small number of steps (20), but in practice you will probably want to set num_train_steps to 10000 steps or more. The max_seq_length and max_predictions_per_seq parameters passed to run_pretraining.py must be the same as create_pretraining_data.py . shell python run_pretraining.py \ input_file /tmp/tf_examples.tfrecord \ output_dir /tmp/pretraining_output \ do_train True \ do_eval True \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ train_batch_size 32 \ max_seq_length 128 \ max_predictions_per_seq 20 \ num_train_steps 20 \ num_warmup_steps 10 \ learning_rate 2e 5 This will produce an output like this: Eval results global_step 20 loss 0.0979674 masked_lm_accuracy 0.985479 masked_lm_loss 0.0979328 next_sentence_accuracy 1.0 next_sentence_loss 3.45724e 05 Note that since our sample_text.txt file is very small, this example training will overfit that data in only a few steps and produce unrealistically high accuracy numbers. Pre training tips and caveats If using your own vocabulary, make sure to change vocab_size in bert_config.json . If you use a larger vocabulary without changing this, you will likely get NaNs when training on GPU or TPU due to unchecked out of bounds access. If your task has a large domain specific corpus available (e.g., movie reviews or scientific papers ), it will likely be beneficial to run additional steps of pre training on your corpus, starting from the BERT checkpoint. The learning rate we used in the paper was 1e 4. However, if you are doing additional steps of pre training starting from an existing BERT checkpoint, you should use a smaller learning rate (e.g., 2e 5). Current BERT models are English only, but we do plan to release a multilingual model which has been pre trained on a lot of languages in the near future (hopefully by the end of November 2018). Longer sequences are disproportionately expensive because attention is quadratic to the sequence length. In other words, a batch of 64 sequences of length 512 is much more expensive than a batch of 256 sequences of length 128. The fully connected/convolutional cost is the same, but the attention cost is far greater for the 512 length sequences. Therefore, one good recipe is to pre train for, say, 90,000 steps with a sequence length of 128 and then for 10,000 additional steps with a sequence length of 512. The very long sequences are mostly needed to learn positional embeddings, which can be learned fairly quickly. Note that this does require generating the data twice with different values of max_seq_length . If you are pre training from scratch, be prepared that pre training is computationally expensive, especially on GPUs. If you are pre training from scratch, our recommended recipe is to pre train a BERT Base on a single preemptible Cloud TPU v2 , which takes about 2 weeks at a cost of about $500 USD (based on the pricing in October 2018). You will have to scale down the batch size when only training on a single Cloud TPU, compared to what was used in the paper. It is recommended to use the largest batch size that fits into TPU memory. Pre training data We will not be able to release the pre processed datasets used in the paper. For Wikipedia, the recommended pre processing is to download the latest dump , extract the text with WikiExtractor.py , and then apply any necessary cleanup to convert it into plain text. Unfortunately the researchers who collected the BookCorpus no longer have it available for public download. The Project Guttenberg Dataset is a somewhat smaller (200M word) collection of older books that are public domain. Common Crawl is another very large collection of text, but you will likely have to do substantial pre processing and cleanup to extract a usable corpus for pre training BERT. Learning a new WordPiece vocabulary This repository does not include code for learning a new WordPiece vocabulary. The reason is that the code used in the paper was implemented in C++ with dependencies on Google's internal libraries. For English, it is almost always better to just start with our vocabulary and pre trained models. For learning vocabularies of other languages, there are a number of open source options available. However, keep in mind that these are not compatible with our tokenization.py library: Google's SentencePiece library tensor2tensor's WordPiece generation script Rico Sennrich's Byte Pair Encoding library Using BERT in Colab If you want to use BERT with Colab , you can get started with the notebook BERT FineTuning with Cloud TPUs . At the time of this writing (October 31st, 2018), Colab users can access a Cloud TPU completely for free. Note: One per user, availability limited, requires a Google Cloud Platform account with storage (although storage may be purchased with free credit for signing up with GCP), and this capability may not longer be available in the future. Click on the BERT Colab that was just linked for more information. FAQ Is this code compatible with Cloud TPUs? What about GPUs? Yes, all of the code in this repository works out of the box with CPU, GPU, and Cloud TPU. However, GPU training is single GPU only. I am getting out of memory errors, what is wrong? See the section on out of memory issues ( out of memory issues) for more information. Is there a PyTorch version available? There is no official PyTorch implementation. However, NLP researchers from HuggingFace made a PyTorch version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. We were not involved in the creation or maintenance of the PyTorch implementation so please direct any questions towards the authors of that repository. Is there a Chainer version available? There is no official Chainer implementation. However, Sosuke Kobayashi made a Chainer version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. We were not involved in the creation or maintenance of the Chainer implementation so please direct any questions towards the authors of that repository. Will models in other languages be released? Yes, we plan to release a multi lingual BERT model in the near future. We cannot make promises about exactly which languages will be included, but it will likely be a single model which includes most of the languages which have a significantly sized Wikipedia. Will models larger than BERT Large be released? So far we have not attempted to train anything larger than BERT Large . It is possible that we will release larger models if we are able to obtain significant improvements. What license is this library released under? All code and models are released under the Apache 2.0 license. See the LICENSE file for more information. How do I cite BERT? For now, cite the Arxiv paper : @article{devlin2018bert, title {BERT: Pre training of Deep Bidirectional Transformers for Language Understanding}, author {Devlin, Jacob and Chang, Ming Wei and Lee, Kenton and Toutanova, Kristina}, journal {arXiv preprint arXiv:1810.04805}, year {2018} } If we submit the paper to a conference or journal, we will update the BibTeX. Disclaimer This is not an official Google product. Contact information For help or issues using BERT, please submit a GitHub issue. For personal communication related to BERT, please contact Jacob Devlin ( jacobdevlin@google.com ), Ming Wei Chang ( mingweichang@google.com ), or Kenton Lee ( kentonl@google.com ).",Machine Translation,Machine Translation 2635,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Domain Spesific Question Answering Assistant Abstract The goal of this project is to implement a text based question answering system which can be used for different sectors. The system was designed that finds the closest questions among the previously asked questions by user and also the system returns correct answer to the user. In case of the answer could not be found, this system informs the user and finds soon a suitable answer. If the chosen answer is satisfactory for the user, the new question answer binary is added to the database. After a certain period of time, the data is retrained to increase the success of system. To find correct answer the closest question among the previously asked questions need to be found. This problem was solved with two different approaches. First approach is using neural network and second approach is using string similarity methods( Cosine similarity, Levenhstein distance, Qgram similarity) as a solution of the problem. Neural networks models depend on size of data significantly, therefore string similarity methods can be used in some circumstances which insufficient sample exist in dataset. On the other hand, in case of the size of data increases, the accuracy of the neural network model will also increase. Report of Project: Apk file of mobile application: Jar file of desktop application: Literature Review On this project, it is aimed to response with appropriate answers to the questions asked by the user. It is expected that the system will only answer the questions in specific sector because this question answering system will be prepared for a specific sector purpose. Objective of the Thesis The project is a question answering system that improves learning with the logic that is set up at the back. It is aimed to create more accurate and meaningful answers over time. Methods to be used; 1. Determining the needs and deficiencies in the first stage and eliminating the data and other needs for the project, 2. Collection of question and answer text data from specific sectors, 3. Cleaning of collected data, 4. Training of different approaches of gathered data deep learning framework, application of optimization, regulation methods, 5. Testing various methods to find similar questions to the question asked, 6. If there are no similar questions, asking the user for new questions and getting the correct answer, 7. If the system fails, re arrange, train and test the new model. Hypothesis On this project my purpose is to give correct answers to the questions asked by users, although the level of achievement can not be reached in the first stage, it is aimed to reaching the acceptable levels of this level step by step. System Requirements The web interface requires the following dependencies: python 3.5 tensorflow (tested with v1.0) numpy CUDA (for using GPU) nltk (natural language toolkit for tokenized the sentences) tqdm (for the nice progression bars) For The web interface requires these packages: –django –channels –Redis –asgi redis Workflow Schema for question answering system is shown in Figure: ! alt text Dataset The dataset which is required for implementation of my project, has been obtained from section of frequently asked questions in Ziraat Bank website . On this section of website there were approximately 400 pair of question answer about different subjects in banking sector. This size of dataset is not enough for implementation my deep learning model. However the dataset can be used for beginning. Using desktop application this dataset can be enlarged for the deep learning model Some of sample questions and answers are shown in Figure: ! alt text Implementation of Model on Web Interface The models in this work are implemented with Tensorflow, and all experiments are processed in a GPU cluster. I use the accuracy on validation set to locate the best epoch and best hyper parameter settings for testing. The word embedding is trained by Word2vec, and the word vector size is 300. Word embeddings are also parameters and are optimized as well during the training. Stochastic Gradient Descent(SGD) is the optimization strategy. I tried different margin values, such as 0.05, 0.1 and 0.2, and finally fixed the margin as 0.2. As an network architecture, multi layer perceptron model has been used on web interface. This model has 5 layer in total. (1 of them as an Input Layer, 3 of them as an Hidden Layer, 1 of them as an output Layer). Each word vector, which is obtained of the questions in the data set, is used as input. The output layer has the indexes of the sentences placed in order. During training of the model after every epoch, the sentence index in the output layer is changed to 1 as 14 a binary value. Total number of questions in dataset is currently 488, which are used for training the model. After 100 epochs the training phase of model is ended. Leaning rate is set to 0.001 and dropout rate is set to 0.5. This approach is very similar to the text classification methods, which use artificial neural networks. Multi layer perceptron model for question answering is shown in Figure: ! alt text Implementation of Desktop Application The size of dataset is not enough for implementation my deep learning model. Therefore, in some cases the web application can not reply with correct answer. The solution for this problem is enlarging dataset for the deep learning model using desktop application. The mentality of desktop application is not complicated. The program finds closest question to obtained question using some similarity methods such as cosine similarity, levenshtein distance and Qqram similarity. If the user is satisfied with answer then he has to click Ok button so the new question and answer will be added on database. If the user is not satisfied with answer, the user can choose one of 5 most similar questions. In this case, a new answer will be shown on text area. If the user is still not satisfied with answer, then he has to click on Not Ok button, the question and answer will be added in another database. The admin will enter new answers after a while for the question. Using this desktop application, enlarging dataset for the deep learning model can be possible. UML diagram of desktop application is shown in Figure: ! alt text Screenshots of Web Application The main page of Web application is shown in Figure. For now it is only for a demo homepage. Users can write their own question in text label as an input. After a couple of milliseconds the system will print the answer to the screen: ! alt text User can ask a question using web application. It is shown in Figure: ! alt text The answer and question will be shown on text area. It is shown in Figure: ! alt text If the user change previously asked question and asks a new question, the new answer will be shown on text area. It is shown in Figure: ! alt text Screenshots of Desktop Application The main page of desktop application is shown in Figure . Users can write their own question in text label as an input. After a couple of milliseconds the system will print the answer to the screen: ! alt text User can ask a question using desktop application. For this sample question : Ziraat internet bankaciligina hangi adreslerden ulasabilirim? the answer, most similar question and similarity rate are shown on screen. Due the question is already in database, similarity rate is 1.0 and correct answer is seen in text area. It is shown in Figure: ! alt text If the user change just one word on previously asked question and asks this question: Ziraat internet bankaciligina hangi adreslerden ulasabilirim? , similarity rate will be 0.9 and same answer will be seen on text area. It is shown in Figure: ! alt text At the bottom of text area an important question is asked to user. It means Are you satisfied with answer? . Also, Ok and Not ok button take part in right side of this question. It is shown in Figure: ! alt text If the user click on Ok button, the question and answer will be added in database. The message The question is added in database is shown in Figure: ! alt text if the user click on Not Ok button, the question and answer will be added in another database. The admin will enter new answers after a while for the question. The message For your question will be found better answer in soon is shown in Figure: ! alt text If the user change the question more deeply, the similarity rate between most similar question will drop. For this example question : Ziraat bankaciligina nasil ulasabilirim? similarity rate will be 0.7 and correct answer will be able to found by application. It is shown in Figure: ! alt text In this application, the user can choose the method which is used to find most similar question and its answer. For a now possible methods are cosine similarity, levenshtein and qram. It is shown in Figure: ! alt text If the user is not satisfied with question, the user can choose one of 5 most similar questions. In this case, a new answer will be shown on text area. Possible questions section is shown in Figure: ! alt text Screenshots of Mobile Application Before the main page of mobile application is shown, the splash screen is shown during a couple of seconds on android application. It is shown in Figure: ! alt text The main page of mobile application is shown in Figure. Users can write their own question in text label as an input. After a couple of milliseconds the system will print the answer to the screen: ! alt text User can ask a question using mobile application. For this sample question : Kartım kayboldu, ne yapabilirim? the answer, most similar question and similarity rate are shown on screen. It is shown in Figure: ! alt text At the above of text area an important question is asked to user. It means Are you satisfied with answer? . Also, Ok and Not ok button take part in right side of this question. It is shown in Figure: ! alt text If the user click on Ok button, the question and answer will be added in database. The message The question is added in database is shown in Figure: ! alt text If the user click on Not Ok button, the question and answer will be added in another database. The admin will enter new answers after a while for the question. The message For your question will be found better answer in soon is shown in Figure: ! alt text If the user is not satisfied with question, the user can choose one of 5 most similar questions. In this case, a new answer will be shown on text area. Possible questions section is shown in Figure: ! alt text Experimental Results An evaluation model was used to measure user satisfaction with the question answering assistants. 10 different participants were given hard copy form of 50 question samples in the dataset to use the question answer assistant. Participants were requested to ask similar or related questions to these questions using web application and desktop application. After each response of the application, the users evaluated the system with a score of 1 5. This evaluating was based on the following 4 criteria: 1. Were the answers given by Question Answering Assistant correct? (Quality) 2. Are you satisfied with answers? (Quantity) 3. Were the answers given by Question Answering Assistant relevant to subject? (Relation) 4. Were the answers given by Question Answering Assistant clear? (Manner) The result of the rating is shown in Table, which have done by the participants: ! alt text According to this frequency values accuracy can be calculated using sample mean formula. ! alt text Accordingly, the overall success of the system was estimated to be approximately 92%. Performance Results Question Answering Assistant has been tested on two different platforms (Windows and Android). This test was examined according to criteria such as time, speed, success. First, when the application created on the mobile platform is examined, the opening time of the program takes 1 2 ms. No stuck or waiting period occurs during usage. Similarly, when we look at the desktop and web application, the opening time of the program lasts in ms. There is no hanging or waiting during usage. Conclusion In this project, a text based question answering assistant was implemented which can be used for different sectors. The system was designed that finds the closest question among the previously asked questions by user and also the system returns correct answer to the user. In case of the answer could not be found, this system informs the user and finds soon a suitable answer. If the chosen answer is satisfactory for the user, the new question answer binary is added to the database. After a certain period of time, the data is retrained to increase the success of system. Two different methods has been used to find best answer. First method is to use neural network model (LSTM) and second method is to use string similarity methods( Cosine similarity, Levenhstein distance, Qgram similarity) to find appropriate answer. Although both methods are considered successful enough, the neural network model needs more data sets. When the experimental results of the project were reviewed, it was found that the participants evaluated the system very successfully. In 4 different criteria (Quality, Quantity, Manner, Relation) it was scored over 4 out of 5. Overall score is 4.6 points, which is about 92%. On this project my purpose was to give correct answers to the questions asked by users, although the level of achievement could not be reached in the first stage, it was reached the acceptable levels of this level step by step. There are not many studies done in Turkish about question and answer systems. I hope this work will be useful for other studies. References 1 Deep Learning for Answer Sentence Selection { Lei Yu, Karl Moritz Hermann, Phil Blunsom, Stephen Pulman (Submitted on 4 Dec 2014). 2 End To End Memory Networks { Sainbayar Sukhbaatar, Arthur Szlam, Jason Weston, Rob Fergus (Submitted on 31 Mar 2015 (v1), last revised 24 Nov 2015) 3 Text REtrieval Conference (TREC) Question Answering Collections { Voorhees, E. Tice, D. Building a Question Answering Test Collection , Proceedings of SIGIR 2000, July, 2000, pp. 200 207 4 Memory Networks { Jason Weston, Sumit Chopra, Antoine Bordes Submitted on 15 Oct 2014. 5 Neural Machine Translation by Jointly Learning to Align and Translate { Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio Submitted on 1 Sep 2014 , last revised 19 May 2016 6 A Sample of the Penn Treebank Corpus { University of Pennsylvania 2017. 7 Embedding made from the text8 Wikipedia dump. { 8 SQuAD: 100,000+ Questions for Machine Comprehension of Text { Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, Percy Liang (Submitted on 16 Jun 2016, last revised 11 Oct 2016 ) 9 The Dialog State Tracking Challenge (DSTC) is an on going series of research community challenge tasks. { 10 The Dialog State Tracking Challenge { Jason D. Williams, Antoine Raux, Deepak Ramachadran, and Alan Black. Proceedings of the SIGDIAL 2013 Conference, Metz, France, August 2013. 11 The Second Dialog State Tracking Challenge { Matthew Henderson, Blaise Thomson, and Jason D. Williams. Proceedings of SIGDIAL 2014 Conference, Philadelphia, USA, June 2014. 12 The Third Dialog State Tracking Challenge { Matthew Henderson, Blaise Thomson, and Jason D. Williams. Proceedings IEEE Spoken Language Technology Workshop (SLT), South Lake Tahoe, USA, December 2014. 13 2015 International Workshop Series on Spoken Dialogue Systems Technology { 14 The Fourth Dialog State Tracking Challenge { Seokhwan Kim, Luis F. D'Haro, Rafael E Banchs, Matthew Henderson, and Jason D. Williams. Proceedings IEEE Spoken Language Technology Workshop (SLT), South Lake Tahoe, USA, December 2014. 15 2016 IEEE Workshop on Spoken Language Technology { 13–16 December 2016 • San Diego, California 16 The Fifth Dialog State Tracking Challenge. Seokhwan Kim, Luis F. D'Haro, Rafael E Banchs, Matthew Henderson, Jason D. Williams, and Koichiro Yoshino. Proceedings IEEE Spoken Language Technology Workshop (SLT), San Diego, USA, December 2016. 17 The Dialogue Breakdown Detection Challenge { Ryuichiro Higashinaka, Kotaro Funakoshi, Yuka Kobayashi, Michimasa Inaba, NTT Media Intelligence Laboratories , Honda Research Institute Japan Co., Ltd., Toshiba Corporation, Hiroshima City University 18 2017 Conference on Neural Information Processing Systems { Long Beach Convention and Entertainment Center, Long Beach, California, USA 19 Restaurant information train and development set { Paul Crook Microsoft Research, Maxine Eskenazi Carnegie Mellon University, Milica Gasic University of Cambridge (3rd February 2014) 20 Ziraat Bank frequently asked questions { 21 Tensorflow An open source machine learning framework for everyone { 22 Vector Representations of Words { 23 Optimization: Stochastic Gradient Descent { 24 Understanding LSTM Networks { 25 Cosine Similarity { 26 Levensthein Distance { 27 Qgram Similarity {",Machine Translation,Machine Translation 2668,Natural Language Processing,Natural Language Processing,Natural Language Processing,"BERT \ \ \ \ \ New November 5th, 2018: Third party PyTorch and Chainer versions of BERT available \ \ \ \ \ NLP researchers from HuggingFace made a PyTorch version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. Sosuke Kobayashi also made a Chainer version of BERT available (Thanks!) We were not involved in the creation or maintenance of the PyTorch implementation so please direct any questions towards the authors of that repository. \ \ \ \ \ New November 3rd, 2018: Multilingual and Chinese models available \ \ \ \ \ We have made two new BERT models available: BERT Base, Multilingual : 102 languages, 12 layer, 768 hidden, 12 heads, 110M parameters BERT Base, Chinese : Chinese Simplified and Traditional, 12 layer, 768 hidden, 12 heads, 110M parameters We use character based tokenization for Chinese, and WordPiece tokenization for all other languages. Both models should work out of the box without any code changes. We did update the implementation of BasicTokenizer in tokenization.py to support Chinese character tokenization, so please update if you forked it. However, we did not change the tokenization API. For more, see the Multilingual README . \ \ \ \ \ End new information \ \ \ \ \ Introduction BERT , or B idirectional E ncoder R epresentations from T ransformers, is a new method of pre training language representations which obtains state of the art results on a wide array of Natural Language Processing (NLP) tasks. Our academic paper which describes BERT in detail and provides full results on a number of tasks can be found here: To give a few numbers, here are the results on the SQuAD v1.1 question answering task: SQuAD v1.1 Leaderboard (Oct 8th 2018) Test EM Test F1 : : : : 1st Place Ensemble BERT 87.4 93.2 2nd Place Ensemble nlnet 86.0 91.7 1st Place Single Model BERT 85.1 91.8 2nd Place Single Model nlnet 83.5 90.1 And several natural language inference tasks: System MultiNLI Question NLI SWAG : : : : : : BERT 86.7 91.1 86.3 OpenAI GPT (Prev. SOTA) 82.2 88.1 75.0 Plus many other tasks. Moreover, these results were all obtained with almost no task specific neural network architecture design. If you already know what BERT is and you just want to get started, you can download the pre trained models ( pre trained models) and run a state of the art fine tuning ( fine tuning with bert) in only a few minutes. What is BERT? BERT is a method of pre training language representations, meaning that we train a general purpose language understanding model on a large text corpus (like Wikipedia), and then use that model for downstream NLP tasks that we care about (like question answering). BERT outperforms previous methods because it is the first unsupervised , deeply bidirectional system for pre training NLP. Unsupervised means that BERT was trained using only a plain text corpus, which is important because an enormous amount of plain text data is publicly available on the web in many languages. Pre trained representations can also either be context free or contextual , and contextual representations can further be unidirectional or bidirectional . Context free models such as word2vec or GloVe generate a single word embedding representation for each word in the vocabulary, so bank would have the same representation in bank deposit and river bank . Contextual models instead generate a representation of each word that is based on the other words in the sentence. BERT was built upon recent work in pre training contextual representations — including Semi supervised Sequence Learning , Generative Pre Training , ELMo , and ULMFit — but crucially these models are all unidirectional or shallowly bidirectional . This means that each word is only contextualized using the words to its left (or right). For example, in the sentence I made a bank deposit the unidirectional representation of bank is only based on I made a but not deposit . Some previous work does combine the representations from separate left context and right context models, but only in a shallow manner. BERT represents bank using both its left and right context — I made a ... deposit — starting from the very bottom of a deep neural network, so it is deeply bidirectional . BERT uses a simple approach for this: We mask out 15% of the words in the input, run the entire sequence through a deep bidirectional Transformer encoder, and then predict only the masked words. For example: Input: the man went to the MASK1 . he bought a MASK2 of milk. Labels: MASK1 store; MASK2 gallon In order to learn relationships between sentences, we also train on a simple task which can be generated from any monolingual corpus: Given two sentences A and B , is B the actual next sentence that comes after A , or just a random sentence from the corpus? Sentence A: the man went to the store . Sentence B: he bought a gallon of milk . Label: IsNextSentence Sentence A: the man went to the store . Sentence B: penguins are flightless . Label: NotNextSentence We then train a large model (12 layer to 24 layer Transformer) on a large corpus (Wikipedia + BookCorpus ) for a long time (1M update steps), and that's BERT. Using BERT has two stages: Pre training and fine tuning . Pre training is fairly expensive (four days on 4 to 16 Cloud TPUs), but is a one time procedure for each language (current models are English only, but multilingual models will be released in the near future). We are releasing a number of pre trained models from the paper which were pre trained at Google. Most NLP researchers will never need to pre train their own model from scratch. Fine tuning is inexpensive. All of the results in the paper can be replicated in at most 1 hour on a single Cloud TPU, or a few hours on a GPU, starting from the exact same pre trained model. SQuAD, for example, can be trained in around 30 minutes on a single Cloud TPU to achieve a Dev F1 score of 91.0%, which is the single system state of the art. The other important aspect of BERT is that it can be adapted to many types of NLP tasks very easily. In the paper, we demonstrate state of the art results on sentence level (e.g., SST 2), sentence pair level (e.g., MultiNLI), word level (e.g., NER), and span level (e.g., SQuAD) tasks with almost no task specific modifications. What has been released in this repository? We are releasing the following: TensorFlow code for the BERT model architecture (which is mostly a standard Transformer architecture). Pre trained checkpoints for both the lowercase and cased version of BERT Base and BERT Large from the paper. TensorFlow code for push button replication of the most important fine tuning experiments from the paper, including SQuAD, MultiNLI, and MRPC. All of the code in this repository works out of the box with CPU, GPU, and Cloud TPU. Pre trained models We are releasing the BERT Base and BERT Large models from the paper. Uncased means that the text has been lowercased before WordPiece tokenization, e.g., John Smith becomes john smith . The Uncased model also strips out any accent markers. Cased means that the true case and accent markers are preserved. Typically, the Uncased model is better unless you know that case information is important for your task (e.g., Named Entity Recognition or Part of Speech tagging). These models are all released under the same license as the source code (Apache 2.0). For information about the Multilingual and Chinese model, see the Multilingual README . The links to the models are here (right click, 'Save link as...' on the name): BERT Base, Uncased : 12 layer, 768 hidden, 12 heads, 110M parameters BERT Large, Uncased : 24 layer, 1024 hidden, 16 heads, 340M parameters BERT Base, Cased : 12 layer, 768 hidden, 12 heads , 110M parameters BERT Large, Cased : 24 layer, 1024 hidden, 16 heads, 340M parameters (Not available yet. Needs to be re generated). BERT Base, Multilingual : 102 languages, 12 layer, 768 hidden, 12 heads, 110M parameters BERT Base, Chinese : Chinese Simplified and Traditional, 12 layer, 768 hidden, 12 heads, 110M parameters Each .zip file contains three items: A TensorFlow checkpoint ( bert_model.ckpt ) containing the pre trained weights (which is actually 3 files). A vocab file ( vocab.txt ) to map WordPiece to word id. A config file ( bert_config.json ) which specifies the hyperparameters of the model. Fine tuning with BERT Important : All results on the paper were fine tuned on a single Cloud TPU, which has 64GB of RAM. It is currently not possible to re produce most of the BERT Large results on the paper using a GPU with 12GB 16GB of RAM, because the maximum batch size that can fit in memory is too small. We are working on adding code to this repository which allows for much larger effective batch size on the GPU. See the section on out of memory issues ( out of memory issues) for more details. This code was tested with TensorFlow 1.11.0. It was tested with Python2 and Python3 (but more thoroughly with Python2, since this is what's used internally in Google). The fine tuning examples which use BERT Base should be able to run on a GPU that has at least 12GB of RAM using the hyperparameters given. Fine tuning with Cloud TPUs Most of the examples below assumes that you will be running training/evaluation on your local machine, using a GPU like a Titan X or GTX 1080. However, if you have access to a Cloud TPU that you want to train on, just add the following flags to run_classifier.py or run_squad.py : use_tpu True \ tpu_name $TPU_NAME Please see the Google Cloud TPU tutorial for how to use Cloud TPUs. Alternatively, you can use the Google Colab notebook BERT FineTuning with Cloud TPUs . On Cloud TPUs, the pretrained model and the output directory will need to be on Google Cloud Storage. For example, if you have a bucket named some_bucket , you might use the following flags instead: output_dir gs://some_bucket/my_output_dir/ The unzipped pre trained model files can also be found in the Google Cloud Storage folder gs://bert_models/2018_10_18 . For example: export BERT_BASE_DIR gs://bert_models/2018_10_18/uncased_L 12_H 768_A 12 Sentence (and sentence pair) classification tasks Before running this example you must download the GLUE data by running this script and unpack it to some directory $GLUE_DIR . Next, download the BERT Base checkpoint and unzip it to some directory $BERT_BASE_DIR . This example code fine tunes BERT Base on the Microsoft Research Paraphrase Corpus (MRPC) corpus, which only contains 3,600 examples and can fine tune in a few minutes on most GPUs. shell export BERT_BASE_DIR /path/to/bert/uncased_L 12_H 768_A 12 export GLUE_DIR /path/to/glue python run_classifier.py \ task_name MRPC \ do_train true \ do_eval true \ data_dir $GLUE_DIR/MRPC \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ max_seq_length 128 \ train_batch_size 32 \ learning_rate 2e 5 \ num_train_epochs 3.0 \ output_dir /tmp/mrpc_output/ You should see output like this: Eval results eval_accuracy 0.845588 eval_loss 0.505248 global_step 343 loss 0.505248 This means that the Dev set accuracy was 84.55%. Small sets like MRPC have a high variance in the Dev set accuracy, even when starting from the same pre training checkpoint. If you re run multiple times (making sure to point to different output_dir ), you should see results between 84% and 88%. A few other pre trained models are implemented off the shelf in run_classifier.py , so it should be straightforward to follow those examples to use BERT for any single sentence or sentence pair classification task. Note: You might see a message Running train on CPU . This really just means that it's running on something other than a Cloud TPU, which includes a GPU. Prediction from classifier Once you have trained your classifier you can use it in inference mode by using the do_predict true command. You need to have a file named test.tsv in the input folder. Output will be created in file called test_results.tsv in the output folder. Each line will contain output for each sample, columns are the class probabilities. shell export BERT_BASE_DIR /path/to/bert/uncased_L 12_H 768_A 12 export GLUE_DIR /path/to/glue export TRAINED_CLASSIFIER /path/to/fine/tuned/classifier python run_classifier.py \ task_name MRPC \ do_predict true \ data_dir $GLUE_DIR/MRPC \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $TRAINED_CLASSIFIER \ max_seq_length 128 \ output_dir /tmp/mrpc_output/ SQuAD The Stanford Question Answering Dataset (SQuAD) is a popular question answering benchmark dataset. BERT (at the time of the release) obtains state of the art results on SQuAD with almost no task specific network architecture modifications or data augmentation. However, it does require semi complex data pre processing and post processing to deal with (a) the variable length nature of SQuAD context paragraphs, and (b) the character level answer annotations which are used for SQuAD training. This processing is implemented and documented in run_squad.py . To run on SQuAD, you will first need to download the dataset. The SQuAD website does not seem to link to the v1.1 datasets any longer, but the necessary files can be found here: train v1.1.json dev v1.1.json evaluate v1.1.py Download these to some directory $SQUAD_DIR . The state of the art SQuAD results from the paper currently cannot be reproduced on a 12GB 16GB GPU due to memory constraints (in fact, even batch size 1 does not seem to fit on a 12GB GPU using BERT Large ). However, a reasonably strong BERT Base model can be trained on the GPU with these hyperparameters: shell python run_squad.py \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ do_train True \ train_file $SQUAD_DIR/train v1.1.json \ do_predict True \ predict_file $SQUAD_DIR/dev v1.1.json \ train_batch_size 12 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir /tmp/squad_base/ The dev set predictions will be saved into a file called predictions.json in the output_dir : shell python $SQUAD_DIR/evaluate v1.1.py $SQUAD_DIR/dev v1.1.json ./squad/predictions.json Which should produce an output like this: shell { f1 : 88.41249612335034, exact_match : 81.2488174077578} You should see a result similar to the 88.5% reported in the paper for BERT Base . If you have access to a Cloud TPU, you can train with BERT Large . Here is a set of hyperparameters (slightly different than the paper) which consistently obtain around 90.5% 91.0% F1 single system trained only on SQuAD: shell python run_squad.py \ vocab_file $BERT_LARGE_DIR/vocab.txt \ bert_config_file $BERT_LARGE_DIR/bert_config.json \ init_checkpoint $BERT_LARGE_DIR/bert_model.ckpt \ do_train True \ train_file $SQUAD_DIR/train v1.1.json \ do_predict True \ predict_file $SQUAD_DIR/dev v1.1.json \ train_batch_size 24 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir gs://some_bucket/squad_large/ \ use_tpu True \ tpu_name $TPU_NAME For example, one random run with these parameters produces the following Dev scores: shell { f1 : 90.87081895814865, exact_match : 84.38978240302744} If you fine tune for one epoch on TriviaQA before this the results will be even better, but you will need to convert TriviaQA into the SQuAD json format. Out of memory issues All experiments in the paper were fine tuned on a Cloud TPU, which has 64GB of device RAM. Therefore, when using a GPU with 12GB 16GB of RAM, you are likely to encounter out of memory issues if you use the same hyperparameters described in the paper. The factors that affect memory usage are: max_seq_length : The released models were trained with sequence lengths up to 512, but you can fine tune with a shorter max sequence length to save substantial memory. This is controlled by the max_seq_length flag in our example code. train_batch_size : The memory usage is also directly proportional to the batch size. Model type, BERT Base vs. BERT Large : The BERT Large model requires significantly more memory than BERT Base . Optimizer : The default optimizer for BERT is Adam, which requires a lot of extra memory to store the m and v vectors. Switching to a more memory efficient optimizer can reduce memory usage, but can also affect the results. We have not experimented with other optimizers for fine tuning. Using the default training scripts ( run_classifier.py and run_squad.py ), we benchmarked the maximum batch size on single Titan X GPU (12GB RAM) with TensorFlow 1.11.0: System Seq Length Max Batch Size BERT Base 64 64 ... 128 32 ... 256 16 ... 320 14 ... 384 12 ... 512 6 BERT Large 64 12 ... 128 6 ... 256 2 ... 320 1 ... 384 0 ... 512 0 Unfortunately, these max batch sizes for BERT Large are so small that they will actually harm the model accuracy, regardless of the learning rate used. We are working on adding code to this repository which will allow much larger effective batch sizes to be used on the GPU. The code will be based on one (or both) of the following techniques: Gradient accumulation : The samples in a minibatch are typically independent with respect to gradient computation (excluding batch normalization, which is not used here). This means that the gradients of multiple smaller minibatches can be accumulated before performing the weight update, and this will be exactly equivalent to a single larger update. Gradient checkpointing : The major use of GPU/TPU memory during DNN training is caching the intermediate activations in the forward pass that are necessary for efficient computation in the backward pass. Gradient checkpointing trades memory for compute time by re computing the activations in an intelligent way. However, this is not implemented in the current release. Using BERT to extract fixed feature vectors (like ELMo) In certain cases, rather than fine tuning the entire pre trained model end to end, it can be beneficial to obtained pre trained contextual embeddings , which are fixed contextual representations of each input token generated from the hidden layers of the pre trained model. This should also mitigate most of the out of memory issues. As an example, we include the script extract_features.py which can be used like this: shell Sentence A and Sentence B are separated by the delimiter for sentence pair tasks like question answering and entailment. For single sentence inputs, put one sentence per line and DON'T use the delimiter. echo 'Who was Jim Henson ? Jim Henson was a puppeteer' > /tmp/input.txt python extract_features.py \ input_file /tmp/input.txt \ output_file /tmp/output.jsonl \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ layers 1, 2, 3, 4 \ max_seq_length 128 \ batch_size 8 This will create a JSON file (one line per line of input) containing the BERT activations from each Transformer layer specified by layers ( 1 is the final hidden layer of the Transformer, etc.) Note that this script will produce very large output files (by default, around 15kb for every input token). If you need to maintain alignment between the original and tokenized words (for projecting training labels), see the Tokenization ( tokenization) section below. Note: You may see a message like Could not find trained model in model_dir: /tmp/tmpuB5g5c, running initialization to predict. This message is expected, it just means that we are using the init_from_checkpoint() API rather than the saved model API. If you don't specify a checkpoint or specify an invalid checkpoint, this script will complain. Tokenization For sentence level tasks (or sentence pair) tasks, tokenization is very simple. Just follow the example code in run_classifier.py and extract_features.py . The basic procedure for sentence level tasks is: 1. Instantiate an instance of tokenizer tokenization.FullTokenizer 2. Tokenize the raw text with tokens tokenizer.tokenize(raw_text) . 3. Truncate to the maximum sequence length. (You can use up to 512, but you probably want to use shorter if possible for memory and speed reasons.) 4. Add the CLS and SEP tokens in the right place. Word level and span level tasks (e.g., SQuAD and NER) are more complex, since you need to maintain alignment between your input text and output text so that you can project your training labels. SQuAD is a particularly complex example because the input labels are character based, and SQuAD paragraphs are often longer than our maximum sequence length. See the code in run_squad.py to show how we handle this. Before we describe the general recipe for handling word level tasks, it's important to understand what exactly our tokenizer is doing. It has three main steps: 1. Text normalization : Convert all whitespace characters to spaces, and (for the Uncased model) lowercase the input and strip out accent markers. E.g., John Johanson's, → john johanson's, . 2. Punctuation splitting : Split all punctuation characters on both sides (i.e., add whitespace around all punctuation characters). Punctuation characters are defined as (a) Anything with a P Unicode class, (b) any non letter/number/space ASCII character (e.g., characters like $ which are technically not punctuation). E.g., john johanson's, → john johanson ' s , 3. WordPiece tokenization : Apply whitespace tokenization to the output of the above procedure, and apply WordPiece tokenization to each token separately. (Our implementation is directly based on the one from tensor2tensor , which is linked). E.g., john johanson ' s , → john johan son ' s , The advantage of this scheme is that it is compatible with most existing English tokenizers. For example, imagine that you have a part of speech tagging task which looks like this: Input: John Johanson 's house Labels: NNP NNP POS NN The tokenized output will look like this: Tokens: john johan son ' s house Crucially, this would be the same output as if the raw text were John Johanson's house (with no space before the 's ). If you have a pre tokenized representation with word level annotations, you can simply tokenize each input word independently, and deterministically maintain an original to tokenized alignment: python Input orig_tokens John , Johanson , 's , house labels NNP , NNP , POS , NN Output bert_tokens Token map will be an int > int mapping between the orig_tokens index and the bert_tokens index. orig_to_tok_map tokenizer tokenization.FullTokenizer( vocab_file vocab_file, do_lower_case True) bert_tokens.append( CLS ) for orig_token in orig_tokens: orig_to_tok_map.append(len(bert_tokens)) bert_tokens.extend(tokenizer.tokenize(orig_token)) bert_tokens.append( SEP ) bert_tokens CLS , john , johan , son , ' , s , house , SEP orig_to_tok_map 1, 2, 4, 6 Now orig_to_tok_map can be used to project labels to the tokenized representation. There are common English tokenization schemes which will cause a slight mismatch between how BERT was pre trained. For example, if your input tokenization splits off contractions like do n't , this will cause a mismatch. If it is possible to do so, you should pre process your data to convert these back to raw looking text, but if it's not possible, this mismatch is likely not a big deal. Pre training with BERT We are releasing code to do masked LM and next sentence prediction on an arbitrary text corpus. Note that this is not the exact code that was used for the paper (the original code was written in C++, and had some additional complexity), but this code does generate pre training data as described in the paper. Here's how to run the data generation. The input is a plain text file, with one sentence per line. (It is important that these be actual sentences for the next sentence prediction task). Documents are delimited by empty lines. The output is a set of tf.train.Example s serialized into TFRecord file format. You can perform sentence segmentation with an off the shelf NLP toolkit such as spaCy . The create_pretraining_data.py script will concatenate segments until they reach the maximum sequence length to minimize computational waste from padding (see the script for more details). However, you may want to intentionally add a slight amount of noise to your input data (e.g., randomly truncate 2% of input segments) to make it more robust to non sentential input during fine tuning. This script stores all of the examples for the entire input file in memory, so for large data files you should shard the input file and call the script multiple times. (You can pass in a file glob to run_pretraining.py , e.g., tf_examples.tf_record .) The max_predictions_per_seq is the maximum number of masked LM predictions per sequence. You should set this to around max_seq_length masked_lm_prob (the script doesn't do that automatically because the exact value needs to be passed to both scripts). shell python create_pretraining_data.py \ input_file ./sample_text.txt \ output_file /tmp/tf_examples.tfrecord \ vocab_file $BERT_BASE_DIR/vocab.txt \ do_lower_case True \ max_seq_length 128 \ max_predictions_per_seq 20 \ masked_lm_prob 0.15 \ random_seed 12345 \ dupe_factor 5 Here's how to run the pre training. Do not include init_checkpoint if you are pre training from scratch. The model configuration (including vocab size) is specified in bert_config_file . This demo code only pre trains for a small number of steps (20), but in practice you will probably want to set num_train_steps to 10000 steps or more. The max_seq_length and max_predictions_per_seq parameters passed to run_pretraining.py must be the same as create_pretraining_data.py . shell python run_pretraining.py \ input_file /tmp/tf_examples.tfrecord \ output_dir /tmp/pretraining_output \ do_train True \ do_eval True \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ train_batch_size 32 \ max_seq_length 128 \ max_predictions_per_seq 20 \ num_train_steps 20 \ num_warmup_steps 10 \ learning_rate 2e 5 This will produce an output like this: Eval results global_step 20 loss 0.0979674 masked_lm_accuracy 0.985479 masked_lm_loss 0.0979328 next_sentence_accuracy 1.0 next_sentence_loss 3.45724e 05 Note that since our sample_text.txt file is very small, this example training will overfit that data in only a few steps and produce unrealistically high accuracy numbers. Pre training tips and caveats If your task has a large domain specific corpus available (e.g., movie reviews or scientific papers ), it will likely be beneficial to run additional steps of pre training on your corpus, starting from the BERT checkpoint. The learning rate we used in the paper was 1e 4. However, if you are doing additional steps of pre training starting from an existing BERT checkpoint, you should use a smaller learning rate (e.g., 2e 5). Current BERT models are English only, but we do plan to release a multilingual model which has been pre trained on a lot of languages in the near future (hopefully by the end of November 2018). Longer sequences are disproportionately expensive because attention is quadratic to the sequence length. In other words, a batch of 64 sequences of length 512 is much more expensive than a batch of 256 sequences of length 128. The fully connected/convolutional cost is the same, but the attention cost is far greater for the 512 length sequences. Therefore, one good recipe is to pre train for, say, 90,000 steps with a sequence length of 128 and then for 10,000 additional steps with a sequence length of 512. The very long sequences are mostly needed to learn positional embeddings, which can be learned fairly quickly. Note that this does require generating the data twice with different values of max_seq_length . If you are pre training from scratch, be prepared that pre training is computationally expensive, especially on GPUs. If you are pre training from scratch, our recommended recipe is to pre train a BERT Base on a single preemptible Cloud TPU v2 , which takes about 2 weeks at a cost of about $500 USD (based on the pricing in October 2018). You will have to scale down the batch size when only training on a single Cloud TPU, compared to what was used in the paper. It is recommended to use the largest batch size that fits into TPU memory. Pre training data We will not be able to release the pre processed datasets used in the paper. For Wikipedia, the recommended pre processing is to download the latest dump , extract the text with WikiExtractor.py , and then apply any necessary cleanup to convert it into plain text. Unfortunately the researchers who collected the BookCorpus no longer have it available for public download. The Project Guttenberg Dataset is a somewhat smaller (200M word) collection of older books that are public domain. Common Crawl is another very large collection of text, but you will likely have to do substantial pre processing and cleanup to extract a usable corpus for pre training BERT. Learning a new WordPiece vocabulary This repository does not include code for learning a new WordPiece vocabulary. The reason is that the code used in the paper was implemented in C++ with dependencies on Google's internal libraries. For English, it is almost always better to just start with our vocabulary and pre trained models. For learning vocabularies of other languages, there are a number of open source options available. However, keep in mind that these are not compatible with our tokenization.py library: Google's SentencePiece library tensor2tensor's WordPiece generation script Rico Sennrich's Byte Pair Encoding library Using BERT in Colab If you want to use BERT with Colab , you can get started with the notebook BERT FineTuning with Cloud TPUs . At the time of this writing (October 31st, 2018), Colab users can access a Cloud TPU completely for free. Note: One per user, availability limited, requires a Google Cloud Platform account with storage (although storage may be purchased with free credit for signing up with GCP), and this capability may not longer be available in the future. Click on the BERT Colab that was just linked for more information. FAQ Is this code compatible with Cloud TPUs? What about GPUs? Yes, all of the code in this repository works out of the box with CPU, GPU, and Cloud TPU. However, GPU training is single GPU only. I am getting out of memory errors, what is wrong? See the section on out of memory issues ( out of memory issues) for more information. Is there a PyTorch version available? There is no official PyTorch implementation. However, NLP researchers from HuggingFace made a PyTorch version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. We were not involved in the creation or maintenance of the PyTorch implementation so please direct any questions towards the authors of that repository. Is there a Chainer version available? There is no official Chainer implementation. However, Sosuke Kobayashi made a Chainer version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. We were not involved in the creation or maintenance of the Chainer implementation so please direct any questions towards the authors of that repository. Will models in other languages be released? Yes, we plan to release a multi lingual BERT model in the near future. We cannot make promises about exactly which languages will be included, but it will likely be a single model which includes most of the languages which have a significantly sized Wikipedia. Will models larger than BERT Large be released? So far we have not attempted to train anything larger than BERT Large . It is possible that we will release larger models if we are able to obtain significant improvements. What license is this library released under? All code and models are released under the Apache 2.0 license. See the LICENSE file for more information. How do I cite BERT? For now, cite the Arxiv paper : @article{devlin2018bert, title {BERT: Pre training of Deep Bidirectional Transformers for Language Understanding}, author {Devlin, Jacob and Chang, Ming Wei and Lee, Kenton and Toutanova, Kristina}, journal {arXiv preprint arXiv:1810.04805}, year {2018} } If we submit the paper to a conference or journal, we will update the BibTeX. Disclaimer This is not an official Google product. Contact information For help or issues using BERT, please submit a GitHub issue. For personal communication related to BERT, please contact Jacob Devlin ( jacobdevlin@google.com ), Ming Wei Chang ( mingweichang@google.com ), or Kenton Lee ( kentonl@google.com ).",Machine Translation,Machine Translation 2703,Natural Language Processing,Natural Language Processing,Natural Language Processing,"BERT \ \ \ \ \ New February 7th, 2019: TfHub Module \ \ \ \ \ BERT has been uploaded to TensorFlow Hub . See run_classifier_with_tfhub.py for an example of how to use the TF Hub module, or run an example in the browser on Colab . \ \ \ \ \ New November 23rd, 2018: Un normalized multilingual model + Thai + Mongolian \ \ \ \ \ We uploaded a new multilingual model which does not perform any normalization on the input (no lower casing, accent stripping, or Unicode normalization), and additionally inclues Thai and Mongolian. It is recommended to use this version for developing multilingual models, especially on languages with non Latin alphabets. This does not require any code changes, and can be downloaded here: BERT Base, Multilingual Cased : 104 languages, 12 layer, 768 hidden, 12 heads, 110M parameters \ \ \ \ \ New November 15th, 2018: SOTA SQuAD 2.0 System \ \ \ \ \ We released code changes to reproduce our 83% F1 SQuAD 2.0 system, which is currently 1st place on the leaderboard by 3%. See the SQuAD 2.0 section of the README for details. \ \ \ \ \ New November 5th, 2018: Third party PyTorch and Chainer versions of BERT available \ \ \ \ \ NLP researchers from HuggingFace made a PyTorch version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. Sosuke Kobayashi also made a Chainer version of BERT available (Thanks!) We were not involved in the creation or maintenance of the PyTorch implementation so please direct any questions towards the authors of that repository. \ \ \ \ \ New November 3rd, 2018: Multilingual and Chinese models available \ \ \ \ \ We have made two new BERT models available: BERT Base, Multilingual (Not recommended, use Multilingual Cased instead) : 102 languages, 12 layer, 768 hidden, 12 heads, 110M parameters BERT Base, Chinese : Chinese Simplified and Traditional, 12 layer, 768 hidden, 12 heads, 110M parameters We use character based tokenization for Chinese, and WordPiece tokenization for all other languages. Both models should work out of the box without any code changes. We did update the implementation of BasicTokenizer in tokenization.py to support Chinese character tokenization, so please update if you forked it. However, we did not change the tokenization API. For more, see the Multilingual README . \ \ \ \ \ End new information \ \ \ \ \ Introduction BERT , or B idirectional E ncoder R epresentations from T ransformers, is a new method of pre training language representations which obtains state of the art results on a wide array of Natural Language Processing (NLP) tasks. Our academic paper which describes BERT in detail and provides full results on a number of tasks can be found here: To give a few numbers, here are the results on the SQuAD v1.1 question answering task: SQuAD v1.1 Leaderboard (Oct 8th 2018) Test EM Test F1 : : : : 1st Place Ensemble BERT 87.4 93.2 2nd Place Ensemble nlnet 86.0 91.7 1st Place Single Model BERT 85.1 91.8 2nd Place Single Model nlnet 83.5 90.1 And several natural language inference tasks: System MultiNLI Question NLI SWAG : : : : : : BERT 86.7 91.1 86.3 OpenAI GPT (Prev. SOTA) 82.2 88.1 75.0 Plus many other tasks. Moreover, these results were all obtained with almost no task specific neural network architecture design. If you already know what BERT is and you just want to get started, you can download the pre trained models ( pre trained models) and run a state of the art fine tuning ( fine tuning with bert) in only a few minutes. What is BERT? BERT is a method of pre training language representations, meaning that we train a general purpose language understanding model on a large text corpus (like Wikipedia), and then use that model for downstream NLP tasks that we care about (like question answering). BERT outperforms previous methods because it is the first unsupervised , deeply bidirectional system for pre training NLP. Unsupervised means that BERT was trained using only a plain text corpus, which is important because an enormous amount of plain text data is publicly available on the web in many languages. Pre trained representations can also either be context free or contextual , and contextual representations can further be unidirectional or bidirectional . Context free models such as word2vec or GloVe generate a single word embedding representation for each word in the vocabulary, so bank would have the same representation in bank deposit and river bank . Contextual models instead generate a representation of each word that is based on the other words in the sentence. BERT was built upon recent work in pre training contextual representations — including Semi supervised Sequence Learning , Generative Pre Training , ELMo , and ULMFit — but crucially these models are all unidirectional or shallowly bidirectional . This means that each word is only contextualized using the words to its left (or right). For example, in the sentence I made a bank deposit the unidirectional representation of bank is only based on I made a but not deposit . Some previous work does combine the representations from separate left context and right context models, but only in a shallow manner. BERT represents bank using both its left and right context — I made a ... deposit — starting from the very bottom of a deep neural network, so it is deeply bidirectional . BERT uses a simple approach for this: We mask out 15% of the words in the input, run the entire sequence through a deep bidirectional Transformer encoder, and then predict only the masked words. For example: Input: the man went to the MASK1 . he bought a MASK2 of milk. Labels: MASK1 store; MASK2 gallon In order to learn relationships between sentences, we also train on a simple task which can be generated from any monolingual corpus: Given two sentences A and B , is B the actual next sentence that comes after A , or just a random sentence from the corpus? Sentence A: the man went to the store . Sentence B: he bought a gallon of milk . Label: IsNextSentence Sentence A: the man went to the store . Sentence B: penguins are flightless . Label: NotNextSentence We then train a large model (12 layer to 24 layer Transformer) on a large corpus (Wikipedia + BookCorpus ) for a long time (1M update steps), and that's BERT. Using BERT has two stages: Pre training and fine tuning . Pre training is fairly expensive (four days on 4 to 16 Cloud TPUs), but is a one time procedure for each language (current models are English only, but multilingual models will be released in the near future). We are releasing a number of pre trained models from the paper which were pre trained at Google. Most NLP researchers will never need to pre train their own model from scratch. Fine tuning is inexpensive. All of the results in the paper can be replicated in at most 1 hour on a single Cloud TPU, or a few hours on a GPU, starting from the exact same pre trained model. SQuAD, for example, can be trained in around 30 minutes on a single Cloud TPU to achieve a Dev F1 score of 91.0%, which is the single system state of the art. The other important aspect of BERT is that it can be adapted to many types of NLP tasks very easily. In the paper, we demonstrate state of the art results on sentence level (e.g., SST 2), sentence pair level (e.g., MultiNLI), word level (e.g., NER), and span level (e.g., SQuAD) tasks with almost no task specific modifications. What has been released in this repository? We are releasing the following: TensorFlow code for the BERT model architecture (which is mostly a standard Transformer architecture). Pre trained checkpoints for both the lowercase and cased version of BERT Base and BERT Large from the paper. TensorFlow code for push button replication of the most important fine tuning experiments from the paper, including SQuAD, MultiNLI, and MRPC. All of the code in this repository works out of the box with CPU, GPU, and Cloud TPU. Pre trained models We are releasing the BERT Base and BERT Large models from the paper. Uncased means that the text has been lowercased before WordPiece tokenization, e.g., John Smith becomes john smith . The Uncased model also strips out any accent markers. Cased means that the true case and accent markers are preserved. Typically, the Uncased model is better unless you know that case information is important for your task (e.g., Named Entity Recognition or Part of Speech tagging). These models are all released under the same license as the source code (Apache 2.0). For information about the Multilingual and Chinese model, see the Multilingual README . When using a cased model, make sure to pass do_lower False to the training scripts. (Or pass do_lower_case False directly to FullTokenizer if you're using your own script.) The links to the models are here (right click, 'Save link as...' on the name): BERT Base, Uncased : 12 layer, 768 hidden, 12 heads, 110M parameters BERT Large, Uncased : 24 layer, 1024 hidden, 16 heads, 340M parameters BERT Base, Cased : 12 layer, 768 hidden, 12 heads , 110M parameters BERT Large, Cased : 24 layer, 1024 hidden, 16 heads, 340M parameters BERT Base, Multilingual Cased (New, recommended) : 104 languages, 12 layer, 768 hidden, 12 heads, 110M parameters BERT Base, Multilingual Uncased (Orig, not recommended) (Not recommended, use Multilingual Cased instead) : 102 languages, 12 layer, 768 hidden, 12 heads, 110M parameters BERT Base, Chinese : Chinese Simplified and Traditional, 12 layer, 768 hidden, 12 heads, 110M parameters Each .zip file contains three items: A TensorFlow checkpoint ( bert_model.ckpt ) containing the pre trained weights (which is actually 3 files). A vocab file ( vocab.txt ) to map WordPiece to word id. A config file ( bert_config.json ) which specifies the hyperparameters of the model. Fine tuning with BERT Important : All results on the paper were fine tuned on a single Cloud TPU, which has 64GB of RAM. It is currently not possible to re produce most of the BERT Large results on the paper using a GPU with 12GB 16GB of RAM, because the maximum batch size that can fit in memory is too small. We are working on adding code to this repository which allows for much larger effective batch size on the GPU. See the section on out of memory issues ( out of memory issues) for more details. This code was tested with TensorFlow 1.11.0. It was tested with Python2 and Python3 (but more thoroughly with Python2, since this is what's used internally in Google). The fine tuning examples which use BERT Base should be able to run on a GPU that has at least 12GB of RAM using the hyperparameters given. Fine tuning with Cloud TPUs Most of the examples below assumes that you will be running training/evaluation on your local machine, using a GPU like a Titan X or GTX 1080. However, if you have access to a Cloud TPU that you want to train on, just add the following flags to run_classifier.py or run_squad.py : use_tpu True \ tpu_name $TPU_NAME Please see the Google Cloud TPU tutorial for how to use Cloud TPUs. Alternatively, you can use the Google Colab notebook BERT FineTuning with Cloud TPUs . On Cloud TPUs, the pretrained model and the output directory will need to be on Google Cloud Storage. For example, if you have a bucket named some_bucket , you might use the following flags instead: output_dir gs://some_bucket/my_output_dir/ The unzipped pre trained model files can also be found in the Google Cloud Storage folder gs://bert_models/2018_10_18 . For example: export BERT_BASE_DIR gs://bert_models/2018_10_18/uncased_L 12_H 768_A 12 Sentence (and sentence pair) classification tasks Before running this example you must download the GLUE data by running this script and unpack it to some directory $GLUE_DIR . Next, download the BERT Base checkpoint and unzip it to some directory $BERT_BASE_DIR . This example code fine tunes BERT Base on the Microsoft Research Paraphrase Corpus (MRPC) corpus, which only contains 3,600 examples and can fine tune in a few minutes on most GPUs. shell export BERT_BASE_DIR /path/to/bert/uncased_L 12_H 768_A 12 export GLUE_DIR /path/to/glue python run_classifier.py \ task_name MRPC \ do_train true \ do_eval true \ data_dir $GLUE_DIR/MRPC \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ max_seq_length 128 \ train_batch_size 32 \ learning_rate 2e 5 \ num_train_epochs 3.0 \ output_dir /tmp/mrpc_output/ You should see output like this: Eval results eval_accuracy 0.845588 eval_loss 0.505248 global_step 343 loss 0.505248 This means that the Dev set accuracy was 84.55%. Small sets like MRPC have a high variance in the Dev set accuracy, even when starting from the same pre training checkpoint. If you re run multiple times (making sure to point to different output_dir ), you should see results between 84% and 88%. A few other pre trained models are implemented off the shelf in run_classifier.py , so it should be straightforward to follow those examples to use BERT for any single sentence or sentence pair classification task. Note: You might see a message Running train on CPU . This really just means that it's running on something other than a Cloud TPU, which includes a GPU. Prediction from classifier Once you have trained your classifier you can use it in inference mode by using the do_predict true command. You need to have a file named test.tsv in the input folder. Output will be created in file called test_results.tsv in the output folder. Each line will contain output for each sample, columns are the class probabilities. shell export BERT_BASE_DIR /path/to/bert/uncased_L 12_H 768_A 12 export GLUE_DIR /path/to/glue export TRAINED_CLASSIFIER /path/to/fine/tuned/classifier python run_classifier.py \ task_name MRPC \ do_predict true \ data_dir $GLUE_DIR/MRPC \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $TRAINED_CLASSIFIER \ max_seq_length 128 \ output_dir /tmp/mrpc_output/ SQuAD 1.1 The Stanford Question Answering Dataset (SQuAD) is a popular question answering benchmark dataset. BERT (at the time of the release) obtains state of the art results on SQuAD with almost no task specific network architecture modifications or data augmentation. However, it does require semi complex data pre processing and post processing to deal with (a) the variable length nature of SQuAD context paragraphs, and (b) the character level answer annotations which are used for SQuAD training. This processing is implemented and documented in run_squad.py . To run on SQuAD, you will first need to download the dataset. The SQuAD website does not seem to link to the v1.1 datasets any longer, but the necessary files can be found here: train v1.1.json dev v1.1.json evaluate v1.1.py Download these to some directory $SQUAD_DIR . The state of the art SQuAD results from the paper currently cannot be reproduced on a 12GB 16GB GPU due to memory constraints (in fact, even batch size 1 does not seem to fit on a 12GB GPU using BERT Large ). However, a reasonably strong BERT Base model can be trained on the GPU with these hyperparameters: shell python run_squad.py \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ do_train True \ train_file $SQUAD_DIR/train v1.1.json \ do_predict True \ predict_file $SQUAD_DIR/dev v1.1.json \ train_batch_size 12 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir /tmp/squad_base/ The dev set predictions will be saved into a file called predictions.json in the output_dir : shell python $SQUAD_DIR/evaluate v1.1.py $SQUAD_DIR/dev v1.1.json ./squad/predictions.json Which should produce an output like this: shell { f1 : 88.41249612335034, exact_match : 81.2488174077578} You should see a result similar to the 88.5% reported in the paper for BERT Base . If you have access to a Cloud TPU, you can train with BERT Large . Here is a set of hyperparameters (slightly different than the paper) which consistently obtain around 90.5% 91.0% F1 single system trained only on SQuAD: shell python run_squad.py \ vocab_file $BERT_LARGE_DIR/vocab.txt \ bert_config_file $BERT_LARGE_DIR/bert_config.json \ init_checkpoint $BERT_LARGE_DIR/bert_model.ckpt \ do_train True \ train_file $SQUAD_DIR/train v1.1.json \ do_predict True \ predict_file $SQUAD_DIR/dev v1.1.json \ train_batch_size 24 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir gs://some_bucket/squad_large/ \ use_tpu True \ tpu_name $TPU_NAME For example, one random run with these parameters produces the following Dev scores: shell { f1 : 90.87081895814865, exact_match : 84.38978240302744} If you fine tune for one epoch on TriviaQA before this the results will be even better, but you will need to convert TriviaQA into the SQuAD json format. SQuAD 2.0 This model is also implemented and documented in run_squad.py . To run on SQuAD 2.0, you will first need to download the dataset. The necessary files can be found here: train v2.0.json dev v2.0.json evaluate v2.0.py Download these to some directory $SQUAD_DIR . On Cloud TPU you can run with BERT Large as follows: shell python run_squad.py \ vocab_file $BERT_LARGE_DIR/vocab.txt \ bert_config_file $BERT_LARGE_DIR/bert_config.json \ init_checkpoint $BERT_LARGE_DIR/bert_model.ckpt \ do_train True \ train_file $SQUAD_DIR/train v2.0.json \ do_predict True \ predict_file $SQUAD_DIR/dev v2.0.json \ train_batch_size 24 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir gs://some_bucket/squad_large/ \ use_tpu True \ tpu_name $TPU_NAME \ version_2_with_negative True We assume you have copied everything from the output directory to a local directory called ./squad/. The initial dev set predictions will be at ./squad/predictions.json and the differences between the score of no answer ( ) and the best non null answer for each question will be in the file ./squad/null_odds.json Run this script to tune a threshold for predicting null versus non null answers: python $SQUAD_DIR/evaluate v2.0.py $SQUAD_DIR/dev v2.0.json ./squad/predictions.json na prob file ./squad/null_odds.json Assume the script outputs best_f1_thresh THRESH. (Typical values are between 1.0 and 5.0). You can now re run the model to generate predictions with the derived threshold or alternatively you can extract the appropriate answers from ./squad/nbest_predictions.json. shell python run_squad.py \ vocab_file $BERT_LARGE_DIR/vocab.txt \ bert_config_file $BERT_LARGE_DIR/bert_config.json \ init_checkpoint $BERT_LARGE_DIR/bert_model.ckpt \ do_train False \ train_file $SQUAD_DIR/train v2.0.json \ do_predict True \ predict_file $SQUAD_DIR/dev v2.0.json \ train_batch_size 24 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir gs://some_bucket/squad_large/ \ use_tpu True \ tpu_name $TPU_NAME \ version_2_with_negative True \ null_score_diff_threshold $THRESH Out of memory issues All experiments in the paper were fine tuned on a Cloud TPU, which has 64GB of device RAM. Therefore, when using a GPU with 12GB 16GB of RAM, you are likely to encounter out of memory issues if you use the same hyperparameters described in the paper. The factors that affect memory usage are: max_seq_length : The released models were trained with sequence lengths up to 512, but you can fine tune with a shorter max sequence length to save substantial memory. This is controlled by the max_seq_length flag in our example code. train_batch_size : The memory usage is also directly proportional to the batch size. Model type, BERT Base vs. BERT Large : The BERT Large model requires significantly more memory than BERT Base . Optimizer : The default optimizer for BERT is Adam, which requires a lot of extra memory to store the m and v vectors. Switching to a more memory efficient optimizer can reduce memory usage, but can also affect the results. We have not experimented with other optimizers for fine tuning. Using the default training scripts ( run_classifier.py and run_squad.py ), we benchmarked the maximum batch size on single Titan X GPU (12GB RAM) with TensorFlow 1.11.0: System Seq Length Max Batch Size BERT Base 64 64 ... 128 32 ... 256 16 ... 320 14 ... 384 12 ... 512 6 BERT Large 64 12 ... 128 6 ... 256 2 ... 320 1 ... 384 0 ... 512 0 Unfortunately, these max batch sizes for BERT Large are so small that they will actually harm the model accuracy, regardless of the learning rate used. We are working on adding code to this repository which will allow much larger effective batch sizes to be used on the GPU. The code will be based on one (or both) of the following techniques: Gradient accumulation : The samples in a minibatch are typically independent with respect to gradient computation (excluding batch normalization, which is not used here). This means that the gradients of multiple smaller minibatches can be accumulated before performing the weight update, and this will be exactly equivalent to a single larger update. Gradient checkpointing : The major use of GPU/TPU memory during DNN training is caching the intermediate activations in the forward pass that are necessary for efficient computation in the backward pass. Gradient checkpointing trades memory for compute time by re computing the activations in an intelligent way. However, this is not implemented in the current release. Using BERT to extract fixed feature vectors (like ELMo) In certain cases, rather than fine tuning the entire pre trained model end to end, it can be beneficial to obtained pre trained contextual embeddings , which are fixed contextual representations of each input token generated from the hidden layers of the pre trained model. This should also mitigate most of the out of memory issues. As an example, we include the script extract_features.py which can be used like this: shell Sentence A and Sentence B are separated by the delimiter for sentence pair tasks like question answering and entailment. For single sentence inputs, put one sentence per line and DON'T use the delimiter. echo 'Who was Jim Henson ? Jim Henson was a puppeteer' > /tmp/input.txt python extract_features.py \ input_file /tmp/input.txt \ output_file /tmp/output.jsonl \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ layers 1, 2, 3, 4 \ max_seq_length 128 \ batch_size 8 This will create a JSON file (one line per line of input) containing the BERT activations from each Transformer layer specified by layers ( 1 is the final hidden layer of the Transformer, etc.) Note that this script will produce very large output files (by default, around 15kb for every input token). If you need to maintain alignment between the original and tokenized words (for projecting training labels), see the Tokenization ( tokenization) section below. Note: You may see a message like Could not find trained model in model_dir: /tmp/tmpuB5g5c, running initialization to predict. This message is expected, it just means that we are using the init_from_checkpoint() API rather than the saved model API. If you don't specify a checkpoint or specify an invalid checkpoint, this script will complain. Tokenization For sentence level tasks (or sentence pair) tasks, tokenization is very simple. Just follow the example code in run_classifier.py and extract_features.py . The basic procedure for sentence level tasks is: 1. Instantiate an instance of tokenizer tokenization.FullTokenizer 2. Tokenize the raw text with tokens tokenizer.tokenize(raw_text) . 3. Truncate to the maximum sequence length. (You can use up to 512, but you probably want to use shorter if possible for memory and speed reasons.) 4. Add the CLS and SEP tokens in the right place. Word level and span level tasks (e.g., SQuAD and NER) are more complex, since you need to maintain alignment between your input text and output text so that you can project your training labels. SQuAD is a particularly complex example because the input labels are character based, and SQuAD paragraphs are often longer than our maximum sequence length. See the code in run_squad.py to show how we handle this. Before we describe the general recipe for handling word level tasks, it's important to understand what exactly our tokenizer is doing. It has three main steps: 1. Text normalization : Convert all whitespace characters to spaces, and (for the Uncased model) lowercase the input and strip out accent markers. E.g., John Johanson's, → john johanson's, . 2. Punctuation splitting : Split all punctuation characters on both sides (i.e., add whitespace around all punctuation characters). Punctuation characters are defined as (a) Anything with a P Unicode class, (b) any non letter/number/space ASCII character (e.g., characters like $ which are technically not punctuation). E.g., john johanson's, → john johanson ' s , 3. WordPiece tokenization : Apply whitespace tokenization to the output of the above procedure, and apply WordPiece tokenization to each token separately. (Our implementation is directly based on the one from tensor2tensor , which is linked). E.g., john johanson ' s , → john johan son ' s , The advantage of this scheme is that it is compatible with most existing English tokenizers. For example, imagine that you have a part of speech tagging task which looks like this: Input: John Johanson 's house Labels: NNP NNP POS NN The tokenized output will look like this: Tokens: john johan son ' s house Crucially, this would be the same output as if the raw text were John Johanson's house (with no space before the 's ). If you have a pre tokenized representation with word level annotations, you can simply tokenize each input word independently, and deterministically maintain an original to tokenized alignment: python Input orig_tokens John , Johanson , 's , house labels NNP , NNP , POS , NN Output bert_tokens Token map will be an int > int mapping between the orig_tokens index and the bert_tokens index. orig_to_tok_map tokenizer tokenization.FullTokenizer( vocab_file vocab_file, do_lower_case True) bert_tokens.append( CLS ) for orig_token in orig_tokens: orig_to_tok_map.append(len(bert_tokens)) bert_tokens.extend(tokenizer.tokenize(orig_token)) bert_tokens.append( SEP ) bert_tokens CLS , john , johan , son , ' , s , house , SEP orig_to_tok_map 1, 2, 4, 6 Now orig_to_tok_map can be used to project labels to the tokenized representation. There are common English tokenization schemes which will cause a slight mismatch between how BERT was pre trained. For example, if your input tokenization splits off contractions like do n't , this will cause a mismatch. If it is possible to do so, you should pre process your data to convert these back to raw looking text, but if it's not possible, this mismatch is likely not a big deal. Pre training with BERT We are releasing code to do masked LM and next sentence prediction on an arbitrary text corpus. Note that this is not the exact code that was used for the paper (the original code was written in C++, and had some additional complexity), but this code does generate pre training data as described in the paper. Here's how to run the data generation. The input is a plain text file, with one sentence per line. (It is important that these be actual sentences for the next sentence prediction task). Documents are delimited by empty lines. The output is a set of tf.train.Example s serialized into TFRecord file format. You can perform sentence segmentation with an off the shelf NLP toolkit such as spaCy . The create_pretraining_data.py script will concatenate segments until they reach the maximum sequence length to minimize computational waste from padding (see the script for more details). However, you may want to intentionally add a slight amount of noise to your input data (e.g., randomly truncate 2% of input segments) to make it more robust to non sentential input during fine tuning. This script stores all of the examples for the entire input file in memory, so for large data files you should shard the input file and call the script multiple times. (You can pass in a file glob to run_pretraining.py , e.g., tf_examples.tf_record .) The max_predictions_per_seq is the maximum number of masked LM predictions per sequence. You should set this to around max_seq_length masked_lm_prob (the script doesn't do that automatically because the exact value needs to be passed to both scripts). shell python create_pretraining_data.py \ input_file ./sample_text.txt \ output_file /tmp/tf_examples.tfrecord \ vocab_file $BERT_BASE_DIR/vocab.txt \ do_lower_case True \ max_seq_length 128 \ max_predictions_per_seq 20 \ masked_lm_prob 0.15 \ random_seed 12345 \ dupe_factor 5 Here's how to run the pre training. Do not include init_checkpoint if you are pre training from scratch. The model configuration (including vocab size) is specified in bert_config_file . This demo code only pre trains for a small number of steps (20), but in practice you will probably want to set num_train_steps to 10000 steps or more. The max_seq_length and max_predictions_per_seq parameters passed to run_pretraining.py must be the same as create_pretraining_data.py . shell python run_pretraining.py \ input_file /tmp/tf_examples.tfrecord \ output_dir /tmp/pretraining_output \ do_train True \ do_eval True \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ train_batch_size 32 \ max_seq_length 128 \ max_predictions_per_seq 20 \ num_train_steps 20 \ num_warmup_steps 10 \ learning_rate 2e 5 This will produce an output like this: Eval results global_step 20 loss 0.0979674 masked_lm_accuracy 0.985479 masked_lm_loss 0.0979328 next_sentence_accuracy 1.0 next_sentence_loss 3.45724e 05 Note that since our sample_text.txt file is very small, this example training will overfit that data in only a few steps and produce unrealistically high accuracy numbers. Pre training tips and caveats If using your own vocabulary, make sure to change vocab_size in bert_config.json . If you use a larger vocabulary without changing this, you will likely get NaNs when training on GPU or TPU due to unchecked out of bounds access. If your task has a large domain specific corpus available (e.g., movie reviews or scientific papers ), it will likely be beneficial to run additional steps of pre training on your corpus, starting from the BERT checkpoint. The learning rate we used in the paper was 1e 4. However, if you are doing additional steps of pre training starting from an existing BERT checkpoint, you should use a smaller learning rate (e.g., 2e 5). Current BERT models are English only, but we do plan to release a multilingual model which has been pre trained on a lot of languages in the near future (hopefully by the end of November 2018). Longer sequences are disproportionately expensive because attention is quadratic to the sequence length. In other words, a batch of 64 sequences of length 512 is much more expensive than a batch of 256 sequences of length 128. The fully connected/convolutional cost is the same, but the attention cost is far greater for the 512 length sequences. Therefore, one good recipe is to pre train for, say, 90,000 steps with a sequence length of 128 and then for 10,000 additional steps with a sequence length of 512. The very long sequences are mostly needed to learn positional embeddings, which can be learned fairly quickly. Note that this does require generating the data twice with different values of max_seq_length . If you are pre training from scratch, be prepared that pre training is computationally expensive, especially on GPUs. If you are pre training from scratch, our recommended recipe is to pre train a BERT Base on a single preemptible Cloud TPU v2 , which takes about 2 weeks at a cost of about $500 USD (based on the pricing in October 2018). You will have to scale down the batch size when only training on a single Cloud TPU, compared to what was used in the paper. It is recommended to use the largest batch size that fits into TPU memory. Pre training data We will not be able to release the pre processed datasets used in the paper. For Wikipedia, the recommended pre processing is to download the latest dump , extract the text with WikiExtractor.py , and then apply any necessary cleanup to convert it into plain text. Unfortunately the researchers who collected the BookCorpus no longer have it available for public download. The Project Guttenberg Dataset is a somewhat smaller (200M word) collection of older books that are public domain. Common Crawl is another very large collection of text, but you will likely have to do substantial pre processing and cleanup to extract a usable corpus for pre training BERT. Learning a new WordPiece vocabulary This repository does not include code for learning a new WordPiece vocabulary. The reason is that the code used in the paper was implemented in C++ with dependencies on Google's internal libraries. For English, it is almost always better to just start with our vocabulary and pre trained models. For learning vocabularies of other languages, there are a number of open source options available. However, keep in mind that these are not compatible with our tokenization.py library: Google's SentencePiece library tensor2tensor's WordPiece generation script Rico Sennrich's Byte Pair Encoding library Using BERT in Colab If you want to use BERT with Colab , you can get started with the notebook BERT FineTuning with Cloud TPUs . At the time of this writing (October 31st, 2018), Colab users can access a Cloud TPU completely for free. Note: One per user, availability limited, requires a Google Cloud Platform account with storage (although storage may be purchased with free credit for signing up with GCP), and this capability may not longer be available in the future. Click on the BERT Colab that was just linked for more information. FAQ Is this code compatible with Cloud TPUs? What about GPUs? Yes, all of the code in this repository works out of the box with CPU, GPU, and Cloud TPU. However, GPU training is single GPU only. I am getting out of memory errors, what is wrong? See the section on out of memory issues ( out of memory issues) for more information. Is there a PyTorch version available? There is no official PyTorch implementation. However, NLP researchers from HuggingFace made a PyTorch version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. We were not involved in the creation or maintenance of the PyTorch implementation so please direct any questions towards the authors of that repository. Is there a Chainer version available? There is no official Chainer implementation. However, Sosuke Kobayashi made a Chainer version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. We were not involved in the creation or maintenance of the Chainer implementation so please direct any questions towards the authors of that repository. Will models in other languages be released? Yes, we plan to release a multi lingual BERT model in the near future. We cannot make promises about exactly which languages will be included, but it will likely be a single model which includes most of the languages which have a significantly sized Wikipedia. Will models larger than BERT Large be released? So far we have not attempted to train anything larger than BERT Large . It is possible that we will release larger models if we are able to obtain significant improvements. What license is this library released under? All code and models are released under the Apache 2.0 license. See the LICENSE file for more information. How do I cite BERT? For now, cite the Arxiv paper : @article{devlin2018bert, title {BERT: Pre training of Deep Bidirectional Transformers for Language Understanding}, author {Devlin, Jacob and Chang, Ming Wei and Lee, Kenton and Toutanova, Kristina}, journal {arXiv preprint arXiv:1810.04805}, year {2018} } If we submit the paper to a conference or journal, we will update the BibTeX. Disclaimer This is not an official Google product. Contact information For help or issues using BERT, please submit a GitHub issue. For personal communication related to BERT, please contact Jacob Devlin ( jacobdevlin@google.com ), Ming Wei Chang ( mingweichang@google.com ), or Kenton Lee ( kentonl@google.com ).",Machine Translation,Machine Translation 2713,Natural Language Processing,Natural Language Processing,Natural Language Processing,"BERT \ \ \ \ \ New November 23rd, 2018: Un normalized multilingual model + Thai + Mongolian \ \ \ \ \ We uploaded a new multilingual model which does not perform any normalization on the input (no lower casing, accent stripping, or Unicode normalization), and additionally inclues Thai and Mongolian. It is recommended to use this version for developing multilingual models, especially on languages with non Latin alphabets. This does not require any code changes, and can be downloaded here: BERT Base, Multilingual Cased : 104 languages, 12 layer, 768 hidden, 12 heads, 110M parameters \ \ \ \ \ New November 15th, 2018: SOTA SQuAD 2.0 System \ \ \ \ \ We released code changes to reproduce our 83% F1 SQuAD 2.0 system, which is currently 1st place on the leaderboard by 3%. See the SQuAD 2.0 section of the README for details. \ \ \ \ \ New November 5th, 2018: Third party PyTorch and Chainer versions of BERT available \ \ \ \ \ NLP researchers from HuggingFace made a PyTorch version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. Sosuke Kobayashi also made a Chainer version of BERT available (Thanks!) We were not involved in the creation or maintenance of the PyTorch implementation so please direct any questions towards the authors of that repository. \ \ \ \ \ New November 3rd, 2018: Multilingual and Chinese models available \ \ \ \ \ We have made two new BERT models available: BERT Base, Multilingual (Not recommended, use Multilingual Cased instead) : 102 languages, 12 layer, 768 hidden, 12 heads, 110M parameters BERT Base, Chinese : Chinese Simplified and Traditional, 12 layer, 768 hidden, 12 heads, 110M parameters We use character based tokenization for Chinese, and WordPiece tokenization for all other languages. Both models should work out of the box without any code changes. We did update the implementation of BasicTokenizer in tokenization.py to support Chinese character tokenization, so please update if you forked it. However, we did not change the tokenization API. For more, see the Multilingual README . \ \ \ \ \ End new information \ \ \ \ \ Introduction BERT , or B idirectional E ncoder R epresentations from T ransformers, is a new method of pre training language representations which obtains state of the art results on a wide array of Natural Language Processing (NLP) tasks. Our academic paper which describes BERT in detail and provides full results on a number of tasks can be found here: To give a few numbers, here are the results on the SQuAD v1.1 question answering task: SQuAD v1.1 Leaderboard (Oct 8th 2018) Test EM Test F1 : : : : 1st Place Ensemble BERT 87.4 93.2 2nd Place Ensemble nlnet 86.0 91.7 1st Place Single Model BERT 85.1 91.8 2nd Place Single Model nlnet 83.5 90.1 And several natural language inference tasks: System MultiNLI Question NLI SWAG : : : : : : BERT 86.7 91.1 86.3 OpenAI GPT (Prev. SOTA) 82.2 88.1 75.0 Plus many other tasks. Moreover, these results were all obtained with almost no task specific neural network architecture design. If you already know what BERT is and you just want to get started, you can download the pre trained models ( pre trained models) and run a state of the art fine tuning ( fine tuning with bert) in only a few minutes. What is BERT? BERT is a method of pre training language representations, meaning that we train a general purpose language understanding model on a large text corpus (like Wikipedia), and then use that model for downstream NLP tasks that we care about (like question answering). BERT outperforms previous methods because it is the first unsupervised , deeply bidirectional system for pre training NLP. Unsupervised means that BERT was trained using only a plain text corpus, which is important because an enormous amount of plain text data is publicly available on the web in many languages. Pre trained representations can also either be context free or contextual , and contextual representations can further be unidirectional or bidirectional . Context free models such as word2vec or GloVe generate a single word embedding representation for each word in the vocabulary, so bank would have the same representation in bank deposit and river bank . Contextual models instead generate a representation of each word that is based on the other words in the sentence. BERT was built upon recent work in pre training contextual representations — including Semi supervised Sequence Learning , Generative Pre Training , ELMo , and ULMFit — but crucially these models are all unidirectional or shallowly bidirectional . This means that each word is only contextualized using the words to its left (or right). For example, in the sentence I made a bank deposit the unidirectional representation of bank is only based on I made a but not deposit . Some previous work does combine the representations from separate left context and right context models, but only in a shallow manner. BERT represents bank using both its left and right context — I made a ... deposit — starting from the very bottom of a deep neural network, so it is deeply bidirectional . BERT uses a simple approach for this: We mask out 15% of the words in the input, run the entire sequence through a deep bidirectional Transformer encoder, and then predict only the masked words. For example: Input: the man went to the MASK1 . he bought a MASK2 of milk. Labels: MASK1 store; MASK2 gallon In order to learn relationships between sentences, we also train on a simple task which can be generated from any monolingual corpus: Given two sentences A and B , is B the actual next sentence that comes after A , or just a random sentence from the corpus? Sentence A: the man went to the store . Sentence B: he bought a gallon of milk . Label: IsNextSentence Sentence A: the man went to the store . Sentence B: penguins are flightless . Label: NotNextSentence We then train a large model (12 layer to 24 layer Transformer) on a large corpus (Wikipedia + BookCorpus ) for a long time (1M update steps), and that's BERT. Using BERT has two stages: Pre training and fine tuning . Pre training is fairly expensive (four days on 4 to 16 Cloud TPUs), but is a one time procedure for each language (current models are English only, but multilingual models will be released in the near future). We are releasing a number of pre trained models from the paper which were pre trained at Google. Most NLP researchers will never need to pre train their own model from scratch. Fine tuning is inexpensive. All of the results in the paper can be replicated in at most 1 hour on a single Cloud TPU, or a few hours on a GPU, starting from the exact same pre trained model. SQuAD, for example, can be trained in around 30 minutes on a single Cloud TPU to achieve a Dev F1 score of 91.0%, which is the single system state of the art. The other important aspect of BERT is that it can be adapted to many types of NLP tasks very easily. In the paper, we demonstrate state of the art results on sentence level (e.g., SST 2), sentence pair level (e.g., MultiNLI), word level (e.g., NER), and span level (e.g., SQuAD) tasks with almost no task specific modifications. What has been released in this repository? We are releasing the following: TensorFlow code for the BERT model architecture (which is mostly a standard Transformer architecture). Pre trained checkpoints for both the lowercase and cased version of BERT Base and BERT Large from the paper. TensorFlow code for push button replication of the most important fine tuning experiments from the paper, including SQuAD, MultiNLI, and MRPC. All of the code in this repository works out of the box with CPU, GPU, and Cloud TPU. Pre trained models We are releasing the BERT Base and BERT Large models from the paper. Uncased means that the text has been lowercased before WordPiece tokenization, e.g., John Smith becomes john smith . The Uncased model also strips out any accent markers. Cased means that the true case and accent markers are preserved. Typically, the Uncased model is better unless you know that case information is important for your task (e.g., Named Entity Recognition or Part of Speech tagging). These models are all released under the same license as the source code (Apache 2.0). For information about the Multilingual and Chinese model, see the Multilingual README . When using a cased model, make sure to pass do_lower False to the training scripts. (Or pass do_lower_case False directly to FullTokenizer if you're using your own script.) The links to the models are here (right click, 'Save link as...' on the name): BERT Base, Uncased : 12 layer, 768 hidden, 12 heads, 110M parameters BERT Large, Uncased : 24 layer, 1024 hidden, 16 heads, 340M parameters BERT Base, Cased : 12 layer, 768 hidden, 12 heads , 110M parameters BERT Large, Cased : 24 layer, 1024 hidden, 16 heads, 340M parameters BERT Base, Multilingual Cased (New, recommended) : 104 languages, 12 layer, 768 hidden, 12 heads, 110M parameters BERT Base, Multilingual Uncased (Orig, not recommended) (Not recommended, use Multilingual Cased instead) : 102 languages, 12 layer, 768 hidden, 12 heads, 110M parameters BERT Base, Chinese : Chinese Simplified and Traditional, 12 layer, 768 hidden, 12 heads, 110M parameters Each .zip file contains three items: A TensorFlow checkpoint ( bert_model.ckpt ) containing the pre trained weights (which is actually 3 files). A vocab file ( vocab.txt ) to map WordPiece to word id. A config file ( bert_config.json ) which specifies the hyperparameters of the model. Fine tuning with BERT Important : All results on the paper were fine tuned on a single Cloud TPU, which has 64GB of RAM. It is currently not possible to re produce most of the BERT Large results on the paper using a GPU with 12GB 16GB of RAM, because the maximum batch size that can fit in memory is too small. We are working on adding code to this repository which allows for much larger effective batch size on the GPU. See the section on out of memory issues ( out of memory issues) for more details. This code was tested with TensorFlow 1.11.0. It was tested with Python2 and Python3 (but more thoroughly with Python2, since this is what's used internally in Google). The fine tuning examples which use BERT Base should be able to run on a GPU that has at least 12GB of RAM using the hyperparameters given. Fine tuning with Cloud TPUs Most of the examples below assumes that you will be running training/evaluation on your local machine, using a GPU like a Titan X or GTX 1080. However, if you have access to a Cloud TPU that you want to train on, just add the following flags to run_classifier.py or run_squad.py : use_tpu True \ tpu_name $TPU_NAME Please see the Google Cloud TPU tutorial for how to use Cloud TPUs. Alternatively, you can use the Google Colab notebook BERT FineTuning with Cloud TPUs . On Cloud TPUs, the pretrained model and the output directory will need to be on Google Cloud Storage. For example, if you have a bucket named some_bucket , you might use the following flags instead: output_dir gs://some_bucket/my_output_dir/ The unzipped pre trained model files can also be found in the Google Cloud Storage folder gs://bert_models/2018_10_18 . For example: export BERT_BASE_DIR gs://bert_models/2018_10_18/uncased_L 12_H 768_A 12 Sentence (and sentence pair) classification tasks Before running this example you must download the GLUE data by running this script and unpack it to some directory $GLUE_DIR . Next, download the BERT Base checkpoint and unzip it to some directory $BERT_BASE_DIR . This example code fine tunes BERT Base on the Microsoft Research Paraphrase Corpus (MRPC) corpus, which only contains 3,600 examples and can fine tune in a few minutes on most GPUs. shell export BERT_BASE_DIR /path/to/bert/uncased_L 12_H 768_A 12 export GLUE_DIR /path/to/glue python run_classifier.py \ task_name MRPC \ do_train true \ do_eval true \ data_dir $GLUE_DIR/MRPC \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ max_seq_length 128 \ train_batch_size 32 \ learning_rate 2e 5 \ num_train_epochs 3.0 \ output_dir /tmp/mrpc_output/ You should see output like this: Eval results eval_accuracy 0.845588 eval_loss 0.505248 global_step 343 loss 0.505248 This means that the Dev set accuracy was 84.55%. Small sets like MRPC have a high variance in the Dev set accuracy, even when starting from the same pre training checkpoint. If you re run multiple times (making sure to point to different output_dir ), you should see results between 84% and 88%. A few other pre trained models are implemented off the shelf in run_classifier.py , so it should be straightforward to follow those examples to use BERT for any single sentence or sentence pair classification task. Note: You might see a message Running train on CPU . This really just means that it's running on something other than a Cloud TPU, which includes a GPU. Prediction from classifier Once you have trained your classifier you can use it in inference mode by using the do_predict true command. You need to have a file named test.tsv in the input folder. Output will be created in file called test_results.tsv in the output folder. Each line will contain output for each sample, columns are the class probabilities. shell export BERT_BASE_DIR /path/to/bert/uncased_L 12_H 768_A 12 export GLUE_DIR /path/to/glue export TRAINED_CLASSIFIER /path/to/fine/tuned/classifier python run_classifier.py \ task_name MRPC \ do_predict true \ data_dir $GLUE_DIR/MRPC \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $TRAINED_CLASSIFIER \ max_seq_length 128 \ output_dir /tmp/mrpc_output/ SQuAD 1.1 The Stanford Question Answering Dataset (SQuAD) is a popular question answering benchmark dataset. BERT (at the time of the release) obtains state of the art results on SQuAD with almost no task specific network architecture modifications or data augmentation. However, it does require semi complex data pre processing and post processing to deal with (a) the variable length nature of SQuAD context paragraphs, and (b) the character level answer annotations which are used for SQuAD training. This processing is implemented and documented in run_squad.py . To run on SQuAD, you will first need to download the dataset. The SQuAD website does not seem to link to the v1.1 datasets any longer, but the necessary files can be found here: train v1.1.json dev v1.1.json evaluate v1.1.py Download these to some directory $SQUAD_DIR . The state of the art SQuAD results from the paper currently cannot be reproduced on a 12GB 16GB GPU due to memory constraints (in fact, even batch size 1 does not seem to fit on a 12GB GPU using BERT Large ). However, a reasonably strong BERT Base model can be trained on the GPU with these hyperparameters: shell python run_squad.py \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ do_train True \ train_file $SQUAD_DIR/train v1.1.json \ do_predict True \ predict_file $SQUAD_DIR/dev v1.1.json \ train_batch_size 12 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir /tmp/squad_base/ The dev set predictions will be saved into a file called predictions.json in the output_dir : shell python $SQUAD_DIR/evaluate v1.1.py $SQUAD_DIR/dev v1.1.json ./squad/predictions.json Which should produce an output like this: shell { f1 : 88.41249612335034, exact_match : 81.2488174077578} You should see a result similar to the 88.5% reported in the paper for BERT Base . If you have access to a Cloud TPU, you can train with BERT Large . Here is a set of hyperparameters (slightly different than the paper) which consistently obtain around 90.5% 91.0% F1 single system trained only on SQuAD: shell python run_squad.py \ vocab_file $BERT_LARGE_DIR/vocab.txt \ bert_config_file $BERT_LARGE_DIR/bert_config.json \ init_checkpoint $BERT_LARGE_DIR/bert_model.ckpt \ do_train True \ train_file $SQUAD_DIR/train v1.1.json \ do_predict True \ predict_file $SQUAD_DIR/dev v1.1.json \ train_batch_size 24 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir gs://some_bucket/squad_large/ \ use_tpu True \ tpu_name $TPU_NAME For example, one random run with these parameters produces the following Dev scores: shell { f1 : 90.87081895814865, exact_match : 84.38978240302744} If you fine tune for one epoch on TriviaQA before this the results will be even better, but you will need to convert TriviaQA into the SQuAD json format. SQuAD 2.0 This model is also implemented and documented in run_squad.py . To run on SQuAD 2.0, you will first need to download the dataset. The necessary files can be found here: train v2.0.json dev v2.0.json evaluate v2.0.py Download these to some directory $SQUAD_DIR . On Cloud TPU you can run with BERT Large as follows: shell python run_squad.py \ vocab_file $BERT_LARGE_DIR/vocab.txt \ bert_config_file $BERT_LARGE_DIR/bert_config.json \ init_checkpoint $BERT_LARGE_DIR/bert_model.ckpt \ do_train True \ train_file $SQUAD_DIR/train v2.0.json \ do_predict True \ predict_file $SQUAD_DIR/dev v2.0.json \ train_batch_size 24 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir gs://some_bucket/squad_large/ \ use_tpu True \ tpu_name $TPU_NAME \ version_2_with_negative True We assume you have copied everything from the output directory to a local directory called ./squad/. The initial dev set predictions will be at ./squad/predictions.json and the differences between the score of no answer ( ) and the best non null answer for each question will be in the file ./squad/null_odds.json Run this script to tune a threshold for predicting null versus non null answers: python $SQUAD_DIR/evaluate v2.0.py $SQUAD_DIR/dev v2.0.json ./squad/predictions.json na prob file ./squad/null_odds.json Assume the script outputs best_f1_thresh THRESH. (Typical values are between 1.0 and 5.0). You can now re run the model to generate predictions with the derived threshold or alternatively you can extract the appropriate answers from ./squad/nbest_predictions.json. shell python run_squad.py \ vocab_file $BERT_LARGE_DIR/vocab.txt \ bert_config_file $BERT_LARGE_DIR/bert_config.json \ init_checkpoint $BERT_LARGE_DIR/bert_model.ckpt \ do_train False \ train_file $SQUAD_DIR/train v2.0.json \ do_predict True \ predict_file $SQUAD_DIR/dev v2.0.json \ train_batch_size 24 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir gs://some_bucket/squad_large/ \ use_tpu True \ tpu_name $TPU_NAME \ version_2_with_negative True \ null_score_diff_threshold $THRESH Out of memory issues All experiments in the paper were fine tuned on a Cloud TPU, which has 64GB of device RAM. Therefore, when using a GPU with 12GB 16GB of RAM, you are likely to encounter out of memory issues if you use the same hyperparameters described in the paper. The factors that affect memory usage are: max_seq_length : The released models were trained with sequence lengths up to 512, but you can fine tune with a shorter max sequence length to save substantial memory. This is controlled by the max_seq_length flag in our example code. train_batch_size : The memory usage is also directly proportional to the batch size. Model type, BERT Base vs. BERT Large : The BERT Large model requires significantly more memory than BERT Base . Optimizer : The default optimizer for BERT is Adam, which requires a lot of extra memory to store the m and v vectors. Switching to a more memory efficient optimizer can reduce memory usage, but can also affect the results. We have not experimented with other optimizers for fine tuning. Using the default training scripts ( run_classifier.py and run_squad.py ), we benchmarked the maximum batch size on single Titan X GPU (12GB RAM) with TensorFlow 1.11.0: System Seq Length Max Batch Size BERT Base 64 64 ... 128 32 ... 256 16 ... 320 14 ... 384 12 ... 512 6 BERT Large 64 12 ... 128 6 ... 256 2 ... 320 1 ... 384 0 ... 512 0 Unfortunately, these max batch sizes for BERT Large are so small that they will actually harm the model accuracy, regardless of the learning rate used. We are working on adding code to this repository which will allow much larger effective batch sizes to be used on the GPU. The code will be based on one (or both) of the following techniques: Gradient accumulation : The samples in a minibatch are typically independent with respect to gradient computation (excluding batch normalization, which is not used here). This means that the gradients of multiple smaller minibatches can be accumulated before performing the weight update, and this will be exactly equivalent to a single larger update. Gradient checkpointing : The major use of GPU/TPU memory during DNN training is caching the intermediate activations in the forward pass that are necessary for efficient computation in the backward pass. Gradient checkpointing trades memory for compute time by re computing the activations in an intelligent way. However, this is not implemented in the current release. Using BERT to extract fixed feature vectors (like ELMo) In certain cases, rather than fine tuning the entire pre trained model end to end, it can be beneficial to obtained pre trained contextual embeddings , which are fixed contextual representations of each input token generated from the hidden layers of the pre trained model. This should also mitigate most of the out of memory issues. As an example, we include the script extract_features.py which can be used like this: shell Sentence A and Sentence B are separated by the delimiter for sentence pair tasks like question answering and entailment. For single sentence inputs, put one sentence per line and DON'T use the delimiter. echo 'Who was Jim Henson ? Jim Henson was a puppeteer' > /tmp/input.txt python extract_features.py \ input_file /tmp/input.txt \ output_file /tmp/output.jsonl \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ layers 1, 2, 3, 4 \ max_seq_length 128 \ batch_size 8 This will create a JSON file (one line per line of input) containing the BERT activations from each Transformer layer specified by layers ( 1 is the final hidden layer of the Transformer, etc.) Note that this script will produce very large output files (by default, around 15kb for every input token). If you need to maintain alignment between the original and tokenized words (for projecting training labels), see the Tokenization ( tokenization) section below. Note: You may see a message like Could not find trained model in model_dir: /tmp/tmpuB5g5c, running initialization to predict. This message is expected, it just means that we are using the init_from_checkpoint() API rather than the saved model API. If you don't specify a checkpoint or specify an invalid checkpoint, this script will complain. Tokenization For sentence level tasks (or sentence pair) tasks, tokenization is very simple. Just follow the example code in run_classifier.py and extract_features.py . The basic procedure for sentence level tasks is: 1. Instantiate an instance of tokenizer tokenization.FullTokenizer 2. Tokenize the raw text with tokens tokenizer.tokenize(raw_text) . 3. Truncate to the maximum sequence length. (You can use up to 512, but you probably want to use shorter if possible for memory and speed reasons.) 4. Add the CLS and SEP tokens in the right place. Word level and span level tasks (e.g., SQuAD and NER) are more complex, since you need to maintain alignment between your input text and output text so that you can project your training labels. SQuAD is a particularly complex example because the input labels are character based, and SQuAD paragraphs are often longer than our maximum sequence length. See the code in run_squad.py to show how we handle this. Before we describe the general recipe for handling word level tasks, it's important to understand what exactly our tokenizer is doing. It has three main steps: 1. Text normalization : Convert all whitespace characters to spaces, and (for the Uncased model) lowercase the input and strip out accent markers. E.g., John Johanson's, → john johanson's, . 2. Punctuation splitting : Split all punctuation characters on both sides (i.e., add whitespace around all punctuation characters). Punctuation characters are defined as (a) Anything with a P Unicode class, (b) any non letter/number/space ASCII character (e.g., characters like $ which are technically not punctuation). E.g., john johanson's, → john johanson ' s , 3. WordPiece tokenization : Apply whitespace tokenization to the output of the above procedure, and apply WordPiece tokenization to each token separately. (Our implementation is directly based on the one from tensor2tensor , which is linked). E.g., john johanson ' s , → john johan son ' s , The advantage of this scheme is that it is compatible with most existing English tokenizers. For example, imagine that you have a part of speech tagging task which looks like this: Input: John Johanson 's house Labels: NNP NNP POS NN The tokenized output will look like this: Tokens: john johan son ' s house Crucially, this would be the same output as if the raw text were John Johanson's house (with no space before the 's ). If you have a pre tokenized representation with word level annotations, you can simply tokenize each input word independently, and deterministically maintain an original to tokenized alignment: python Input orig_tokens John , Johanson , 's , house labels NNP , NNP , POS , NN Output bert_tokens Token map will be an int > int mapping between the orig_tokens index and the bert_tokens index. orig_to_tok_map tokenizer tokenization.FullTokenizer( vocab_file vocab_file, do_lower_case True) bert_tokens.append( CLS ) for orig_token in orig_tokens: orig_to_tok_map.append(len(bert_tokens)) bert_tokens.extend(tokenizer.tokenize(orig_token)) bert_tokens.append( SEP ) bert_tokens CLS , john , johan , son , ' , s , house , SEP orig_to_tok_map 1, 2, 4, 6 Now orig_to_tok_map can be used to project labels to the tokenized representation. There are common English tokenization schemes which will cause a slight mismatch between how BERT was pre trained. For example, if your input tokenization splits off contractions like do n't , this will cause a mismatch. If it is possible to do so, you should pre process your data to convert these back to raw looking text, but if it's not possible, this mismatch is likely not a big deal. Pre training with BERT We are releasing code to do masked LM and next sentence prediction on an arbitrary text corpus. Note that this is not the exact code that was used for the paper (the original code was written in C++, and had some additional complexity), but this code does generate pre training data as described in the paper. Here's how to run the data generation. The input is a plain text file, with one sentence per line. (It is important that these be actual sentences for the next sentence prediction task). Documents are delimited by empty lines. The output is a set of tf.train.Example s serialized into TFRecord file format. You can perform sentence segmentation with an off the shelf NLP toolkit such as spaCy . The create_pretraining_data.py script will concatenate segments until they reach the maximum sequence length to minimize computational waste from padding (see the script for more details). However, you may want to intentionally add a slight amount of noise to your input data (e.g., randomly truncate 2% of input segments) to make it more robust to non sentential input during fine tuning. This script stores all of the examples for the entire input file in memory, so for large data files you should shard the input file and call the script multiple times. (You can pass in a file glob to run_pretraining.py , e.g., tf_examples.tf_record .) The max_predictions_per_seq is the maximum number of masked LM predictions per sequence. You should set this to around max_seq_length masked_lm_prob (the script doesn't do that automatically because the exact value needs to be passed to both scripts). shell python create_pretraining_data.py \ input_file ./sample_text.txt \ output_file /tmp/tf_examples.tfrecord \ vocab_file $BERT_BASE_DIR/vocab.txt \ do_lower_case True \ max_seq_length 128 \ max_predictions_per_seq 20 \ masked_lm_prob 0.15 \ random_seed 12345 \ dupe_factor 5 Here's how to run the pre training. Do not include init_checkpoint if you are pre training from scratch. The model configuration (including vocab size) is specified in bert_config_file . This demo code only pre trains for a small number of steps (20), but in practice you will probably want to set num_train_steps to 10000 steps or more. The max_seq_length and max_predictions_per_seq parameters passed to run_pretraining.py must be the same as create_pretraining_data.py . shell python run_pretraining.py \ input_file /tmp/tf_examples.tfrecord \ output_dir /tmp/pretraining_output \ do_train True \ do_eval True \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ train_batch_size 32 \ max_seq_length 128 \ max_predictions_per_seq 20 \ num_train_steps 20 \ num_warmup_steps 10 \ learning_rate 2e 5 This will produce an output like this: Eval results global_step 20 loss 0.0979674 masked_lm_accuracy 0.985479 masked_lm_loss 0.0979328 next_sentence_accuracy 1.0 next_sentence_loss 3.45724e 05 Note that since our sample_text.txt file is very small, this example training will overfit that data in only a few steps and produce unrealistically high accuracy numbers. Pre training tips and caveats If using your own vocabulary, make sure to change vocab_size in bert_config.json . If you use a larger vocabulary without changing this, you will likely get NaNs when training on GPU or TPU due to unchecked out of bounds access. If your task has a large domain specific corpus available (e.g., movie reviews or scientific papers ), it will likely be beneficial to run additional steps of pre training on your corpus, starting from the BERT checkpoint. The learning rate we used in the paper was 1e 4. However, if you are doing additional steps of pre training starting from an existing BERT checkpoint, you should use a smaller learning rate (e.g., 2e 5). Current BERT models are English only, but we do plan to release a multilingual model which has been pre trained on a lot of languages in the near future (hopefully by the end of November 2018). Longer sequences are disproportionately expensive because attention is quadratic to the sequence length. In other words, a batch of 64 sequences of length 512 is much more expensive than a batch of 256 sequences of length 128. The fully connected/convolutional cost is the same, but the attention cost is far greater for the 512 length sequences. Therefore, one good recipe is to pre train for, say, 90,000 steps with a sequence length of 128 and then for 10,000 additional steps with a sequence length of 512. The very long sequences are mostly needed to learn positional embeddings, which can be learned fairly quickly. Note that this does require generating the data twice with different values of max_seq_length . If you are pre training from scratch, be prepared that pre training is computationally expensive, especially on GPUs. If you are pre training from scratch, our recommended recipe is to pre train a BERT Base on a single preemptible Cloud TPU v2 , which takes about 2 weeks at a cost of about $500 USD (based on the pricing in October 2018). You will have to scale down the batch size when only training on a single Cloud TPU, compared to what was used in the paper. It is recommended to use the largest batch size that fits into TPU memory. Pre training data We will not be able to release the pre processed datasets used in the paper. For Wikipedia, the recommended pre processing is to download the latest dump , extract the text with WikiExtractor.py , and then apply any necessary cleanup to convert it into plain text. Unfortunately the researchers who collected the BookCorpus no longer have it available for public download. The Project Guttenberg Dataset is a somewhat smaller (200M word) collection of older books that are public domain. Common Crawl is another very large collection of text, but you will likely have to do substantial pre processing and cleanup to extract a usable corpus for pre training BERT. Learning a new WordPiece vocabulary This repository does not include code for learning a new WordPiece vocabulary. The reason is that the code used in the paper was implemented in C++ with dependencies on Google's internal libraries. For English, it is almost always better to just start with our vocabulary and pre trained models. For learning vocabularies of other languages, there are a number of open source options available. However, keep in mind that these are not compatible with our tokenization.py library: Google's SentencePiece library tensor2tensor's WordPiece generation script Rico Sennrich's Byte Pair Encoding library Using BERT in Colab If you want to use BERT with Colab , you can get started with the notebook BERT FineTuning with Cloud TPUs . At the time of this writing (October 31st, 2018), Colab users can access a Cloud TPU completely for free. Note: One per user, availability limited, requires a Google Cloud Platform account with storage (although storage may be purchased with free credit for signing up with GCP), and this capability may not longer be available in the future. Click on the BERT Colab that was just linked for more information. FAQ Is this code compatible with Cloud TPUs? What about GPUs? Yes, all of the code in this repository works out of the box with CPU, GPU, and Cloud TPU. However, GPU training is single GPU only. I am getting out of memory errors, what is wrong? See the section on out of memory issues ( out of memory issues) for more information. Is there a PyTorch version available? There is no official PyTorch implementation. However, NLP researchers from HuggingFace made a PyTorch version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. We were not involved in the creation or maintenance of the PyTorch implementation so please direct any questions towards the authors of that repository. Is there a Chainer version available? There is no official Chainer implementation. However, Sosuke Kobayashi made a Chainer version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. We were not involved in the creation or maintenance of the Chainer implementation so please direct any questions towards the authors of that repository. Will models in other languages be released? Yes, we plan to release a multi lingual BERT model in the near future. We cannot make promises about exactly which languages will be included, but it will likely be a single model which includes most of the languages which have a significantly sized Wikipedia. Will models larger than BERT Large be released? So far we have not attempted to train anything larger than BERT Large . It is possible that we will release larger models if we are able to obtain significant improvements. What license is this library released under? All code and models are released under the Apache 2.0 license. See the LICENSE file for more information. How do I cite BERT? For now, cite the Arxiv paper : @article{devlin2018bert, title {BERT: Pre training of Deep Bidirectional Transformers for Language Understanding}, author {Devlin, Jacob and Chang, Ming Wei and Lee, Kenton and Toutanova, Kristina}, journal {arXiv preprint arXiv:1810.04805}, year {2018} } If we submit the paper to a conference or journal, we will update the BibTeX. Disclaimer This is not an official Google product. Contact information For help or issues using BERT, please submit a GitHub issue. For personal communication related to BERT, please contact Jacob Devlin ( jacobdevlin@google.com ), Ming Wei Chang ( mingweichang@google.com ), or Kenton Lee ( kentonl@google.com ).",Machine Translation,Machine Translation 2729,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Long Short Term Memory Units This is self contained package to train a language model on word level Penn Tree Bank dataset. It achieves 115 perplexity for a small model in 1h, and 81 perplexity for a big model in a day. Model ensemble of 38 big models gives 69 perplexity. This code is derived from (the same author, but a different company). More information:",Machine Translation,Machine Translation 2731,Natural Language Processing,Natural Language Processing,Natural Language Processing,"BERT \ \ \ \ \ New November 23rd, 2018: Un normalized multilingual model + Thai + Mongolian \ \ \ \ \ We uploaded a new multilingual model which does not perform any normalization on the input (no lower casing, accent stripping, or Unicode normalization), and additionally inclues Thai and Mongolian. It is recommended to use this version for developing multilingual models, especially on languages with non Latin alphabets. This does not require any code changes, and can be downloaded here: BERT Base, Multilingual Cased : 104 languages, 12 layer, 768 hidden, 12 heads, 110M parameters \ \ \ \ \ New November 15th, 2018: SOTA SQuAD 2.0 System \ \ \ \ \ We released code changes to reproduce our 83% F1 SQuAD 2.0 system, which is currently 1st place on the leaderboard by 3%. See the SQuAD 2.0 section of the README for details. \ \ \ \ \ New November 5th, 2018: Third party PyTorch and Chainer versions of BERT available \ \ \ \ \ NLP researchers from HuggingFace made a PyTorch version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. Sosuke Kobayashi also made a Chainer version of BERT available (Thanks!) We were not involved in the creation or maintenance of the PyTorch implementation so please direct any questions towards the authors of that repository. \ \ \ \ \ New November 3rd, 2018: Multilingual and Chinese models available \ \ \ \ \ We have made two new BERT models available: BERT Base, Multilingual (Not recommended, use Multilingual Cased instead) : 102 languages, 12 layer, 768 hidden, 12 heads, 110M parameters BERT Base, Chinese : Chinese Simplified and Traditional, 12 layer, 768 hidden, 12 heads, 110M parameters We use character based tokenization for Chinese, and WordPiece tokenization for all other languages. Both models should work out of the box without any code changes. We did update the implementation of BasicTokenizer in tokenization.py to support Chinese character tokenization, so please update if you forked it. However, we did not change the tokenization API. For more, see the Multilingual README . \ \ \ \ \ End new information \ \ \ \ \ Introduction BERT , or B idirectional E ncoder R epresentations from T ransformers, is a new method of pre training language representations which obtains state of the art results on a wide array of Natural Language Processing (NLP) tasks. Our academic paper which describes BERT in detail and provides full results on a number of tasks can be found here: To give a few numbers, here are the results on the SQuAD v1.1 question answering task: SQuAD v1.1 Leaderboard (Oct 8th 2018) Test EM Test F1 : : : : 1st Place Ensemble BERT 87.4 93.2 2nd Place Ensemble nlnet 86.0 91.7 1st Place Single Model BERT 85.1 91.8 2nd Place Single Model nlnet 83.5 90.1 And several natural language inference tasks: System MultiNLI Question NLI SWAG : : : : : : BERT 86.7 91.1 86.3 OpenAI GPT (Prev. SOTA) 82.2 88.1 75.0 Plus many other tasks. Moreover, these results were all obtained with almost no task specific neural network architecture design. If you already know what BERT is and you just want to get started, you can download the pre trained models ( pre trained models) and run a state of the art fine tuning ( fine tuning with bert) in only a few minutes. What is BERT? BERT is a method of pre training language representations, meaning that we train a general purpose language understanding model on a large text corpus (like Wikipedia), and then use that model for downstream NLP tasks that we care about (like question answering). BERT outperforms previous methods because it is the first unsupervised , deeply bidirectional system for pre training NLP. Unsupervised means that BERT was trained using only a plain text corpus, which is important because an enormous amount of plain text data is publicly available on the web in many languages. Pre trained representations can also either be context free or contextual , and contextual representations can further be unidirectional or bidirectional . Context free models such as word2vec or GloVe generate a single word embedding representation for each word in the vocabulary, so bank would have the same representation in bank deposit and river bank . Contextual models instead generate a representation of each word that is based on the other words in the sentence. BERT was built upon recent work in pre training contextual representations — including Semi supervised Sequence Learning , Generative Pre Training , ELMo , and ULMFit — but crucially these models are all unidirectional or shallowly bidirectional . This means that each word is only contextualized using the words to its left (or right). For example, in the sentence I made a bank deposit the unidirectional representation of bank is only based on I made a but not deposit . Some previous work does combine the representations from separate left context and right context models, but only in a shallow manner. BERT represents bank using both its left and right context — I made a ... deposit — starting from the very bottom of a deep neural network, so it is deeply bidirectional . BERT uses a simple approach for this: We mask out 15% of the words in the input, run the entire sequence through a deep bidirectional Transformer encoder, and then predict only the masked words. For example: Input: the man went to the MASK1 . he bought a MASK2 of milk. Labels: MASK1 store; MASK2 gallon In order to learn relationships between sentences, we also train on a simple task which can be generated from any monolingual corpus: Given two sentences A and B , is B the actual next sentence that comes after A , or just a random sentence from the corpus? Sentence A: the man went to the store . Sentence B: he bought a gallon of milk . Label: IsNextSentence Sentence A: the man went to the store . Sentence B: penguins are flightless . Label: NotNextSentence We then train a large model (12 layer to 24 layer Transformer) on a large corpus (Wikipedia + BookCorpus ) for a long time (1M update steps), and that's BERT. Using BERT has two stages: Pre training and fine tuning . Pre training is fairly expensive (four days on 4 to 16 Cloud TPUs), but is a one time procedure for each language (current models are English only, but multilingual models will be released in the near future). We are releasing a number of pre trained models from the paper which were pre trained at Google. Most NLP researchers will never need to pre train their own model from scratch. Fine tuning is inexpensive. All of the results in the paper can be replicated in at most 1 hour on a single Cloud TPU, or a few hours on a GPU, starting from the exact same pre trained model. SQuAD, for example, can be trained in around 30 minutes on a single Cloud TPU to achieve a Dev F1 score of 91.0%, which is the single system state of the art. The other important aspect of BERT is that it can be adapted to many types of NLP tasks very easily. In the paper, we demonstrate state of the art results on sentence level (e.g., SST 2), sentence pair level (e.g., MultiNLI), word level (e.g., NER), and span level (e.g., SQuAD) tasks with almost no task specific modifications. What has been released in this repository? We are releasing the following: TensorFlow code for the BERT model architecture (which is mostly a standard Transformer architecture). Pre trained checkpoints for both the lowercase and cased version of BERT Base and BERT Large from the paper. TensorFlow code for push button replication of the most important fine tuning experiments from the paper, including SQuAD, MultiNLI, and MRPC. All of the code in this repository works out of the box with CPU, GPU, and Cloud TPU. Pre trained models We are releasing the BERT Base and BERT Large models from the paper. Uncased means that the text has been lowercased before WordPiece tokenization, e.g., John Smith becomes john smith . The Uncased model also strips out any accent markers. Cased means that the true case and accent markers are preserved. Typically, the Uncased model is better unless you know that case information is important for your task (e.g., Named Entity Recognition or Part of Speech tagging). These models are all released under the same license as the source code (Apache 2.0). For information about the Multilingual and Chinese model, see the Multilingual README . When using a cased model, make sure to pass do_lower False to the training scripts. (Or pass do_lower_case False directly to FullTokenizer if you're using your own script.) The links to the models are here (right click, 'Save link as...' on the name): BERT Base, Uncased : 12 layer, 768 hidden, 12 heads, 110M parameters BERT Large, Uncased : 24 layer, 1024 hidden, 16 heads, 340M parameters BERT Base, Cased : 12 layer, 768 hidden, 12 heads , 110M parameters BERT Large, Cased : 24 layer, 1024 hidden, 16 heads, 340M parameters BERT Base, Multilingual Cased (New, recommended) : 104 languages, 12 layer, 768 hidden, 12 heads, 110M parameters BERT Base, Multilingual Uncased (Orig, not recommended) (Not recommended, use Multilingual Cased instead) : 102 languages, 12 layer, 768 hidden, 12 heads, 110M parameters BERT Base, Chinese : Chinese Simplified and Traditional, 12 layer, 768 hidden, 12 heads, 110M parameters Each .zip file contains three items: A TensorFlow checkpoint ( bert_model.ckpt ) containing the pre trained weights (which is actually 3 files). A vocab file ( vocab.txt ) to map WordPiece to word id. A config file ( bert_config.json ) which specifies the hyperparameters of the model. Fine tuning with BERT Important : All results on the paper were fine tuned on a single Cloud TPU, which has 64GB of RAM. It is currently not possible to re produce most of the BERT Large results on the paper using a GPU with 12GB 16GB of RAM, because the maximum batch size that can fit in memory is too small. We are working on adding code to this repository which allows for much larger effective batch size on the GPU. See the section on out of memory issues ( out of memory issues) for more details. This code was tested with TensorFlow 1.11.0. It was tested with Python2 and Python3 (but more thoroughly with Python2, since this is what's used internally in Google). The fine tuning examples which use BERT Base should be able to run on a GPU that has at least 12GB of RAM using the hyperparameters given. Fine tuning with Cloud TPUs Most of the examples below assumes that you will be running training/evaluation on your local machine, using a GPU like a Titan X or GTX 1080. However, if you have access to a Cloud TPU that you want to train on, just add the following flags to run_classifier.py or run_squad.py : use_tpu True \ tpu_name $TPU_NAME Please see the Google Cloud TPU tutorial for how to use Cloud TPUs. Alternatively, you can use the Google Colab notebook BERT FineTuning with Cloud TPUs . On Cloud TPUs, the pretrained model and the output directory will need to be on Google Cloud Storage. For example, if you have a bucket named some_bucket , you might use the following flags instead: output_dir gs://some_bucket/my_output_dir/ The unzipped pre trained model files can also be found in the Google Cloud Storage folder gs://bert_models/2018_10_18 . For example: export BERT_BASE_DIR gs://bert_models/2018_10_18/uncased_L 12_H 768_A 12 Sentence (and sentence pair) classification tasks Before running this example you must download the GLUE data by running this script and unpack it to some directory $GLUE_DIR . Next, download the BERT Base checkpoint and unzip it to some directory $BERT_BASE_DIR . This example code fine tunes BERT Base on the Microsoft Research Paraphrase Corpus (MRPC) corpus, which only contains 3,600 examples and can fine tune in a few minutes on most GPUs. shell export BERT_BASE_DIR /path/to/bert/uncased_L 12_H 768_A 12 export GLUE_DIR /path/to/glue python run_classifier.py \ task_name MRPC \ do_train true \ do_eval true \ data_dir $GLUE_DIR/MRPC \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ max_seq_length 128 \ train_batch_size 32 \ learning_rate 2e 5 \ num_train_epochs 3.0 \ output_dir /tmp/mrpc_output/ You should see output like this: Eval results eval_accuracy 0.845588 eval_loss 0.505248 global_step 343 loss 0.505248 This means that the Dev set accuracy was 84.55%. Small sets like MRPC have a high variance in the Dev set accuracy, even when starting from the same pre training checkpoint. If you re run multiple times (making sure to point to different output_dir ), you should see results between 84% and 88%. A few other pre trained models are implemented off the shelf in run_classifier.py , so it should be straightforward to follow those examples to use BERT for any single sentence or sentence pair classification task. Note: You might see a message Running train on CPU . This really just means that it's running on something other than a Cloud TPU, which includes a GPU. Prediction from classifier Once you have trained your classifier you can use it in inference mode by using the do_predict true command. You need to have a file named test.tsv in the input folder. Output will be created in file called test_results.tsv in the output folder. Each line will contain output for each sample, columns are the class probabilities. shell export BERT_BASE_DIR /path/to/bert/uncased_L 12_H 768_A 12 export GLUE_DIR /path/to/glue export TRAINED_CLASSIFIER /path/to/fine/tuned/classifier python run_classifier.py \ task_name MRPC \ do_predict true \ data_dir $GLUE_DIR/MRPC \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $TRAINED_CLASSIFIER \ max_seq_length 128 \ output_dir /tmp/mrpc_output/ SQuAD 1.1 The Stanford Question Answering Dataset (SQuAD) is a popular question answering benchmark dataset. BERT (at the time of the release) obtains state of the art results on SQuAD with almost no task specific network architecture modifications or data augmentation. However, it does require semi complex data pre processing and post processing to deal with (a) the variable length nature of SQuAD context paragraphs, and (b) the character level answer annotations which are used for SQuAD training. This processing is implemented and documented in run_squad.py . To run on SQuAD, you will first need to download the dataset. The SQuAD website does not seem to link to the v1.1 datasets any longer, but the necessary files can be found here: train v1.1.json dev v1.1.json evaluate v1.1.py Download these to some directory $SQUAD_DIR . The state of the art SQuAD results from the paper currently cannot be reproduced on a 12GB 16GB GPU due to memory constraints (in fact, even batch size 1 does not seem to fit on a 12GB GPU using BERT Large ). However, a reasonably strong BERT Base model can be trained on the GPU with these hyperparameters: shell python run_squad.py \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ do_train True \ train_file $SQUAD_DIR/train v1.1.json \ do_predict True \ predict_file $SQUAD_DIR/dev v1.1.json \ train_batch_size 12 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir /tmp/squad_base/ The dev set predictions will be saved into a file called predictions.json in the output_dir : shell python $SQUAD_DIR/evaluate v1.1.py $SQUAD_DIR/dev v1.1.json ./squad/predictions.json Which should produce an output like this: shell { f1 : 88.41249612335034, exact_match : 81.2488174077578} You should see a result similar to the 88.5% reported in the paper for BERT Base . If you have access to a Cloud TPU, you can train with BERT Large . Here is a set of hyperparameters (slightly different than the paper) which consistently obtain around 90.5% 91.0% F1 single system trained only on SQuAD: shell python run_squad.py \ vocab_file $BERT_LARGE_DIR/vocab.txt \ bert_config_file $BERT_LARGE_DIR/bert_config.json \ init_checkpoint $BERT_LARGE_DIR/bert_model.ckpt \ do_train True \ train_file $SQUAD_DIR/train v1.1.json \ do_predict True \ predict_file $SQUAD_DIR/dev v1.1.json \ train_batch_size 24 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir gs://some_bucket/squad_large/ \ use_tpu True \ tpu_name $TPU_NAME For example, one random run with these parameters produces the following Dev scores: shell { f1 : 90.87081895814865, exact_match : 84.38978240302744} If you fine tune for one epoch on TriviaQA before this the results will be even better, but you will need to convert TriviaQA into the SQuAD json format. SQuAD 2.0 This model is also implemented and documented in run_squad.py . To run on SQuAD 2.0, you will first need to download the dataset. The necessary files can be found here: train v2.0.json dev v2.0.json evaluate v2.0.py Download these to some directory $SQUAD_DIR . On Cloud TPU you can run with BERT Large as follows: shell python run_squad.py \ vocab_file $BERT_LARGE_DIR/vocab.txt \ bert_config_file $BERT_LARGE_DIR/bert_config.json \ init_checkpoint $BERT_LARGE_DIR/bert_model.ckpt \ do_train True \ train_file $SQUAD_DIR/train v2.0.json \ do_predict True \ predict_file $SQUAD_DIR/dev v2.0.json \ train_batch_size 24 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir gs://some_bucket/squad_large/ \ use_tpu True \ tpu_name $TPU_NAME \ version_2_with_negative True We assume you have copied everything from the output directory to a local directory called ./squad/. The initial dev set predictions will be at ./squad/predictions.json and the differences between the score of no answer ( ) and the best non null answer for each question will be in the file ./squad/null_odds.json Run this script to tune a threshold for predicting null versus non null answers: python $SQUAD_DIR/evaluate v2.0.py $SQUAD_DIR/dev v2.0.json ./squad/predictions.json na prob file ./squad/null_odds.json Assume the script outputs best_f1_thresh THRESH. (Typical values are between 1.0 and 5.0). You can now re run the model to generate predictions with the derived threshold or alternatively you can extract the appropriate answers from ./squad/nbest_predictions.json. shell python run_squad.py \ vocab_file $BERT_LARGE_DIR/vocab.txt \ bert_config_file $BERT_LARGE_DIR/bert_config.json \ init_checkpoint $BERT_LARGE_DIR/bert_model.ckpt \ do_train False \ train_file $SQUAD_DIR/train v2.0.json \ do_predict True \ predict_file $SQUAD_DIR/dev v2.0.json \ train_batch_size 24 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir gs://some_bucket/squad_large/ \ use_tpu True \ tpu_name $TPU_NAME \ version_2_with_negative True \ null_score_diff_threshold $THRESH Out of memory issues All experiments in the paper were fine tuned on a Cloud TPU, which has 64GB of device RAM. Therefore, when using a GPU with 12GB 16GB of RAM, you are likely to encounter out of memory issues if you use the same hyperparameters described in the paper. The factors that affect memory usage are: max_seq_length : The released models were trained with sequence lengths up to 512, but you can fine tune with a shorter max sequence length to save substantial memory. This is controlled by the max_seq_length flag in our example code. train_batch_size : The memory usage is also directly proportional to the batch size. Model type, BERT Base vs. BERT Large : The BERT Large model requires significantly more memory than BERT Base . Optimizer : The default optimizer for BERT is Adam, which requires a lot of extra memory to store the m and v vectors. Switching to a more memory efficient optimizer can reduce memory usage, but can also affect the results. We have not experimented with other optimizers for fine tuning. Using the default training scripts ( run_classifier.py and run_squad.py ), we benchmarked the maximum batch size on single Titan X GPU (12GB RAM) with TensorFlow 1.11.0: System Seq Length Max Batch Size BERT Base 64 64 ... 128 32 ... 256 16 ... 320 14 ... 384 12 ... 512 6 BERT Large 64 12 ... 128 6 ... 256 2 ... 320 1 ... 384 0 ... 512 0 Unfortunately, these max batch sizes for BERT Large are so small that they will actually harm the model accuracy, regardless of the learning rate used. We are working on adding code to this repository which will allow much larger effective batch sizes to be used on the GPU. The code will be based on one (or both) of the following techniques: Gradient accumulation : The samples in a minibatch are typically independent with respect to gradient computation (excluding batch normalization, which is not used here). This means that the gradients of multiple smaller minibatches can be accumulated before performing the weight update, and this will be exactly equivalent to a single larger update. Gradient checkpointing : The major use of GPU/TPU memory during DNN training is caching the intermediate activations in the forward pass that are necessary for efficient computation in the backward pass. Gradient checkpointing trades memory for compute time by re computing the activations in an intelligent way. However, this is not implemented in the current release. Using BERT to extract fixed feature vectors (like ELMo) In certain cases, rather than fine tuning the entire pre trained model end to end, it can be beneficial to obtained pre trained contextual embeddings , which are fixed contextual representations of each input token generated from the hidden layers of the pre trained model. This should also mitigate most of the out of memory issues. As an example, we include the script extract_features.py which can be used like this: shell Sentence A and Sentence B are separated by the delimiter for sentence pair tasks like question answering and entailment. For single sentence inputs, put one sentence per line and DON'T use the delimiter. echo 'Who was Jim Henson ? Jim Henson was a puppeteer' > /tmp/input.txt python extract_features.py \ input_file /tmp/input.txt \ output_file /tmp/output.jsonl \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ layers 1, 2, 3, 4 \ max_seq_length 128 \ batch_size 8 This will create a JSON file (one line per line of input) containing the BERT activations from each Transformer layer specified by layers ( 1 is the final hidden layer of the Transformer, etc.) Note that this script will produce very large output files (by default, around 15kb for every input token). If you need to maintain alignment between the original and tokenized words (for projecting training labels), see the Tokenization ( tokenization) section below. Note: You may see a message like Could not find trained model in model_dir: /tmp/tmpuB5g5c, running initialization to predict. This message is expected, it just means that we are using the init_from_checkpoint() API rather than the saved model API. If you don't specify a checkpoint or specify an invalid checkpoint, this script will complain. Tokenization For sentence level tasks (or sentence pair) tasks, tokenization is very simple. Just follow the example code in run_classifier.py and extract_features.py . The basic procedure for sentence level tasks is: 1. Instantiate an instance of tokenizer tokenization.FullTokenizer 2. Tokenize the raw text with tokens tokenizer.tokenize(raw_text) . 3. Truncate to the maximum sequence length. (You can use up to 512, but you probably want to use shorter if possible for memory and speed reasons.) 4. Add the CLS and SEP tokens in the right place. Word level and span level tasks (e.g., SQuAD and NER) are more complex, since you need to maintain alignment between your input text and output text so that you can project your training labels. SQuAD is a particularly complex example because the input labels are character based, and SQuAD paragraphs are often longer than our maximum sequence length. See the code in run_squad.py to show how we handle this. Before we describe the general recipe for handling word level tasks, it's important to understand what exactly our tokenizer is doing. It has three main steps: 1. Text normalization : Convert all whitespace characters to spaces, and (for the Uncased model) lowercase the input and strip out accent markers. E.g., John Johanson's, → john johanson's, . 2. Punctuation splitting : Split all punctuation characters on both sides (i.e., add whitespace around all punctuation characters). Punctuation characters are defined as (a) Anything with a P Unicode class, (b) any non letter/number/space ASCII character (e.g., characters like $ which are technically not punctuation). E.g., john johanson's, → john johanson ' s , 3. WordPiece tokenization : Apply whitespace tokenization to the output of the above procedure, and apply WordPiece tokenization to each token separately. (Our implementation is directly based on the one from tensor2tensor , which is linked). E.g., john johanson ' s , → john johan son ' s , The advantage of this scheme is that it is compatible with most existing English tokenizers. For example, imagine that you have a part of speech tagging task which looks like this: Input: John Johanson 's house Labels: NNP NNP POS NN The tokenized output will look like this: Tokens: john johan son ' s house Crucially, this would be the same output as if the raw text were John Johanson's house (with no space before the 's ). If you have a pre tokenized representation with word level annotations, you can simply tokenize each input word independently, and deterministically maintain an original to tokenized alignment: python Input orig_tokens John , Johanson , 's , house labels NNP , NNP , POS , NN Output bert_tokens Token map will be an int > int mapping between the orig_tokens index and the bert_tokens index. orig_to_tok_map tokenizer tokenization.FullTokenizer( vocab_file vocab_file, do_lower_case True) bert_tokens.append( CLS ) for orig_token in orig_tokens: orig_to_tok_map.append(len(bert_tokens)) bert_tokens.extend(tokenizer.tokenize(orig_token)) bert_tokens.append( SEP ) bert_tokens CLS , john , johan , son , ' , s , house , SEP orig_to_tok_map 1, 2, 4, 6 Now orig_to_tok_map can be used to project labels to the tokenized representation. There are common English tokenization schemes which will cause a slight mismatch between how BERT was pre trained. For example, if your input tokenization splits off contractions like do n't , this will cause a mismatch. If it is possible to do so, you should pre process your data to convert these back to raw looking text, but if it's not possible, this mismatch is likely not a big deal. Pre training with BERT We are releasing code to do masked LM and next sentence prediction on an arbitrary text corpus. Note that this is not the exact code that was used for the paper (the original code was written in C++, and had some additional complexity), but this code does generate pre training data as described in the paper. Here's how to run the data generation. The input is a plain text file, with one sentence per line. (It is important that these be actual sentences for the next sentence prediction task). Documents are delimited by empty lines. The output is a set of tf.train.Example s serialized into TFRecord file format. You can perform sentence segmentation with an off the shelf NLP toolkit such as spaCy . The create_pretraining_data.py script will concatenate segments until they reach the maximum sequence length to minimize computational waste from padding (see the script for more details). However, you may want to intentionally add a slight amount of noise to your input data (e.g., randomly truncate 2% of input segments) to make it more robust to non sentential input during fine tuning. This script stores all of the examples for the entire input file in memory, so for large data files you should shard the input file and call the script multiple times. (You can pass in a file glob to run_pretraining.py , e.g., tf_examples.tf_record .) The max_predictions_per_seq is the maximum number of masked LM predictions per sequence. You should set this to around max_seq_length masked_lm_prob (the script doesn't do that automatically because the exact value needs to be passed to both scripts). shell python create_pretraining_data.py \ input_file ./sample_text.txt \ output_file /tmp/tf_examples.tfrecord \ vocab_file $BERT_BASE_DIR/vocab.txt \ do_lower_case True \ max_seq_length 128 \ max_predictions_per_seq 20 \ masked_lm_prob 0.15 \ random_seed 12345 \ dupe_factor 5 Here's how to run the pre training. Do not include init_checkpoint if you are pre training from scratch. The model configuration (including vocab size) is specified in bert_config_file . This demo code only pre trains for a small number of steps (20), but in practice you will probably want to set num_train_steps to 10000 steps or more. The max_seq_length and max_predictions_per_seq parameters passed to run_pretraining.py must be the same as create_pretraining_data.py . shell python run_pretraining.py \ input_file /tmp/tf_examples.tfrecord \ output_dir /tmp/pretraining_output \ do_train True \ do_eval True \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ train_batch_size 32 \ max_seq_length 128 \ max_predictions_per_seq 20 \ num_train_steps 20 \ num_warmup_steps 10 \ learning_rate 2e 5 This will produce an output like this: Eval results global_step 20 loss 0.0979674 masked_lm_accuracy 0.985479 masked_lm_loss 0.0979328 next_sentence_accuracy 1.0 next_sentence_loss 3.45724e 05 Note that since our sample_text.txt file is very small, this example training will overfit that data in only a few steps and produce unrealistically high accuracy numbers. Pre training tips and caveats If using your own vocabulary, make sure to change vocab_size in bert_config.json . If you use a larger vocabulary without changing this, you will likely get NaNs when training on GPU or TPU due to unchecked out of bounds access. If your task has a large domain specific corpus available (e.g., movie reviews or scientific papers ), it will likely be beneficial to run additional steps of pre training on your corpus, starting from the BERT checkpoint. The learning rate we used in the paper was 1e 4. However, if you are doing additional steps of pre training starting from an existing BERT checkpoint, you should use a smaller learning rate (e.g., 2e 5). Current BERT models are English only, but we do plan to release a multilingual model which has been pre trained on a lot of languages in the near future (hopefully by the end of November 2018). Longer sequences are disproportionately expensive because attention is quadratic to the sequence length. In other words, a batch of 64 sequences of length 512 is much more expensive than a batch of 256 sequences of length 128. The fully connected/convolutional cost is the same, but the attention cost is far greater for the 512 length sequences. Therefore, one good recipe is to pre train for, say, 90,000 steps with a sequence length of 128 and then for 10,000 additional steps with a sequence length of 512. The very long sequences are mostly needed to learn positional embeddings, which can be learned fairly quickly. Note that this does require generating the data twice with different values of max_seq_length . If you are pre training from scratch, be prepared that pre training is computationally expensive, especially on GPUs. If you are pre training from scratch, our recommended recipe is to pre train a BERT Base on a single preemptible Cloud TPU v2 , which takes about 2 weeks at a cost of about $500 USD (based on the pricing in October 2018). You will have to scale down the batch size when only training on a single Cloud TPU, compared to what was used in the paper. It is recommended to use the largest batch size that fits into TPU memory. Pre training data We will not be able to release the pre processed datasets used in the paper. For Wikipedia, the recommended pre processing is to download the latest dump , extract the text with WikiExtractor.py , and then apply any necessary cleanup to convert it into plain text. Unfortunately the researchers who collected the BookCorpus no longer have it available for public download. The Project Guttenberg Dataset is a somewhat smaller (200M word) collection of older books that are public domain. Common Crawl is another very large collection of text, but you will likely have to do substantial pre processing and cleanup to extract a usable corpus for pre training BERT. Learning a new WordPiece vocabulary This repository does not include code for learning a new WordPiece vocabulary. The reason is that the code used in the paper was implemented in C++ with dependencies on Google's internal libraries. For English, it is almost always better to just start with our vocabulary and pre trained models. For learning vocabularies of other languages, there are a number of open source options available. However, keep in mind that these are not compatible with our tokenization.py library: Google's SentencePiece library tensor2tensor's WordPiece generation script Rico Sennrich's Byte Pair Encoding library Using BERT in Colab If you want to use BERT with Colab , you can get started with the notebook BERT FineTuning with Cloud TPUs . At the time of this writing (October 31st, 2018), Colab users can access a Cloud TPU completely for free. Note: One per user, availability limited, requires a Google Cloud Platform account with storage (although storage may be purchased with free credit for signing up with GCP), and this capability may not longer be available in the future. Click on the BERT Colab that was just linked for more information. FAQ Is this code compatible with Cloud TPUs? What about GPUs? Yes, all of the code in this repository works out of the box with CPU, GPU, and Cloud TPU. However, GPU training is single GPU only. I am getting out of memory errors, what is wrong? See the section on out of memory issues ( out of memory issues) for more information. Is there a PyTorch version available? There is no official PyTorch implementation. However, NLP researchers from HuggingFace made a PyTorch version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. We were not involved in the creation or maintenance of the PyTorch implementation so please direct any questions towards the authors of that repository. Is there a Chainer version available? There is no official Chainer implementation. However, Sosuke Kobayashi made a Chainer version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. We were not involved in the creation or maintenance of the Chainer implementation so please direct any questions towards the authors of that repository. Will models in other languages be released? Yes, we plan to release a multi lingual BERT model in the near future. We cannot make promises about exactly which languages will be included, but it will likely be a single model which includes most of the languages which have a significantly sized Wikipedia. Will models larger than BERT Large be released? So far we have not attempted to train anything larger than BERT Large . It is possible that we will release larger models if we are able to obtain significant improvements. What license is this library released under? All code and models are released under the Apache 2.0 license. See the LICENSE file for more information. How do I cite BERT? For now, cite the Arxiv paper : @article{devlin2018bert, title {BERT: Pre training of Deep Bidirectional Transformers for Language Understanding}, author {Devlin, Jacob and Chang, Ming Wei and Lee, Kenton and Toutanova, Kristina}, journal {arXiv preprint arXiv:1810.04805}, year {2018} } If we submit the paper to a conference or journal, we will update the BibTeX. Disclaimer This is not an official Google product. Contact information For help or issues using BERT, please submit a GitHub issue. For personal communication related to BERT, please contact Jacob Devlin ( jacobdevlin@google.com ), Ming Wei Chang ( mingweichang@google.com ), or Kenton Lee ( kentonl@google.com ).",Machine Translation,Machine Translation 2747,Natural Language Processing,Natural Language Processing,Natural Language Processing,"BERT \ \ \ \ \ New November 23rd, 2018: Un normalized multilingual model + Thai + Mongolian \ \ \ \ \ We uploaded a new multilingual model which does not perform any normalization on the input (no lower casing, accent stripping, or Unicode normalization), and additionally includes Thai and Mongolian. It is recommended to use this version for developing multilingual models, especially on languages with non Latin alphabets. This does not require any code changes, and can be downloaded here: BERT Base, Multilingual Cased : 104 languages, 12 layer, 768 hidden, 12 heads, 110M parameters \ \ \ \ \ New November 15th, 2018: SOTA SQuAD 2.0 System \ \ \ \ \ We released code changes to reproduce our 83% F1 SQuAD 2.0 system, which is currently 1st place on the leaderboard by 3%. See the SQuAD 2.0 section of the README for details. \ \ \ \ \ New November 5th, 2018: Third party PyTorch and Chainer versions of BERT available \ \ \ \ \ NLP researchers from HuggingFace made a PyTorch version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. Sosuke Kobayashi also made a Chainer version of BERT available (Thanks!) We were not involved in the creation or maintenance of the PyTorch implementation so please direct any questions towards the authors of that repository. \ \ \ \ \ New November 3rd, 2018: Multilingual and Chinese models available \ \ \ \ \ We have made two new BERT models available: BERT Base, Multilingual : 102 languages, 12 layer, 768 hidden, 12 heads, 110M parameters BERT Base, Chinese : Chinese Simplified and Traditional, 12 layer, 768 hidden, 12 heads, 110M parameters We use character based tokenization for Chinese, and WordPiece tokenization for all other languages. Both models should work out of the box without any code changes. We did update the implementation of BasicTokenizer in tokenization.py to support Chinese character tokenization, so please update if you forked it. However, we did not change the tokenization API. For more, see the Multilingual README . \ \ \ \ \ End new information \ \ \ \ \ Introduction BERT , or B idirectional E ncoder R epresentations from T ransformers, is a new method of pre training language representations which obtains state of the art results on a wide array of Natural Language Processing (NLP) tasks. Our academic paper which describes BERT in detail and provides full results on a number of tasks can be found here: To give a few numbers, here are the results on the SQuAD v1.1 question answering task: SQuAD v1.1 Leaderboard (Oct 8th 2018) Test EM Test F1 : : : : 1st Place Ensemble BERT 87.4 93.2 2nd Place Ensemble nlnet 86.0 91.7 1st Place Single Model BERT 85.1 91.8 2nd Place Single Model nlnet 83.5 90.1 And several natural language inference tasks: System MultiNLI Question NLI SWAG : : : : : : BERT 86.7 91.1 86.3 OpenAI GPT (Prev. SOTA) 82.2 88.1 75.0 Plus many other tasks. Moreover, these results were all obtained with almost no task specific neural network architecture design. If you already know what BERT is and you just want to get started, you can download the pre trained models ( pre trained models) and run a state of the art fine tuning ( fine tuning with bert) in only a few minutes. What is BERT? BERT is a method of pre training language representations, meaning that we train a general purpose language understanding model on a large text corpus (like Wikipedia), and then use that model for downstream NLP tasks that we care about (like question answering). BERT outperforms previous methods because it is the first unsupervised , deeply bidirectional system for pre training NLP. Unsupervised means that BERT was trained using only a plain text corpus, which is important because an enormous amount of plain text data is publicly available on the web in many languages. Pre trained representations can also either be context free or contextual , and contextual representations can further be unidirectional or bidirectional . Context free models such as word2vec or GloVe generate a single word embedding representation for each word in the vocabulary, so bank would have the same representation in bank deposit and river bank . Contextual models instead generate a representation of each word that is based on the other words in the sentence. BERT was built upon recent work in pre training contextual representations — including Semi supervised Sequence Learning , Generative Pre Training , ELMo , and ULMFit — but crucially these models are all unidirectional or shallowly bidirectional . This means that each word is only contextualized using the words to its left (or right). For example, in the sentence I made a bank deposit the unidirectional representation of bank is only based on I made a but not deposit . Some previous work does combine the representations from separate left context and right context models, but only in a shallow manner. BERT represents bank using both its left and right context — I made a ... deposit — starting from the very bottom of a deep neural network, so it is deeply bidirectional . BERT uses a simple approach for this: We mask out 15% of the words in the input, run the entire sequence through a deep bidirectional Transformer encoder, and then predict only the masked words. For example: Input: the man went to the MASK1 . he bought a MASK2 of milk. Labels: MASK1 store; MASK2 gallon In order to learn relationships between sentences, we also train on a simple task which can be generated from any monolingual corpus: Given two sentences A and B , is B the actual next sentence that comes after A , or just a random sentence from the corpus? Sentence A: the man went to the store . Sentence B: he bought a gallon of milk . Label: IsNextSentence Sentence A: the man went to the store . Sentence B: penguins are flightless . Label: NotNextSentence We then train a large model (12 layer to 24 layer Transformer) on a large corpus (Wikipedia + BookCorpus ) for a long time (1M update steps), and that's BERT. Using BERT has two stages: Pre training and fine tuning . Pre training is fairly expensive (four days on 4 to 16 Cloud TPUs), but is a one time procedure for each language (current models are English only, but multilingual models will be released in the near future). We are releasing a number of pre trained models from the paper which were pre trained at Google. Most NLP researchers will never need to pre train their own model from scratch. Fine tuning is inexpensive. All of the results in the paper can be replicated in at most 1 hour on a single Cloud TPU, or a few hours on a GPU, starting from the exact same pre trained model. SQuAD, for example, can be trained in around 30 minutes on a single Cloud TPU to achieve a Dev F1 score of 91.0%, which is the single system state of the art. The other important aspect of BERT is that it can be adapted to many types of NLP tasks very easily. In the paper, we demonstrate state of the art results on sentence level (e.g., SST 2), sentence pair level (e.g., MultiNLI), word level (e.g., NER), and span level (e.g., SQuAD) tasks with almost no task specific modifications. What has been released in this repository? We are releasing the following: TensorFlow code for the BERT model architecture (which is mostly a standard Transformer architecture). Pre trained checkpoints for both the lowercase and cased version of BERT Base and BERT Large from the paper. TensorFlow code for push button replication of the most important fine tuning experiments from the paper, including SQuAD, MultiNLI, and MRPC. All of the code in this repository works out of the box with CPU, GPU, and Cloud TPU. Pre trained models We are releasing the BERT Base and BERT Large models from the paper. Uncased means that the text has been lowercased before WordPiece tokenization, e.g., John Smith becomes john smith . The Uncased model also strips out any accent markers. Cased means that the true case and accent markers are preserved. Typically, the Uncased model is better unless you know that case information is important for your task (e.g., Named Entity Recognition or Part of Speech tagging). These models are all released under the same license as the source code (Apache 2.0). For information about the Multilingual and Chinese model, see the Multilingual README . When using a cased model, make sure to pass do_lower False to the training scripts. (Or pass do_lower_case False directly to FullTokenizer if you're using your own script.) The links to the models are here (right click, 'Save link as...' on the name): BERT Base, Uncased : 12 layer, 768 hidden, 12 heads, 110M parameters BERT Large, Uncased : 24 layer, 1024 hidden, 16 heads, 340M parameters BERT Base, Cased : 12 layer, 768 hidden, 12 heads , 110M parameters BERT Large, Cased : 24 layer, 1024 hidden, 16 heads, 340M parameters BERT Base, Multilingual Cased (New, recommended) : 104 languages, 12 layer, 768 hidden, 12 heads, 110M parameters BERT Base, Multilingual Uncased (Orig, not recommended) : 102 languages, 12 layer, 768 hidden, 12 heads, 110M parameters BERT Base, Chinese : Chinese Simplified and Traditional, 12 layer, 768 hidden, 12 heads, 110M parameters Each .zip file contains three items: A TensorFlow checkpoint ( bert_model.ckpt ) containing the pre trained weights (which is actually 3 files). A vocab file ( vocab.txt ) to map WordPiece to word id. A config file ( bert_config.json ) which specifies the hyperparameters of the model. Fine tuning with BERT Important : All results on the paper were fine tuned on a single Cloud TPU, which has 64GB of RAM. It is currently not possible to re produce most of the BERT Large results on the paper using a GPU with 12GB 16GB of RAM, because the maximum batch size that can fit in memory is too small. We are working on adding code to this repository which allows for much larger effective batch size on the GPU. See the section on out of memory issues ( out of memory issues) for more details. This code was tested with TensorFlow 1.11.0. It was tested with Python2 and Python3 (but more thoroughly with Python2, since this is what's used internally in Google). The fine tuning examples which use BERT Base should be able to run on a GPU that has at least 12GB of RAM using the hyperparameters given. Fine tuning with Cloud TPUs Most of the examples below assumes that you will be running training/evaluation on your local machine, using a GPU like a Titan X or GTX 1080. However, if you have access to a Cloud TPU that you want to train on, just add the following flags to run_classifier.py or run_squad.py : use_tpu True \ tpu_name $TPU_NAME Please see the Google Cloud TPU tutorial for how to use Cloud TPUs. Alternatively, you can use the Google Colab notebook BERT FineTuning with Cloud TPUs . On Cloud TPUs, the pretrained model and the output directory will need to be on Google Cloud Storage. For example, if you have a bucket named some_bucket , you might use the following flags instead: output_dir gs://some_bucket/my_output_dir/ The unzipped pre trained model files can also be found in the Google Cloud Storage folder gs://bert_models/2018_10_18 . For example: export BERT_BASE_DIR gs://bert_models/2018_10_18/uncased_L 12_H 768_A 12 Sentence (and sentence pair) classification tasks Before running this example you must download the GLUE data by running this script and unpack it to some directory $GLUE_DIR . Next, download the BERT Base checkpoint and unzip it to some directory $BERT_BASE_DIR . This example code fine tunes BERT Base on the Microsoft Research Paraphrase Corpus (MRPC) corpus, which only contains 3,600 examples and can fine tune in a few minutes on most GPUs. shell export BERT_BASE_DIR /path/to/bert/uncased_L 12_H 768_A 12 export GLUE_DIR /path/to/glue python run_classifier.py \ task_name MRPC \ do_train true \ do_eval true \ data_dir $GLUE_DIR/MRPC \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ max_seq_length 128 \ train_batch_size 32 \ learning_rate 2e 5 \ num_train_epochs 3.0 \ output_dir /tmp/mrpc_output/ You should see output like this: Eval results eval_accuracy 0.845588 eval_loss 0.505248 global_step 343 loss 0.505248 This means that the Dev set accuracy was 84.55%. Small sets like MRPC have a high variance in the Dev set accuracy, even when starting from the same pre training checkpoint. If you re run multiple times (making sure to point to different output_dir ), you should see results between 84% and 88%. A few other pre trained models are implemented off the shelf in run_classifier.py , so it should be straightforward to follow those examples to use BERT for any single sentence or sentence pair classification task. Note: You might see a message Running train on CPU . This really just means that it's running on something other than a Cloud TPU, which includes a GPU. Prediction from classifier Once you have trained your classifier you can use it in inference mode by using the do_predict true command. You need to have a file named test.tsv in the input folder. Output will be created in file called test_results.tsv in the output folder. Each line will contain output for each sample, columns are the class probabilities. shell export BERT_BASE_DIR /path/to/bert/uncased_L 12_H 768_A 12 export GLUE_DIR /path/to/glue export TRAINED_CLASSIFIER /path/to/fine/tuned/classifier python run_classifier.py \ task_name MRPC \ do_predict true \ data_dir $GLUE_DIR/MRPC \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $TRAINED_CLASSIFIER \ max_seq_length 128 \ output_dir /tmp/mrpc_output/ SQuAD 1.1 The Stanford Question Answering Dataset (SQuAD) is a popular question answering benchmark dataset. BERT (at the time of the release) obtains state of the art results on SQuAD with almost no task specific network architecture modifications or data augmentation. However, it does require semi complex data pre processing and post processing to deal with (a) the variable length nature of SQuAD context paragraphs, and (b) the character level answer annotations which are used for SQuAD training. This processing is implemented and documented in run_squad.py . To run on SQuAD, you will first need to download the dataset. The SQuAD website does not seem to link to the v1.1 datasets any longer, but the necessary files can be found here: train v1.1.json dev v1.1.json evaluate v1.1.py Download these to some directory $SQUAD_DIR . The state of the art SQuAD results from the paper currently cannot be reproduced on a 12GB 16GB GPU due to memory constraints (in fact, even batch size 1 does not seem to fit on a 12GB GPU using BERT Large ). However, a reasonably strong BERT Base model can be trained on the GPU with these hyperparameters: shell python run_squad.py \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ do_train True \ train_file $SQUAD_DIR/train v1.1.json \ do_predict True \ predict_file $SQUAD_DIR/dev v1.1.json \ train_batch_size 12 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir /tmp/squad_base/ The dev set predictions will be saved into a file called predictions.json in the output_dir : shell python $SQUAD_DIR/evaluate v1.1.py $SQUAD_DIR/dev v1.1.json ./squad/predictions.json Which should produce an output like this: shell { f1 : 88.41249612335034, exact_match : 81.2488174077578} You should see a result similar to the 88.5% reported in the paper for BERT Base . If you have access to a Cloud TPU, you can train with BERT Large . Here is a set of hyperparameters (slightly different than the paper) which consistently obtain around 90.5% 91.0% F1 single system trained only on SQuAD: shell python run_squad.py \ vocab_file $BERT_LARGE_DIR/vocab.txt \ bert_config_file $BERT_LARGE_DIR/bert_config.json \ init_checkpoint $BERT_LARGE_DIR/bert_model.ckpt \ do_train True \ train_file $SQUAD_DIR/train v1.1.json \ do_predict True \ predict_file $SQUAD_DIR/dev v1.1.json \ train_batch_size 24 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir gs://some_bucket/squad_large/ \ use_tpu True \ tpu_name $TPU_NAME For example, one random run with these parameters produces the following Dev scores: shell { f1 : 90.87081895814865, exact_match : 84.38978240302744} If you fine tune for one epoch on TriviaQA before this the results will be even better, but you will need to convert TriviaQA into the SQuAD json format. SQuAD 2.0 This model is also implemented and documented in run_squad.py . To run on SQuAD 2.0, you will first need to download the dataset. The necessary files can be found here: train v2.0.json dev v2.0.json evaluate v2.0.py Download these to some directory $SQUAD_DIR . On Cloud TPU you can run with BERT Large as follows: shell python run_squad.py \ vocab_file $BERT_LARGE_DIR/vocab.txt \ bert_config_file $BERT_LARGE_DIR/bert_config.json \ init_checkpoint $BERT_LARGE_DIR/bert_model.ckpt \ do_train True \ train_file $SQUAD_DIR/train v2.0.json \ do_predict True \ predict_file $SQUAD_DIR/dev v2.0.json \ train_batch_size 24 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir gs://some_bucket/squad_large/ \ use_tpu True \ tpu_name $TPU_NAME \ version_2_with_negative True We assume you have copied everything from the output directory to a local directory called ./squad/. The initial dev set predictions will be at ./squad/predictions.json and the differences between the score of no answer ( ) and the best non null answer for each question will be in the file ./squad/null_odds.json Run this script to tune a threshold for predicting null versus non null answers: python $SQUAD_DIR/evaluate v2.0.py $SQUAD_DIR/dev v2.0.json ./squad/predictions.json na prob file ./squad/null_odds.json Assume the script outputs best_f1_thresh THRESH. (Typical values are between 1.0 and 5.0). You can now re run the model to generate predictions with the derived threshold or alternatively you can extract the appropriate answers from ./squad/nbest_predictions.json. shell python run_squad.py \ vocab_file $BERT_LARGE_DIR/vocab.txt \ bert_config_file $BERT_LARGE_DIR/bert_config.json \ init_checkpoint $BERT_LARGE_DIR/bert_model.ckpt \ do_train False \ train_file $SQUAD_DIR/train v2.0.json \ do_predict True \ predict_file $SQUAD_DIR/dev v2.0.json \ train_batch_size 24 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir gs://some_bucket/squad_large/ \ use_tpu True \ tpu_name $TPU_NAME \ version_2_with_negative True \ null_score_diff_threshold $THRESH Out of memory issues All experiments in the paper were fine tuned on a Cloud TPU, which has 64GB of device RAM. Therefore, when using a GPU with 12GB 16GB of RAM, you are likely to encounter out of memory issues if you use the same hyperparameters described in the paper. The factors that affect memory usage are: max_seq_length : The released models were trained with sequence lengths up to 512, but you can fine tune with a shorter max sequence length to save substantial memory. This is controlled by the max_seq_length flag in our example code. train_batch_size : The memory usage is also directly proportional to the batch size. Model type, BERT Base vs. BERT Large : The BERT Large model requires significantly more memory than BERT Base . Optimizer : The default optimizer for BERT is Adam, which requires a lot of extra memory to store the m and v vectors. Switching to a more memory efficient optimizer can reduce memory usage, but can also affect the results. We have not experimented with other optimizers for fine tuning. Using the default training scripts ( run_classifier.py and run_squad.py ), we benchmarked the maximum batch size on single Titan X GPU (12GB RAM) with TensorFlow 1.11.0: System Seq Length Max Batch Size BERT Base 64 64 ... 128 32 ... 256 16 ... 320 14 ... 384 12 ... 512 6 BERT Large 64 12 ... 128 6 ... 256 2 ... 320 1 ... 384 0 ... 512 0 Unfortunately, these max batch sizes for BERT Large are so small that they will actually harm the model accuracy, regardless of the learning rate used. We are working on adding code to this repository which will allow much larger effective batch sizes to be used on the GPU. The code will be based on one (or both) of the following techniques: Gradient accumulation : The samples in a minibatch are typically independent with respect to gradient computation (excluding batch normalization, which is not used here). This means that the gradients of multiple smaller minibatches can be accumulated before performing the weight update, and this will be exactly equivalent to a single larger update. Gradient checkpointing : The major use of GPU/TPU memory during DNN training is caching the intermediate activations in the forward pass that are necessary for efficient computation in the backward pass. Gradient checkpointing trades memory for compute time by re computing the activations in an intelligent way. However, this is not implemented in the current release. Using BERT to extract fixed feature vectors (like ELMo) In certain cases, rather than fine tuning the entire pre trained model end to end, it can be beneficial to obtained pre trained contextual embeddings , which are fixed contextual representations of each input token generated from the hidden layers of the pre trained model. This should also mitigate most of the out of memory issues. As an example, we include the script extract_features.py which can be used like this: shell Sentence A and Sentence B are separated by the delimiter for sentence pair tasks like question answering and entailment. For single sentence inputs, put one sentence per line and DON'T use the delimiter. echo 'Who was Jim Henson ? Jim Henson was a puppeteer' > /tmp/input.txt python extract_features.py \ input_file /tmp/input.txt \ output_file /tmp/output.jsonl \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ layers 1, 2, 3, 4 \ max_seq_length 128 \ batch_size 8 This will create a JSON file (one line per line of input) containing the BERT activations from each Transformer layer specified by layers ( 1 is the final hidden layer of the Transformer, etc.) Note that this script will produce very large output files (by default, around 15kb for every input token). If you need to maintain alignment between the original and tokenized words (for projecting training labels), see the Tokenization ( tokenization) section below. Note: You may see a message like Could not find trained model in model_dir: /tmp/tmpuB5g5c, running initialization to predict. This message is expected, it just means that we are using the init_from_checkpoint() API rather than the saved model API. If you don't specify a checkpoint or specify an invalid checkpoint, this script will complain. Tokenization For sentence level tasks (or sentence pair) tasks, tokenization is very simple. Just follow the example code in run_classifier.py and extract_features.py . The basic procedure for sentence level tasks is: 1. Instantiate an instance of tokenizer tokenization.FullTokenizer 2. Tokenize the raw text with tokens tokenizer.tokenize(raw_text) . 3. Truncate to the maximum sequence length. (You can use up to 512, but you probably want to use shorter if possible for memory and speed reasons.) 4. Add the CLS and SEP tokens in the right place. Word level and span level tasks (e.g., SQuAD and NER) are more complex, since you need to maintain alignment between your input text and output text so that you can project your training labels. SQuAD is a particularly complex example because the input labels are character based, and SQuAD paragraphs are often longer than our maximum sequence length. See the code in run_squad.py to show how we handle this. Before we describe the general recipe for handling word level tasks, it's important to understand what exactly our tokenizer is doing. It has three main steps: 1. Text normalization : Convert all whitespace characters to spaces, and (for the Uncased model) lowercase the input and strip out accent markers. E.g., John Johanson's, → john johanson's, . 2. Punctuation splitting : Split all punctuation characters on both sides (i.e., add whitespace around all punctuation characters). Punctuation characters are defined as (a) Anything with a P Unicode class, (b) any non letter/number/space ASCII character (e.g., characters like $ which are technically not punctuation). E.g., john johanson's, → john johanson ' s , 3. WordPiece tokenization : Apply whitespace tokenization to the output of the above procedure, and apply WordPiece tokenization to each token separately. (Our implementation is directly based on the one from tensor2tensor , which is linked). E.g., john johanson ' s , → john johan son ' s , The advantage of this scheme is that it is compatible with most existing English tokenizers. For example, imagine that you have a part of speech tagging task which looks like this: Input: John Johanson 's house Labels: NNP NNP POS NN The tokenized output will look like this: Tokens: john johan son ' s house Crucially, this would be the same output as if the raw text were John Johanson's house (with no space before the 's ). If you have a pre tokenized representation with word level annotations, you can simply tokenize each input word independently, and deterministically maintain an original to tokenized alignment: python Input orig_tokens John , Johanson , 's , house labels NNP , NNP , POS , NN Output bert_tokens Token map will be an int > int mapping between the orig_tokens index and the bert_tokens index. orig_to_tok_map tokenizer tokenization.FullTokenizer( vocab_file vocab_file, do_lower_case True) bert_tokens.append( CLS ) for orig_token in orig_tokens: orig_to_tok_map.append(len(bert_tokens)) bert_tokens.extend(tokenizer.tokenize(orig_token)) bert_tokens.append( SEP ) bert_tokens CLS , john , johan , son , ' , s , house , SEP orig_to_tok_map 1, 2, 4, 6 Now orig_to_tok_map can be used to project labels to the tokenized representation. There are common English tokenization schemes which will cause a slight mismatch between how BERT was pre trained. For example, if your input tokenization splits off contractions like do n't , this will cause a mismatch. If it is possible to do so, you should pre process your data to convert these back to raw looking text, but if it's not possible, this mismatch is likely not a big deal. Pre training with BERT We are releasing code to do masked LM and next sentence prediction on an arbitrary text corpus. Note that this is not the exact code that was used for the paper (the original code was written in C++, and had some additional complexity), but this code does generate pre training data as described in the paper. Here's how to run the data generation. The input is a plain text file, with one sentence per line. (It is important that these be actual sentences for the next sentence prediction task). Documents are delimited by empty lines. The output is a set of tf.train.Example s serialized into TFRecord file format. You can perform sentence segmentation with an off the shelf NLP toolkit such as spaCy . The create_pretraining_data.py script will concatenate segments until they reach the maximum sequence length to minimize computational waste from padding (see the script for more details). However, you may want to intentionally add a slight amount of noise to your input data (e.g., randomly truncate 2% of input segments) to make it more robust to non sentential input during fine tuning. This script stores all of the examples for the entire input file in memory, so for large data files you should shard the input file and call the script multiple times. (You can pass in a file glob to run_pretraining.py , e.g., tf_examples.tf_record .) The max_predictions_per_seq is the maximum number of masked LM predictions per sequence. You should set this to around max_seq_length masked_lm_prob (the script doesn't do that automatically because the exact value needs to be passed to both scripts). shell python create_pretraining_data.py \ input_file ./sample_text.txt \ output_file /tmp/tf_examples.tfrecord \ vocab_file $BERT_BASE_DIR/vocab.txt \ do_lower_case True \ max_seq_length 128 \ max_predictions_per_seq 20 \ masked_lm_prob 0.15 \ random_seed 12345 \ dupe_factor 5 Here's how to run the pre training. Do not include init_checkpoint if you are pre training from scratch. The model configuration (including vocab size) is specified in bert_config_file . This demo code only pre trains for a small number of steps (20), but in practice you will probably want to set num_train_steps to 10000 steps or more. The max_seq_length and max_predictions_per_seq parameters passed to run_pretraining.py must be the same as create_pretraining_data.py . shell python run_pretraining.py \ input_file /tmp/tf_examples.tfrecord \ output_dir /tmp/pretraining_output \ do_train True \ do_eval True \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ train_batch_size 32 \ max_seq_length 128 \ max_predictions_per_seq 20 \ num_train_steps 20 \ num_warmup_steps 10 \ learning_rate 2e 5 This will produce an output like this: Eval results global_step 20 loss 0.0979674 masked_lm_accuracy 0.985479 masked_lm_loss 0.0979328 next_sentence_accuracy 1.0 next_sentence_loss 3.45724e 05 Note that since our sample_text.txt file is very small, this example training will overfit that data in only a few steps and produce unrealistically high accuracy numbers. Pre training tips and caveats If using your own vocabulary, make sure to change vocab_size in bert_config.json . If you use a larger vocabulary without changing this, you will likely get NaNs when training on GPU or TPU due to unchecked out of bounds access. If your task has a large domain specific corpus available (e.g., movie reviews or scientific papers ), it will likely be beneficial to run additional steps of pre training on your corpus, starting from the BERT checkpoint. The learning rate we used in the paper was 1e 4. However, if you are doing additional steps of pre training starting from an existing BERT checkpoint, you should use a smaller learning rate (e.g., 2e 5). Current BERT models are English only, but we do plan to release a multilingual model which has been pre trained on a lot of languages in the near future (hopefully by the end of November 2018). Longer sequences are disproportionately expensive because attention is quadratic to the sequence length. In other words, a batch of 64 sequences of length 512 is much more expensive than a batch of 256 sequences of length 128. The fully connected/convolutional cost is the same, but the attention cost is far greater for the 512 length sequences. Therefore, one good recipe is to pre train for, say, 90,000 steps with a sequence length of 128 and then for 10,000 additional steps with a sequence length of 512. The very long sequences are mostly needed to learn positional embeddings, which can be learned fairly quickly. Note that this does require generating the data twice with different values of max_seq_length . If you are pre training from scratch, be prepared that pre training is computationally expensive, especially on GPUs. If you are pre training from scratch, our recommended recipe is to pre train a BERT Base on a single preemptible Cloud TPU v2 , which takes about 2 weeks at a cost of about $500 USD (based on the pricing in October 2018). You will have to scale down the batch size when only training on a single Cloud TPU, compared to what was used in the paper. It is recommended to use the largest batch size that fits into TPU memory. Pre training data We will not be able to release the pre processed datasets used in the paper. For Wikipedia, the recommended pre processing is to download the latest dump , extract the text with WikiExtractor.py , and then apply any necessary cleanup to convert it into plain text. Unfortunately the researchers who collected the BookCorpus no longer have it available for public download. The Project Guttenberg Dataset is a somewhat smaller (200M word) collection of older books that are public domain. Common Crawl is another very large collection of text, but you will likely have to do substantial pre processing and cleanup to extract a usable corpus for pre training BERT. Learning a new WordPiece vocabulary This repository does not include code for learning a new WordPiece vocabulary. The reason is that the code used in the paper was implemented in C++ with dependencies on Google's internal libraries. For English, it is almost always better to just start with our vocabulary and pre trained models. For learning vocabularies of other languages, there are a number of open source options available. However, keep in mind that these are not compatible with our tokenization.py library: Google's SentencePiece library tensor2tensor's WordPiece generation script Rico Sennrich's Byte Pair Encoding library Using BERT in Colab If you want to use BERT with Colab , you can get started with the notebook BERT FineTuning with Cloud TPUs . At the time of this writing (October 31st, 2018), Colab users can access a Cloud TPU completely for free. Note: One per user, availability limited, requires a Google Cloud Platform account with storage (although storage may be purchased with free credit for signing up with GCP), and this capability may not longer be available in the future. Click on the BERT Colab that was just linked for more information. FAQ Is this code compatible with Cloud TPUs? What about GPUs? Yes, all of the code in this repository works out of the box with CPU, GPU, and Cloud TPU. However, GPU training is single GPU only. I am getting out of memory errors, what is wrong? See the section on out of memory issues ( out of memory issues) for more information. Is there a PyTorch version available? There is no official PyTorch implementation. However, NLP researchers from HuggingFace made a PyTorch version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. We were not involved in the creation or maintenance of the PyTorch implementation so please direct any questions towards the authors of that repository. Is there a Chainer version available? There is no official Chainer implementation. However, Sosuke Kobayashi made a Chainer version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. We were not involved in the creation or maintenance of the Chainer implementation so please direct any questions towards the authors of that repository. Will models in other languages be released? Yes, we plan to release a multi lingual BERT model in the near future. We cannot make promises about exactly which languages will be included, but it will likely be a single model which includes most of the languages which have a significantly sized Wikipedia. Will models larger than BERT Large be released? So far we have not attempted to train anything larger than BERT Large . It is possible that we will release larger models if we are able to obtain significant improvements. What license is this library released under? All code and models are released under the Apache 2.0 license. See the LICENSE file for more information. How do I cite BERT? For now, cite the Arxiv paper : @article{devlin2018bert, title {BERT: Pre training of Deep Bidirectional Transformers for Language Understanding}, author {Devlin, Jacob and Chang, Ming Wei and Lee, Kenton and Toutanova, Kristina}, journal {arXiv preprint arXiv:1810.04805}, year {2018} } If we submit the paper to a conference or journal, we will update the BibTeX. Disclaimer This is not an official Google product. Contact information For help or issues using BERT, please submit a GitHub issue. For personal communication related to BERT, please contact Jacob Devlin ( jacobdevlin@google.com ), Ming Wei Chang ( mingweichang@google.com ), or Kenton Lee ( kentonl@google.com ).",Machine Translation,Machine Translation 2756,Natural Language Processing,Natural Language Processing,Natural Language Processing,"BERT \ \ \ \ \ New November 23rd, 2018: Un normalized multilingual model + Thai + Mongolian \ \ \ \ \ We uploaded a new multilingual model which does not perform any normalization on the input (no lower casing, accent stripping, or Unicode normalization), and additionally includes Thai and Mongolian. It is recommended to use this version for developing multilingual models, especially on languages with non Latin alphabets. This does not require any code changes, and can be downloaded here: BERT Base, Multilingual Cased : 104 languages, 12 layer, 768 hidden, 12 heads, 110M parameters \ \ \ \ \ New November 15th, 2018: SOTA SQuAD 2.0 System \ \ \ \ \ We released code changes to reproduce our 83% F1 SQuAD 2.0 system, which is currently 1st place on the leaderboard by 3%. See the SQuAD 2.0 section of the README for details. \ \ \ \ \ New November 5th, 2018: Third party PyTorch and Chainer versions of BERT available \ \ \ \ \ NLP researchers from HuggingFace made a PyTorch version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. Sosuke Kobayashi also made a Chainer version of BERT available (Thanks!) We were not involved in the creation or maintenance of the PyTorch implementation so please direct any questions towards the authors of that repository. \ \ \ \ \ New November 3rd, 2018: Multilingual and Chinese models available \ \ \ \ \ We have made two new BERT models available: BERT Base, Multilingual : 102 languages, 12 layer, 768 hidden, 12 heads, 110M parameters BERT Base, Chinese : Chinese Simplified and Traditional, 12 layer, 768 hidden, 12 heads, 110M parameters We use character based tokenization for Chinese, and WordPiece tokenization for all other languages. Both models should work out of the box without any code changes. We did update the implementation of BasicTokenizer in tokenization.py to support Chinese character tokenization, so please update if you forked it. However, we did not change the tokenization API. For more, see the Multilingual README . \ \ \ \ \ End new information \ \ \ \ \ Introduction BERT , or B idirectional E ncoder R epresentations from T ransformers, is a new method of pre training language representations which obtains state of the art results on a wide array of Natural Language Processing (NLP) tasks. Our academic paper which describes BERT in detail and provides full results on a number of tasks can be found here: To give a few numbers, here are the results on the SQuAD v1.1 question answering task: SQuAD v1.1 Leaderboard (Oct 8th 2018) Test EM Test F1 : : : : 1st Place Ensemble BERT 87.4 93.2 2nd Place Ensemble nlnet 86.0 91.7 1st Place Single Model BERT 85.1 91.8 2nd Place Single Model nlnet 83.5 90.1 And several natural language inference tasks: System MultiNLI Question NLI SWAG : : : : : : BERT 86.7 91.1 86.3 OpenAI GPT (Prev. SOTA) 82.2 88.1 75.0 Plus many other tasks. Moreover, these results were all obtained with almost no task specific neural network architecture design. If you already know what BERT is and you just want to get started, you can download the pre trained models ( pre trained models) and run a state of the art fine tuning ( fine tuning with bert) in only a few minutes. What is BERT? BERT is a method of pre training language representations, meaning that we train a general purpose language understanding model on a large text corpus (like Wikipedia), and then use that model for downstream NLP tasks that we care about (like question answering). BERT outperforms previous methods because it is the first unsupervised , deeply bidirectional system for pre training NLP. Unsupervised means that BERT was trained using only a plain text corpus, which is important because an enormous amount of plain text data is publicly available on the web in many languages. Pre trained representations can also either be context free or contextual , and contextual representations can further be unidirectional or bidirectional . Context free models such as word2vec or GloVe generate a single word embedding representation for each word in the vocabulary, so bank would have the same representation in bank deposit and river bank . Contextual models instead generate a representation of each word that is based on the other words in the sentence. BERT was built upon recent work in pre training contextual representations — including Semi supervised Sequence Learning , Generative Pre Training , ELMo , and ULMFit — but crucially these models are all unidirectional or shallowly bidirectional . This means that each word is only contextualized using the words to its left (or right). For example, in the sentence I made a bank deposit the unidirectional representation of bank is only based on I made a but not deposit . Some previous work does combine the representations from separate left context and right context models, but only in a shallow manner. BERT represents bank using both its left and right context — I made a ... deposit — starting from the very bottom of a deep neural network, so it is deeply bidirectional . BERT uses a simple approach for this: We mask out 15% of the words in the input, run the entire sequence through a deep bidirectional Transformer encoder, and then predict only the masked words. For example: Input: the man went to the MASK1 . he bought a MASK2 of milk. Labels: MASK1 store; MASK2 gallon In order to learn relationships between sentences, we also train on a simple task which can be generated from any monolingual corpus: Given two sentences A and B , is B the actual next sentence that comes after A , or just a random sentence from the corpus? Sentence A: the man went to the store . Sentence B: he bought a gallon of milk . Label: IsNextSentence Sentence A: the man went to the store . Sentence B: penguins are flightless . Label: NotNextSentence We then train a large model (12 layer to 24 layer Transformer) on a large corpus (Wikipedia + BookCorpus ) for a long time (1M update steps), and that's BERT. Using BERT has two stages: Pre training and fine tuning . Pre training is fairly expensive (four days on 4 to 16 Cloud TPUs), but is a one time procedure for each language (current models are English only, but multilingual models will be released in the near future). We are releasing a number of pre trained models from the paper which were pre trained at Google. Most NLP researchers will never need to pre train their own model from scratch. Fine tuning is inexpensive. All of the results in the paper can be replicated in at most 1 hour on a single Cloud TPU, or a few hours on a GPU, starting from the exact same pre trained model. SQuAD, for example, can be trained in around 30 minutes on a single Cloud TPU to achieve a Dev F1 score of 91.0%, which is the single system state of the art. The other important aspect of BERT is that it can be adapted to many types of NLP tasks very easily. In the paper, we demonstrate state of the art results on sentence level (e.g., SST 2), sentence pair level (e.g., MultiNLI), word level (e.g., NER), and span level (e.g., SQuAD) tasks with almost no task specific modifications. What has been released in this repository? We are releasing the following: TensorFlow code for the BERT model architecture (which is mostly a standard Transformer architecture). Pre trained checkpoints for both the lowercase and cased version of BERT Base and BERT Large from the paper. TensorFlow code for push button replication of the most important fine tuning experiments from the paper, including SQuAD, MultiNLI, and MRPC. All of the code in this repository works out of the box with CPU, GPU, and Cloud TPU. Pre trained models We are releasing the BERT Base and BERT Large models from the paper. Uncased means that the text has been lowercased before WordPiece tokenization, e.g., John Smith becomes john smith . The Uncased model also strips out any accent markers. Cased means that the true case and accent markers are preserved. Typically, the Uncased model is better unless you know that case information is important for your task (e.g., Named Entity Recognition or Part of Speech tagging). These models are all released under the same license as the source code (Apache 2.0). For information about the Multilingual and Chinese model, see the Multilingual README . When using a cased model, make sure to pass do_lower False to the training scripts. (Or pass do_lower_case False directly to FullTokenizer if you're using your own script.) The links to the models are here (right click, 'Save link as...' on the name): BERT Base, Uncased : 12 layer, 768 hidden, 12 heads, 110M parameters BERT Large, Uncased : 24 layer, 1024 hidden, 16 heads, 340M parameters BERT Base, Cased : 12 layer, 768 hidden, 12 heads , 110M parameters BERT Large, Cased : 24 layer, 1024 hidden, 16 heads, 340M parameters BERT Base, Multilingual Cased (New, recommended) : 104 languages, 12 layer, 768 hidden, 12 heads, 110M parameters BERT Base, Multilingual Uncased (Orig, not recommended) : 102 languages, 12 layer, 768 hidden, 12 heads, 110M parameters BERT Base, Chinese : Chinese Simplified and Traditional, 12 layer, 768 hidden, 12 heads, 110M parameters Each .zip file contains three items: A TensorFlow checkpoint ( bert_model.ckpt ) containing the pre trained weights (which is actually 3 files). A vocab file ( vocab.txt ) to map WordPiece to word id. A config file ( bert_config.json ) which specifies the hyperparameters of the model. Fine tuning with BERT Important : All results on the paper were fine tuned on a single Cloud TPU, which has 64GB of RAM. It is currently not possible to re produce most of the BERT Large results on the paper using a GPU with 12GB 16GB of RAM, because the maximum batch size that can fit in memory is too small. We are working on adding code to this repository which allows for much larger effective batch size on the GPU. See the section on out of memory issues ( out of memory issues) for more details. This code was tested with TensorFlow 1.11.0. It was tested with Python2 and Python3 (but more thoroughly with Python2, since this is what's used internally in Google). The fine tuning examples which use BERT Base should be able to run on a GPU that has at least 12GB of RAM using the hyperparameters given. Fine tuning with Cloud TPUs Most of the examples below assumes that you will be running training/evaluation on your local machine, using a GPU like a Titan X or GTX 1080. However, if you have access to a Cloud TPU that you want to train on, just add the following flags to run_classifier.py or run_squad.py : use_tpu True \ tpu_name $TPU_NAME Please see the Google Cloud TPU tutorial for how to use Cloud TPUs. Alternatively, you can use the Google Colab notebook BERT FineTuning with Cloud TPUs . On Cloud TPUs, the pretrained model and the output directory will need to be on Google Cloud Storage. For example, if you have a bucket named some_bucket , you might use the following flags instead: output_dir gs://some_bucket/my_output_dir/ The unzipped pre trained model files can also be found in the Google Cloud Storage folder gs://bert_models/2018_10_18 . For example: export BERT_BASE_DIR gs://bert_models/2018_10_18/uncased_L 12_H 768_A 12 Sentence (and sentence pair) classification tasks Before running this example you must download the GLUE data by running this script and unpack it to some directory $GLUE_DIR . Next, download the BERT Base checkpoint and unzip it to some directory $BERT_BASE_DIR . This example code fine tunes BERT Base on the Microsoft Research Paraphrase Corpus (MRPC) corpus, which only contains 3,600 examples and can fine tune in a few minutes on most GPUs. shell export BERT_BASE_DIR /path/to/bert/uncased_L 12_H 768_A 12 export GLUE_DIR /path/to/glue python run_classifier.py \ task_name MRPC \ do_train true \ do_eval true \ data_dir $GLUE_DIR/MRPC \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ max_seq_length 128 \ train_batch_size 32 \ learning_rate 2e 5 \ num_train_epochs 3.0 \ output_dir /tmp/mrpc_output/ You should see output like this: Eval results eval_accuracy 0.845588 eval_loss 0.505248 global_step 343 loss 0.505248 This means that the Dev set accuracy was 84.55%. Small sets like MRPC have a high variance in the Dev set accuracy, even when starting from the same pre training checkpoint. If you re run multiple times (making sure to point to different output_dir ), you should see results between 84% and 88%. A few other pre trained models are implemented off the shelf in run_classifier.py , so it should be straightforward to follow those examples to use BERT for any single sentence or sentence pair classification task. Note: You might see a message Running train on CPU . This really just means that it's running on something other than a Cloud TPU, which includes a GPU. Prediction from classifier Once you have trained your classifier you can use it in inference mode by using the do_predict true command. You need to have a file named test.tsv in the input folder. Output will be created in file called test_results.tsv in the output folder. Each line will contain output for each sample, columns are the class probabilities. shell export BERT_BASE_DIR /path/to/bert/uncased_L 12_H 768_A 12 export GLUE_DIR /path/to/glue export TRAINED_CLASSIFIER /path/to/fine/tuned/classifier python run_classifier.py \ task_name MRPC \ do_predict true \ data_dir $GLUE_DIR/MRPC \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $TRAINED_CLASSIFIER \ max_seq_length 128 \ output_dir /tmp/mrpc_output/ SQuAD 1.1 The Stanford Question Answering Dataset (SQuAD) is a popular question answering benchmark dataset. BERT (at the time of the release) obtains state of the art results on SQuAD with almost no task specific network architecture modifications or data augmentation. However, it does require semi complex data pre processing and post processing to deal with (a) the variable length nature of SQuAD context paragraphs, and (b) the character level answer annotations which are used for SQuAD training. This processing is implemented and documented in run_squad.py . To run on SQuAD, you will first need to download the dataset. The SQuAD website does not seem to link to the v1.1 datasets any longer, but the necessary files can be found here: train v1.1.json dev v1.1.json evaluate v1.1.py Download these to some directory $SQUAD_DIR . The state of the art SQuAD results from the paper currently cannot be reproduced on a 12GB 16GB GPU due to memory constraints (in fact, even batch size 1 does not seem to fit on a 12GB GPU using BERT Large ). However, a reasonably strong BERT Base model can be trained on the GPU with these hyperparameters: shell python run_squad.py \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ do_train True \ train_file $SQUAD_DIR/train v1.1.json \ do_predict True \ predict_file $SQUAD_DIR/dev v1.1.json \ train_batch_size 12 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir /tmp/squad_base/ The dev set predictions will be saved into a file called predictions.json in the output_dir : shell python $SQUAD_DIR/evaluate v1.1.py $SQUAD_DIR/dev v1.1.json ./squad/predictions.json Which should produce an output like this: shell { f1 : 88.41249612335034, exact_match : 81.2488174077578} You should see a result similar to the 88.5% reported in the paper for BERT Base . If you have access to a Cloud TPU, you can train with BERT Large . Here is a set of hyperparameters (slightly different than the paper) which consistently obtain around 90.5% 91.0% F1 single system trained only on SQuAD: shell python run_squad.py \ vocab_file $BERT_LARGE_DIR/vocab.txt \ bert_config_file $BERT_LARGE_DIR/bert_config.json \ init_checkpoint $BERT_LARGE_DIR/bert_model.ckpt \ do_train True \ train_file $SQUAD_DIR/train v1.1.json \ do_predict True \ predict_file $SQUAD_DIR/dev v1.1.json \ train_batch_size 24 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir gs://some_bucket/squad_large/ \ use_tpu True \ tpu_name $TPU_NAME For example, one random run with these parameters produces the following Dev scores: shell { f1 : 90.87081895814865, exact_match : 84.38978240302744} If you fine tune for one epoch on TriviaQA before this the results will be even better, but you will need to convert TriviaQA into the SQuAD json format. SQuAD 2.0 This model is also implemented and documented in run_squad.py . To run on SQuAD 2.0, you will first need to download the dataset. The necessary files can be found here: train v2.0.json dev v2.0.json evaluate v2.0.py Download these to some directory $SQUAD_DIR . On Cloud TPU you can run with BERT Large as follows: shell python run_squad.py \ vocab_file $BERT_LARGE_DIR/vocab.txt \ bert_config_file $BERT_LARGE_DIR/bert_config.json \ init_checkpoint $BERT_LARGE_DIR/bert_model.ckpt \ do_train True \ train_file $SQUAD_DIR/train v2.0.json \ do_predict True \ predict_file $SQUAD_DIR/dev v2.0.json \ train_batch_size 24 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir gs://some_bucket/squad_large/ \ use_tpu True \ tpu_name $TPU_NAME \ version_2_with_negative True We assume you have copied everything from the output directory to a local directory called ./squad/. The initial dev set predictions will be at ./squad/predictions.json and the differences between the score of no answer ( ) and the best non null answer for each question will be in the file ./squad/null_odds.json Run this script to tune a threshold for predicting null versus non null answers: python $SQUAD_DIR/evaluate v2.0.py $SQUAD_DIR/dev v2.0.json ./squad/predictions.json na prob file ./squad/null_odds.json Assume the script outputs best_f1_thresh THRESH. (Typical values are between 1.0 and 5.0). You can now re run the model to generate predictions with the derived threshold or alternatively you can extract the appropriate answers from ./squad/nbest_predictions.json. shell python run_squad.py \ vocab_file $BERT_LARGE_DIR/vocab.txt \ bert_config_file $BERT_LARGE_DIR/bert_config.json \ init_checkpoint $BERT_LARGE_DIR/bert_model.ckpt \ do_train False \ train_file $SQUAD_DIR/train v2.0.json \ do_predict True \ predict_file $SQUAD_DIR/dev v2.0.json \ train_batch_size 24 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir gs://some_bucket/squad_large/ \ use_tpu True \ tpu_name $TPU_NAME \ version_2_with_negative True \ null_score_diff_threshold $THRESH Out of memory issues All experiments in the paper were fine tuned on a Cloud TPU, which has 64GB of device RAM. Therefore, when using a GPU with 12GB 16GB of RAM, you are likely to encounter out of memory issues if you use the same hyperparameters described in the paper. The factors that affect memory usage are: max_seq_length : The released models were trained with sequence lengths up to 512, but you can fine tune with a shorter max sequence length to save substantial memory. This is controlled by the max_seq_length flag in our example code. train_batch_size : The memory usage is also directly proportional to the batch size. Model type, BERT Base vs. BERT Large : The BERT Large model requires significantly more memory than BERT Base . Optimizer : The default optimizer for BERT is Adam, which requires a lot of extra memory to store the m and v vectors. Switching to a more memory efficient optimizer can reduce memory usage, but can also affect the results. We have not experimented with other optimizers for fine tuning. Using the default training scripts ( run_classifier.py and run_squad.py ), we benchmarked the maximum batch size on single Titan X GPU (12GB RAM) with TensorFlow 1.11.0: System Seq Length Max Batch Size BERT Base 64 64 ... 128 32 ... 256 16 ... 320 14 ... 384 12 ... 512 6 BERT Large 64 12 ... 128 6 ... 256 2 ... 320 1 ... 384 0 ... 512 0 Unfortunately, these max batch sizes for BERT Large are so small that they will actually harm the model accuracy, regardless of the learning rate used. We are working on adding code to this repository which will allow much larger effective batch sizes to be used on the GPU. The code will be based on one (or both) of the following techniques: Gradient accumulation : The samples in a minibatch are typically independent with respect to gradient computation (excluding batch normalization, which is not used here). This means that the gradients of multiple smaller minibatches can be accumulated before performing the weight update, and this will be exactly equivalent to a single larger update. Gradient checkpointing : The major use of GPU/TPU memory during DNN training is caching the intermediate activations in the forward pass that are necessary for efficient computation in the backward pass. Gradient checkpointing trades memory for compute time by re computing the activations in an intelligent way. However, this is not implemented in the current release. Using BERT to extract fixed feature vectors (like ELMo) In certain cases, rather than fine tuning the entire pre trained model end to end, it can be beneficial to obtained pre trained contextual embeddings , which are fixed contextual representations of each input token generated from the hidden layers of the pre trained model. This should also mitigate most of the out of memory issues. As an example, we include the script extract_features.py which can be used like this: shell Sentence A and Sentence B are separated by the delimiter for sentence pair tasks like question answering and entailment. For single sentence inputs, put one sentence per line and DON'T use the delimiter. echo 'Who was Jim Henson ? Jim Henson was a puppeteer' > /tmp/input.txt python extract_features.py \ input_file /tmp/input.txt \ output_file /tmp/output.jsonl \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ layers 1, 2, 3, 4 \ max_seq_length 128 \ batch_size 8 This will create a JSON file (one line per line of input) containing the BERT activations from each Transformer layer specified by layers ( 1 is the final hidden layer of the Transformer, etc.) Note that this script will produce very large output files (by default, around 15kb for every input token). If you need to maintain alignment between the original and tokenized words (for projecting training labels), see the Tokenization ( tokenization) section below. Note: You may see a message like Could not find trained model in model_dir: /tmp/tmpuB5g5c, running initialization to predict. This message is expected, it just means that we are using the init_from_checkpoint() API rather than the saved model API. If you don't specify a checkpoint or specify an invalid checkpoint, this script will complain. Tokenization For sentence level tasks (or sentence pair) tasks, tokenization is very simple. Just follow the example code in run_classifier.py and extract_features.py . The basic procedure for sentence level tasks is: 1. Instantiate an instance of tokenizer tokenization.FullTokenizer 2. Tokenize the raw text with tokens tokenizer.tokenize(raw_text) . 3. Truncate to the maximum sequence length. (You can use up to 512, but you probably want to use shorter if possible for memory and speed reasons.) 4. Add the CLS and SEP tokens in the right place. Word level and span level tasks (e.g., SQuAD and NER) are more complex, since you need to maintain alignment between your input text and output text so that you can project your training labels. SQuAD is a particularly complex example because the input labels are character based, and SQuAD paragraphs are often longer than our maximum sequence length. See the code in run_squad.py to show how we handle this. Before we describe the general recipe for handling word level tasks, it's important to understand what exactly our tokenizer is doing. It has three main steps: 1. Text normalization : Convert all whitespace characters to spaces, and (for the Uncased model) lowercase the input and strip out accent markers. E.g., John Johanson's, → john johanson's, . 2. Punctuation splitting : Split all punctuation characters on both sides (i.e., add whitespace around all punctuation characters). Punctuation characters are defined as (a) Anything with a P Unicode class, (b) any non letter/number/space ASCII character (e.g., characters like $ which are technically not punctuation). E.g., john johanson's, → john johanson ' s , 3. WordPiece tokenization : Apply whitespace tokenization to the output of the above procedure, and apply WordPiece tokenization to each token separately. (Our implementation is directly based on the one from tensor2tensor , which is linked). E.g., john johanson ' s , → john johan son ' s , The advantage of this scheme is that it is compatible with most existing English tokenizers. For example, imagine that you have a part of speech tagging task which looks like this: Input: John Johanson 's house Labels: NNP NNP POS NN The tokenized output will look like this: Tokens: john johan son ' s house Crucially, this would be the same output as if the raw text were John Johanson's house (with no space before the 's ). If you have a pre tokenized representation with word level annotations, you can simply tokenize each input word independently, and deterministically maintain an original to tokenized alignment: python Input orig_tokens John , Johanson , 's , house labels NNP , NNP , POS , NN Output bert_tokens Token map will be an int > int mapping between the orig_tokens index and the bert_tokens index. orig_to_tok_map tokenizer tokenization.FullTokenizer( vocab_file vocab_file, do_lower_case True) bert_tokens.append( CLS ) for orig_token in orig_tokens: orig_to_tok_map.append(len(bert_tokens)) bert_tokens.extend(tokenizer.tokenize(orig_token)) bert_tokens.append( SEP ) bert_tokens CLS , john , johan , son , ' , s , house , SEP orig_to_tok_map 1, 2, 4, 6 Now orig_to_tok_map can be used to project labels to the tokenized representation. There are common English tokenization schemes which will cause a slight mismatch between how BERT was pre trained. For example, if your input tokenization splits off contractions like do n't , this will cause a mismatch. If it is possible to do so, you should pre process your data to convert these back to raw looking text, but if it's not possible, this mismatch is likely not a big deal. Pre training with BERT We are releasing code to do masked LM and next sentence prediction on an arbitrary text corpus. Note that this is not the exact code that was used for the paper (the original code was written in C++, and had some additional complexity), but this code does generate pre training data as described in the paper. Here's how to run the data generation. The input is a plain text file, with one sentence per line. (It is important that these be actual sentences for the next sentence prediction task). Documents are delimited by empty lines. The output is a set of tf.train.Example s serialized into TFRecord file format. You can perform sentence segmentation with an off the shelf NLP toolkit such as spaCy . The create_pretraining_data.py script will concatenate segments until they reach the maximum sequence length to minimize computational waste from padding (see the script for more details). However, you may want to intentionally add a slight amount of noise to your input data (e.g., randomly truncate 2% of input segments) to make it more robust to non sentential input during fine tuning. This script stores all of the examples for the entire input file in memory, so for large data files you should shard the input file and call the script multiple times. (You can pass in a file glob to run_pretraining.py , e.g., tf_examples.tf_record .) The max_predictions_per_seq is the maximum number of masked LM predictions per sequence. You should set this to around max_seq_length masked_lm_prob (the script doesn't do that automatically because the exact value needs to be passed to both scripts). shell python create_pretraining_data.py \ input_file ./sample_text.txt \ output_file /tmp/tf_examples.tfrecord \ vocab_file $BERT_BASE_DIR/vocab.txt \ do_lower_case True \ max_seq_length 128 \ max_predictions_per_seq 20 \ masked_lm_prob 0.15 \ random_seed 12345 \ dupe_factor 5 Here's how to run the pre training. Do not include init_checkpoint if you are pre training from scratch. The model configuration (including vocab size) is specified in bert_config_file . This demo code only pre trains for a small number of steps (20), but in practice you will probably want to set num_train_steps to 10000 steps or more. The max_seq_length and max_predictions_per_seq parameters passed to run_pretraining.py must be the same as create_pretraining_data.py . shell python run_pretraining.py \ input_file /tmp/tf_examples.tfrecord \ output_dir /tmp/pretraining_output \ do_train True \ do_eval True \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ train_batch_size 32 \ max_seq_length 128 \ max_predictions_per_seq 20 \ num_train_steps 20 \ num_warmup_steps 10 \ learning_rate 2e 5 This will produce an output like this: Eval results global_step 20 loss 0.0979674 masked_lm_accuracy 0.985479 masked_lm_loss 0.0979328 next_sentence_accuracy 1.0 next_sentence_loss 3.45724e 05 Note that since our sample_text.txt file is very small, this example training will overfit that data in only a few steps and produce unrealistically high accuracy numbers. Pre training tips and caveats If using your own vocabulary, make sure to change vocab_size in bert_config.json . If you use a larger vocabulary without changing this, you will likely get NaNs when training on GPU or TPU due to unchecked out of bounds access. If your task has a large domain specific corpus available (e.g., movie reviews or scientific papers ), it will likely be beneficial to run additional steps of pre training on your corpus, starting from the BERT checkpoint. The learning rate we used in the paper was 1e 4. However, if you are doing additional steps of pre training starting from an existing BERT checkpoint, you should use a smaller learning rate (e.g., 2e 5). Current BERT models are English only, but we do plan to release a multilingual model which has been pre trained on a lot of languages in the near future (hopefully by the end of November 2018). Longer sequences are disproportionately expensive because attention is quadratic to the sequence length. In other words, a batch of 64 sequences of length 512 is much more expensive than a batch of 256 sequences of length 128. The fully connected/convolutional cost is the same, but the attention cost is far greater for the 512 length sequences. Therefore, one good recipe is to pre train for, say, 90,000 steps with a sequence length of 128 and then for 10,000 additional steps with a sequence length of 512. The very long sequences are mostly needed to learn positional embeddings, which can be learned fairly quickly. Note that this does require generating the data twice with different values of max_seq_length . If you are pre training from scratch, be prepared that pre training is computationally expensive, especially on GPUs. If you are pre training from scratch, our recommended recipe is to pre train a BERT Base on a single preemptible Cloud TPU v2 , which takes about 2 weeks at a cost of about $500 USD (based on the pricing in October 2018). You will have to scale down the batch size when only training on a single Cloud TPU, compared to what was used in the paper. It is recommended to use the largest batch size that fits into TPU memory. Pre training data We will not be able to release the pre processed datasets used in the paper. For Wikipedia, the recommended pre processing is to download the latest dump , extract the text with WikiExtractor.py , and then apply any necessary cleanup to convert it into plain text. Unfortunately the researchers who collected the BookCorpus no longer have it available for public download. The Project Guttenberg Dataset is a somewhat smaller (200M word) collection of older books that are public domain. Common Crawl is another very large collection of text, but you will likely have to do substantial pre processing and cleanup to extract a usable corpus for pre training BERT. Learning a new WordPiece vocabulary This repository does not include code for learning a new WordPiece vocabulary. The reason is that the code used in the paper was implemented in C++ with dependencies on Google's internal libraries. For English, it is almost always better to just start with our vocabulary and pre trained models. For learning vocabularies of other languages, there are a number of open source options available. However, keep in mind that these are not compatible with our tokenization.py library: Google's SentencePiece library tensor2tensor's WordPiece generation script Rico Sennrich's Byte Pair Encoding library Using BERT in Colab If you want to use BERT with Colab , you can get started with the notebook BERT FineTuning with Cloud TPUs . At the time of this writing (October 31st, 2018), Colab users can access a Cloud TPU completely for free. Note: One per user, availability limited, requires a Google Cloud Platform account with storage (although storage may be purchased with free credit for signing up with GCP), and this capability may not longer be available in the future. Click on the BERT Colab that was just linked for more information. FAQ Is this code compatible with Cloud TPUs? What about GPUs? Yes, all of the code in this repository works out of the box with CPU, GPU, and Cloud TPU. However, GPU training is single GPU only. I am getting out of memory errors, what is wrong? See the section on out of memory issues ( out of memory issues) for more information. Is there a PyTorch version available? There is no official PyTorch implementation. However, NLP researchers from HuggingFace made a PyTorch version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. We were not involved in the creation or maintenance of the PyTorch implementation so please direct any questions towards the authors of that repository. Is there a Chainer version available? There is no official Chainer implementation. However, Sosuke Kobayashi made a Chainer version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. We were not involved in the creation or maintenance of the Chainer implementation so please direct any questions towards the authors of that repository. Will models in other languages be released? Yes, we plan to release a multi lingual BERT model in the near future. We cannot make promises about exactly which languages will be included, but it will likely be a single model which includes most of the languages which have a significantly sized Wikipedia. Will models larger than BERT Large be released? So far we have not attempted to train anything larger than BERT Large . It is possible that we will release larger models if we are able to obtain significant improvements. What license is this library released under? All code and models are released under the Apache 2.0 license. See the LICENSE file for more information. How do I cite BERT? For now, cite the Arxiv paper : @article{devlin2018bert, title {BERT: Pre training of Deep Bidirectional Transformers for Language Understanding}, author {Devlin, Jacob and Chang, Ming Wei and Lee, Kenton and Toutanova, Kristina}, journal {arXiv preprint arXiv:1810.04805}, year {2018} } If we submit the paper to a conference or journal, we will update the BibTeX. Disclaimer This is not an official Google product. Contact information For help or issues using BERT, please submit a GitHub issue. For personal communication related to BERT, please contact Jacob Devlin ( jacobdevlin@google.com ), Ming Wei Chang ( mingweichang@google.com ), or Kenton Lee ( kentonl@google.com ).",Machine Translation,Machine Translation 2768,Natural Language Processing,Natural Language Processing,Natural Language Processing,"A Pytorch Implementation of the Transformer Network This repository includes pytorch implementations of Attention is All You Need (Vaswani et al., NIPS 2017) and Weighted Transformer Network for Machine Translation (Ahmed et al., arXiv 2017) Reference Paper Vaswani et al., Attention is All You Need , NIPS 2017 Ahmed et al., Weighted Transformer Network for Machine Translation , Arxiv 2017 Code jadore801120/attention is all you need OpenNMT/OpenNMT py The Annotated Transformers",Machine Translation,Machine Translation 2776,Natural Language Processing,Natural Language Processing,Natural Language Processing,"XLM PyTorch original implementation of Cross lingual Language Model Pretraining . Provides a cross lingual implementation of BERT, with state of the art results on XNLI, and unsupervised MT. ! Model XLM contains code for: Language model pretraining: Causal Language Model (CLM) monolingual Masked Language Model (MLM) monolingual Translation Language Model (TLM) cross lingual Supervised / Unsupervised MT training: Denoising auto encoder Parallel data training Online back translation XNLI fine tuning GLUE fine tuning XLM supports multi GPU and multi node training. Pretrained models We provide pretrained cross lingual language models, all trained with the MLM objective (see training command below): Languages Model BPE codes Vocabulary : : : : : English French Model BPE codes Vocabulary English German Model BPE codes Vocabulary English Romanian Model BPE codes Vocabulary XNLI 15 Model BPE codes Vocabulary The English French, English German and English Romanian models are the ones we used in the paper for MT pretraining. If you use these models, you should use the same data preprocessing / BPE codes to preprocess your data. See the preprocessing commands in get data nmt.sh . XNLI 15 is the model used in the paper for XNLI fine tuning. It handles English, French, Spanish, German, Greek, Bulgarian, Russian, Turkish, Arabic, Vietnamese, Thai, Chinese, Hindi, Swahili and Urdu. For this model we used a different preprocessing than for the MT models (such as lowercasing and accents removal). Generating cross lingual sentence representations This notebook (generate embeddings.ipynb) provides an example to quickly obtain cross lingual sentence representations from a pretrained model. Dependencies Python 3 NumPy PyTorch (currently tested on version 0.4 and 1.0) fastBPE (generate and apply BPE codes) Moses (scripts to clean and tokenize text only no installation required) Apex (for fp16 training) Supervised / Unsupervised MT experiments Download / preprocess data To download the data required for the unsupervised MT experiments, simply run: git clone cd XLM And one of the three commands below: ./get data nmt.sh src en tgt fr ./get data nmt.sh src de tgt en ./get data nmt.sh src en tgt ro for English French, German English, or English Romanian experiments. The script will successively: download Moses scripts, download and compile fastBPE download, extract, tokenize, apply BPE to monolingual and parallel test data binarize all datasets If you want to use our pretrained models, you need to have an exactly identical vocabulary. Since small differences can happen during preprocessing, we recommend that you use our BPE codes and vocabulary (although you should get something almost identical if you learn the codes and compute the vocabulary yourself). This will ensure that the vocabulary of your preprocessed data perfectly matches the one of our pretrained models, and that there is not a word / index mismatch. To do so, simply run: wget wget ./get data nmt.sh src en tgt fr reload_codes codes_enfr reload_vocab vocab_enfr get data nmt.sh contains a few parameters defined at the beginning of the file: N_MONO number of monolingual sentences for each language (default 5000000) CODES number of BPE codes (default 60000) N_THREADS number of threads in data preprocessing (default 16) The default number of monolingual data is 5M sentences, but using more monolingual data will significantly improve the quality of pretrained models. In practice, the models we release for MT are trained on all NewsCrawl data available, i.e. about 260M, 200M and 65M sentences for German, English and French respectively. The script should output a data summary that contains the location of all files required to start experiments: Data summary Monolingual training data: en: ./data/processed/en fr/train.en.pth fr: ./data/processed/en fr/train.fr.pth Monolingual validation data: en: ./data/processed/en fr/valid.en.pth fr: ./data/processed/en fr/valid.fr.pth Monolingual test data: en: ./data/processed/en fr/test.en.pth fr: ./data/processed/en fr/test.fr.pth Parallel validation data: en: ./data/processed/en fr/valid.en fr.en.pth fr: ./data/processed/en fr/valid.en fr.fr.pth Parallel test data: en: ./data/processed/en fr/test.en fr.en.pth fr: ./data/processed/en fr/test.en fr.fr.pth Pretrain a language model (with MLM) The following script will pretrain a model with the MLM objective for English and French: python train.py main parameters exp_name test_enfr_mlm experiment name dump_path ./dumped/ where to store the experiment data location / training objective data_path ./data/processed/en fr/ data location lgs 'en fr' considered languages clm_steps '' CLM objective mlm_steps 'en,fr' MLM objective transformer parameters emb_dim 1024 embeddings / model dimension n_layers 6 number of layers n_heads 8 number of heads dropout 0.1 dropout attention_dropout 0.1 attention dropout gelu_activation true GELU instead of ReLU optimization batch_size 32 sequences per batch bptt 256 sequences length optimizer adam,lr 0.0001 optimizer epoch_size 200000 number of sentences per epoch validation_metrics _valid_mlm_ppl validation metric (when to save the best model) stopping_criterion _valid_mlm_ppl,10 end experiment if stopping criterion does not improve If parallel data is available, the TLM objective can be used with mlm_steps 'en fr' . To train with both the MLM and TLM objective, you can use mlm_steps 'en,fr,en fr' . We provide models trained with the command above for English French, English German and English Romanian, along with the BPE codes and vocabulary used to preprocess the data. Train on unsupervised MT from a pretrained model You can now use the pretrained model for Machine Translation. To download a model trained with the command above on the MLM objective, and the corresponding BPE codes, run: wget c If you preprocessed your dataset in ./data/processed/en fr/ with the provided BPE codes codes_enfr and vocabulary vocab_enfr , you can pretrain your NMT model with mlm_enfr_1024.pth and run: python train.py main parameters exp_name unsupMT_enfr experiment name dump_path ./dumped/ where to store the experiment reload_model 'mlm_enfr_1024.pth,mlm_enfr_1024.pth' model to reload for encoder,decoder data location / training objective data_path ./data/processed/en fr/ data location lgs 'en fr' considered languages ae_steps 'en,fr' denoising auto encoder training steps bt_steps 'en fr en,fr en fr' back translation steps word_shuffle 3 noise for auto encoding loss word_dropout 0.1 noise for auto encoding loss word_blank 0.1 noise for auto encoding loss lambda_ae '0:1,100000:0.1,300000:0' scheduling on the auto encoding coefficient transformer parameters encoder_only false use a decoder for MT emb_dim 1024 embeddings / model dimension n_layers 6 number of layers n_heads 8 number of heads dropout 0.1 dropout attention_dropout 0.1 attention dropout gelu_activation true GELU instead of ReLU optimization tokens_per_batch 2000 use batches with a fixed number of words batch_size 32 batch size (for back translation) bptt 256 sequence length optimizer adam_inverse_sqrt,beta1 0.9,beta2 0.98,lr 0.0001 optimizer epoch_size 200000 number of sentences per epoch eval_bleu true also evaluate the BLEU score stopping_criterion 'valid_en fr_mt_bleu,10' validation metric (when to save the best model) validation_metrics 'valid_en fr_mt_bleu' end experiment if stopping criterion does not improve The parameters of your Transformer model have to be identical to the ones used for pretraining (or you will have to slightly modify the code to only reload existing parameters). After 8 epochs on 8 GPUs, the above command should give you something like this: epoch > 7 valid_fr en_mt_bleu > 28.36 valid_en fr_mt_bleu > 30.50 test_fr en_mt_bleu > 34.02 test_en fr_mt_bleu > 36.62 Cross lingual text classification (XNLI) XLMs can be used to build cross lingual classifiers. After fine tuning an XLM model on an English training corpus for instance (e.g. of sentiment analysis, natural language inference), the model is still able to make accurate predictions at test time in other languages, for which there is very little or no training data. This approach is usually referred to as zero shot cross lingual classification . Get the right tokenizers Before running the scripts below, make sure you download the tokenizers from the tools/ directory. Download / preprocess monolingual data This script will download and preprocess the Wikipedia datasets in the 15 languages that are part of XNLI: for lg in ar bg de el en es fr hi ru sw th tr ur vi zh; do ./get data wiki.sh $lg done Downloading the Wikipedia dumps make take several hours. The get data wiki.sh script will automatically download Wikipedia dumps, extract raw sentences, clean and tokenize them, apply BPE codes and binarize the data. Note that in our experiments we also concatenated the Toronto Book Corpus to the English Wikipedia. For Chinese and Thai you will need a special tokenizer that you can install using the commands below. For all other languages, the data will be tokenized with Moses scripts. Thai pip install pythainlp Chinese cd tools/ wget unzip stanford segmenter 2018 10 16.zip Download / preprocess parallel data This script will download and preprocess parallel data that can be used for the TLM objective: lg_pairs ar en bg en de en el en en es en fr en hi en ru en sw en th en tr en ur en vi en zh for lg_pair in $lg_pairs; do ./get data para.sh $lg_pair done Download / preprocess XNLI data This script will download and preprocess the XNLI corpus: ./get data xnli.sh Pretrain a language model (with MLM and TLM) The following script will pretrain a model with the MLM and TLM objectives for the 15 XNLI languages: python train.py main parameters exp_name train_xnli_mlm_tlm experiment name dump_path ./dumped/ where to store the experiment data location / training objective data_path ./data/processed/XLM15/ data location lgs 'ar bg de el en es fr hi ru sw th tr ur vi zh' considered languages clm_steps '' CLM objective mlm_steps 'ar,bg,de,el,en,es,fr,hi,ru,sw,th,tr,ur,vi,zh,en ar,en bg,en de,en el,en es,en fr,en hi,en ru,en sw,en th,en tr,en ur,en vi,en zh,ar en,bg en,de en,el en,es en,fr en,hi en,ru en,sw en,th en,tr en,ur en,vi en,zh en' MLM objective transformer parameters emb_dim 1024 embeddings / model dimension n_layers 12 number of layers n_heads 8 number of heads dropout 0.1 dropout attention_dropout 0.1 attention dropout gelu_activation true GELU instead of ReLU optimization batch_size 32 sequences per batch bptt 256 sequences length optimizer adam_inverse_sqrt,beta1 0.9,beta2 0.98,lr 0.0001,weight_decay 0 optimizer epoch_size 200000 number of sentences per epoch validation_metrics _valid_mlm_ppl validation metric (when to save the best model) stopping_criterion _valid_mlm_ppl,10 end experiment if stopping criterion does not improve Train on XNLI from a pretrained model You can now use the pretrained model for cross lingual classification. To download a model trained with the command above on the MLM TLM objective, run: wget c You can now fine tune the pretrained model on XNLI, or on one of the English GLUE tasks: python glue xnli.py exp_name test_xnli_mlm_tlm experiment name dump_path ./dumped/ where to store the experiment model_path mlm_tlm_xnli15_1024.pth model location data_path ./data/processed/XLM15 data location transfer_tasks XNLI,SST 2 transfer tasks (XNLI or GLUE tasks) optimizer adam,lr 0.000005 optimizer batch_size 8 batch size n_epochs 250 number of epochs epoch_size 20000 number of sentences per epoch max_len 256 max number of words in sentences max_vocab 95000 max number of words in vocab Frequently Asked Questions How can I run experiments on multiple GPUs? XLM supports both multi GPU and multi node training, and was tested with up to 128 GPUs. To run an experiment with multiple GPUs on a single machine, simply replace python train.py in the commands above with: export NGPU 8; python m torch.distributed.launch nproc_per_node $NGPU train.py The multi node is automatically handled by SLURM. References Please cite 1 if you found the resources in this repository useful. Cross lingual Language Model Pretraining 1 G. Lample , A. Conneau Cross lingual Language Model Pretraining \ Equal contribution. Order has been determined with a coin flip. @article{lample2019cross, title {Cross lingual Language Model Pretraining}, author {Lample, Guillaume and Conneau, Alexis}, journal {arXiv preprint arXiv:1901.07291}, year {2019} } XNLI: Evaluating Cross lingual Sentence Representations 2 A. Conneau, G. Lample, R. Rinott, A. Williams, S. R. Bowman, H. Schwenk, V. Stoyanov XNLI: Evaluating Cross lingual Sentence Representations @inproceedings{conneau2018xnli, title {XNLI: Evaluating Cross lingual Sentence Representations}, author {Conneau, Alexis and Lample, Guillaume and Rinott, Ruty and Williams, Adina and Bowman, Samuel R and Schwenk, Holger and Stoyanov, Veselin}, booktitle {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP)}, year {2018} } Phrase Based \& Neural Unsupervised Machine Translation 3 G. Lample, M. Ott, A. Conneau, L. Denoyer, MA. Ranzato Phrase Based & Neural Unsupervised Machine Translation @inproceedings{lample2018phrase, title {Phrase Based \& Neural Unsupervised Machine Translation}, author {Lample, Guillaume and Ott, Myle and Conneau, Alexis and Denoyer, Ludovic and Ranzato, Marc'Aurelio}, booktitle {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP)}, year {2018} } License See the LICENSE (LICENSE) file for more details.",Machine Translation,Machine Translation 2789,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Attention is all you need: A Pytorch Implementation This is a PyTorch implementation of the Transformer model in Attention is All You Need (Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, arxiv, 2017). A novel sequence to sequence framework utilizes the self attention mechanism , instead of Convolution operation or Recurrent structure, and achieve the state of the art performance on WMT 2014 English to German translation task . (2017/06/12) > The official Tensorflow Implementation can be found in: tensorflow/tensor2tensor . > To learn more about self attention mechanism, you could read A Structured Self attentive Sentence Embedding . The project support training and translation with trained model now. Note that this project is still a work in progress. If there is any suggestion or error, feel free to fire an issue to let me know. :) Requirement python 3.4+ pytorch 0.2.0 tqdm numpy Usage Some useful tools: The example below uses the Moses tokenizer to prepare the data and the moses BLEU script for evaluation. bash wget wget wget sed i s/$RealBin\/..\/share\/nonbreaking_prefixes// tokenizer.perl wget WMT'16 Multimodal Translation: Multi30k (de en) An example of training for the WMT'16 Multimodal Translation task . 0) Download the data. bash mkdir p data/multi30k wget && tar xf training.tar.gz C data/multi30k && rm training.tar.gz wget && tar xf validation.tar.gz C data/multi30k && rm validation.tar.gz wget && tar xf mmt16_task1_test.tar.gz C data/multi30k && rm mmt16_task1_test.tar.gz 1) Preprocess the data. bash for l in en de; do for f in data/multi30k/ .$l; do if $f ! test ; then sed i $ d $f; fi; done; done for l in en de; do for f in data/multi30k/ .$l; do perl tokenizer.perl a no escape l $l q $f.atok; done; done python preprocess.py train_src data/multi30k/train.en.atok train_tgt data/multi30k/train.de.atok valid_src data/multi30k/val.en.atok valid_tgt data/multi30k/val.de.atok save_data data/multi30k.atok.low.pt 2) Train the model bash python train.py data data/multi30k.atok.low.pt save_model trained save_mode best proj_share_weight > If your source and target language share one common vocabulary, use the embs_share_weight flag to enable the model to share source/target word embedding. 3) Test the model bash python translate.py model trained.chkpt vocab data/multi30k.atok.low.pt src data/multi30k/test.en.atok Performance Training Parameter settings: batch_size 64 d_inner_hid 1024 d_k 64 d_v 64 d_model 512 d_word_vec 512 dropout 0.1 embs_share_weight False n_head 8 n_layers 6 n_warmup_steps 4000 proj_share_weight True Elapse per epoch (on NVIDIA Titan X): Training set: 1.38 min Validation set: 0.016 min Testing coming soon. TODO Label smoothing Evaluation on the generated text. Attention weight plot. Acknowledgement The project structure, some scripts and the dataset preprocessing steps are heavily borrowed from OpenNMT/OpenNMT py . Thanks for the suggestions from @srush, @iamalbert and @ZiJianZhao.",Machine Translation,Machine Translation 2824,Natural Language Processing,Natural Language Processing,Natural Language Processing,"!!! it is just repository for private study refer site: refer others: BERT \ \ \ \ \ New February 7th, 2019: TfHub Module \ \ \ \ \ BERT has been uploaded to TensorFlow Hub . See run_classifier_with_tfhub.py for an example of how to use the TF Hub module, or run an example in the browser on Colab . \ \ \ \ \ New November 23rd, 2018: Un normalized multilingual model + Thai + Mongolian \ \ \ \ \ We uploaded a new multilingual model which does not perform any normalization on the input (no lower casing, accent stripping, or Unicode normalization), and additionally inclues Thai and Mongolian. It is recommended to use this version for developing multilingual models, especially on languages with non Latin alphabets. This does not require any code changes, and can be downloaded here: BERT Base, Multilingual Cased : 104 languages, 12 layer, 768 hidden, 12 heads, 110M parameters \ \ \ \ \ New November 15th, 2018: SOTA SQuAD 2.0 System \ \ \ \ \ We released code changes to reproduce our 83% F1 SQuAD 2.0 system, which is currently 1st place on the leaderboard by 3%. See the SQuAD 2.0 section of the README for details. \ \ \ \ \ New November 5th, 2018: Third party PyTorch and Chainer versions of BERT available \ \ \ \ \ NLP researchers from HuggingFace made a PyTorch version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. Sosuke Kobayashi also made a Chainer version of BERT available (Thanks!) We were not involved in the creation or maintenance of the PyTorch implementation so please direct any questions towards the authors of that repository. \ \ \ \ \ New November 3rd, 2018: Multilingual and Chinese models available \ \ \ \ \ We have made two new BERT models available: BERT Base, Multilingual (Not recommended, use Multilingual Cased instead) : 102 languages, 12 layer, 768 hidden, 12 heads, 110M parameters BERT Base, Chinese : Chinese Simplified and Traditional, 12 layer, 768 hidden, 12 heads, 110M parameters We use character based tokenization for Chinese, and WordPiece tokenization for all other languages. Both models should work out of the box without any code changes. We did update the implementation of BasicTokenizer in tokenization.py to support Chinese character tokenization, so please update if you forked it. However, we did not change the tokenization API. For more, see the Multilingual README . \ \ \ \ \ End new information \ \ \ \ \ Introduction BERT , or B idirectional E ncoder R epresentations from T ransformers, is a new method of pre training language representations which obtains state of the art results on a wide array of Natural Language Processing (NLP) tasks. Our academic paper which describes BERT in detail and provides full results on a number of tasks can be found here: To give a few numbers, here are the results on the SQuAD v1.1 question answering task: SQuAD v1.1 Leaderboard (Oct 8th 2018) Test EM Test F1 : : : : 1st Place Ensemble BERT 87.4 93.2 2nd Place Ensemble nlnet 86.0 91.7 1st Place Single Model BERT 85.1 91.8 2nd Place Single Model nlnet 83.5 90.1 And several natural language inference tasks: System MultiNLI Question NLI SWAG : : : : : : BERT 86.7 91.1 86.3 OpenAI GPT (Prev. SOTA) 82.2 88.1 75.0 Plus many other tasks. Moreover, these results were all obtained with almost no task specific neural network architecture design. If you already know what BERT is and you just want to get started, you can download the pre trained models ( pre trained models) and run a state of the art fine tuning ( fine tuning with bert) in only a few minutes. What is BERT? BERT is a method of pre training language representations, meaning that we train a general purpose language understanding model on a large text corpus (like Wikipedia), and then use that model for downstream NLP tasks that we care about (like question answering). BERT outperforms previous methods because it is the first unsupervised , deeply bidirectional system for pre training NLP. Unsupervised means that BERT was trained using only a plain text corpus, which is important because an enormous amount of plain text data is publicly available on the web in many languages. Pre trained representations can also either be context free or contextual , and contextual representations can further be unidirectional or bidirectional . Context free models such as word2vec or GloVe generate a single word embedding representation for each word in the vocabulary, so bank would have the same representation in bank deposit and river bank . Contextual models instead generate a representation of each word that is based on the other words in the sentence. BERT was built upon recent work in pre training contextual representations — including Semi supervised Sequence Learning , Generative Pre Training , ELMo , and ULMFit — but crucially these models are all unidirectional or shallowly bidirectional . This means that each word is only contextualized using the words to its left (or right). For example, in the sentence I made a bank deposit the unidirectional representation of bank is only based on I made a but not deposit . Some previous work does combine the representations from separate left context and right context models, but only in a shallow manner. BERT represents bank using both its left and right context — I made a ... deposit — starting from the very bottom of a deep neural network, so it is deeply bidirectional . BERT uses a simple approach for this: We mask out 15% of the words in the input, run the entire sequence through a deep bidirectional Transformer encoder, and then predict only the masked words. For example: Input: the man went to the MASK1 . he bought a MASK2 of milk. Labels: MASK1 store; MASK2 gallon In order to learn relationships between sentences, we also train on a simple task which can be generated from any monolingual corpus: Given two sentences A and B , is B the actual next sentence that comes after A , or just a random sentence from the corpus? Sentence A: the man went to the store . Sentence B: he bought a gallon of milk . Label: IsNextSentence Sentence A: the man went to the store . Sentence B: penguins are flightless . Label: NotNextSentence We then train a large model (12 layer to 24 layer Transformer) on a large corpus (Wikipedia + BookCorpus ) for a long time (1M update steps), and that's BERT. Using BERT has two stages: Pre training and fine tuning . Pre training is fairly expensive (four days on 4 to 16 Cloud TPUs), but is a one time procedure for each language (current models are English only, but multilingual models will be released in the near future). We are releasing a number of pre trained models from the paper which were pre trained at Google. Most NLP researchers will never need to pre train their own model from scratch. Fine tuning is inexpensive. All of the results in the paper can be replicated in at most 1 hour on a single Cloud TPU, or a few hours on a GPU, starting from the exact same pre trained model. SQuAD, for example, can be trained in around 30 minutes on a single Cloud TPU to achieve a Dev F1 score of 91.0%, which is the single system state of the art. The other important aspect of BERT is that it can be adapted to many types of NLP tasks very easily. In the paper, we demonstrate state of the art results on sentence level (e.g., SST 2), sentence pair level (e.g., MultiNLI), word level (e.g., NER), and span level (e.g., SQuAD) tasks with almost no task specific modifications. What has been released in this repository? We are releasing the following: TensorFlow code for the BERT model architecture (which is mostly a standard Transformer architecture). Pre trained checkpoints for both the lowercase and cased version of BERT Base and BERT Large from the paper. TensorFlow code for push button replication of the most important fine tuning experiments from the paper, including SQuAD, MultiNLI, and MRPC. All of the code in this repository works out of the box with CPU, GPU, and Cloud TPU. Pre trained models We are releasing the BERT Base and BERT Large models from the paper. Uncased means that the text has been lowercased before WordPiece tokenization, e.g., John Smith becomes john smith . The Uncased model also strips out any accent markers. Cased means that the true case and accent markers are preserved. Typically, the Uncased model is better unless you know that case information is important for your task (e.g., Named Entity Recognition or Part of Speech tagging). These models are all released under the same license as the source code (Apache 2.0). For information about the Multilingual and Chinese model, see the Multilingual README . When using a cased model, make sure to pass do_lower False to the training scripts. (Or pass do_lower_case False directly to FullTokenizer if you're using your own script.) The links to the models are here (right click, 'Save link as...' on the name): BERT Base, Uncased : 12 layer, 768 hidden, 12 heads, 110M parameters BERT Large, Uncased : 24 layer, 1024 hidden, 16 heads, 340M parameters BERT Base, Cased : 12 layer, 768 hidden, 12 heads , 110M parameters BERT Large, Cased : 24 layer, 1024 hidden, 16 heads, 340M parameters BERT Base, Multilingual Cased (New, recommended) : 104 languages, 12 layer, 768 hidden, 12 heads, 110M parameters BERT Base, Multilingual Uncased (Orig, not recommended) (Not recommended, use Multilingual Cased instead) : 102 languages, 12 layer, 768 hidden, 12 heads, 110M parameters BERT Base, Chinese : Chinese Simplified and Traditional, 12 layer, 768 hidden, 12 heads, 110M parameters Each .zip file contains three items: A TensorFlow checkpoint ( bert_model.ckpt ) containing the pre trained weights (which is actually 3 files). A vocab file ( vocab.txt ) to map WordPiece to word id. A config file ( bert_config.json ) which specifies the hyperparameters of the model. Fine tuning with BERT Important : All results on the paper were fine tuned on a single Cloud TPU, which has 64GB of RAM. It is currently not possible to re produce most of the BERT Large results on the paper using a GPU with 12GB 16GB of RAM, because the maximum batch size that can fit in memory is too small. We are working on adding code to this repository which allows for much larger effective batch size on the GPU. See the section on out of memory issues ( out of memory issues) for more details. This code was tested with TensorFlow 1.11.0. It was tested with Python2 and Python3 (but more thoroughly with Python2, since this is what's used internally in Google). The fine tuning examples which use BERT Base should be able to run on a GPU that has at least 12GB of RAM using the hyperparameters given. Fine tuning with Cloud TPUs Most of the examples below assumes that you will be running training/evaluation on your local machine, using a GPU like a Titan X or GTX 1080. However, if you have access to a Cloud TPU that you want to train on, just add the following flags to run_classifier.py or run_squad.py : use_tpu True \ tpu_name $TPU_NAME Please see the Google Cloud TPU tutorial for how to use Cloud TPUs. Alternatively, you can use the Google Colab notebook BERT FineTuning with Cloud TPUs . On Cloud TPUs, the pretrained model and the output directory will need to be on Google Cloud Storage. For example, if you have a bucket named some_bucket , you might use the following flags instead: output_dir gs://some_bucket/my_output_dir/ The unzipped pre trained model files can also be found in the Google Cloud Storage folder gs://bert_models/2018_10_18 . For example: export BERT_BASE_DIR gs://bert_models/2018_10_18/uncased_L 12_H 768_A 12 Sentence (and sentence pair) classification tasks Before running this example you must download the GLUE data by running this script and unpack it to some directory $GLUE_DIR . Next, download the BERT Base checkpoint and unzip it to some directory $BERT_BASE_DIR . This example code fine tunes BERT Base on the Microsoft Research Paraphrase Corpus (MRPC) corpus, which only contains 3,600 examples and can fine tune in a few minutes on most GPUs. shell export BERT_BASE_DIR /path/to/bert/uncased_L 12_H 768_A 12 export GLUE_DIR /path/to/glue python run_classifier.py \ task_name MRPC \ do_train true \ do_eval true \ data_dir $GLUE_DIR/MRPC \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ max_seq_length 128 \ train_batch_size 32 \ learning_rate 2e 5 \ num_train_epochs 3.0 \ output_dir /tmp/mrpc_output/ You should see output like this: Eval results eval_accuracy 0.845588 eval_loss 0.505248 global_step 343 loss 0.505248 This means that the Dev set accuracy was 84.55%. Small sets like MRPC have a high variance in the Dev set accuracy, even when starting from the same pre training checkpoint. If you re run multiple times (making sure to point to different output_dir ), you should see results between 84% and 88%. A few other pre trained models are implemented off the shelf in run_classifier.py , so it should be straightforward to follow those examples to use BERT for any single sentence or sentence pair classification task. Note: You might see a message Running train on CPU . This really just means that it's running on something other than a Cloud TPU, which includes a GPU. Prediction from classifier Once you have trained your classifier you can use it in inference mode by using the do_predict true command. You need to have a file named test.tsv in the input folder. Output will be created in file called test_results.tsv in the output folder. Each line will contain output for each sample, columns are the class probabilities. shell export BERT_BASE_DIR /path/to/bert/uncased_L 12_H 768_A 12 export GLUE_DIR /path/to/glue export TRAINED_CLASSIFIER /path/to/fine/tuned/classifier python run_classifier.py \ task_name MRPC \ do_predict true \ data_dir $GLUE_DIR/MRPC \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $TRAINED_CLASSIFIER \ max_seq_length 128 \ output_dir /tmp/mrpc_output/ SQuAD 1.1 The Stanford Question Answering Dataset (SQuAD) is a popular question answering benchmark dataset. BERT (at the time of the release) obtains state of the art results on SQuAD with almost no task specific network architecture modifications or data augmentation. However, it does require semi complex data pre processing and post processing to deal with (a) the variable length nature of SQuAD context paragraphs, and (b) the character level answer annotations which are used for SQuAD training. This processing is implemented and documented in run_squad.py . To run on SQuAD, you will first need to download the dataset. The SQuAD website does not seem to link to the v1.1 datasets any longer, but the necessary files can be found here: train v1.1.json dev v1.1.json evaluate v1.1.py Download these to some directory $SQUAD_DIR . The state of the art SQuAD results from the paper currently cannot be reproduced on a 12GB 16GB GPU due to memory constraints (in fact, even batch size 1 does not seem to fit on a 12GB GPU using BERT Large ). However, a reasonably strong BERT Base model can be trained on the GPU with these hyperparameters: shell python run_squad.py \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ do_train True \ train_file $SQUAD_DIR/train v1.1.json \ do_predict True \ predict_file $SQUAD_DIR/dev v1.1.json \ train_batch_size 12 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir /tmp/squad_base/ The dev set predictions will be saved into a file called predictions.json in the output_dir : shell python $SQUAD_DIR/evaluate v1.1.py $SQUAD_DIR/dev v1.1.json ./squad/predictions.json Which should produce an output like this: shell { f1 : 88.41249612335034, exact_match : 81.2488174077578} You should see a result similar to the 88.5% reported in the paper for BERT Base . If you have access to a Cloud TPU, you can train with BERT Large . Here is a set of hyperparameters (slightly different than the paper) which consistently obtain around 90.5% 91.0% F1 single system trained only on SQuAD: shell python run_squad.py \ vocab_file $BERT_LARGE_DIR/vocab.txt \ bert_config_file $BERT_LARGE_DIR/bert_config.json \ init_checkpoint $BERT_LARGE_DIR/bert_model.ckpt \ do_train True \ train_file $SQUAD_DIR/train v1.1.json \ do_predict True \ predict_file $SQUAD_DIR/dev v1.1.json \ train_batch_size 24 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir gs://some_bucket/squad_large/ \ use_tpu True \ tpu_name $TPU_NAME For example, one random run with these parameters produces the following Dev scores: shell { f1 : 90.87081895814865, exact_match : 84.38978240302744} If you fine tune for one epoch on TriviaQA before this the results will be even better, but you will need to convert TriviaQA into the SQuAD json format. SQuAD 2.0 This model is also implemented and documented in run_squad.py . To run on SQuAD 2.0, you will first need to download the dataset. The necessary files can be found here: train v2.0.json dev v2.0.json evaluate v2.0.py Download these to some directory $SQUAD_DIR . On Cloud TPU you can run with BERT Large as follows: shell python run_squad.py \ vocab_file $BERT_LARGE_DIR/vocab.txt \ bert_config_file $BERT_LARGE_DIR/bert_config.json \ init_checkpoint $BERT_LARGE_DIR/bert_model.ckpt \ do_train True \ train_file $SQUAD_DIR/train v2.0.json \ do_predict True \ predict_file $SQUAD_DIR/dev v2.0.json \ train_batch_size 24 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir gs://some_bucket/squad_large/ \ use_tpu True \ tpu_name $TPU_NAME \ version_2_with_negative True We assume you have copied everything from the output directory to a local directory called ./squad/. The initial dev set predictions will be at ./squad/predictions.json and the differences between the score of no answer ( ) and the best non null answer for each question will be in the file ./squad/null_odds.json Run this script to tune a threshold for predicting null versus non null answers: python $SQUAD_DIR/evaluate v2.0.py $SQUAD_DIR/dev v2.0.json ./squad/predictions.json na prob file ./squad/null_odds.json Assume the script outputs best_f1_thresh THRESH. (Typical values are between 1.0 and 5.0). You can now re run the model to generate predictions with the derived threshold or alternatively you can extract the appropriate answers from ./squad/nbest_predictions.json. shell python run_squad.py \ vocab_file $BERT_LARGE_DIR/vocab.txt \ bert_config_file $BERT_LARGE_DIR/bert_config.json \ init_checkpoint $BERT_LARGE_DIR/bert_model.ckpt \ do_train False \ train_file $SQUAD_DIR/train v2.0.json \ do_predict True \ predict_file $SQUAD_DIR/dev v2.0.json \ train_batch_size 24 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir gs://some_bucket/squad_large/ \ use_tpu True \ tpu_name $TPU_NAME \ version_2_with_negative True \ null_score_diff_threshold $THRESH Out of memory issues All experiments in the paper were fine tuned on a Cloud TPU, which has 64GB of device RAM. Therefore, when using a GPU with 12GB 16GB of RAM, you are likely to encounter out of memory issues if you use the same hyperparameters described in the paper. The factors that affect memory usage are: max_seq_length : The released models were trained with sequence lengths up to 512, but you can fine tune with a shorter max sequence length to save substantial memory. This is controlled by the max_seq_length flag in our example code. train_batch_size : The memory usage is also directly proportional to the batch size. Model type, BERT Base vs. BERT Large : The BERT Large model requires significantly more memory than BERT Base . Optimizer : The default optimizer for BERT is Adam, which requires a lot of extra memory to store the m and v vectors. Switching to a more memory efficient optimizer can reduce memory usage, but can also affect the results. We have not experimented with other optimizers for fine tuning. Using the default training scripts ( run_classifier.py and run_squad.py ), we benchmarked the maximum batch size on single Titan X GPU (12GB RAM) with TensorFlow 1.11.0: System Seq Length Max Batch Size BERT Base 64 64 ... 128 32 ... 256 16 ... 320 14 ... 384 12 ... 512 6 BERT Large 64 12 ... 128 6 ... 256 2 ... 320 1 ... 384 0 ... 512 0 Unfortunately, these max batch sizes for BERT Large are so small that they will actually harm the model accuracy, regardless of the learning rate used. We are working on adding code to this repository which will allow much larger effective batch sizes to be used on the GPU. The code will be based on one (or both) of the following techniques: Gradient accumulation : The samples in a minibatch are typically independent with respect to gradient computation (excluding batch normalization, which is not used here). This means that the gradients of multiple smaller minibatches can be accumulated before performing the weight update, and this will be exactly equivalent to a single larger update. Gradient checkpointing : The major use of GPU/TPU memory during DNN training is caching the intermediate activations in the forward pass that are necessary for efficient computation in the backward pass. Gradient checkpointing trades memory for compute time by re computing the activations in an intelligent way. However, this is not implemented in the current release. Using BERT to extract fixed feature vectors (like ELMo) In certain cases, rather than fine tuning the entire pre trained model end to end, it can be beneficial to obtained pre trained contextual embeddings , which are fixed contextual representations of each input token generated from the hidden layers of the pre trained model. This should also mitigate most of the out of memory issues. As an example, we include the script extract_features.py which can be used like this: shell Sentence A and Sentence B are separated by the delimiter for sentence pair tasks like question answering and entailment. For single sentence inputs, put one sentence per line and DON'T use the delimiter. echo 'Who was Jim Henson ? Jim Henson was a puppeteer' > /tmp/input.txt python extract_features.py \ input_file /tmp/input.txt \ output_file /tmp/output.jsonl \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ layers 1, 2, 3, 4 \ max_seq_length 128 \ batch_size 8 This will create a JSON file (one line per line of input) containing the BERT activations from each Transformer layer specified by layers ( 1 is the final hidden layer of the Transformer, etc.) Note that this script will produce very large output files (by default, around 15kb for every input token). If you need to maintain alignment between the original and tokenized words (for projecting training labels), see the Tokenization ( tokenization) section below. Note: You may see a message like Could not find trained model in model_dir: /tmp/tmpuB5g5c, running initialization to predict. This message is expected, it just means that we are using the init_from_checkpoint() API rather than the saved model API. If you don't specify a checkpoint or specify an invalid checkpoint, this script will complain. Tokenization For sentence level tasks (or sentence pair) tasks, tokenization is very simple. Just follow the example code in run_classifier.py and extract_features.py . The basic procedure for sentence level tasks is: 1. Instantiate an instance of tokenizer tokenization.FullTokenizer 2. Tokenize the raw text with tokens tokenizer.tokenize(raw_text) . 3. Truncate to the maximum sequence length. (You can use up to 512, but you probably want to use shorter if possible for memory and speed reasons.) 4. Add the CLS and SEP tokens in the right place. Word level and span level tasks (e.g., SQuAD and NER) are more complex, since you need to maintain alignment between your input text and output text so that you can project your training labels. SQuAD is a particularly complex example because the input labels are character based, and SQuAD paragraphs are often longer than our maximum sequence length. See the code in run_squad.py to show how we handle this. Before we describe the general recipe for handling word level tasks, it's important to understand what exactly our tokenizer is doing. It has three main steps: 1. Text normalization : Convert all whitespace characters to spaces, and (for the Uncased model) lowercase the input and strip out accent markers. E.g., John Johanson's, → john johanson's, . 2. Punctuation splitting : Split all punctuation characters on both sides (i.e., add whitespace around all punctuation characters). Punctuation characters are defined as (a) Anything with a P Unicode class, (b) any non letter/number/space ASCII character (e.g., characters like $ which are technically not punctuation). E.g., john johanson's, → john johanson ' s , 3. WordPiece tokenization : Apply whitespace tokenization to the output of the above procedure, and apply WordPiece tokenization to each token separately. (Our implementation is directly based on the one from tensor2tensor , which is linked). E.g., john johanson ' s , → john johan son ' s , The advantage of this scheme is that it is compatible with most existing English tokenizers. For example, imagine that you have a part of speech tagging task which looks like this: Input: John Johanson 's house Labels: NNP NNP POS NN The tokenized output will look like this: Tokens: john johan son ' s house Crucially, this would be the same output as if the raw text were John Johanson's house (with no space before the 's ). If you have a pre tokenized representation with word level annotations, you can simply tokenize each input word independently, and deterministically maintain an original to tokenized alignment: python Input orig_tokens John , Johanson , 's , house labels NNP , NNP , POS , NN Output bert_tokens Token map will be an int > int mapping between the orig_tokens index and the bert_tokens index. orig_to_tok_map tokenizer tokenization.FullTokenizer( vocab_file vocab_file, do_lower_case True) bert_tokens.append( CLS ) for orig_token in orig_tokens: orig_to_tok_map.append(len(bert_tokens)) bert_tokens.extend(tokenizer.tokenize(orig_token)) bert_tokens.append( SEP ) bert_tokens CLS , john , johan , son , ' , s , house , SEP orig_to_tok_map 1, 2, 4, 6 Now orig_to_tok_map can be used to project labels to the tokenized representation. There are common English tokenization schemes which will cause a slight mismatch between how BERT was pre trained. For example, if your input tokenization splits off contractions like do n't , this will cause a mismatch. If it is possible to do so, you should pre process your data to convert these back to raw looking text, but if it's not possible, this mismatch is likely not a big deal. Pre training with BERT We are releasing code to do masked LM and next sentence prediction on an arbitrary text corpus. Note that this is not the exact code that was used for the paper (the original code was written in C++, and had some additional complexity), but this code does generate pre training data as described in the paper. Here's how to run the data generation. The input is a plain text file, with one sentence per line. (It is important that these be actual sentences for the next sentence prediction task). Documents are delimited by empty lines. The output is a set of tf.train.Example s serialized into TFRecord file format. You can perform sentence segmentation with an off the shelf NLP toolkit such as spaCy . The create_pretraining_data.py script will concatenate segments until they reach the maximum sequence length to minimize computational waste from padding (see the script for more details). However, you may want to intentionally add a slight amount of noise to your input data (e.g., randomly truncate 2% of input segments) to make it more robust to non sentential input during fine tuning. This script stores all of the examples for the entire input file in memory, so for large data files you should shard the input file and call the script multiple times. (You can pass in a file glob to run_pretraining.py , e.g., tf_examples.tf_record .) The max_predictions_per_seq is the maximum number of masked LM predictions per sequence. You should set this to around max_seq_length masked_lm_prob (the script doesn't do that automatically because the exact value needs to be passed to both scripts). shell python create_pretraining_data.py \ input_file ./sample_text.txt \ output_file /tmp/tf_examples.tfrecord \ vocab_file $BERT_BASE_DIR/vocab.txt \ do_lower_case True \ max_seq_length 128 \ max_predictions_per_seq 20 \ masked_lm_prob 0.15 \ random_seed 12345 \ dupe_factor 5 Here's how to run the pre training. Do not include init_checkpoint if you are pre training from scratch. The model configuration (including vocab size) is specified in bert_config_file . This demo code only pre trains for a small number of steps (20), but in practice you will probably want to set num_train_steps to 10000 steps or more. The max_seq_length and max_predictions_per_seq parameters passed to run_pretraining.py must be the same as create_pretraining_data.py . shell python run_pretraining.py \ input_file /tmp/tf_examples.tfrecord \ output_dir /tmp/pretraining_output \ do_train True \ do_eval True \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ train_batch_size 32 \ max_seq_length 128 \ max_predictions_per_seq 20 \ num_train_steps 20 \ num_warmup_steps 10 \ learning_rate 2e 5 This will produce an output like this: Eval results global_step 20 loss 0.0979674 masked_lm_accuracy 0.985479 masked_lm_loss 0.0979328 next_sentence_accuracy 1.0 next_sentence_loss 3.45724e 05 Note that since our sample_text.txt file is very small, this example training will overfit that data in only a few steps and produce unrealistically high accuracy numbers. Pre training tips and caveats If using your own vocabulary, make sure to change vocab_size in bert_config.json . If you use a larger vocabulary without changing this, you will likely get NaNs when training on GPU or TPU due to unchecked out of bounds access. If your task has a large domain specific corpus available (e.g., movie reviews or scientific papers ), it will likely be beneficial to run additional steps of pre training on your corpus, starting from the BERT checkpoint. The learning rate we used in the paper was 1e 4. However, if you are doing additional steps of pre training starting from an existing BERT checkpoint, you should use a smaller learning rate (e.g., 2e 5). Current BERT models are English only, but we do plan to release a multilingual model which has been pre trained on a lot of languages in the near future (hopefully by the end of November 2018). Longer sequences are disproportionately expensive because attention is quadratic to the sequence length. In other words, a batch of 64 sequences of length 512 is much more expensive than a batch of 256 sequences of length 128. The fully connected/convolutional cost is the same, but the attention cost is far greater for the 512 length sequences. Therefore, one good recipe is to pre train for, say, 90,000 steps with a sequence length of 128 and then for 10,000 additional steps with a sequence length of 512. The very long sequences are mostly needed to learn positional embeddings, which can be learned fairly quickly. Note that this does require generating the data twice with different values of max_seq_length . If you are pre training from scratch, be prepared that pre training is computationally expensive, especially on GPUs. If you are pre training from scratch, our recommended recipe is to pre train a BERT Base on a single preemptible Cloud TPU v2 , which takes about 2 weeks at a cost of about $500 USD (based on the pricing in October 2018). You will have to scale down the batch size when only training on a single Cloud TPU, compared to what was used in the paper. It is recommended to use the largest batch size that fits into TPU memory. Pre training data We will not be able to release the pre processed datasets used in the paper. For Wikipedia, the recommended pre processing is to download the latest dump , extract the text with WikiExtractor.py , and then apply any necessary cleanup to convert it into plain text. Unfortunately the researchers who collected the BookCorpus no longer have it available for public download. The Project Guttenberg Dataset is a somewhat smaller (200M word) collection of older books that are public domain. Common Crawl is another very large collection of text, but you will likely have to do substantial pre processing and cleanup to extract a usable corpus for pre training BERT. Learning a new WordPiece vocabulary This repository does not include code for learning a new WordPiece vocabulary. The reason is that the code used in the paper was implemented in C++ with dependencies on Google's internal libraries. For English, it is almost always better to just start with our vocabulary and pre trained models. For learning vocabularies of other languages, there are a number of open source options available. However, keep in mind that these are not compatible with our tokenization.py library: Google's SentencePiece library tensor2tensor's WordPiece generation script Rico Sennrich's Byte Pair Encoding library Using BERT in Colab If you want to use BERT with Colab , you can get started with the notebook BERT FineTuning with Cloud TPUs . At the time of this writing (October 31st, 2018), Colab users can access a Cloud TPU completely for free. Note: One per user, availability limited, requires a Google Cloud Platform account with storage (although storage may be purchased with free credit for signing up with GCP), and this capability may not longer be available in the future. Click on the BERT Colab that was just linked for more information. FAQ Is this code compatible with Cloud TPUs? What about GPUs? Yes, all of the code in this repository works out of the box with CPU, GPU, and Cloud TPU. However, GPU training is single GPU only. I am getting out of memory errors, what is wrong? See the section on out of memory issues ( out of memory issues) for more information. Is there a PyTorch version available? There is no official PyTorch implementation. However, NLP researchers from HuggingFace made a PyTorch version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. We were not involved in the creation or maintenance of the PyTorch implementation so please direct any questions towards the authors of that repository. Is there a Chainer version available? There is no official Chainer implementation. However, Sosuke Kobayashi made a Chainer version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. We were not involved in the creation or maintenance of the Chainer implementation so please direct any questions towards the authors of that repository. Will models in other languages be released? Yes, we plan to release a multi lingual BERT model in the near future. We cannot make promises about exactly which languages will be included, but it will likely be a single model which includes most of the languages which have a significantly sized Wikipedia. Will models larger than BERT Large be released? So far we have not attempted to train anything larger than BERT Large . It is possible that we will release larger models if we are able to obtain significant improvements. What license is this library released under? All code and models are released under the Apache 2.0 license. See the LICENSE file for more information. How do I cite BERT? For now, cite the Arxiv paper : @article{devlin2018bert, title {BERT: Pre training of Deep Bidirectional Transformers for Language Understanding}, author {Devlin, Jacob and Chang, Ming Wei and Lee, Kenton and Toutanova, Kristina}, journal {arXiv preprint arXiv:1810.04805}, year {2018} } If we submit the paper to a conference or journal, we will update the BibTeX. Disclaimer This is not an official Google product. Contact information For help or issues using BERT, please submit a GitHub issue. For personal communication related to BERT, please contact Jacob Devlin ( jacobdevlin@google.com ), Ming Wei Chang ( mingweichang@google.com ), or Kenton Lee ( kentonl@google.com ).",Machine Translation,Machine Translation 2837,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Attention is all you need: A Pytorch Implementation 2 yueyongjiao This is a PyTorch implementation of the Transformer model in Attention is All You Need (Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, arxiv, 2017). A novel sequence to sequence framework utilizes the self attention mechanism , instead of Convolution operation or Recurrent structure, and achieve the state of the art performance on WMT 2014 English to German translation task . (2017/06/12) > The official Tensorflow Implementation can be found in: tensorflow/tensor2tensor . > To learn more about self attention mechanism, you could read A Structured Self attentive Sentence Embedding . The project support training and translation with trained model now. Note that this project is still a work in progress. If there is any suggestion or error, feel free to fire an issue to let me know. :) Requirement python 3.4+ pytorch 0.4.1+ tqdm numpy Usage Some useful tools: The example below uses the Moses tokenizer to prepare the data and the moses BLEU script for evaluation. bash wget wget wget sed i s/$RealBin\/..\/share\/nonbreaking_prefixes// tokenizer.perl wget WMT'16 Multimodal Translation: Multi30k (de en) An example of training for the WMT'16 Multimodal Translation task . 0) Download the data. bash mkdir p data/multi30k wget && tar xf training.tar.gz C data/multi30k && rm training.tar.gz wget && tar xf validation.tar.gz C data/multi30k && rm validation.tar.gz wget && tar xf mmt16_task1_test.tar.gz C data/multi30k && rm mmt16_task1_test.tar.gz 1) Preprocess the data. bash for l in en de; do for f in data/multi30k/ .$l; do if $f ! test ; then sed i $ d $f; fi; done; done for l in en de; do for f in data/multi30k/ .$l; do perl tokenizer.perl a no escape l $l q $f.atok; done; done python preprocess.py train_src data/multi30k/train.en.atok train_tgt data/multi30k/train.de.atok valid_src data/multi30k/val.en.atok valid_tgt data/multi30k/val.de.atok save_data data/multi30k.atok.low.pt 2) Train the model bash python train.py data data/multi30k.atok.low.pt save_model trained save_mode best proj_share_weight label_smoothing > If your source and target language share one common vocabulary, use the embs_share_weight flag to enable the model to share source/target word embedding. 3) Test the model bash python translate.py model trained.chkpt vocab data/multi30k.atok.low.pt src data/multi30k/test.en.atok no_cuda Performance Training Parameter settings: default parameter and optimizer settings label smoothing target embedding / pre softmax linear layer weight sharing. Elapse per epoch (on NVIDIA Titan X): Training set: 0.888 minutes Validation set: 0.011 minutes Testing coming soon. TODO Evaluation on the generated text. Attention weight plot. Acknowledgement The project structure, some scripts and the dataset preprocessing steps are heavily borrowed from OpenNMT/OpenNMT py . Thanks for the suggestions from @srush, @iamalbert and @ZiJianZhao.",Machine Translation,Machine Translation 2838,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Attention is all you need: A Pytorch Implementation 2 yueyongjiao This is a PyTorch implementation of the Transformer model in Attention is All You Need (Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, arxiv, 2017). A novel sequence to sequence framework utilizes the self attention mechanism , instead of Convolution operation or Recurrent structure, and achieve the state of the art performance on WMT 2014 English to German translation task . (2017/06/12) > The official Tensorflow Implementation can be found in: tensorflow/tensor2tensor . > To learn more about self attention mechanism, you could read A Structured Self attentive Sentence Embedding . The project support training and translation with trained model now. Note that this project is still a work in progress. If there is any suggestion or error, feel free to fire an issue to let me know. :) Requirement python 3.4+ pytorch 0.4.1+ tqdm numpy Usage Some useful tools: The example below uses the Moses tokenizer to prepare the data and the moses BLEU script for evaluation. bash wget wget wget sed i s/$RealBin\/..\/share\/nonbreaking_prefixes// tokenizer.perl wget WMT'16 Multimodal Translation: Multi30k (de en) An example of training for the WMT'16 Multimodal Translation task . 0) Download the data. bash mkdir p data/multi30k wget && tar xf training.tar.gz C data/multi30k && rm training.tar.gz wget && tar xf validation.tar.gz C data/multi30k && rm validation.tar.gz wget && tar xf mmt16_task1_test.tar.gz C data/multi30k && rm mmt16_task1_test.tar.gz 1) Preprocess the data. bash for l in en de; do for f in data/multi30k/ .$l; do if $f ! test ; then sed i $ d $f; fi; done; done for l in en de; do for f in data/multi30k/ .$l; do perl tokenizer.perl a no escape l $l q $f.atok; done; done python preprocess.py train_src data/multi30k/train.en.atok train_tgt data/multi30k/train.de.atok valid_src data/multi30k/val.en.atok valid_tgt data/multi30k/val.de.atok save_data data/multi30k.atok.low.pt 2) Train the model bash python train.py data data/multi30k.atok.low.pt save_model trained save_mode best proj_share_weight label_smoothing > If your source and target language share one common vocabulary, use the embs_share_weight flag to enable the model to share source/target word embedding. 3) Test the model bash python translate.py model trained.chkpt vocab data/multi30k.atok.low.pt src data/multi30k/test.en.atok no_cuda Performance Training Parameter settings: default parameter and optimizer settings label smoothing target embedding / pre softmax linear layer weight sharing. Elapse per epoch (on NVIDIA Titan X): Training set: 0.888 minutes Validation set: 0.011 minutes Testing coming soon. TODO Evaluation on the generated text. Attention weight plot. Acknowledgement The project structure, some scripts and the dataset preprocessing steps are heavily borrowed from OpenNMT/OpenNMT py . Thanks for the suggestions from @srush, @iamalbert and @ZiJianZhao.",Machine Translation,Machine Translation 2839,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Attention is all you need: A Pytorch Implementation 2 yueyongjiao This is a PyTorch implementation of the Transformer model in Attention is All You Need (Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, arxiv, 2017). A novel sequence to sequence framework utilizes the self attention mechanism , instead of Convolution operation or Recurrent structure, and achieve the state of the art performance on WMT 2014 English to German translation task . (2017/06/12) > The official Tensorflow Implementation can be found in: tensorflow/tensor2tensor . > To learn more about self attention mechanism, you could read A Structured Self attentive Sentence Embedding . The project support training and translation with trained model now. Note that this project is still a work in progress. If there is any suggestion or error, feel free to fire an issue to let me know. :) Requirement python 3.4+ pytorch 0.4.1+ tqdm numpy Usage Some useful tools: The example below uses the Moses tokenizer to prepare the data and the moses BLEU script for evaluation. bash wget wget wget sed i s/$RealBin\/..\/share\/nonbreaking_prefixes// tokenizer.perl wget WMT'16 Multimodal Translation: Multi30k (de en) An example of training for the WMT'16 Multimodal Translation task . 0) Download the data. bash mkdir p data/multi30k wget && tar xf training.tar.gz C data/multi30k && rm training.tar.gz wget && tar xf validation.tar.gz C data/multi30k && rm validation.tar.gz wget && tar xf mmt16_task1_test.tar.gz C data/multi30k && rm mmt16_task1_test.tar.gz 1) Preprocess the data. bash for l in en de; do for f in data/multi30k/ .$l; do if $f ! test ; then sed i $ d $f; fi; done; done for l in en de; do for f in data/multi30k/ .$l; do perl tokenizer.perl a no escape l $l q $f.atok; done; done python preprocess.py train_src data/multi30k/train.en.atok train_tgt data/multi30k/train.de.atok valid_src data/multi30k/val.en.atok valid_tgt data/multi30k/val.de.atok save_data data/multi30k.atok.low.pt 2) Train the model bash python train.py data data/multi30k.atok.low.pt save_model trained save_mode best proj_share_weight label_smoothing > If your source and target language share one common vocabulary, use the embs_share_weight flag to enable the model to share source/target word embedding. 3) Test the model bash python translate.py model trained.chkpt vocab data/multi30k.atok.low.pt src data/multi30k/test.en.atok no_cuda Performance Training Parameter settings: default parameter and optimizer settings label smoothing target embedding / pre softmax linear layer weight sharing. Elapse per epoch (on NVIDIA Titan X): Training set: 0.888 minutes Validation set: 0.011 minutes Testing coming soon. TODO Evaluation on the generated text. Attention weight plot. Acknowledgement The project structure, some scripts and the dataset preprocessing steps are heavily borrowed from OpenNMT/OpenNMT py . Thanks for the suggestions from @srush, @iamalbert and @ZiJianZhao.",Machine Translation,Machine Translation 2849,Natural Language Processing,Natural Language Processing,Natural Language Processing,"BERT \ \ \ \ \ New November 23rd, 2018: Un normalized multilingual model + Thai + Mongolian \ \ \ \ \ We uploaded a new multilingual model which does not perform any normalization on the input (no lower casing, accent stripping, or Unicode normalization), and additionally includes Thai and Mongolian. It is recommended to use this version for developing multilingual models, especially on languages with non Latin alphabets. This does not require any code changes, and can be downloaded here: BERT Base, Multilingual Cased : 104 languages, 12 layer, 768 hidden, 12 heads, 110M parameters \ \ \ \ \ New November 15th, 2018: SOTA SQuAD 2.0 System \ \ \ \ \ We released code changes to reproduce our 83% F1 SQuAD 2.0 system, which is currently 1st place on the leaderboard by 3%. See the SQuAD 2.0 section of the README for details. \ \ \ \ \ New November 5th, 2018: Third party PyTorch and Chainer versions of BERT available \ \ \ \ \ NLP researchers from HuggingFace made a PyTorch version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. Sosuke Kobayashi also made a Chainer version of BERT available (Thanks!) We were not involved in the creation or maintenance of the PyTorch implementation so please direct any questions towards the authors of that repository. \ \ \ \ \ New November 3rd, 2018: Multilingual and Chinese models available \ \ \ \ \ We have made two new BERT models available: BERT Base, Multilingual : 102 languages, 12 layer, 768 hidden, 12 heads, 110M parameters BERT Base, Chinese : Chinese Simplified and Traditional, 12 layer, 768 hidden, 12 heads, 110M parameters We use character based tokenization for Chinese, and WordPiece tokenization for all other languages. Both models should work out of the box without any code changes. We did update the implementation of BasicTokenizer in tokenization.py to support Chinese character tokenization, so please update if you forked it. However, we did not change the tokenization API. For more, see the Multilingual README . \ \ \ \ \ End new information \ \ \ \ \ Introduction BERT , or B idirectional E ncoder R epresentations from T ransformers, is a new method of pre training language representations which obtains state of the art results on a wide array of Natural Language Processing (NLP) tasks. Our academic paper which describes BERT in detail and provides full results on a number of tasks can be found here: To give a few numbers, here are the results on the SQuAD v1.1 question answering task: SQuAD v1.1 Leaderboard (Oct 8th 2018) Test EM Test F1 : : : : 1st Place Ensemble BERT 87.4 93.2 2nd Place Ensemble nlnet 86.0 91.7 1st Place Single Model BERT 85.1 91.8 2nd Place Single Model nlnet 83.5 90.1 And several natural language inference tasks: System MultiNLI Question NLI SWAG : : : : : : BERT 86.7 91.1 86.3 OpenAI GPT (Prev. SOTA) 82.2 88.1 75.0 Plus many other tasks. Moreover, these results were all obtained with almost no task specific neural network architecture design. If you already know what BERT is and you just want to get started, you can download the pre trained models ( pre trained models) and run a state of the art fine tuning ( fine tuning with bert) in only a few minutes. What is BERT? BERT is a method of pre training language representations, meaning that we train a general purpose language understanding model on a large text corpus (like Wikipedia), and then use that model for downstream NLP tasks that we care about (like question answering). BERT outperforms previous methods because it is the first unsupervised , deeply bidirectional system for pre training NLP. Unsupervised means that BERT was trained using only a plain text corpus, which is important because an enormous amount of plain text data is publicly available on the web in many languages. Pre trained representations can also either be context free or contextual , and contextual representations can further be unidirectional or bidirectional . Context free models such as word2vec or GloVe generate a single word embedding representation for each word in the vocabulary, so bank would have the same representation in bank deposit and river bank . Contextual models instead generate a representation of each word that is based on the other words in the sentence. BERT was built upon recent work in pre training contextual representations — including Semi supervised Sequence Learning , Generative Pre Training , ELMo , and ULMFit — but crucially these models are all unidirectional or shallowly bidirectional . This means that each word is only contextualized using the words to its left (or right). For example, in the sentence I made a bank deposit the unidirectional representation of bank is only based on I made a but not deposit . Some previous work does combine the representations from separate left context and right context models, but only in a shallow manner. BERT represents bank using both its left and right context — I made a ... deposit — starting from the very bottom of a deep neural network, so it is deeply bidirectional . BERT uses a simple approach for this: We mask out 15% of the words in the input, run the entire sequence through a deep bidirectional Transformer encoder, and then predict only the masked words. For example: Input: the man went to the MASK1 . he bought a MASK2 of milk. Labels: MASK1 store; MASK2 gallon In order to learn relationships between sentences, we also train on a simple task which can be generated from any monolingual corpus: Given two sentences A and B , is B the actual next sentence that comes after A , or just a random sentence from the corpus? Sentence A: the man went to the store . Sentence B: he bought a gallon of milk . Label: IsNextSentence Sentence A: the man went to the store . Sentence B: penguins are flightless . Label: NotNextSentence We then train a large model (12 layer to 24 layer Transformer) on a large corpus (Wikipedia + BookCorpus ) for a long time (1M update steps), and that's BERT. Using BERT has two stages: Pre training and fine tuning . Pre training is fairly expensive (four days on 4 to 16 Cloud TPUs), but is a one time procedure for each language (current models are English only, but multilingual models will be released in the near future). We are releasing a number of pre trained models from the paper which were pre trained at Google. Most NLP researchers will never need to pre train their own model from scratch. Fine tuning is inexpensive. All of the results in the paper can be replicated in at most 1 hour on a single Cloud TPU, or a few hours on a GPU, starting from the exact same pre trained model. SQuAD, for example, can be trained in around 30 minutes on a single Cloud TPU to achieve a Dev F1 score of 91.0%, which is the single system state of the art. The other important aspect of BERT is that it can be adapted to many types of NLP tasks very easily. In the paper, we demonstrate state of the art results on sentence level (e.g., SST 2), sentence pair level (e.g., MultiNLI), word level (e.g., NER), and span level (e.g., SQuAD) tasks with almost no task specific modifications. What has been released in this repository? We are releasing the following: TensorFlow code for the BERT model architecture (which is mostly a standard Transformer architecture). Pre trained checkpoints for both the lowercase and cased version of BERT Base and BERT Large from the paper. TensorFlow code for push button replication of the most important fine tuning experiments from the paper, including SQuAD, MultiNLI, and MRPC. All of the code in this repository works out of the box with CPU, GPU, and Cloud TPU. Pre trained models We are releasing the BERT Base and BERT Large models from the paper. Uncased means that the text has been lowercased before WordPiece tokenization, e.g., John Smith becomes john smith . The Uncased model also strips out any accent markers. Cased means that the true case and accent markers are preserved. Typically, the Uncased model is better unless you know that case information is important for your task (e.g., Named Entity Recognition or Part of Speech tagging). These models are all released under the same license as the source code (Apache 2.0). For information about the Multilingual and Chinese model, see the Multilingual README . When using a cased model, make sure to pass do_lower False to the training scripts. (Or pass do_lower_case False directly to FullTokenizer if you're using your own script.) The links to the models are here (right click, 'Save link as...' on the name): BERT Base, Uncased : 12 layer, 768 hidden, 12 heads, 110M parameters BERT Large, Uncased : 24 layer, 1024 hidden, 16 heads, 340M parameters BERT Base, Cased : 12 layer, 768 hidden, 12 heads , 110M parameters BERT Large, Cased : 24 layer, 1024 hidden, 16 heads, 340M parameters BERT Base, Multilingual Cased (New, recommended) : 104 languages, 12 layer, 768 hidden, 12 heads, 110M parameters BERT Base, Multilingual Uncased (Orig, not recommended) : 102 languages, 12 layer, 768 hidden, 12 heads, 110M parameters BERT Base, Chinese : Chinese Simplified and Traditional, 12 layer, 768 hidden, 12 heads, 110M parameters Each .zip file contains three items: A TensorFlow checkpoint ( bert_model.ckpt ) containing the pre trained weights (which is actually 3 files). A vocab file ( vocab.txt ) to map WordPiece to word id. A config file ( bert_config.json ) which specifies the hyperparameters of the model. Fine tuning with BERT Important : All results on the paper were fine tuned on a single Cloud TPU, which has 64GB of RAM. It is currently not possible to re produce most of the BERT Large results on the paper using a GPU with 12GB 16GB of RAM, because the maximum batch size that can fit in memory is too small. We are working on adding code to this repository which allows for much larger effective batch size on the GPU. See the section on out of memory issues ( out of memory issues) for more details. This code was tested with TensorFlow 1.11.0. It was tested with Python2 and Python3 (but more thoroughly with Python2, since this is what's used internally in Google). The fine tuning examples which use BERT Base should be able to run on a GPU that has at least 12GB of RAM using the hyperparameters given. Fine tuning with Cloud TPUs Most of the examples below assumes that you will be running training/evaluation on your local machine, using a GPU like a Titan X or GTX 1080. However, if you have access to a Cloud TPU that you want to train on, just add the following flags to run_classifier.py or run_squad.py : use_tpu True \ tpu_name $TPU_NAME Please see the Google Cloud TPU tutorial for how to use Cloud TPUs. Alternatively, you can use the Google Colab notebook BERT FineTuning with Cloud TPUs . On Cloud TPUs, the pretrained model and the output directory will need to be on Google Cloud Storage. For example, if you have a bucket named some_bucket , you might use the following flags instead: output_dir gs://some_bucket/my_output_dir/ The unzipped pre trained model files can also be found in the Google Cloud Storage folder gs://bert_models/2018_10_18 . For example: export BERT_BASE_DIR gs://bert_models/2018_10_18/uncased_L 12_H 768_A 12 Sentence (and sentence pair) classification tasks Before running this example you must download the GLUE data by running this script and unpack it to some directory $GLUE_DIR . Next, download the BERT Base checkpoint and unzip it to some directory $BERT_BASE_DIR . This example code fine tunes BERT Base on the Microsoft Research Paraphrase Corpus (MRPC) corpus, which only contains 3,600 examples and can fine tune in a few minutes on most GPUs. shell export BERT_BASE_DIR /path/to/bert/uncased_L 12_H 768_A 12 export GLUE_DIR /path/to/glue python run_classifier.py \ task_name MRPC \ do_train true \ do_eval true \ data_dir $GLUE_DIR/MRPC \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ max_seq_length 128 \ train_batch_size 32 \ learning_rate 2e 5 \ num_train_epochs 3.0 \ output_dir /tmp/mrpc_output/ You should see output like this: Eval results eval_accuracy 0.845588 eval_loss 0.505248 global_step 343 loss 0.505248 This means that the Dev set accuracy was 84.55%. Small sets like MRPC have a high variance in the Dev set accuracy, even when starting from the same pre training checkpoint. If you re run multiple times (making sure to point to different output_dir ), you should see results between 84% and 88%. A few other pre trained models are implemented off the shelf in run_classifier.py , so it should be straightforward to follow those examples to use BERT for any single sentence or sentence pair classification task. Note: You might see a message Running train on CPU . This really just means that it's running on something other than a Cloud TPU, which includes a GPU. Prediction from classifier Once you have trained your classifier you can use it in inference mode by using the do_predict true command. You need to have a file named test.tsv in the input folder. Output will be created in file called test_results.tsv in the output folder. Each line will contain output for each sample, columns are the class probabilities. shell export BERT_BASE_DIR /path/to/bert/uncased_L 12_H 768_A 12 export GLUE_DIR /path/to/glue export TRAINED_CLASSIFIER /path/to/fine/tuned/classifier python run_classifier.py \ task_name MRPC \ do_predict true \ data_dir $GLUE_DIR/MRPC \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $TRAINED_CLASSIFIER \ max_seq_length 128 \ output_dir /tmp/mrpc_output/ SQuAD 1.1 The Stanford Question Answering Dataset (SQuAD) is a popular question answering benchmark dataset. BERT (at the time of the release) obtains state of the art results on SQuAD with almost no task specific network architecture modifications or data augmentation. However, it does require semi complex data pre processing and post processing to deal with (a) the variable length nature of SQuAD context paragraphs, and (b) the character level answer annotations which are used for SQuAD training. This processing is implemented and documented in run_squad.py . To run on SQuAD, you will first need to download the dataset. The SQuAD website does not seem to link to the v1.1 datasets any longer, but the necessary files can be found here: train v1.1.json dev v1.1.json evaluate v1.1.py Download these to some directory $SQUAD_DIR . The state of the art SQuAD results from the paper currently cannot be reproduced on a 12GB 16GB GPU due to memory constraints (in fact, even batch size 1 does not seem to fit on a 12GB GPU using BERT Large ). However, a reasonably strong BERT Base model can be trained on the GPU with these hyperparameters: shell python run_squad.py \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ do_train True \ train_file $SQUAD_DIR/train v1.1.json \ do_predict True \ predict_file $SQUAD_DIR/dev v1.1.json \ train_batch_size 12 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir /tmp/squad_base/ The dev set predictions will be saved into a file called predictions.json in the output_dir : shell python $SQUAD_DIR/evaluate v1.1.py $SQUAD_DIR/dev v1.1.json ./squad/predictions.json Which should produce an output like this: shell { f1 : 88.41249612335034, exact_match : 81.2488174077578} You should see a result similar to the 88.5% reported in the paper for BERT Base . If you have access to a Cloud TPU, you can train with BERT Large . Here is a set of hyperparameters (slightly different than the paper) which consistently obtain around 90.5% 91.0% F1 single system trained only on SQuAD: shell python run_squad.py \ vocab_file $BERT_LARGE_DIR/vocab.txt \ bert_config_file $BERT_LARGE_DIR/bert_config.json \ init_checkpoint $BERT_LARGE_DIR/bert_model.ckpt \ do_train True \ train_file $SQUAD_DIR/train v1.1.json \ do_predict True \ predict_file $SQUAD_DIR/dev v1.1.json \ train_batch_size 24 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir gs://some_bucket/squad_large/ \ use_tpu True \ tpu_name $TPU_NAME For example, one random run with these parameters produces the following Dev scores: shell { f1 : 90.87081895814865, exact_match : 84.38978240302744} If you fine tune for one epoch on TriviaQA before this the results will be even better, but you will need to convert TriviaQA into the SQuAD json format. SQuAD 2.0 This model is also implemented and documented in run_squad.py . To run on SQuAD 2.0, you will first need to download the dataset. The necessary files can be found here: train v2.0.json dev v2.0.json evaluate v2.0.py Download these to some directory $SQUAD_DIR . On Cloud TPU you can run with BERT Large as follows: shell python run_squad.py \ vocab_file $BERT_LARGE_DIR/vocab.txt \ bert_config_file $BERT_LARGE_DIR/bert_config.json \ init_checkpoint $BERT_LARGE_DIR/bert_model.ckpt \ do_train True \ train_file $SQUAD_DIR/train v2.0.json \ do_predict True \ predict_file $SQUAD_DIR/dev v2.0.json \ train_batch_size 24 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir gs://some_bucket/squad_large/ \ use_tpu True \ tpu_name $TPU_NAME \ version_2_with_negative True We assume you have copied everything from the output directory to a local directory called ./squad/. The initial dev set predictions will be at ./squad/predictions.json and the differences between the score of no answer ( ) and the best non null answer for each question will be in the file ./squad/null_odds.json Run this script to tune a threshold for predicting null versus non null answers: python $SQUAD_DIR/evaluate v2.0.py $SQUAD_DIR/dev v2.0.json ./squad/predictions.json na prob file ./squad/null_odds.json Assume the script outputs best_f1_thresh THRESH. (Typical values are between 1.0 and 5.0). You can now re run the model to generate predictions with the derived threshold or alternatively you can extract the appropriate answers from ./squad/nbest_predictions.json. shell python run_squad.py \ vocab_file $BERT_LARGE_DIR/vocab.txt \ bert_config_file $BERT_LARGE_DIR/bert_config.json \ init_checkpoint $BERT_LARGE_DIR/bert_model.ckpt \ do_train False \ train_file $SQUAD_DIR/train v2.0.json \ do_predict True \ predict_file $SQUAD_DIR/dev v2.0.json \ train_batch_size 24 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir gs://some_bucket/squad_large/ \ use_tpu True \ tpu_name $TPU_NAME \ version_2_with_negative True \ null_score_diff_threshold $THRESH Out of memory issues All experiments in the paper were fine tuned on a Cloud TPU, which has 64GB of device RAM. Therefore, when using a GPU with 12GB 16GB of RAM, you are likely to encounter out of memory issues if you use the same hyperparameters described in the paper. The factors that affect memory usage are: max_seq_length : The released models were trained with sequence lengths up to 512, but you can fine tune with a shorter max sequence length to save substantial memory. This is controlled by the max_seq_length flag in our example code. train_batch_size : The memory usage is also directly proportional to the batch size. Model type, BERT Base vs. BERT Large : The BERT Large model requires significantly more memory than BERT Base . Optimizer : The default optimizer for BERT is Adam, which requires a lot of extra memory to store the m and v vectors. Switching to a more memory efficient optimizer can reduce memory usage, but can also affect the results. We have not experimented with other optimizers for fine tuning. Using the default training scripts ( run_classifier.py and run_squad.py ), we benchmarked the maximum batch size on single Titan X GPU (12GB RAM) with TensorFlow 1.11.0: System Seq Length Max Batch Size BERT Base 64 64 ... 128 32 ... 256 16 ... 320 14 ... 384 12 ... 512 6 BERT Large 64 12 ... 128 6 ... 256 2 ... 320 1 ... 384 0 ... 512 0 Unfortunately, these max batch sizes for BERT Large are so small that they will actually harm the model accuracy, regardless of the learning rate used. We are working on adding code to this repository which will allow much larger effective batch sizes to be used on the GPU. The code will be based on one (or both) of the following techniques: Gradient accumulation : The samples in a minibatch are typically independent with respect to gradient computation (excluding batch normalization, which is not used here). This means that the gradients of multiple smaller minibatches can be accumulated before performing the weight update, and this will be exactly equivalent to a single larger update. Gradient checkpointing : The major use of GPU/TPU memory during DNN training is caching the intermediate activations in the forward pass that are necessary for efficient computation in the backward pass. Gradient checkpointing trades memory for compute time by re computing the activations in an intelligent way. However, this is not implemented in the current release. Using BERT to extract fixed feature vectors (like ELMo) In certain cases, rather than fine tuning the entire pre trained model end to end, it can be beneficial to obtained pre trained contextual embeddings , which are fixed contextual representations of each input token generated from the hidden layers of the pre trained model. This should also mitigate most of the out of memory issues. As an example, we include the script extract_features.py which can be used like this: shell Sentence A and Sentence B are separated by the delimiter for sentence pair tasks like question answering and entailment. For single sentence inputs, put one sentence per line and DON'T use the delimiter. echo 'Who was Jim Henson ? Jim Henson was a puppeteer' > /tmp/input.txt python extract_features.py \ input_file /tmp/input.txt \ output_file /tmp/output.jsonl \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ layers 1, 2, 3, 4 \ max_seq_length 128 \ batch_size 8 This will create a JSON file (one line per line of input) containing the BERT activations from each Transformer layer specified by layers ( 1 is the final hidden layer of the Transformer, etc.) Note that this script will produce very large output files (by default, around 15kb for every input token). If you need to maintain alignment between the original and tokenized words (for projecting training labels), see the Tokenization ( tokenization) section below. Note: You may see a message like Could not find trained model in model_dir: /tmp/tmpuB5g5c, running initialization to predict. This message is expected, it just means that we are using the init_from_checkpoint() API rather than the saved model API. If you don't specify a checkpoint or specify an invalid checkpoint, this script will complain. Tokenization For sentence level tasks (or sentence pair) tasks, tokenization is very simple. Just follow the example code in run_classifier.py and extract_features.py . The basic procedure for sentence level tasks is: 1. Instantiate an instance of tokenizer tokenization.FullTokenizer 2. Tokenize the raw text with tokens tokenizer.tokenize(raw_text) . 3. Truncate to the maximum sequence length. (You can use up to 512, but you probably want to use shorter if possible for memory and speed reasons.) 4. Add the CLS and SEP tokens in the right place. Word level and span level tasks (e.g., SQuAD and NER) are more complex, since you need to maintain alignment between your input text and output text so that you can project your training labels. SQuAD is a particularly complex example because the input labels are character based, and SQuAD paragraphs are often longer than our maximum sequence length. See the code in run_squad.py to show how we handle this. Before we describe the general recipe for handling word level tasks, it's important to understand what exactly our tokenizer is doing. It has three main steps: 1. Text normalization : Convert all whitespace characters to spaces, and (for the Uncased model) lowercase the input and strip out accent markers. E.g., John Johanson's, → john johanson's, . 2. Punctuation splitting : Split all punctuation characters on both sides (i.e., add whitespace around all punctuation characters). Punctuation characters are defined as (a) Anything with a P Unicode class, (b) any non letter/number/space ASCII character (e.g., characters like $ which are technically not punctuation). E.g., john johanson's, → john johanson ' s , 3. WordPiece tokenization : Apply whitespace tokenization to the output of the above procedure, and apply WordPiece tokenization to each token separately. (Our implementation is directly based on the one from tensor2tensor , which is linked). E.g., john johanson ' s , → john johan son ' s , The advantage of this scheme is that it is compatible with most existing English tokenizers. For example, imagine that you have a part of speech tagging task which looks like this: Input: John Johanson 's house Labels: NNP NNP POS NN The tokenized output will look like this: Tokens: john johan son ' s house Crucially, this would be the same output as if the raw text were John Johanson's house (with no space before the 's ). If you have a pre tokenized representation with word level annotations, you can simply tokenize each input word independently, and deterministically maintain an original to tokenized alignment: python Input orig_tokens John , Johanson , 's , house labels NNP , NNP , POS , NN Output bert_tokens Token map will be an int > int mapping between the orig_tokens index and the bert_tokens index. orig_to_tok_map tokenizer tokenization.FullTokenizer( vocab_file vocab_file, do_lower_case True) bert_tokens.append( CLS ) for orig_token in orig_tokens: orig_to_tok_map.append(len(bert_tokens)) bert_tokens.extend(tokenizer.tokenize(orig_token)) bert_tokens.append( SEP ) bert_tokens CLS , john , johan , son , ' , s , house , SEP orig_to_tok_map 1, 2, 4, 6 Now orig_to_tok_map can be used to project labels to the tokenized representation. There are common English tokenization schemes which will cause a slight mismatch between how BERT was pre trained. For example, if your input tokenization splits off contractions like do n't , this will cause a mismatch. If it is possible to do so, you should pre process your data to convert these back to raw looking text, but if it's not possible, this mismatch is likely not a big deal. Pre training with BERT We are releasing code to do masked LM and next sentence prediction on an arbitrary text corpus. Note that this is not the exact code that was used for the paper (the original code was written in C++, and had some additional complexity), but this code does generate pre training data as described in the paper. Here's how to run the data generation. The input is a plain text file, with one sentence per line. (It is important that these be actual sentences for the next sentence prediction task). Documents are delimited by empty lines. The output is a set of tf.train.Example s serialized into TFRecord file format. You can perform sentence segmentation with an off the shelf NLP toolkit such as spaCy . The create_pretraining_data.py script will concatenate segments until they reach the maximum sequence length to minimize computational waste from padding (see the script for more details). However, you may want to intentionally add a slight amount of noise to your input data (e.g., randomly truncate 2% of input segments) to make it more robust to non sentential input during fine tuning. This script stores all of the examples for the entire input file in memory, so for large data files you should shard the input file and call the script multiple times. (You can pass in a file glob to run_pretraining.py , e.g., tf_examples.tf_record .) The max_predictions_per_seq is the maximum number of masked LM predictions per sequence. You should set this to around max_seq_length masked_lm_prob (the script doesn't do that automatically because the exact value needs to be passed to both scripts). shell python create_pretraining_data.py \ input_file ./sample_text.txt \ output_file /tmp/tf_examples.tfrecord \ vocab_file $BERT_BASE_DIR/vocab.txt \ do_lower_case True \ max_seq_length 128 \ max_predictions_per_seq 20 \ masked_lm_prob 0.15 \ random_seed 12345 \ dupe_factor 5 Here's how to run the pre training. Do not include init_checkpoint if you are pre training from scratch. The model configuration (including vocab size) is specified in bert_config_file . This demo code only pre trains for a small number of steps (20), but in practice you will probably want to set num_train_steps to 10000 steps or more. The max_seq_length and max_predictions_per_seq parameters passed to run_pretraining.py must be the same as create_pretraining_data.py . shell python run_pretraining.py \ input_file /tmp/tf_examples.tfrecord \ output_dir /tmp/pretraining_output \ do_train True \ do_eval True \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ train_batch_size 32 \ max_seq_length 128 \ max_predictions_per_seq 20 \ num_train_steps 20 \ num_warmup_steps 10 \ learning_rate 2e 5 This will produce an output like this: Eval results global_step 20 loss 0.0979674 masked_lm_accuracy 0.985479 masked_lm_loss 0.0979328 next_sentence_accuracy 1.0 next_sentence_loss 3.45724e 05 Note that since our sample_text.txt file is very small, this example training will overfit that data in only a few steps and produce unrealistically high accuracy numbers. Pre training tips and caveats If using your own vocabulary, make sure to change vocab_size in bert_config.json . If you use a larger vocabulary without changing this, you will likely get NaNs when training on GPU or TPU due to unchecked out of bounds access. If your task has a large domain specific corpus available (e.g., movie reviews or scientific papers ), it will likely be beneficial to run additional steps of pre training on your corpus, starting from the BERT checkpoint. The learning rate we used in the paper was 1e 4. However, if you are doing additional steps of pre training starting from an existing BERT checkpoint, you should use a smaller learning rate (e.g., 2e 5). Current BERT models are English only, but we do plan to release a multilingual model which has been pre trained on a lot of languages in the near future (hopefully by the end of November 2018). Longer sequences are disproportionately expensive because attention is quadratic to the sequence length. In other words, a batch of 64 sequences of length 512 is much more expensive than a batch of 256 sequences of length 128. The fully connected/convolutional cost is the same, but the attention cost is far greater for the 512 length sequences. Therefore, one good recipe is to pre train for, say, 90,000 steps with a sequence length of 128 and then for 10,000 additional steps with a sequence length of 512. The very long sequences are mostly needed to learn positional embeddings, which can be learned fairly quickly. Note that this does require generating the data twice with different values of max_seq_length . If you are pre training from scratch, be prepared that pre training is computationally expensive, especially on GPUs. If you are pre training from scratch, our recommended recipe is to pre train a BERT Base on a single preemptible Cloud TPU v2 , which takes about 2 weeks at a cost of about $500 USD (based on the pricing in October 2018). You will have to scale down the batch size when only training on a single Cloud TPU, compared to what was used in the paper. It is recommended to use the largest batch size that fits into TPU memory. Pre training data We will not be able to release the pre processed datasets used in the paper. For Wikipedia, the recommended pre processing is to download the latest dump , extract the text with WikiExtractor.py , and then apply any necessary cleanup to convert it into plain text. Unfortunately the researchers who collected the BookCorpus no longer have it available for public download. The Project Guttenberg Dataset is a somewhat smaller (200M word) collection of older books that are public domain. Common Crawl is another very large collection of text, but you will likely have to do substantial pre processing and cleanup to extract a usable corpus for pre training BERT. Learning a new WordPiece vocabulary This repository does not include code for learning a new WordPiece vocabulary. The reason is that the code used in the paper was implemented in C++ with dependencies on Google's internal libraries. For English, it is almost always better to just start with our vocabulary and pre trained models. For learning vocabularies of other languages, there are a number of open source options available. However, keep in mind that these are not compatible with our tokenization.py library: Google's SentencePiece library tensor2tensor's WordPiece generation script Rico Sennrich's Byte Pair Encoding library Using BERT in Colab If you want to use BERT with Colab , you can get started with the notebook BERT FineTuning with Cloud TPUs . At the time of this writing (October 31st, 2018), Colab users can access a Cloud TPU completely for free. Note: One per user, availability limited, requires a Google Cloud Platform account with storage (although storage may be purchased with free credit for signing up with GCP), and this capability may not longer be available in the future. Click on the BERT Colab that was just linked for more information. FAQ Is this code compatible with Cloud TPUs? What about GPUs? Yes, all of the code in this repository works out of the box with CPU, GPU, and Cloud TPU. However, GPU training is single GPU only. I am getting out of memory errors, what is wrong? See the section on out of memory issues ( out of memory issues) for more information. Is there a PyTorch version available? There is no official PyTorch implementation. However, NLP researchers from HuggingFace made a PyTorch version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. We were not involved in the creation or maintenance of the PyTorch implementation so please direct any questions towards the authors of that repository. Is there a Chainer version available? There is no official Chainer implementation. However, Sosuke Kobayashi made a Chainer version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. We were not involved in the creation or maintenance of the Chainer implementation so please direct any questions towards the authors of that repository. Will models in other languages be released? Yes, we plan to release a multi lingual BERT model in the near future. We cannot make promises about exactly which languages will be included, but it will likely be a single model which includes most of the languages which have a significantly sized Wikipedia. Will models larger than BERT Large be released? So far we have not attempted to train anything larger than BERT Large . It is possible that we will release larger models if we are able to obtain significant improvements. What license is this library released under? All code and models are released under the Apache 2.0 license. See the LICENSE file for more information. How do I cite BERT? For now, cite the Arxiv paper : @article{devlin2018bert, title {BERT: Pre training of Deep Bidirectional Transformers for Language Understanding}, author {Devlin, Jacob and Chang, Ming Wei and Lee, Kenton and Toutanova, Kristina}, journal {arXiv preprint arXiv:1810.04805}, year {2018} } If we submit the paper to a conference or journal, we will update the BibTeX. Disclaimer This is not an official Google product. Contact information For help or issues using BERT, please submit a GitHub issue. For personal communication related to BERT, please contact Jacob Devlin ( jacobdevlin@google.com ), Ming Wei Chang ( mingweichang@google.com ), or Kenton Lee ( kentonl@google.com ).",Machine Translation,Machine Translation 2856,Natural Language Processing,Natural Language Processing,Natural Language Processing,"BERT \ \ \ \ \ New February 7th, 2019: TfHub Module \ \ \ \ \ BERT has been uploaded to TensorFlow Hub . See run_classifier_with_tfhub.py for an example of how to use the TF Hub module, or run an example in the browser on Colab . \ \ \ \ \ New November 23rd, 2018: Un normalized multilingual model + Thai + Mongolian \ \ \ \ \ We uploaded a new multilingual model which does not perform any normalization on the input (no lower casing, accent stripping, or Unicode normalization), and additionally inclues Thai and Mongolian. It is recommended to use this version for developing multilingual models, especially on languages with non Latin alphabets. This does not require any code changes, and can be downloaded here: BERT Base, Multilingual Cased : 104 languages, 12 layer, 768 hidden, 12 heads, 110M parameters \ \ \ \ \ New November 15th, 2018: SOTA SQuAD 2.0 System \ \ \ \ \ We released code changes to reproduce our 83% F1 SQuAD 2.0 system, which is currently 1st place on the leaderboard by 3%. See the SQuAD 2.0 section of the README for details. \ \ \ \ \ New November 5th, 2018: Third party PyTorch and Chainer versions of BERT available \ \ \ \ \ NLP researchers from HuggingFace made a PyTorch version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. Sosuke Kobayashi also made a Chainer version of BERT available (Thanks!) We were not involved in the creation or maintenance of the PyTorch implementation so please direct any questions towards the authors of that repository. \ \ \ \ \ New November 3rd, 2018: Multilingual and Chinese models available \ \ \ \ \ We have made two new BERT models available: BERT Base, Multilingual (Not recommended, use Multilingual Cased instead) : 102 languages, 12 layer, 768 hidden, 12 heads, 110M parameters BERT Base, Chinese : Chinese Simplified and Traditional, 12 layer, 768 hidden, 12 heads, 110M parameters We use character based tokenization for Chinese, and WordPiece tokenization for all other languages. Both models should work out of the box without any code changes. We did update the implementation of BasicTokenizer in tokenization.py to support Chinese character tokenization, so please update if you forked it. However, we did not change the tokenization API. For more, see the Multilingual README . \ \ \ \ \ End new information \ \ \ \ \ Introduction BERT , or B idirectional E ncoder R epresentations from T ransformers, is a new method of pre training language representations which obtains state of the art results on a wide array of Natural Language Processing (NLP) tasks. Our academic paper which describes BERT in detail and provides full results on a number of tasks can be found here: To give a few numbers, here are the results on the SQuAD v1.1 question answering task: SQuAD v1.1 Leaderboard (Oct 8th 2018) Test EM Test F1 : : : : 1st Place Ensemble BERT 87.4 93.2 2nd Place Ensemble nlnet 86.0 91.7 1st Place Single Model BERT 85.1 91.8 2nd Place Single Model nlnet 83.5 90.1 And several natural language inference tasks: System MultiNLI Question NLI SWAG : : : : : : BERT 86.7 91.1 86.3 OpenAI GPT (Prev. SOTA) 82.2 88.1 75.0 Plus many other tasks. Moreover, these results were all obtained with almost no task specific neural network architecture design. If you already know what BERT is and you just want to get started, you can download the pre trained models ( pre trained models) and run a state of the art fine tuning ( fine tuning with bert) in only a few minutes. What is BERT? BERT is a method of pre training language representations, meaning that we train a general purpose language understanding model on a large text corpus (like Wikipedia), and then use that model for downstream NLP tasks that we care about (like question answering). BERT outperforms previous methods because it is the first unsupervised , deeply bidirectional system for pre training NLP. Unsupervised means that BERT was trained using only a plain text corpus, which is important because an enormous amount of plain text data is publicly available on the web in many languages. Pre trained representations can also either be context free or contextual , and contextual representations can further be unidirectional or bidirectional . Context free models such as word2vec or GloVe generate a single word embedding representation for each word in the vocabulary, so bank would have the same representation in bank deposit and river bank . Contextual models instead generate a representation of each word that is based on the other words in the sentence. BERT was built upon recent work in pre training contextual representations — including Semi supervised Sequence Learning , Generative Pre Training , ELMo , and ULMFit — but crucially these models are all unidirectional or shallowly bidirectional . This means that each word is only contextualized using the words to its left (or right). For example, in the sentence I made a bank deposit the unidirectional representation of bank is only based on I made a but not deposit . Some previous work does combine the representations from separate left context and right context models, but only in a shallow manner. BERT represents bank using both its left and right context — I made a ... deposit — starting from the very bottom of a deep neural network, so it is deeply bidirectional . BERT uses a simple approach for this: We mask out 15% of the words in the input, run the entire sequence through a deep bidirectional Transformer encoder, and then predict only the masked words. For example: Input: the man went to the MASK1 . he bought a MASK2 of milk. Labels: MASK1 store; MASK2 gallon In order to learn relationships between sentences, we also train on a simple task which can be generated from any monolingual corpus: Given two sentences A and B , is B the actual next sentence that comes after A , or just a random sentence from the corpus? Sentence A: the man went to the store . Sentence B: he bought a gallon of milk . Label: IsNextSentence Sentence A: the man went to the store . Sentence B: penguins are flightless . Label: NotNextSentence We then train a large model (12 layer to 24 layer Transformer) on a large corpus (Wikipedia + BookCorpus ) for a long time (1M update steps), and that's BERT. Using BERT has two stages: Pre training and fine tuning . Pre training is fairly expensive (four days on 4 to 16 Cloud TPUs), but is a one time procedure for each language (current models are English only, but multilingual models will be released in the near future). We are releasing a number of pre trained models from the paper which were pre trained at Google. Most NLP researchers will never need to pre train their own model from scratch. Fine tuning is inexpensive. All of the results in the paper can be replicated in at most 1 hour on a single Cloud TPU, or a few hours on a GPU, starting from the exact same pre trained model. SQuAD, for example, can be trained in around 30 minutes on a single Cloud TPU to achieve a Dev F1 score of 91.0%, which is the single system state of the art. The other important aspect of BERT is that it can be adapted to many types of NLP tasks very easily. In the paper, we demonstrate state of the art results on sentence level (e.g., SST 2), sentence pair level (e.g., MultiNLI), word level (e.g., NER), and span level (e.g., SQuAD) tasks with almost no task specific modifications. What has been released in this repository? We are releasing the following: TensorFlow code for the BERT model architecture (which is mostly a standard Transformer architecture). Pre trained checkpoints for both the lowercase and cased version of BERT Base and BERT Large from the paper. TensorFlow code for push button replication of the most important fine tuning experiments from the paper, including SQuAD, MultiNLI, and MRPC. All of the code in this repository works out of the box with CPU, GPU, and Cloud TPU. Pre trained models We are releasing the BERT Base and BERT Large models from the paper. Uncased means that the text has been lowercased before WordPiece tokenization, e.g., John Smith becomes john smith . The Uncased model also strips out any accent markers. Cased means that the true case and accent markers are preserved. Typically, the Uncased model is better unless you know that case information is important for your task (e.g., Named Entity Recognition or Part of Speech tagging). These models are all released under the same license as the source code (Apache 2.0). For information about the Multilingual and Chinese model, see the Multilingual README . When using a cased model, make sure to pass do_lower False to the training scripts. (Or pass do_lower_case False directly to FullTokenizer if you're using your own script.) The links to the models are here (right click, 'Save link as...' on the name): BERT Base, Uncased : 12 layer, 768 hidden, 12 heads, 110M parameters BERT Large, Uncased : 24 layer, 1024 hidden, 16 heads, 340M parameters BERT Base, Cased : 12 layer, 768 hidden, 12 heads , 110M parameters BERT Large, Cased : 24 layer, 1024 hidden, 16 heads, 340M parameters BERT Base, Multilingual Cased (New, recommended) : 104 languages, 12 layer, 768 hidden, 12 heads, 110M parameters BERT Base, Multilingual Uncased (Orig, not recommended) (Not recommended, use Multilingual Cased instead) : 102 languages, 12 layer, 768 hidden, 12 heads, 110M parameters BERT Base, Chinese : Chinese Simplified and Traditional, 12 layer, 768 hidden, 12 heads, 110M parameters Each .zip file contains three items: A TensorFlow checkpoint ( bert_model.ckpt ) containing the pre trained weights (which is actually 3 files). A vocab file ( vocab.txt ) to map WordPiece to word id. A config file ( bert_config.json ) which specifies the hyperparameters of the model. Fine tuning with BERT Important : All results on the paper were fine tuned on a single Cloud TPU, which has 64GB of RAM. It is currently not possible to re produce most of the BERT Large results on the paper using a GPU with 12GB 16GB of RAM, because the maximum batch size that can fit in memory is too small. We are working on adding code to this repository which allows for much larger effective batch size on the GPU. See the section on out of memory issues ( out of memory issues) for more details. This code was tested with TensorFlow 1.11.0. It was tested with Python2 and Python3 (but more thoroughly with Python2, since this is what's used internally in Google). The fine tuning examples which use BERT Base should be able to run on a GPU that has at least 12GB of RAM using the hyperparameters given. Fine tuning with Cloud TPUs Most of the examples below assumes that you will be running training/evaluation on your local machine, using a GPU like a Titan X or GTX 1080. However, if you have access to a Cloud TPU that you want to train on, just add the following flags to run_classifier.py or run_squad.py : use_tpu True \ tpu_name $TPU_NAME Please see the Google Cloud TPU tutorial for how to use Cloud TPUs. Alternatively, you can use the Google Colab notebook BERT FineTuning with Cloud TPUs . On Cloud TPUs, the pretrained model and the output directory will need to be on Google Cloud Storage. For example, if you have a bucket named some_bucket , you might use the following flags instead: output_dir gs://some_bucket/my_output_dir/ The unzipped pre trained model files can also be found in the Google Cloud Storage folder gs://bert_models/2018_10_18 . For example: export BERT_BASE_DIR gs://bert_models/2018_10_18/uncased_L 12_H 768_A 12 Sentence (and sentence pair) classification tasks Before running this example you must download the GLUE data by running this script and unpack it to some directory $GLUE_DIR . Next, download the BERT Base checkpoint and unzip it to some directory $BERT_BASE_DIR . This example code fine tunes BERT Base on the Microsoft Research Paraphrase Corpus (MRPC) corpus, which only contains 3,600 examples and can fine tune in a few minutes on most GPUs. shell export BERT_BASE_DIR /path/to/bert/uncased_L 12_H 768_A 12 export GLUE_DIR /path/to/glue python run_classifier.py \ task_name MRPC \ do_train true \ do_eval true \ data_dir $GLUE_DIR/MRPC \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ max_seq_length 128 \ train_batch_size 32 \ learning_rate 2e 5 \ num_train_epochs 3.0 \ output_dir /tmp/mrpc_output/ You should see output like this: Eval results eval_accuracy 0.845588 eval_loss 0.505248 global_step 343 loss 0.505248 This means that the Dev set accuracy was 84.55%. Small sets like MRPC have a high variance in the Dev set accuracy, even when starting from the same pre training checkpoint. If you re run multiple times (making sure to point to different output_dir ), you should see results between 84% and 88%. A few other pre trained models are implemented off the shelf in run_classifier.py , so it should be straightforward to follow those examples to use BERT for any single sentence or sentence pair classification task. Note: You might see a message Running train on CPU . This really just means that it's running on something other than a Cloud TPU, which includes a GPU. Prediction from classifier Once you have trained your classifier you can use it in inference mode by using the do_predict true command. You need to have a file named test.tsv in the input folder. Output will be created in file called test_results.tsv in the output folder. Each line will contain output for each sample, columns are the class probabilities. shell export BERT_BASE_DIR /path/to/bert/uncased_L 12_H 768_A 12 export GLUE_DIR /path/to/glue export TRAINED_CLASSIFIER /path/to/fine/tuned/classifier python run_classifier.py \ task_name MRPC \ do_predict true \ data_dir $GLUE_DIR/MRPC \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $TRAINED_CLASSIFIER \ max_seq_length 128 \ output_dir /tmp/mrpc_output/ SQuAD 1.1 The Stanford Question Answering Dataset (SQuAD) is a popular question answering benchmark dataset. BERT (at the time of the release) obtains state of the art results on SQuAD with almost no task specific network architecture modifications or data augmentation. However, it does require semi complex data pre processing and post processing to deal with (a) the variable length nature of SQuAD context paragraphs, and (b) the character level answer annotations which are used for SQuAD training. This processing is implemented and documented in run_squad.py . To run on SQuAD, you will first need to download the dataset. The SQuAD website does not seem to link to the v1.1 datasets any longer, but the necessary files can be found here: train v1.1.json dev v1.1.json evaluate v1.1.py Download these to some directory $SQUAD_DIR . The state of the art SQuAD results from the paper currently cannot be reproduced on a 12GB 16GB GPU due to memory constraints (in fact, even batch size 1 does not seem to fit on a 12GB GPU using BERT Large ). However, a reasonably strong BERT Base model can be trained on the GPU with these hyperparameters: shell python run_squad.py \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ do_train True \ train_file $SQUAD_DIR/train v1.1.json \ do_predict True \ predict_file $SQUAD_DIR/dev v1.1.json \ train_batch_size 12 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir /tmp/squad_base/ The dev set predictions will be saved into a file called predictions.json in the output_dir : shell python $SQUAD_DIR/evaluate v1.1.py $SQUAD_DIR/dev v1.1.json ./squad/predictions.json Which should produce an output like this: shell { f1 : 88.41249612335034, exact_match : 81.2488174077578} You should see a result similar to the 88.5% reported in the paper for BERT Base . If you have access to a Cloud TPU, you can train with BERT Large . Here is a set of hyperparameters (slightly different than the paper) which consistently obtain around 90.5% 91.0% F1 single system trained only on SQuAD: shell python run_squad.py \ vocab_file $BERT_LARGE_DIR/vocab.txt \ bert_config_file $BERT_LARGE_DIR/bert_config.json \ init_checkpoint $BERT_LARGE_DIR/bert_model.ckpt \ do_train True \ train_file $SQUAD_DIR/train v1.1.json \ do_predict True \ predict_file $SQUAD_DIR/dev v1.1.json \ train_batch_size 24 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir gs://some_bucket/squad_large/ \ use_tpu True \ tpu_name $TPU_NAME For example, one random run with these parameters produces the following Dev scores: shell { f1 : 90.87081895814865, exact_match : 84.38978240302744} If you fine tune for one epoch on TriviaQA before this the results will be even better, but you will need to convert TriviaQA into the SQuAD json format. SQuAD 2.0 This model is also implemented and documented in run_squad.py . To run on SQuAD 2.0, you will first need to download the dataset. The necessary files can be found here: train v2.0.json dev v2.0.json evaluate v2.0.py Download these to some directory $SQUAD_DIR . On Cloud TPU you can run with BERT Large as follows: shell python run_squad.py \ vocab_file $BERT_LARGE_DIR/vocab.txt \ bert_config_file $BERT_LARGE_DIR/bert_config.json \ init_checkpoint $BERT_LARGE_DIR/bert_model.ckpt \ do_train True \ train_file $SQUAD_DIR/train v2.0.json \ do_predict True \ predict_file $SQUAD_DIR/dev v2.0.json \ train_batch_size 24 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir gs://some_bucket/squad_large/ \ use_tpu True \ tpu_name $TPU_NAME \ version_2_with_negative True We assume you have copied everything from the output directory to a local directory called ./squad/. The initial dev set predictions will be at ./squad/predictions.json and the differences between the score of no answer ( ) and the best non null answer for each question will be in the file ./squad/null_odds.json Run this script to tune a threshold for predicting null versus non null answers: python $SQUAD_DIR/evaluate v2.0.py $SQUAD_DIR/dev v2.0.json ./squad/predictions.json na prob file ./squad/null_odds.json Assume the script outputs best_f1_thresh THRESH. (Typical values are between 1.0 and 5.0). You can now re run the model to generate predictions with the derived threshold or alternatively you can extract the appropriate answers from ./squad/nbest_predictions.json. shell python run_squad.py \ vocab_file $BERT_LARGE_DIR/vocab.txt \ bert_config_file $BERT_LARGE_DIR/bert_config.json \ init_checkpoint $BERT_LARGE_DIR/bert_model.ckpt \ do_train False \ train_file $SQUAD_DIR/train v2.0.json \ do_predict True \ predict_file $SQUAD_DIR/dev v2.0.json \ train_batch_size 24 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir gs://some_bucket/squad_large/ \ use_tpu True \ tpu_name $TPU_NAME \ version_2_with_negative True \ null_score_diff_threshold $THRESH Out of memory issues All experiments in the paper were fine tuned on a Cloud TPU, which has 64GB of device RAM. Therefore, when using a GPU with 12GB 16GB of RAM, you are likely to encounter out of memory issues if you use the same hyperparameters described in the paper. The factors that affect memory usage are: max_seq_length : The released models were trained with sequence lengths up to 512, but you can fine tune with a shorter max sequence length to save substantial memory. This is controlled by the max_seq_length flag in our example code. train_batch_size : The memory usage is also directly proportional to the batch size. Model type, BERT Base vs. BERT Large : The BERT Large model requires significantly more memory than BERT Base . Optimizer : The default optimizer for BERT is Adam, which requires a lot of extra memory to store the m and v vectors. Switching to a more memory efficient optimizer can reduce memory usage, but can also affect the results. We have not experimented with other optimizers for fine tuning. Using the default training scripts ( run_classifier.py and run_squad.py ), we benchmarked the maximum batch size on single Titan X GPU (12GB RAM) with TensorFlow 1.11.0: System Seq Length Max Batch Size BERT Base 64 64 ... 128 32 ... 256 16 ... 320 14 ... 384 12 ... 512 6 BERT Large 64 12 ... 128 6 ... 256 2 ... 320 1 ... 384 0 ... 512 0 Unfortunately, these max batch sizes for BERT Large are so small that they will actually harm the model accuracy, regardless of the learning rate used. We are working on adding code to this repository which will allow much larger effective batch sizes to be used on the GPU. The code will be based on one (or both) of the following techniques: Gradient accumulation : The samples in a minibatch are typically independent with respect to gradient computation (excluding batch normalization, which is not used here). This means that the gradients of multiple smaller minibatches can be accumulated before performing the weight update, and this will be exactly equivalent to a single larger update. Gradient checkpointing : The major use of GPU/TPU memory during DNN training is caching the intermediate activations in the forward pass that are necessary for efficient computation in the backward pass. Gradient checkpointing trades memory for compute time by re computing the activations in an intelligent way. However, this is not implemented in the current release. Using BERT to extract fixed feature vectors (like ELMo) In certain cases, rather than fine tuning the entire pre trained model end to end, it can be beneficial to obtained pre trained contextual embeddings , which are fixed contextual representations of each input token generated from the hidden layers of the pre trained model. This should also mitigate most of the out of memory issues. As an example, we include the script extract_features.py which can be used like this: shell Sentence A and Sentence B are separated by the delimiter for sentence pair tasks like question answering and entailment. For single sentence inputs, put one sentence per line and DON'T use the delimiter. echo 'Who was Jim Henson ? Jim Henson was a puppeteer' > /tmp/input.txt python extract_features.py \ input_file /tmp/input.txt \ output_file /tmp/output.jsonl \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ layers 1, 2, 3, 4 \ max_seq_length 128 \ batch_size 8 This will create a JSON file (one line per line of input) containing the BERT activations from each Transformer layer specified by layers ( 1 is the final hidden layer of the Transformer, etc.) Note that this script will produce very large output files (by default, around 15kb for every input token). If you need to maintain alignment between the original and tokenized words (for projecting training labels), see the Tokenization ( tokenization) section below. Note: You may see a message like Could not find trained model in model_dir: /tmp/tmpuB5g5c, running initialization to predict. This message is expected, it just means that we are using the init_from_checkpoint() API rather than the saved model API. If you don't specify a checkpoint or specify an invalid checkpoint, this script will complain. Tokenization For sentence level tasks (or sentence pair) tasks, tokenization is very simple. Just follow the example code in run_classifier.py and extract_features.py . The basic procedure for sentence level tasks is: 1. Instantiate an instance of tokenizer tokenization.FullTokenizer 2. Tokenize the raw text with tokens tokenizer.tokenize(raw_text) . 3. Truncate to the maximum sequence length. (You can use up to 512, but you probably want to use shorter if possible for memory and speed reasons.) 4. Add the CLS and SEP tokens in the right place. Word level and span level tasks (e.g., SQuAD and NER) are more complex, since you need to maintain alignment between your input text and output text so that you can project your training labels. SQuAD is a particularly complex example because the input labels are character based, and SQuAD paragraphs are often longer than our maximum sequence length. See the code in run_squad.py to show how we handle this. Before we describe the general recipe for handling word level tasks, it's important to understand what exactly our tokenizer is doing. It has three main steps: 1. Text normalization : Convert all whitespace characters to spaces, and (for the Uncased model) lowercase the input and strip out accent markers. E.g., John Johanson's, → john johanson's, . 2. Punctuation splitting : Split all punctuation characters on both sides (i.e., add whitespace around all punctuation characters). Punctuation characters are defined as (a) Anything with a P Unicode class, (b) any non letter/number/space ASCII character (e.g., characters like $ which are technically not punctuation). E.g., john johanson's, → john johanson ' s , 3. WordPiece tokenization : Apply whitespace tokenization to the output of the above procedure, and apply WordPiece tokenization to each token separately. (Our implementation is directly based on the one from tensor2tensor , which is linked). E.g., john johanson ' s , → john johan son ' s , The advantage of this scheme is that it is compatible with most existing English tokenizers. For example, imagine that you have a part of speech tagging task which looks like this: Input: John Johanson 's house Labels: NNP NNP POS NN The tokenized output will look like this: Tokens: john johan son ' s house Crucially, this would be the same output as if the raw text were John Johanson's house (with no space before the 's ). If you have a pre tokenized representation with word level annotations, you can simply tokenize each input word independently, and deterministically maintain an original to tokenized alignment: python Input orig_tokens John , Johanson , 's , house labels NNP , NNP , POS , NN Output bert_tokens Token map will be an int > int mapping between the orig_tokens index and the bert_tokens index. orig_to_tok_map tokenizer tokenization.FullTokenizer( vocab_file vocab_file, do_lower_case True) bert_tokens.append( CLS ) for orig_token in orig_tokens: orig_to_tok_map.append(len(bert_tokens)) bert_tokens.extend(tokenizer.tokenize(orig_token)) bert_tokens.append( SEP ) bert_tokens CLS , john , johan , son , ' , s , house , SEP orig_to_tok_map 1, 2, 4, 6 Now orig_to_tok_map can be used to project labels to the tokenized representation. There are common English tokenization schemes which will cause a slight mismatch between how BERT was pre trained. For example, if your input tokenization splits off contractions like do n't , this will cause a mismatch. If it is possible to do so, you should pre process your data to convert these back to raw looking text, but if it's not possible, this mismatch is likely not a big deal. Pre training with BERT We are releasing code to do masked LM and next sentence prediction on an arbitrary text corpus. Note that this is not the exact code that was used for the paper (the original code was written in C++, and had some additional complexity), but this code does generate pre training data as described in the paper. Here's how to run the data generation. The input is a plain text file, with one sentence per line. (It is important that these be actual sentences for the next sentence prediction task). Documents are delimited by empty lines. The output is a set of tf.train.Example s serialized into TFRecord file format. You can perform sentence segmentation with an off the shelf NLP toolkit such as spaCy . The create_pretraining_data.py script will concatenate segments until they reach the maximum sequence length to minimize computational waste from padding (see the script for more details). However, you may want to intentionally add a slight amount of noise to your input data (e.g., randomly truncate 2% of input segments) to make it more robust to non sentential input during fine tuning. This script stores all of the examples for the entire input file in memory, so for large data files you should shard the input file and call the script multiple times. (You can pass in a file glob to run_pretraining.py , e.g., tf_examples.tf_record .) The max_predictions_per_seq is the maximum number of masked LM predictions per sequence. You should set this to around max_seq_length masked_lm_prob (the script doesn't do that automatically because the exact value needs to be passed to both scripts). shell python create_pretraining_data.py \ input_file ./sample_text.txt \ output_file /tmp/tf_examples.tfrecord \ vocab_file $BERT_BASE_DIR/vocab.txt \ do_lower_case True \ max_seq_length 128 \ max_predictions_per_seq 20 \ masked_lm_prob 0.15 \ random_seed 12345 \ dupe_factor 5 Here's how to run the pre training. Do not include init_checkpoint if you are pre training from scratch. The model configuration (including vocab size) is specified in bert_config_file . This demo code only pre trains for a small number of steps (20), but in practice you will probably want to set num_train_steps to 10000 steps or more. The max_seq_length and max_predictions_per_seq parameters passed to run_pretraining.py must be the same as create_pretraining_data.py . shell python run_pretraining.py \ input_file /tmp/tf_examples.tfrecord \ output_dir /tmp/pretraining_output \ do_train True \ do_eval True \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ train_batch_size 32 \ max_seq_length 128 \ max_predictions_per_seq 20 \ num_train_steps 20 \ num_warmup_steps 10 \ learning_rate 2e 5 This will produce an output like this: Eval results global_step 20 loss 0.0979674 masked_lm_accuracy 0.985479 masked_lm_loss 0.0979328 next_sentence_accuracy 1.0 next_sentence_loss 3.45724e 05 Note that since our sample_text.txt file is very small, this example training will overfit that data in only a few steps and produce unrealistically high accuracy numbers. Pre training tips and caveats If using your own vocabulary, make sure to change vocab_size in bert_config.json . If you use a larger vocabulary without changing this, you will likely get NaNs when training on GPU or TPU due to unchecked out of bounds access. If your task has a large domain specific corpus available (e.g., movie reviews or scientific papers ), it will likely be beneficial to run additional steps of pre training on your corpus, starting from the BERT checkpoint. The learning rate we used in the paper was 1e 4. However, if you are doing additional steps of pre training starting from an existing BERT checkpoint, you should use a smaller learning rate (e.g., 2e 5). Current BERT models are English only, but we do plan to release a multilingual model which has been pre trained on a lot of languages in the near future (hopefully by the end of November 2018). Longer sequences are disproportionately expensive because attention is quadratic to the sequence length. In other words, a batch of 64 sequences of length 512 is much more expensive than a batch of 256 sequences of length 128. The fully connected/convolutional cost is the same, but the attention cost is far greater for the 512 length sequences. Therefore, one good recipe is to pre train for, say, 90,000 steps with a sequence length of 128 and then for 10,000 additional steps with a sequence length of 512. The very long sequences are mostly needed to learn positional embeddings, which can be learned fairly quickly. Note that this does require generating the data twice with different values of max_seq_length . If you are pre training from scratch, be prepared that pre training is computationally expensive, especially on GPUs. If you are pre training from scratch, our recommended recipe is to pre train a BERT Base on a single preemptible Cloud TPU v2 , which takes about 2 weeks at a cost of about $500 USD (based on the pricing in October 2018). You will have to scale down the batch size when only training on a single Cloud TPU, compared to what was used in the paper. It is recommended to use the largest batch size that fits into TPU memory. Pre training data We will not be able to release the pre processed datasets used in the paper. For Wikipedia, the recommended pre processing is to download the latest dump , extract the text with WikiExtractor.py , and then apply any necessary cleanup to convert it into plain text. Unfortunately the researchers who collected the BookCorpus no longer have it available for public download. The Project Guttenberg Dataset is a somewhat smaller (200M word) collection of older books that are public domain. Common Crawl is another very large collection of text, but you will likely have to do substantial pre processing and cleanup to extract a usable corpus for pre training BERT. Learning a new WordPiece vocabulary This repository does not include code for learning a new WordPiece vocabulary. The reason is that the code used in the paper was implemented in C++ with dependencies on Google's internal libraries. For English, it is almost always better to just start with our vocabulary and pre trained models. For learning vocabularies of other languages, there are a number of open source options available. However, keep in mind that these are not compatible with our tokenization.py library: Google's SentencePiece library tensor2tensor's WordPiece generation script Rico Sennrich's Byte Pair Encoding library Using BERT in Colab If you want to use BERT with Colab , you can get started with the notebook BERT FineTuning with Cloud TPUs . At the time of this writing (October 31st, 2018), Colab users can access a Cloud TPU completely for free. Note: One per user, availability limited, requires a Google Cloud Platform account with storage (although storage may be purchased with free credit for signing up with GCP), and this capability may not longer be available in the future. Click on the BERT Colab that was just linked for more information. FAQ Is this code compatible with Cloud TPUs? What about GPUs? Yes, all of the code in this repository works out of the box with CPU, GPU, and Cloud TPU. However, GPU training is single GPU only. I am getting out of memory errors, what is wrong? See the section on out of memory issues ( out of memory issues) for more information. Is there a PyTorch version available? There is no official PyTorch implementation. However, NLP researchers from HuggingFace made a PyTorch version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. We were not involved in the creation or maintenance of the PyTorch implementation so please direct any questions towards the authors of that repository. Is there a Chainer version available? There is no official Chainer implementation. However, Sosuke Kobayashi made a Chainer version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. We were not involved in the creation or maintenance of the Chainer implementation so please direct any questions towards the authors of that repository. Will models in other languages be released? Yes, we plan to release a multi lingual BERT model in the near future. We cannot make promises about exactly which languages will be included, but it will likely be a single model which includes most of the languages which have a significantly sized Wikipedia. Will models larger than BERT Large be released? So far we have not attempted to train anything larger than BERT Large . It is possible that we will release larger models if we are able to obtain significant improvements. What license is this library released under? All code and models are released under the Apache 2.0 license. See the LICENSE file for more information. How do I cite BERT? For now, cite the Arxiv paper : @article{devlin2018bert, title {BERT: Pre training of Deep Bidirectional Transformers for Language Understanding}, author {Devlin, Jacob and Chang, Ming Wei and Lee, Kenton and Toutanova, Kristina}, journal {arXiv preprint arXiv:1810.04805}, year {2018} } If we submit the paper to a conference or journal, we will update the BibTeX. Disclaimer This is not an official Google product. Contact information For help or issues using BERT, please submit a GitHub issue. For personal communication related to BERT, please contact Jacob Devlin ( jacobdevlin@google.com ), Ming Wei Chang ( mingweichang@google.com ), or Kenton Lee ( kentonl@google.com ).",Machine Translation,Machine Translation 2880,Natural Language Processing,Natural Language Processing,Natural Language Processing,"BERT \ \ \ \ \ New November 5th, 2018: Third party PyTorch and Chainer versions of BERT available \ \ \ \ \ NLP researchers from HuggingFace made a PyTorch version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. Sosuke Kobayashi also made a Chainer version of BERT available (Thanks!) We were not involved in the creation or maintenance of the PyTorch implementation so please direct any questions towards the authors of that repository. \ \ \ \ \ New November 3rd, 2018: Multilingual and Chinese models available \ \ \ \ \ We have made two new BERT models available: BERT Base, Multilingual : 102 languages, 12 layer, 768 hidden, 12 heads, 110M parameters BERT Base, Chinese : Chinese Simplified and Traditional, 12 layer, 768 hidden, 12 heads, 110M parameters We use character based tokenization for Chinese, and WordPiece tokenization for all other languages. Both models should work out of the box without any code changes. We did update the implementation of BasicTokenizer in tokenization.py to support Chinese character tokenization, so please update if you forked it. However, we did not change the tokenization API. For more, see the Multilingual README . \ \ \ \ \ End new information \ \ \ \ \ Introduction BERT , or B idirectional E ncoder R epresentations from T ransformers, is a new method of pre training language representations which obtains state of the art results on a wide array of Natural Language Processing (NLP) tasks. Our academic paper which describes BERT in detail and provides full results on a number of tasks can be found here: To give a few numbers, here are the results on the SQuAD v1.1 question answering task: SQuAD v1.1 Leaderboard (Oct 8th 2018) Test EM Test F1 : : : : 1st Place Ensemble BERT 87.4 93.2 2nd Place Ensemble nlnet 86.0 91.7 1st Place Single Model BERT 85.1 91.8 2nd Place Single Model nlnet 83.5 90.1 And several natural language inference tasks: System MultiNLI Question NLI SWAG : : : : : : BERT 86.7 91.1 86.3 OpenAI GPT (Prev. SOTA) 82.2 88.1 75.0 Plus many other tasks. Moreover, these results were all obtained with almost no task specific neural network architecture design. If you already know what BERT is and you just want to get started, you can download the pre trained models ( pre trained models) and run a state of the art fine tuning ( fine tuning with bert) in only a few minutes. What is BERT? BERT is a method of pre training language representations, meaning that we train a general purpose language understanding model on a large text corpus (like Wikipedia), and then use that model for downstream NLP tasks that we care about (like question answering). BERT outperforms previous methods because it is the first unsupervised , deeply bidirectional system for pre training NLP. Unsupervised means that BERT was trained using only a plain text corpus, which is important because an enormous amount of plain text data is publicly available on the web in many languages. Pre trained representations can also either be context free or contextual , and contextual representations can further be unidirectional or bidirectional . Context free models such as word2vec or GloVe generate a single word embedding representation for each word in the vocabulary, so bank would have the same representation in bank deposit and river bank . Contextual models instead generate a representation of each word that is based on the other words in the sentence. BERT was built upon recent work in pre training contextual representations — including Semi supervised Sequence Learning , Generative Pre Training , ELMo , and ULMFit — but crucially these models are all unidirectional or shallowly bidirectional . This means that each word is only contextualized using the words to its left (or right). For example, in the sentence I made a bank deposit the unidirectional representation of bank is only based on I made a but not deposit . Some previous work does combine the representations from separate left context and right context models, but only in a shallow manner. BERT represents bank using both its left and right context — I made a ... deposit — starting from the very bottom of a deep neural network, so it is deeply bidirectional . BERT uses a simple approach for this: We mask out 15% of the words in the input, run the entire sequence through a deep bidirectional Transformer encoder, and then predict only the masked words. For example: Input: the man went to the MASK1 . he bought a MASK2 of milk. Labels: MASK1 store; MASK2 gallon In order to learn relationships between sentences, we also train on a simple task which can be generated from any monolingual corpus: Given two sentences A and B , is B the actual next sentence that comes after A , or just a random sentence from the corpus? Sentence A: the man went to the store . Sentence B: he bought a gallon of milk . Label: IsNextSentence Sentence A: the man went to the store . Sentence B: penguins are flightless . Label: NotNextSentence We then train a large model (12 layer to 24 layer Transformer) on a large corpus (Wikipedia + BookCorpus ) for a long time (1M update steps), and that's BERT. Using BERT has two stages: Pre training and fine tuning . Pre training is fairly expensive (four days on 4 to 16 Cloud TPUs), but is a one time procedure for each language (current models are English only, but multilingual models will be released in the near future). We are releasing a number of pre trained models from the paper which were pre trained at Google. Most NLP researchers will never need to pre train their own model from scratch. Fine tuning is inexpensive. All of the results in the paper can be replicated in at most 1 hour on a single Cloud TPU, or a few hours on a GPU, starting from the exact same pre trained model. SQuAD, for example, can be trained in around 30 minutes on a single Cloud TPU to achieve a Dev F1 score of 91.0%, which is the single system state of the art. The other important aspect of BERT is that it can be adapted to many types of NLP tasks very easily. In the paper, we demonstrate state of the art results on sentence level (e.g., SST 2), sentence pair level (e.g., MultiNLI), word level (e.g., NER), and span level (e.g., SQuAD) tasks with almost no task specific modifications. What has been released in this repository? We are releasing the following: TensorFlow code for the BERT model architecture (which is mostly a standard Transformer architecture). Pre trained checkpoints for both the lowercase and cased version of BERT Base and BERT Large from the paper. TensorFlow code for push button replication of the most important fine tuning experiments from the paper, including SQuAD, MultiNLI, and MRPC. All of the code in this repository works out of the box with CPU, GPU, and Cloud TPU. Pre trained models We are releasing the BERT Base and BERT Large models from the paper. Uncased means that the text has been lowercased before WordPiece tokenization, e.g., John Smith becomes john smith . The Uncased model also strips out any accent markers. Cased means that the true case and accent markers are preserved. Typically, the Uncased model is better unless you know that case information is important for your task (e.g., Named Entity Recognition or Part of Speech tagging). These models are all released under the same license as the source code (Apache 2.0). For information about the Multilingual and Chinese model, see the Multilingual README . The links to the models are here (right click, 'Save link as...' on the name): BERT Base, Uncased : 12 layer, 768 hidden, 12 heads, 110M parameters BERT Large, Uncased : 24 layer, 1024 hidden, 16 heads, 340M parameters BERT Base, Cased : 12 layer, 768 hidden, 12 heads , 110M parameters BERT Large, Cased : 24 layer, 1024 hidden, 16 heads, 340M parameters (Not available yet. Needs to be re generated). BERT Base, Multilingual : 102 languages, 12 layer, 768 hidden, 12 heads, 110M parameters BERT Base, Chinese : Chinese Simplified and Traditional, 12 layer, 768 hidden, 12 heads, 110M parameters Each .zip file contains three items: A TensorFlow checkpoint ( bert_model.ckpt ) containing the pre trained weights (which is actually 3 files). A vocab file ( vocab.txt ) to map WordPiece to word id. A config file ( bert_config.json ) which specifies the hyperparameters of the model. Fine tuning with BERT Important : All results on the paper were fine tuned on a single Cloud TPU, which has 64GB of RAM. It is currently not possible to re produce most of the BERT Large results on the paper using a GPU with 12GB 16GB of RAM, because the maximum batch size that can fit in memory is too small. We are working on adding code to this repository which allows for much larger effective batch size on the GPU. See the section on out of memory issues ( out of memory issues) for more details. This code was tested with TensorFlow 1.11.0. It was tested with Python2 and Python3 (but more thoroughly with Python2, since this is what's used internally in Google). The fine tuning examples which use BERT Base should be able to run on a GPU that has at least 12GB of RAM using the hyperparameters given. Fine tuning with Cloud TPUs Most of the examples below assumes that you will be running training/evaluation on your local machine, using a GPU like a Titan X or GTX 1080. However, if you have access to a Cloud TPU that you want to train on, just add the following flags to run_classifier.py or run_squad.py : use_tpu True \ tpu_name $TPU_NAME Please see the Google Cloud TPU tutorial for how to use Cloud TPUs. Alternatively, you can use the Google Colab notebook BERT FineTuning with Cloud TPUs . On Cloud TPUs, the pretrained model and the output directory will need to be on Google Cloud Storage. For example, if you have a bucket named some_bucket , you might use the following flags instead: output_dir gs://some_bucket/my_output_dir/ The unzipped pre trained model files can also be found in the Google Cloud Storage folder gs://bert_models/2018_10_18 . For example: export BERT_BASE_DIR gs://bert_models/2018_10_18/uncased_L 12_H 768_A 12 Sentence (and sentence pair) classification tasks Before running this example you must download the GLUE data by running this script and unpack it to some directory $GLUE_DIR . Next, download the BERT Base checkpoint and unzip it to some directory $BERT_BASE_DIR . This example code fine tunes BERT Base on the Microsoft Research Paraphrase Corpus (MRPC) corpus, which only contains 3,600 examples and can fine tune in a few minutes on most GPUs. shell export BERT_BASE_DIR /path/to/bert/uncased_L 12_H 768_A 12 export GLUE_DIR /path/to/glue python run_classifier.py \ task_name MRPC \ do_train true \ do_eval true \ data_dir $GLUE_DIR/MRPC \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ max_seq_length 128 \ train_batch_size 32 \ learning_rate 2e 5 \ num_train_epochs 3.0 \ output_dir /tmp/mrpc_output/ You should see output like this: Eval results eval_accuracy 0.845588 eval_loss 0.505248 global_step 343 loss 0.505248 This means that the Dev set accuracy was 84.55%. Small sets like MRPC have a high variance in the Dev set accuracy, even when starting from the same pre training checkpoint. If you re run multiple times (making sure to point to different output_dir ), you should see results between 84% and 88%. A few other pre trained models are implemented off the shelf in run_classifier.py , so it should be straightforward to follow those examples to use BERT for any single sentence or sentence pair classification task. Note: You might see a message Running train on CPU . This really just means that it's running on something other than a Cloud TPU, which includes a GPU. Prediction from classifier Once you have trained your classifier you can use it in inference mode by using the do_predict true command. You need to have a file named test.tsv in the input folder. Output will be created in file called test_results.tsv in the output folder. Each line will contain output for each sample, columns are the class probabilities. shell export BERT_BASE_DIR /path/to/bert/uncased_L 12_H 768_A 12 export GLUE_DIR /path/to/glue export TRAINED_CLASSIFIER /path/to/fine/tuned/classifier python run_classifier.py \ task_name MRPC \ do_predict true \ data_dir $GLUE_DIR/MRPC \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $TRAINED_CLASSIFIER \ max_seq_length 128 \ output_dir /tmp/mrpc_output/ SQuAD The Stanford Question Answering Dataset (SQuAD) is a popular question answering benchmark dataset. BERT (at the time of the release) obtains state of the art results on SQuAD with almost no task specific network architecture modifications or data augmentation. However, it does require semi complex data pre processing and post processing to deal with (a) the variable length nature of SQuAD context paragraphs, and (b) the character level answer annotations which are used for SQuAD training. This processing is implemented and documented in run_squad.py . To run on SQuAD, you will first need to download the dataset. The SQuAD website does not seem to link to the v1.1 datasets any longer, but the necessary files can be found here: train v1.1.json dev v1.1.json evaluate v1.1.py Download these to some directory $SQUAD_DIR . The state of the art SQuAD results from the paper currently cannot be reproduced on a 12GB 16GB GPU due to memory constraints (in fact, even batch size 1 does not seem to fit on a 12GB GPU using BERT Large ). However, a reasonably strong BERT Base model can be trained on the GPU with these hyperparameters: shell python run_squad.py \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ do_train True \ train_file $SQUAD_DIR/train v1.1.json \ do_predict True \ predict_file $SQUAD_DIR/dev v1.1.json \ train_batch_size 12 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir /tmp/squad_base/ The dev set predictions will be saved into a file called predictions.json in the output_dir : shell python $SQUAD_DIR/evaluate v1.1.py $SQUAD_DIR/dev v1.1.json ./squad/predictions.json Which should produce an output like this: shell { f1 : 88.41249612335034, exact_match : 81.2488174077578} You should see a result similar to the 88.5% reported in the paper for BERT Base . If you have access to a Cloud TPU, you can train with BERT Large . Here is a set of hyperparameters (slightly different than the paper) which consistently obtain around 90.5% 91.0% F1 single system trained only on SQuAD: shell python run_squad.py \ vocab_file $BERT_LARGE_DIR/vocab.txt \ bert_config_file $BERT_LARGE_DIR/bert_config.json \ init_checkpoint $BERT_LARGE_DIR/bert_model.ckpt \ do_train True \ train_file $SQUAD_DIR/train v1.1.json \ do_predict True \ predict_file $SQUAD_DIR/dev v1.1.json \ train_batch_size 24 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir gs://some_bucket/squad_large/ \ use_tpu True \ tpu_name $TPU_NAME For example, one random run with these parameters produces the following Dev scores: shell { f1 : 90.87081895814865, exact_match : 84.38978240302744} If you fine tune for one epoch on TriviaQA before this the results will be even better, but you will need to convert TriviaQA into the SQuAD json format. Out of memory issues All experiments in the paper were fine tuned on a Cloud TPU, which has 64GB of device RAM. Therefore, when using a GPU with 12GB 16GB of RAM, you are likely to encounter out of memory issues if you use the same hyperparameters described in the paper. The factors that affect memory usage are: max_seq_length : The released models were trained with sequence lengths up to 512, but you can fine tune with a shorter max sequence length to save substantial memory. This is controlled by the max_seq_length flag in our example code. train_batch_size : The memory usage is also directly proportional to the batch size. Model type, BERT Base vs. BERT Large : The BERT Large model requires significantly more memory than BERT Base . Optimizer : The default optimizer for BERT is Adam, which requires a lot of extra memory to store the m and v vectors. Switching to a more memory efficient optimizer can reduce memory usage, but can also affect the results. We have not experimented with other optimizers for fine tuning. Using the default training scripts ( run_classifier.py and run_squad.py ), we benchmarked the maximum batch size on single Titan X GPU (12GB RAM) with TensorFlow 1.11.0: System Seq Length Max Batch Size BERT Base 64 64 ... 128 32 ... 256 16 ... 320 14 ... 384 12 ... 512 6 BERT Large 64 12 ... 128 6 ... 256 2 ... 320 1 ... 384 0 ... 512 0 Unfortunately, these max batch sizes for BERT Large are so small that they will actually harm the model accuracy, regardless of the learning rate used. We are working on adding code to this repository which will allow much larger effective batch sizes to be used on the GPU. The code will be based on one (or both) of the following techniques: Gradient accumulation : The samples in a minibatch are typically independent with respect to gradient computation (excluding batch normalization, which is not used here). This means that the gradients of multiple smaller minibatches can be accumulated before performing the weight update, and this will be exactly equivalent to a single larger update. Gradient checkpointing : The major use of GPU/TPU memory during DNN training is caching the intermediate activations in the forward pass that are necessary for efficient computation in the backward pass. Gradient checkpointing trades memory for compute time by re computing the activations in an intelligent way. However, this is not implemented in the current release. Using BERT to extract fixed feature vectors (like ELMo) In certain cases, rather than fine tuning the entire pre trained model end to end, it can be beneficial to obtained pre trained contextual embeddings , which are fixed contextual representations of each input token generated from the hidden layers of the pre trained model. This should also mitigate most of the out of memory issues. As an example, we include the script extract_features.py which can be used like this: shell Sentence A and Sentence B are separated by the delimiter for sentence pair tasks like question answering and entailment. For single sentence inputs, put one sentence per line and DON'T use the delimiter. echo 'Who was Jim Henson ? Jim Henson was a puppeteer' > /tmp/input.txt python extract_features.py \ input_file /tmp/input.txt \ output_file /tmp/output.jsonl \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ layers 1, 2, 3, 4 \ max_seq_length 128 \ batch_size 8 This will create a JSON file (one line per line of input) containing the BERT activations from each Transformer layer specified by layers ( 1 is the final hidden layer of the Transformer, etc.) Note that this script will produce very large output files (by default, around 15kb for every input token). If you need to maintain alignment between the original and tokenized words (for projecting training labels), see the Tokenization ( tokenization) section below. Note: You may see a message like Could not find trained model in model_dir: /tmp/tmpuB5g5c, running initialization to predict. This message is expected, it just means that we are using the init_from_checkpoint() API rather than the saved model API. If you don't specify a checkpoint or specify an invalid checkpoint, this script will complain. Tokenization For sentence level tasks (or sentence pair) tasks, tokenization is very simple. Just follow the example code in run_classifier.py and extract_features.py . The basic procedure for sentence level tasks is: 1. Instantiate an instance of tokenizer tokenization.FullTokenizer 2. Tokenize the raw text with tokens tokenizer.tokenize(raw_text) . 3. Truncate to the maximum sequence length. (You can use up to 512, but you probably want to use shorter if possible for memory and speed reasons.) 4. Add the CLS and SEP tokens in the right place. Word level and span level tasks (e.g., SQuAD and NER) are more complex, since you need to maintain alignment between your input text and output text so that you can project your training labels. SQuAD is a particularly complex example because the input labels are character based, and SQuAD paragraphs are often longer than our maximum sequence length. See the code in run_squad.py to show how we handle this. Before we describe the general recipe for handling word level tasks, it's important to understand what exactly our tokenizer is doing. It has three main steps: 1. Text normalization : Convert all whitespace characters to spaces, and (for the Uncased model) lowercase the input and strip out accent markers. E.g., John Johanson's, → john johanson's, . 2. Punctuation splitting : Split all punctuation characters on both sides (i.e., add whitespace around all punctuation characters). Punctuation characters are defined as (a) Anything with a P Unicode class, (b) any non letter/number/space ASCII character (e.g., characters like $ which are technically not punctuation). E.g., john johanson's, → john johanson ' s , 3. WordPiece tokenization : Apply whitespace tokenization to the output of the above procedure, and apply WordPiece tokenization to each token separately. (Our implementation is directly based on the one from tensor2tensor , which is linked). E.g., john johanson ' s , → john johan son ' s , The advantage of this scheme is that it is compatible with most existing English tokenizers. For example, imagine that you have a part of speech tagging task which looks like this: Input: John Johanson 's house Labels: NNP NNP POS NN The tokenized output will look like this: Tokens: john johan son ' s house Crucially, this would be the same output as if the raw text were John Johanson's house (with no space before the 's ). If you have a pre tokenized representation with word level annotations, you can simply tokenize each input word independently, and deterministically maintain an original to tokenized alignment: python Input orig_tokens John , Johanson , 's , house labels NNP , NNP , POS , NN Output bert_tokens Token map will be an int > int mapping between the orig_tokens index and the bert_tokens index. orig_to_tok_map tokenizer tokenization.FullTokenizer( vocab_file vocab_file, do_lower_case True) bert_tokens.append( CLS ) for orig_token in orig_tokens: orig_to_tok_map.append(len(bert_tokens)) bert_tokens.extend(tokenizer.tokenize(orig_token)) bert_tokens.append( SEP ) bert_tokens CLS , john , johan , son , ' , s , house , SEP orig_to_tok_map 1, 2, 4, 6 Now orig_to_tok_map can be used to project labels to the tokenized representation. There are common English tokenization schemes which will cause a slight mismatch between how BERT was pre trained. For example, if your input tokenization splits off contractions like do n't , this will cause a mismatch. If it is possible to do so, you should pre process your data to convert these back to raw looking text, but if it's not possible, this mismatch is likely not a big deal. Pre training with BERT We are releasing code to do masked LM and next sentence prediction on an arbitrary text corpus. Note that this is not the exact code that was used for the paper (the original code was written in C++, and had some additional complexity), but this code does generate pre training data as described in the paper. Here's how to run the data generation. The input is a plain text file, with one sentence per line. (It is important that these be actual sentences for the next sentence prediction task). Documents are delimited by empty lines. The output is a set of tf.train.Example s serialized into TFRecord file format. You can perform sentence segmentation with an off the shelf NLP toolkit such as spaCy . The create_pretraining_data.py script will concatenate segments until they reach the maximum sequence length to minimize computational waste from padding (see the script for more details). However, you may want to intentionally add a slight amount of noise to your input data (e.g., randomly truncate 2% of input segments) to make it more robust to non sentential input during fine tuning. This script stores all of the examples for the entire input file in memory, so for large data files you should shard the input file and call the script multiple times. (You can pass in a file glob to run_pretraining.py , e.g., tf_examples.tf_record .) The max_predictions_per_seq is the maximum number of masked LM predictions per sequence. You should set this to around max_seq_length masked_lm_prob (the script doesn't do that automatically because the exact value needs to be passed to both scripts). shell python create_pretraining_data.py \ input_file ./sample_text.txt \ output_file /tmp/tf_examples.tfrecord \ vocab_file $BERT_BASE_DIR/vocab.txt \ do_lower_case True \ max_seq_length 128 \ max_predictions_per_seq 20 \ masked_lm_prob 0.15 \ random_seed 12345 \ dupe_factor 5 Here's how to run the pre training. Do not include init_checkpoint if you are pre training from scratch. The model configuration (including vocab size) is specified in bert_config_file . This demo code only pre trains for a small number of steps (20), but in practice you will probably want to set num_train_steps to 10000 steps or more. The max_seq_length and max_predictions_per_seq parameters passed to run_pretraining.py must be the same as create_pretraining_data.py . shell python run_pretraining.py \ input_file /tmp/tf_examples.tfrecord \ output_dir /tmp/pretraining_output \ do_train True \ do_eval True \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ train_batch_size 32 \ max_seq_length 128 \ max_predictions_per_seq 20 \ num_train_steps 20 \ num_warmup_steps 10 \ learning_rate 2e 5 This will produce an output like this: Eval results global_step 20 loss 0.0979674 masked_lm_accuracy 0.985479 masked_lm_loss 0.0979328 next_sentence_accuracy 1.0 next_sentence_loss 3.45724e 05 Note that since our sample_text.txt file is very small, this example training will overfit that data in only a few steps and produce unrealistically high accuracy numbers. Pre training tips and caveats If your task has a large domain specific corpus available (e.g., movie reviews or scientific papers ), it will likely be beneficial to run additional steps of pre training on your corpus, starting from the BERT checkpoint. The learning rate we used in the paper was 1e 4. However, if you are doing additional steps of pre training starting from an existing BERT checkpoint, you should use a smaller learning rate (e.g., 2e 5). Current BERT models are English only, but we do plan to release a multilingual model which has been pre trained on a lot of languages in the near future (hopefully by the end of November 2018). Longer sequences are disproportionately expensive because attention is quadratic to the sequence length. In other words, a batch of 64 sequences of length 512 is much more expensive than a batch of 256 sequences of length 128. The fully connected/convolutional cost is the same, but the attention cost is far greater for the 512 length sequences. Therefore, one good recipe is to pre train for, say, 90,000 steps with a sequence length of 128 and then for 10,000 additional steps with a sequence length of 512. The very long sequences are mostly needed to learn positional embeddings, which can be learned fairly quickly. Note that this does require generating the data twice with different values of max_seq_length . If you are pre training from scratch, be prepared that pre training is computationally expensive, especially on GPUs. If you are pre training from scratch, our recommended recipe is to pre train a BERT Base on a single preemptible Cloud TPU v2 , which takes about 2 weeks at a cost of about $500 USD (based on the pricing in October 2018). You will have to scale down the batch size when only training on a single Cloud TPU, compared to what was used in the paper. It is recommended to use the largest batch size that fits into TPU memory. Pre training data We will not be able to release the pre processed datasets used in the paper. For Wikipedia, the recommended pre processing is to download the latest dump , extract the text with WikiExtractor.py , and then apply any necessary cleanup to convert it into plain text. Unfortunately the researchers who collected the BookCorpus no longer have it available for public download. The Project Guttenberg Dataset is a somewhat smaller (200M word) collection of older books that are public domain. Common Crawl is another very large collection of text, but you will likely have to do substantial pre processing and cleanup to extract a usable corpus for pre training BERT. Learning a new WordPiece vocabulary This repository does not include code for learning a new WordPiece vocabulary. The reason is that the code used in the paper was implemented in C++ with dependencies on Google's internal libraries. For English, it is almost always better to just start with our vocabulary and pre trained models. For learning vocabularies of other languages, there are a number of open source options available. However, keep in mind that these are not compatible with our tokenization.py library: Google's SentencePiece library tensor2tensor's WordPiece generation script Rico Sennrich's Byte Pair Encoding library Using BERT in Colab If you want to use BERT with Colab , you can get started with the notebook BERT FineTuning with Cloud TPUs . At the time of this writing (October 31st, 2018), Colab users can access a Cloud TPU completely for free. Note: One per user, availability limited, requires a Google Cloud Platform account with storage (although storage may be purchased with free credit for signing up with GCP), and this capability may not longer be available in the future. Click on the BERT Colab that was just linked for more information. FAQ Is this code compatible with Cloud TPUs? What about GPUs? Yes, all of the code in this repository works out of the box with CPU, GPU, and Cloud TPU. However, GPU training is single GPU only. I am getting out of memory errors, what is wrong? See the section on out of memory issues ( out of memory issues) for more information. Is there a PyTorch version available? There is no official PyTorch implementation. However, NLP researchers from HuggingFace made a PyTorch version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. We were not involved in the creation or maintenance of the PyTorch implementation so please direct any questions towards the authors of that repository. Is there a Chainer version available? There is no official Chainer implementation. However, Sosuke Kobayashi made a Chainer version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. We were not involved in the creation or maintenance of the Chainer implementation so please direct any questions towards the authors of that repository. Will models in other languages be released? Yes, we plan to release a multi lingual BERT model in the near future. We cannot make promises about exactly which languages will be included, but it will likely be a single model which includes most of the languages which have a significantly sized Wikipedia. Will models larger than BERT Large be released? So far we have not attempted to train anything larger than BERT Large . It is possible that we will release larger models if we are able to obtain significant improvements. What license is this library released under? All code and models are released under the Apache 2.0 license. See the LICENSE file for more information. How do I cite BERT? For now, cite the Arxiv paper : @article{devlin2018bert, title {BERT: Pre training of Deep Bidirectional Transformers for Language Understanding}, author {Devlin, Jacob and Chang, Ming Wei and Lee, Kenton and Toutanova, Kristina}, journal {arXiv preprint arXiv:1810.04805}, year {2018} } If we submit the paper to a conference or journal, we will update the BibTeX. Disclaimer This is not an official Google product. Contact information For help or issues using BERT, please submit a GitHub issue. For personal communication related to BERT, please contact Jacob Devlin ( jacobdevlin@google.com ), Ming Wei Chang ( mingweichang@google.com ), or Kenton Lee ( kentonl@google.com ).",Machine Translation,Machine Translation 2881,Natural Language Processing,Natural Language Processing,Natural Language Processing,w266 project Spring 2019 w266 Project Project References decaNLP\ QRNN\ stanford sentiment treebank\ humor in word embeddings\ categorical metadata representation for customized test classification\ sentiment analysis on epinions\ Applications/examples:\ Algorithmic trading: ascribe.com products:,Machine Translation,Machine Translation 2899,Natural Language Processing,Natural Language Processing,Natural Language Processing,"nncompress: Implementations of Embedding Quantization (Compress Word Embeddings) Thank you for your interest on our paper. I'm receieving mail basically everyday and happy to know many of you implemented the model correctly. I'm glad to debug your code or have discussion with you. Please do not hesitate to mail me for help. mail_address raph_ael@ua_ca.com .replace( _ , ) Requirements: numpy and tensorflow (I also have the pytorch implementation, which will be uploaded) Tutorial of the code 1. Download the project and prepare the data > git clone > cd neuralcompressor > bash scripts/download_glove_data.sh 2. Convert the Glove embeddings to numpy format > python scripts/convert_glove2numpy.py data/glove.6B.300d.txt 3. Train the embedding quantization model > python bin/quantize_embed.py M 32 K 16 train ... epoch198 train_loss 12.82 train_maxp 0.98 valid_loss 12.50 valid_maxp 0.98 bps 618 epoch199 train_loss 12.80 train_maxp 0.98 valid_loss 12.53 valid_maxp 0.98 bps 605 Training Done 4. Evaluate the averaged euclidean distance > python bin/quantize_embed.py M 32 K 16 evaluate Mean euclidean distance: 4.889592628145218 5. Export the word codes and the codebook matrix > python bin/quantize_embed.py M 32 K 16 export It will generate two files: data/mymodel.codes data/mymodel.codebook.npy 6. Check the codes > paste data/glove.6B.300d.word data/mymodel.codes head n 100 ... only 15 14 7 10 1 14 14 3 0 9 1 9 3 3 0 0 12 1 3 12 15 3 11 12 12 6 1 5 13 6 2 6 state 7 13 7 3 8 14 10 6 6 4 12 2 9 3 9 0 1 1 3 9 11 10 0 14 14 4 15 5 0 6 2 1 million 5 7 3 15 1 14 4 0 6 11 1 4 8 3 1 0 0 1 3 14 8 6 6 5 2 1 2 12 13 6 6 15 could 3 14 7 0 2 14 5 3 0 9 1 0 2 3 9 0 3 1 3 11 5 15 1 12 12 6 1 6 2 6 2 10 ... Use it in python python from nncompress import EmbeddingCompressor Load my embedding matrix matrix np.load( data/glove.6B.300d.npy ) Initialize the compressor compressor EmbeddingCompressor(32, 16, data/mymodel ) Train the quantization model compressor.train(matrix) Evaluate distance compressor.evaluate(matrix) print( Mean euclidean distance: , distance) Export the codes and codebook compressor.export(matrix, data/mymodel ) Citation @inproceedings{shu2018compressing, title {Compressing Word Embeddings via Deep Compositional Code Learning}, author {Raphael Shu and Hideki Nakayama}, booktitle {International Conference on Learning Representations (ICLR)}, year {2018}, url { } Arxiv version:",Machine Translation,Machine Translation 2906,Natural Language Processing,Natural Language Processing,Natural Language Processing,"zsw Modify make floder pretrained_model to store the model(for Chinese model: Links ) atfer download and unzip the chinese_L 12_H 768_A 12 to pretrained_model folder,can use command to run run_pretraining.py: python create_pretraining_data.py input_file ./sample_text.txt output_file ./sample.output vocab_file ./pretrained_model/chinese_L 12_H 768_A 12/vocab.txt random_seed 123 BERT \ \ \ \ \ New November 23rd, 2018: Un normalized multilingual model + Thai + Mongolian \ \ \ \ \ We uploaded a new multilingual model which does not perform any normalization on the input (no lower casing, accent stripping, or Unicode normalization), and additionally includes Thai and Mongolian. It is recommended to use this version for developing multilingual models, especially on languages with non Latin alphabets. This does not require any code changes, and can be downloaded here: BERT Base, Multilingual Cased : 104 languages, 12 layer, 768 hidden, 12 heads, 110M parameters \ \ \ \ \ New November 15th, 2018: SOTA SQuAD 2.0 System \ \ \ \ \ We released code changes to reproduce our 83% F1 SQuAD 2.0 system, which is currently 1st place on the leaderboard by 3%. See the SQuAD 2.0 section of the README for details. \ \ \ \ \ New November 5th, 2018: Third party PyTorch and Chainer versions of BERT available \ \ \ \ \ NLP researchers from HuggingFace made a PyTorch version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. Sosuke Kobayashi also made a Chainer version of BERT available (Thanks!) We were not involved in the creation or maintenance of the PyTorch implementation so please direct any questions towards the authors of that repository. \ \ \ \ \ New November 3rd, 2018: Multilingual and Chinese models available \ \ \ \ \ We have made two new BERT models available: BERT Base, Multilingual : 102 languages, 12 layer, 768 hidden, 12 heads, 110M parameters BERT Base, Chinese : Chinese Simplified and Traditional, 12 layer, 768 hidden, 12 heads, 110M parameters We use character based tokenization for Chinese, and WordPiece tokenization for all other languages. Both models should work out of the box without any code changes. We did update the implementation of BasicTokenizer in tokenization.py to support Chinese character tokenization, so please update if you forked it. However, we did not change the tokenization API. For more, see the Multilingual README . \ \ \ \ \ End new information \ \ \ \ \ Introduction BERT , or B idirectional E ncoder R epresentations from T ransformers, is a new method of pre training language representations which obtains state of the art results on a wide array of Natural Language Processing (NLP) tasks. Our academic paper which describes BERT in detail and provides full results on a number of tasks can be found here: To give a few numbers, here are the results on the SQuAD v1.1 question answering task: SQuAD v1.1 Leaderboard (Oct 8th 2018) Test EM Test F1 : : : : 1st Place Ensemble BERT 87.4 93.2 2nd Place Ensemble nlnet 86.0 91.7 1st Place Single Model BERT 85.1 91.8 2nd Place Single Model nlnet 83.5 90.1 And several natural language inference tasks: System MultiNLI Question NLI SWAG : : : : : : BERT 86.7 91.1 86.3 OpenAI GPT (Prev. SOTA) 82.2 88.1 75.0 Plus many other tasks. Moreover, these results were all obtained with almost no task specific neural network architecture design. If you already know what BERT is and you just want to get started, you can download the pre trained models ( pre trained models) and run a state of the art fine tuning ( fine tuning with bert) in only a few minutes. What is BERT? BERT is a method of pre training language representations, meaning that we train a general purpose language understanding model on a large text corpus (like Wikipedia), and then use that model for downstream NLP tasks that we care about (like question answering). BERT outperforms previous methods because it is the first unsupervised , deeply bidirectional system for pre training NLP. Unsupervised means that BERT was trained using only a plain text corpus, which is important because an enormous amount of plain text data is publicly available on the web in many languages. Pre trained representations can also either be context free or contextual , and contextual representations can further be unidirectional or bidirectional . Context free models such as word2vec or GloVe generate a single word embedding representation for each word in the vocabulary, so bank would have the same representation in bank deposit and river bank . Contextual models instead generate a representation of each word that is based on the other words in the sentence. BERT was built upon recent work in pre training contextual representations — including Semi supervised Sequence Learning , Generative Pre Training , ELMo , and ULMFit — but crucially these models are all unidirectional or shallowly bidirectional . This means that each word is only contextualized using the words to its left (or right). For example, in the sentence I made a bank deposit the unidirectional representation of bank is only based on I made a but not deposit . Some previous work does combine the representations from separate left context and right context models, but only in a shallow manner. BERT represents bank using both its left and right context — I made a ... deposit — starting from the very bottom of a deep neural network, so it is deeply bidirectional . BERT uses a simple approach for this: We mask out 15% of the words in the input, run the entire sequence through a deep bidirectional Transformer encoder, and then predict only the masked words. For example: Input: the man went to the MASK1 . he bought a MASK2 of milk. Labels: MASK1 store; MASK2 gallon In order to learn relationships between sentences, we also train on a simple task which can be generated from any monolingual corpus: Given two sentences A and B , is B the actual next sentence that comes after A , or just a random sentence from the corpus? Sentence A: the man went to the store . Sentence B: he bought a gallon of milk . Label: IsNextSentence Sentence A: the man went to the store . Sentence B: penguins are flightless . Label: NotNextSentence We then train a large model (12 layer to 24 layer Transformer) on a large corpus (Wikipedia + BookCorpus ) for a long time (1M update steps), and that's BERT. Using BERT has two stages: Pre training and fine tuning . Pre training is fairly expensive (four days on 4 to 16 Cloud TPUs), but is a one time procedure for each language (current models are English only, but multilingual models will be released in the near future). We are releasing a number of pre trained models from the paper which were pre trained at Google. Most NLP researchers will never need to pre train their own model from scratch. Fine tuning is inexpensive. All of the results in the paper can be replicated in at most 1 hour on a single Cloud TPU, or a few hours on a GPU, starting from the exact same pre trained model. SQuAD, for example, can be trained in around 30 minutes on a single Cloud TPU to achieve a Dev F1 score of 91.0%, which is the single system state of the art. The other important aspect of BERT is that it can be adapted to many types of NLP tasks very easily. In the paper, we demonstrate state of the art results on sentence level (e.g., SST 2), sentence pair level (e.g., MultiNLI), word level (e.g., NER), and span level (e.g., SQuAD) tasks with almost no task specific modifications. What has been released in this repository? We are releasing the following: TensorFlow code for the BERT model architecture (which is mostly a standard Transformer architecture). Pre trained checkpoints for both the lowercase and cased version of BERT Base and BERT Large from the paper. TensorFlow code for push button replication of the most important fine tuning experiments from the paper, including SQuAD, MultiNLI, and MRPC. All of the code in this repository works out of the box with CPU, GPU, and Cloud TPU. Pre trained models We are releasing the BERT Base and BERT Large models from the paper. Uncased means that the text has been lowercased before WordPiece tokenization, e.g., John Smith becomes john smith . The Uncased model also strips out any accent markers. Cased means that the true case and accent markers are preserved. Typically, the Uncased model is better unless you know that case information is important for your task (e.g., Named Entity Recognition or Part of Speech tagging). These models are all released under the same license as the source code (Apache 2.0). For information about the Multilingual and Chinese model, see the Multilingual README . When using a cased model, make sure to pass do_lower False to the training scripts. (Or pass do_lower_case False directly to FullTokenizer if you're using your own script.) The links to the models are here (right click, 'Save link as...' on the name): BERT Base, Uncased : 12 layer, 768 hidden, 12 heads, 110M parameters BERT Large, Uncased : 24 layer, 1024 hidden, 16 heads, 340M parameters BERT Base, Cased : 12 layer, 768 hidden, 12 heads , 110M parameters BERT Large, Cased : 24 layer, 1024 hidden, 16 heads, 340M parameters BERT Base, Multilingual Cased (New, recommended) : 104 languages, 12 layer, 768 hidden, 12 heads, 110M parameters BERT Base, Multilingual Uncased (Orig, not recommended) : 102 languages, 12 layer, 768 hidden, 12 heads, 110M parameters BERT Base, Chinese : Chinese Simplified and Traditional, 12 layer, 768 hidden, 12 heads, 110M parameters Each .zip file contains three items: A TensorFlow checkpoint ( bert_model.ckpt ) containing the pre trained weights (which is actually 3 files). A vocab file ( vocab.txt ) to map WordPiece to word id. A config file ( bert_config.json ) which specifies the hyperparameters of the model. Fine tuning with BERT Important : All results on the paper were fine tuned on a single Cloud TPU, which has 64GB of RAM. It is currently not possible to re produce most of the BERT Large results on the paper using a GPU with 12GB 16GB of RAM, because the maximum batch size that can fit in memory is too small. We are working on adding code to this repository which allows for much larger effective batch size on the GPU. See the section on out of memory issues ( out of memory issues) for more details. This code was tested with TensorFlow 1.11.0. It was tested with Python2 and Python3 (but more thoroughly with Python2, since this is what's used internally in Google). The fine tuning examples which use BERT Base should be able to run on a GPU that has at least 12GB of RAM using the hyperparameters given. Fine tuning with Cloud TPUs Most of the examples below assumes that you will be running training/evaluation on your local machine, using a GPU like a Titan X or GTX 1080. However, if you have access to a Cloud TPU that you want to train on, just add the following flags to run_classifier.py or run_squad.py : use_tpu True \ tpu_name $TPU_NAME Please see the Google Cloud TPU tutorial for how to use Cloud TPUs. Alternatively, you can use the Google Colab notebook BERT FineTuning with Cloud TPUs . On Cloud TPUs, the pretrained model and the output directory will need to be on Google Cloud Storage. For example, if you have a bucket named some_bucket , you might use the following flags instead: output_dir gs://some_bucket/my_output_dir/ The unzipped pre trained model files can also be found in the Google Cloud Storage folder gs://bert_models/2018_10_18 . For example: export BERT_BASE_DIR gs://bert_models/2018_10_18/uncased_L 12_H 768_A 12 Sentence (and sentence pair) classification tasks Before running this example you must download the GLUE data by running this script and unpack it to some directory $GLUE_DIR . Next, download the BERT Base checkpoint and unzip it to some directory $BERT_BASE_DIR . This example code fine tunes BERT Base on the Microsoft Research Paraphrase Corpus (MRPC) corpus, which only contains 3,600 examples and can fine tune in a few minutes on most GPUs. shell export BERT_BASE_DIR /path/to/bert/uncased_L 12_H 768_A 12 export GLUE_DIR /path/to/glue python run_classifier.py \ task_name MRPC \ do_train true \ do_eval true \ data_dir $GLUE_DIR/MRPC \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ max_seq_length 128 \ train_batch_size 32 \ learning_rate 2e 5 \ num_train_epochs 3.0 \ output_dir /tmp/mrpc_output/ You should see output like this: Eval results eval_accuracy 0.845588 eval_loss 0.505248 global_step 343 loss 0.505248 This means that the Dev set accuracy was 84.55%. Small sets like MRPC have a high variance in the Dev set accuracy, even when starting from the same pre training checkpoint. If you re run multiple times (making sure to point to different output_dir ), you should see results between 84% and 88%. A few other pre trained models are implemented off the shelf in run_classifier.py , so it should be straightforward to follow those examples to use BERT for any single sentence or sentence pair classification task. Note: You might see a message Running train on CPU . This really just means that it's running on something other than a Cloud TPU, which includes a GPU. Prediction from classifier Once you have trained your classifier you can use it in inference mode by using the do_predict true command. You need to have a file named test.tsv in the input folder. Output will be created in file called test_results.tsv in the output folder. Each line will contain output for each sample, columns are the class probabilities. shell export BERT_BASE_DIR /path/to/bert/uncased_L 12_H 768_A 12 export GLUE_DIR /path/to/glue export TRAINED_CLASSIFIER /path/to/fine/tuned/classifier python run_classifier.py \ task_name MRPC \ do_predict true \ data_dir $GLUE_DIR/MRPC \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $TRAINED_CLASSIFIER \ max_seq_length 128 \ output_dir /tmp/mrpc_output/ SQuAD 1.1 The Stanford Question Answering Dataset (SQuAD) is a popular question answering benchmark dataset. BERT (at the time of the release) obtains state of the art results on SQuAD with almost no task specific network architecture modifications or data augmentation. However, it does require semi complex data pre processing and post processing to deal with (a) the variable length nature of SQuAD context paragraphs, and (b) the character level answer annotations which are used for SQuAD training. This processing is implemented and documented in run_squad.py . To run on SQuAD, you will first need to download the dataset. The SQuAD website does not seem to link to the v1.1 datasets any longer, but the necessary files can be found here: train v1.1.json dev v1.1.json evaluate v1.1.py Download these to some directory $SQUAD_DIR . The state of the art SQuAD results from the paper currently cannot be reproduced on a 12GB 16GB GPU due to memory constraints (in fact, even batch size 1 does not seem to fit on a 12GB GPU using BERT Large ). However, a reasonably strong BERT Base model can be trained on the GPU with these hyperparameters: shell python run_squad.py \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ do_train True \ train_file $SQUAD_DIR/train v1.1.json \ do_predict True \ predict_file $SQUAD_DIR/dev v1.1.json \ train_batch_size 12 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir /tmp/squad_base/ The dev set predictions will be saved into a file called predictions.json in the output_dir : shell python $SQUAD_DIR/evaluate v1.1.py $SQUAD_DIR/dev v1.1.json ./squad/predictions.json Which should produce an output like this: shell { f1 : 88.41249612335034, exact_match : 81.2488174077578} You should see a result similar to the 88.5% reported in the paper for BERT Base . If you have access to a Cloud TPU, you can train with BERT Large . Here is a set of hyperparameters (slightly different than the paper) which consistently obtain around 90.5% 91.0% F1 single system trained only on SQuAD: shell python run_squad.py \ vocab_file $BERT_LARGE_DIR/vocab.txt \ bert_config_file $BERT_LARGE_DIR/bert_config.json \ init_checkpoint $BERT_LARGE_DIR/bert_model.ckpt \ do_train True \ train_file $SQUAD_DIR/train v1.1.json \ do_predict True \ predict_file $SQUAD_DIR/dev v1.1.json \ train_batch_size 24 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir gs://some_bucket/squad_large/ \ use_tpu True \ tpu_name $TPU_NAME For example, one random run with these parameters produces the following Dev scores: shell { f1 : 90.87081895814865, exact_match : 84.38978240302744} If you fine tune for one epoch on TriviaQA before this the results will be even better, but you will need to convert TriviaQA into the SQuAD json format. SQuAD 2.0 This model is also implemented and documented in run_squad.py . To run on SQuAD 2.0, you will first need to download the dataset. The necessary files can be found here: train v2.0.json dev v2.0.json evaluate v2.0.py Download these to some directory $SQUAD_DIR . On Cloud TPU you can run with BERT Large as follows: shell python run_squad.py \ vocab_file $BERT_LARGE_DIR/vocab.txt \ bert_config_file $BERT_LARGE_DIR/bert_config.json \ init_checkpoint $BERT_LARGE_DIR/bert_model.ckpt \ do_train True \ train_file $SQUAD_DIR/train v2.0.json \ do_predict True \ predict_file $SQUAD_DIR/dev v2.0.json \ train_batch_size 24 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir gs://some_bucket/squad_large/ \ use_tpu True \ tpu_name $TPU_NAME \ version_2_with_negative True We assume you have copied everything from the output directory to a local directory called ./squad/. The initial dev set predictions will be at ./squad/predictions.json and the differences between the score of no answer ( ) and the best non null answer for each question will be in the file ./squad/null_odds.json Run this script to tune a threshold for predicting null versus non null answers: python $SQUAD_DIR/evaluate v2.0.py $SQUAD_DIR/dev v2.0.json ./squad/predictions.json na prob file ./squad/null_odds.json Assume the script outputs best_f1_thresh THRESH. (Typical values are between 1.0 and 5.0). You can now re run the model to generate predictions with the derived threshold or alternatively you can extract the appropriate answers from ./squad/nbest_predictions.json. shell python run_squad.py \ vocab_file $BERT_LARGE_DIR/vocab.txt \ bert_config_file $BERT_LARGE_DIR/bert_config.json \ init_checkpoint $BERT_LARGE_DIR/bert_model.ckpt \ do_train False \ train_file $SQUAD_DIR/train v2.0.json \ do_predict True \ predict_file $SQUAD_DIR/dev v2.0.json \ train_batch_size 24 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir gs://some_bucket/squad_large/ \ use_tpu True \ tpu_name $TPU_NAME \ version_2_with_negative True \ null_score_diff_threshold $THRESH Out of memory issues All experiments in the paper were fine tuned on a Cloud TPU, which has 64GB of device RAM. Therefore, when using a GPU with 12GB 16GB of RAM, you are likely to encounter out of memory issues if you use the same hyperparameters described in the paper. The factors that affect memory usage are: max_seq_length : The released models were trained with sequence lengths up to 512, but you can fine tune with a shorter max sequence length to save substantial memory. This is controlled by the max_seq_length flag in our example code. train_batch_size : The memory usage is also directly proportional to the batch size. Model type, BERT Base vs. BERT Large : The BERT Large model requires significantly more memory than BERT Base . Optimizer : The default optimizer for BERT is Adam, which requires a lot of extra memory to store the m and v vectors. Switching to a more memory efficient optimizer can reduce memory usage, but can also affect the results. We have not experimented with other optimizers for fine tuning. Using the default training scripts ( run_classifier.py and run_squad.py ), we benchmarked the maximum batch size on single Titan X GPU (12GB RAM) with TensorFlow 1.11.0: System Seq Length Max Batch Size BERT Base 64 64 ... 128 32 ... 256 16 ... 320 14 ... 384 12 ... 512 6 BERT Large 64 12 ... 128 6 ... 256 2 ... 320 1 ... 384 0 ... 512 0 Unfortunately, these max batch sizes for BERT Large are so small that they will actually harm the model accuracy, regardless of the learning rate used. We are working on adding code to this repository which will allow much larger effective batch sizes to be used on the GPU. The code will be based on one (or both) of the following techniques: Gradient accumulation : The samples in a minibatch are typically independent with respect to gradient computation (excluding batch normalization, which is not used here). This means that the gradients of multiple smaller minibatches can be accumulated before performing the weight update, and this will be exactly equivalent to a single larger update. Gradient checkpointing : The major use of GPU/TPU memory during DNN training is caching the intermediate activations in the forward pass that are necessary for efficient computation in the backward pass. Gradient checkpointing trades memory for compute time by re computing the activations in an intelligent way. However, this is not implemented in the current release. Using BERT to extract fixed feature vectors (like ELMo) In certain cases, rather than fine tuning the entire pre trained model end to end, it can be beneficial to obtained pre trained contextual embeddings , which are fixed contextual representations of each input token generated from the hidden layers of the pre trained model. This should also mitigate most of the out of memory issues. As an example, we include the script extract_features.py which can be used like this: shell Sentence A and Sentence B are separated by the delimiter for sentence pair tasks like question answering and entailment. For single sentence inputs, put one sentence per line and DON'T use the delimiter. echo 'Who was Jim Henson ? Jim Henson was a puppeteer' > /tmp/input.txt python extract_features.py \ input_file /tmp/input.txt \ output_file /tmp/output.jsonl \ vocab_file $BERT_BASE_DIR/vocab.txt \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ layers 1, 2, 3, 4 \ max_seq_length 128 \ batch_size 8 This will create a JSON file (one line per line of input) containing the BERT activations from each Transformer layer specified by layers ( 1 is the final hidden layer of the Transformer, etc.) Note that this script will produce very large output files (by default, around 15kb for every input token). If you need to maintain alignment between the original and tokenized words (for projecting training labels), see the Tokenization ( tokenization) section below. Note: You may see a message like Could not find trained model in model_dir: /tmp/tmpuB5g5c, running initialization to predict. This message is expected, it just means that we are using the init_from_checkpoint() API rather than the saved model API. If you don't specify a checkpoint or specify an invalid checkpoint, this script will complain. Tokenization For sentence level tasks (or sentence pair) tasks, tokenization is very simple. Just follow the example code in run_classifier.py and extract_features.py . The basic procedure for sentence level tasks is: 1. Instantiate an instance of tokenizer tokenization.FullTokenizer 2. Tokenize the raw text with tokens tokenizer.tokenize(raw_text) . 3. Truncate to the maximum sequence length. (You can use up to 512, but you probably want to use shorter if possible for memory and speed reasons.) 4. Add the CLS and SEP tokens in the right place. Word level and span level tasks (e.g., SQuAD and NER) are more complex, since you need to maintain alignment between your input text and output text so that you can project your training labels. SQuAD is a particularly complex example because the input labels are character based, and SQuAD paragraphs are often longer than our maximum sequence length. See the code in run_squad.py to show how we handle this. Before we describe the general recipe for handling word level tasks, it's important to understand what exactly our tokenizer is doing. It has three main steps: 1. Text normalization : Convert all whitespace characters to spaces, and (for the Uncased model) lowercase the input and strip out accent markers. E.g., John Johanson's, → john johanson's, . 2. Punctuation splitting : Split all punctuation characters on both sides (i.e., add whitespace around all punctuation characters). Punctuation characters are defined as (a) Anything with a P Unicode class, (b) any non letter/number/space ASCII character (e.g., characters like $ which are technically not punctuation). E.g., john johanson's, → john johanson ' s , 3. WordPiece tokenization : Apply whitespace tokenization to the output of the above procedure, and apply WordPiece tokenization to each token separately. (Our implementation is directly based on the one from tensor2tensor , which is linked). E.g., john johanson ' s , → john johan son ' s , The advantage of this scheme is that it is compatible with most existing English tokenizers. For example, imagine that you have a part of speech tagging task which looks like this: Input: John Johanson 's house Labels: NNP NNP POS NN The tokenized output will look like this: Tokens: john johan son ' s house Crucially, this would be the same output as if the raw text were John Johanson's house (with no space before the 's ). If you have a pre tokenized representation with word level annotations, you can simply tokenize each input word independently, and deterministically maintain an original to tokenized alignment: python Input orig_tokens John , Johanson , 's , house labels NNP , NNP , POS , NN Output bert_tokens Token map will be an int > int mapping between the orig_tokens index and the bert_tokens index. orig_to_tok_map tokenizer tokenization.FullTokenizer( vocab_file vocab_file, do_lower_case True) bert_tokens.append( CLS ) for orig_token in orig_tokens: orig_to_tok_map.append(len(bert_tokens)) bert_tokens.extend(tokenizer.tokenize(orig_token)) bert_tokens.append( SEP ) bert_tokens CLS , john , johan , son , ' , s , house , SEP orig_to_tok_map 1, 2, 4, 6 Now orig_to_tok_map can be used to project labels to the tokenized representation. There are common English tokenization schemes which will cause a slight mismatch between how BERT was pre trained. For example, if your input tokenization splits off contractions like do n't , this will cause a mismatch. If it is possible to do so, you should pre process your data to convert these back to raw looking text, but if it's not possible, this mismatch is likely not a big deal. Pre training with BERT We are releasing code to do masked LM and next sentence prediction on an arbitrary text corpus. Note that this is not the exact code that was used for the paper (the original code was written in C++, and had some additional complexity), but this code does generate pre training data as described in the paper. Here's how to run the data generation. The input is a plain text file, with one sentence per line. (It is important that these be actual sentences for the next sentence prediction task). Documents are delimited by empty lines. The output is a set of tf.train.Example s serialized into TFRecord file format. You can perform sentence segmentation with an off the shelf NLP toolkit such as spaCy . The create_pretraining_data.py script will concatenate segments until they reach the maximum sequence length to minimize computational waste from padding (see the script for more details). However, you may want to intentionally add a slight amount of noise to your input data (e.g., randomly truncate 2% of input segments) to make it more robust to non sentential input during fine tuning. This script stores all of the examples for the entire input file in memory, so for large data files you should shard the input file and call the script multiple times. (You can pass in a file glob to run_pretraining.py , e.g., tf_examples.tf_record .) The max_predictions_per_seq is the maximum number of masked LM predictions per sequence. You should set this to around max_seq_length masked_lm_prob (the script doesn't do that automatically because the exact value needs to be passed to both scripts). shell python create_pretraining_data.py \ input_file ./sample_text.txt \ output_file /tmp/tf_examples.tfrecord \ vocab_file $BERT_BASE_DIR/vocab.txt \ do_lower_case True \ max_seq_length 128 \ max_predictions_per_seq 20 \ masked_lm_prob 0.15 \ random_seed 12345 \ dupe_factor 5 Here's how to run the pre training. Do not include init_checkpoint if you are pre training from scratch. The model configuration (including vocab size) is specified in bert_config_file . This demo code only pre trains for a small number of steps (20), but in practice you will probably want to set num_train_steps to 10000 steps or more. The max_seq_length and max_predictions_per_seq parameters passed to run_pretraining.py must be the same as create_pretraining_data.py . shell python run_pretraining.py \ input_file /tmp/tf_examples.tfrecord \ output_dir /tmp/pretraining_output \ do_train True \ do_eval True \ bert_config_file $BERT_BASE_DIR/bert_config.json \ init_checkpoint $BERT_BASE_DIR/bert_model.ckpt \ train_batch_size 32 \ max_seq_length 128 \ max_predictions_per_seq 20 \ num_train_steps 20 \ num_warmup_steps 10 \ learning_rate 2e 5 This will produce an output like this: Eval results global_step 20 loss 0.0979674 masked_lm_accuracy 0.985479 masked_lm_loss 0.0979328 next_sentence_accuracy 1.0 next_sentence_loss 3.45724e 05 Note that since our sample_text.txt file is very small, this example training will overfit that data in only a few steps and produce unrealistically high accuracy numbers. Pre training tips and caveats If using your own vocabulary, make sure to change vocab_size in bert_config.json . If you use a larger vocabulary without changing this, you will likely get NaNs when training on GPU or TPU due to unchecked out of bounds access. If your task has a large domain specific corpus available (e.g., movie reviews or scientific papers ), it will likely be beneficial to run additional steps of pre training on your corpus, starting from the BERT checkpoint. The learning rate we used in the paper was 1e 4. However, if you are doing additional steps of pre training starting from an existing BERT checkpoint, you should use a smaller learning rate (e.g., 2e 5). Current BERT models are English only, but we do plan to release a multilingual model which has been pre trained on a lot of languages in the near future (hopefully by the end of November 2018). Longer sequences are disproportionately expensive because attention is quadratic to the sequence length. In other words, a batch of 64 sequences of length 512 is much more expensive than a batch of 256 sequences of length 128. The fully connected/convolutional cost is the same, but the attention cost is far greater for the 512 length sequences. Therefore, one good recipe is to pre train for, say, 90,000 steps with a sequence length of 128 and then for 10,000 additional steps with a sequence length of 512. The very long sequences are mostly needed to learn positional embeddings, which can be learned fairly quickly. Note that this does require generating the data twice with different values of max_seq_length . If you are pre training from scratch, be prepared that pre training is computationally expensive, especially on GPUs. If you are pre training from scratch, our recommended recipe is to pre train a BERT Base on a single preemptible Cloud TPU v2 , which takes about 2 weeks at a cost of about $500 USD (based on the pricing in October 2018). You will have to scale down the batch size when only training on a single Cloud TPU, compared to what was used in the paper. It is recommended to use the largest batch size that fits into TPU memory. Pre training data We will not be able to release the pre processed datasets used in the paper. For Wikipedia, the recommended pre processing is to download the latest dump , extract the text with WikiExtractor.py , and then apply any necessary cleanup to convert it into plain text. Unfortunately the researchers who collected the BookCorpus no longer have it available for public download. The Project Guttenberg Dataset is a somewhat smaller (200M word) collection of older books that are public domain. Common Crawl is another very large collection of text, but you will likely have to do substantial pre processing and cleanup to extract a usable corpus for pre training BERT. Learning a new WordPiece vocabulary This repository does not include code for learning a new WordPiece vocabulary. The reason is that the code used in the paper was implemented in C++ with dependencies on Google's internal libraries. For English, it is almost always better to just start with our vocabulary and pre trained models. For learning vocabularies of other languages, there are a number of open source options available. However, keep in mind that these are not compatible with our tokenization.py library: Google's SentencePiece library tensor2tensor's WordPiece generation script Rico Sennrich's Byte Pair Encoding library Using BERT in Colab If you want to use BERT with Colab , you can get started with the notebook BERT FineTuning with Cloud TPUs . At the time of this writing (October 31st, 2018), Colab users can access a Cloud TPU completely for free. Note: One per user, availability limited, requires a Google Cloud Platform account with storage (although storage may be purchased with free credit for signing up with GCP), and this capability may not longer be available in the future. Click on the BERT Colab that was just linked for more information. FAQ Is this code compatible with Cloud TPUs? What about GPUs? Yes, all of the code in this repository works out of the box with CPU, GPU, and Cloud TPU. However, GPU training is single GPU only. I am getting out of memory errors, what is wrong? See the section on out of memory issues ( out of memory issues) for more information. Is there a PyTorch version available? There is no official PyTorch implementation. However, NLP researchers from HuggingFace made a PyTorch version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. We were not involved in the creation or maintenance of the PyTorch implementation so please direct any questions towards the authors of that repository. Is there a Chainer version available? There is no official Chainer implementation. However, Sosuke Kobayashi made a Chainer version of BERT available which is compatible with our pre trained checkpoints and is able to reproduce our results. We were not involved in the creation or maintenance of the Chainer implementation so please direct any questions towards the authors of that repository. Will models in other languages be released? Yes, we plan to release a multi lingual BERT model in the near future. We cannot make promises about exactly which languages will be included, but it will likely be a single model which includes most of the languages which have a significantly sized Wikipedia. Will models larger than BERT Large be released? So far we have not attempted to train anything larger than BERT Large . It is possible that we will release larger models if we are able to obtain significant improvements. What license is this library released under? All code and models are released under the Apache 2.0 license. See the LICENSE file for more information. How do I cite BERT? For now, cite the Arxiv paper : @article{devlin2018bert, title {BERT: Pre training of Deep Bidirectional Transformers for Language Understanding}, author {Devlin, Jacob and Chang, Ming Wei and Lee, Kenton and Toutanova, Kristina}, journal {arXiv preprint arXiv:1810.04805}, year {2018} } If we submit the paper to a conference or journal, we will update the BibTeX. Disclaimer This is not an official Google product. Contact information For help or issues using BERT, please submit a GitHub issue. For personal communication related to BERT, please contact Jacob Devlin ( jacobdevlin@google.com ), Ming Wei Chang ( mingweichang@google.com ), or Kenton Lee ( kentonl@google.com ).",Machine Translation,Machine Translation 2912,Natural Language Processing,Natural Language Processing,Natural Language Processing,"post Several models for POS Tagging Already implemented models: BPNN+CRF BiLSTM+CRF CHAR+BiLSTM+CRF Requirements txt python 3.6.5 pytorch 0.4.1 Usage Commands sh $ git clone $ cd post eg: BiLSTM+CHAR+CRF $ python run.py model lstm_char crf Arguments sh $ python run.py h usage: run.py h model {bpnn_crf,lstm_crf,char_lstm_crf} drop DROP batch_size BATCH_SIZE epochs EPOCHS interval INTERVAL eta ETA threads THREADS seed SEED file FILE Create several models for POS Tagging. optional arguments: h, help show this help message and exit model {bpnn_crf,lstm_crf,char_lstm_crf}, m {bpnn_crf,lstm_crf,char_lstm_crf} choose the model for POS Tagging drop DROP set the prob of dropout batch_size BATCH_SIZE set the size of batch epochs EPOCHS set the max num of epochs interval INTERVAL set the max interval to stop eta ETA set the learning rate of training threads THREADS, t THREADS set the max num of threads seed SEED, s SEED set the seed for generating random numbers file FILE, f FILE set where to store the model Structures python BPNN+CRF BPNN_CRF( (embed): Embedding(54304, 100) (hid): Sequential( (0): Linear(in_features 500, out_features 300, bias True) (1): ReLU() ) (out): Linear(in_features 300, out_features 32, bias True) (crf): CRF() (drop): Dropout(p 0.5) ) BiLSTM+CRF LSTM_CRF( (embed): Embedding(54304, 100) (lstm): LSTM(100, 150, batch_first True, bidirectional True) (out): Linear(in_features 300, out_features 32, bias True) (crf): CRF() (drop): Dropout(p 0.5) ) CHAR+BiLSTM+CRF CHAR_LSTM_CRF( (embed): Embedding(54304, 100) (char_lstm): CharLSTM( (embed): Embedding(7478, 100) (lstm): LSTM(100, 100, batch_first True, bidirectional True) ) (word_lstm): LSTM(300, 150, batch_first True, bidirectional True) (out): Linear(in_features 300, out_features 32, bias True) (crf): CRF() (drop): Dropout(p 0.5) ) References tagger LM LSTM CRF pytorch crf Neural Architectures for Named Entity Recognition Empower Sequence Labeling with Task Aware Neural Language Model",Machine Translation,Machine Translation 2059,Computer Vision,Computer Vision,Computer Vision,"Status: Archive (code is provided as is, no updates expected) Glow Code for reproducing results in Glow: Generative Flow with Invertible 1x1 Convolutions To use pretrained CelebA HQ model, make your own manipulation vectors and run our interactive demo, check demo folder. Requirements Tensorflow (tested with v1.8.0) Horovod (tested with v0.13.8) and (Open)MPI Run pip install r requirements.txt To setup (Open)MPI, check instructions on Horovod github page . Download datasets For small scale experiments, use MNIST/CIFAR 10 (directly downloaded by train.py using keras) For larger scale experiments, the datasets used are in the Google Cloud locations The dataset_names are below, we mention the exact preprocessing / downsampling method for a correct comparison of likelihood. Quantitative results imagenet oord 20GB. Unconditional ImageNet 32x32 and 64x64, as described in PixelRNN/RealNVP papers (we downloaded this processed version). lsun_realnvp 140GB. LSUN 96x96. Random 64x64 crops taken at processing time, as described in RealNVP. Qualitative results celeba 4GB. CelebA HQ 256x256 dataset, as described in Progressive growing of GAN's. For 1024x1024 version (120GB), use celeba full tfr.tar while downloading. imagenet 20GB. ImageNet 32x32 and 64x64 with class labels. Centre cropped, area downsampled. lsun 700GB. LSUN 256x256. Centre cropped, area downsampled. To download and extract celeb for example, run wget tar xvf celeb tfr.tar Change hps.data_dir in train.py file to point to the above folder (or use the data_dir flag when you run train.py) For lsun , since download can be quite big, you can instead follow the instructions in data_loaders/generate_tfr/lsun.py to generate the tfr file directly from LSUN images. church_outdoor will be the smallest category. Simple Train with 1 GPU Run wtih small depth to test CUDA_VISIBLE_DEVICES 0 python train.py depth 1 Train with multiple GPUs using MPI and Horovod Run default training script with 8 GPUs: mpiexec n 8 python train.py Ablation experiments mpiexec n 8 python train.py problem cifar10 image_size 32 n_level 3 depth 32 flow_permutation 0/1/2 flow_coupling 0/1 seed 0/1/2 learntop lr 0.001 Pretrained models, logs and samples wget CIFAR 10 Quantitative result mpiexec n 8 python train.py problem cifar10 image_size 32 n_level 3 depth 32 flow_permutation 2 flow_coupling 1 seed 0 learntop lr 0.001 n_bits_x 8 ImageNet 32x32 Quantitative result mpiexec n 8 python train.py problem imagenet oord image_size 32 n_level 3 depth 48 flow_permutation 2 flow_coupling 1 seed 0 learntop lr 0.001 n_bits_x 8 ImageNet 64x64 Quantitative result mpiexec n 8 python train.py problem imagenet oord image_size 64 n_level 4 depth 48 flow_permutation 2 flow_coupling 1 seed 0 learntop lr 0.001 n_bits_x 8 LSUN 64x64 Quantitative result mpiexec n 8 python train.py problem lsun_realnvp category bedroom/church_outdoor/tower image_size 64 n_level 3 depth 48 flow_permutation 2 flow_coupling 1 seed 0 learntop lr 0.001 n_bits_x 8 Pretrained models, logs and samples wget CelebA HQ 256x256 Qualitative result mpiexec n 40 python train.py problem celeba image_size 256 n_level 6 depth 32 flow_permutation 2 flow_coupling 0 seed 0 learntop lr 0.001 n_bits_x 5 LSUN 96x96 and 128x128 Qualitative result mpiexec n 40 python train.py problem lsun category bedroom/church_outdoor/tower image_size 96/128 n_level 5 depth 64 flow_permutation 2 flow_coupling 0 seed 0 learntop lr 0.001 n_bits_x 5 Logs and samples wget Conditional CIFAR 10 Qualitative result mpiexec n 8 python train.py problem cifar10 image_size 32 n_level 3 depth 32 flow_permutation 2 flow_coupling 0 seed 0 learntop lr 0.001 n_bits_x 5 ycond weight_y 0.01 Conditional ImageNet 32x32 Qualitative result mpiexec n 8 python train.py problem imagenet image_size 32 n_level 3 depth 48 flow_permutation 2 flow_coupling 0 seed 0 learntop lr 0.001 n_bits_x 5 ycond weight_y 0.01",Image Generation,Image Generation 2061,Computer Vision,Computer Vision,Computer Vision,"Status: Archive (code is provided as is, no updates expected) Improve Variational Inference with Inverse Autoregressive Flow Code for reproducing key results in the paper Improving Variational Inference with Inverse Autoregressive Flow by Diederik P. Kingma, Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, and Max Welling. Prerequisites 1. Make sure that recent versions installed of: Python (version 2.7 or higher) Numpy (e.g. pip install numpy ) Theano (e.g. pip install Theano ) 2. Set floatX float32 in the global section of Theano config (usually /.theanorc ). Alternatively you could prepend THEANO_FLAGS floatX float32 to the python commands below. 3. Clone this repository, e.g.: sh git clone 4. Download the CIFAR 10 dataset (get the Python version) and create an environment variable CIFAR10_PATH that points to the subdirectory with CIFAR 10 data. For example: sh export CIFAR10_PATH $HOME/cifar 10 Syntax of train.py Example: sh python train.py with problem cifar10 n_z 32 n_h 64 depths 2,2,2 margs.depth_ar 1 margs.posterior down_iaf2_NL margs.kl_min 0.25 problem is the problem (dataset) to train on. I only tested cifar10 for this release. n_z is the number of stochastic featuremaps in each layer. n_h is the number of deterministic featuremaps used throughout the model. depths is an array of integers that denotes the depths of the levels in the model. Each level is a sequence of layers. Each subsequent level operates over spatially smaller featuremaps. In case of CIFAR 10, the first level operates over 16x16 featuremaps, the second over 8x8 featuremaps, etc. Some possible choices for margs.posterior are: up_diag : bottom up factorized Gaussian up_iaf1_nl : bottom up IAF, mean only perturbation up_iaf2_nl : bottom up IAF down_diag : top down factorized Gaussian down_iaf1_nl : top down IAF, mean only perturbation down_iaf2_nl : top down IAF margs.depth_ar is the number of hidden layers within IAF, and can be any non negative integer. margs.kl_min : the minimum information constraint. Should be a non negative float (where 0 is no constraint). Results of Table 3 (3.28 bits/dim) sh python train.py with problem cifar10 n_h 160 depths 10,10 margs.depth_ar 2 margs.posterior down_iaf2_nl margs.prior diag margs.kl_min 0.25 More instructions will follow. Multi GPU TensorFlow implementation Prerequisites Make sure that recent versions installed of: Python (version 2.7 or higher) TensorFlow tqdm CIFAR10_PATH environment variable should point to the dataset location. Syntax of tf_train.py Training script: sh python tf_train.py logdir hpconfig depth 1,num_blocks 20,kl_min 0.1,learning_rate 0.002,batch_size 32 num_gpus 8 mode train It will run the training procedure on a given number of GPUs. Model checkpoints will be stored in /train directory along with TensorBoard summaries that are useful for monitoring and debugging issues. Evaluation script: sh python tf_train.py logdir hpconfig depth 1,num_blocks 20,kl_min 0.1,learning_rate 0.002,batch_size 32 num_gpus 1 mode eval_test It will run the evaluation on the test set using a single GPU and will produce TensorBoard summary with the results and generated samples. To start TensorBoard: sh tensorboard logdir For the description of hyper parameters, take a look at get_default_hparams function in tf_train.py . Loading from the checkpoint The best IAF model trained on CIFAR 10 reached 3.15 bits/dim when evaluated with a single sample. With 10,000 samples, the estimation of log likelihood is 3.111 bits/dim. The checkpoint is available at link . Steps to use it: download the file create directory /train/ and copy the checkpoint there run the following command: sh python tf_train.py logdir hpconfig depth 1,num_blocks 20,kl_min 0.1,learning_rate 0.002,batch_size 32 num_gpus 1 mode eval_test The script will run the evaluation on the test set and generate samples stored in TensorFlow events file that can be accessed using TensorBoard.",Image Generation,Image Generation 2062,Computer Vision,Computer Vision,Computer Vision,"Status: Archive (code is provided as is, no updates expected) pixel cnn++ This is a Python3 / Tensorflow implementation of PixelCNN++ , as described in the following paper: PixelCNN++: A PixelCNN Implementation with Discretized Logistic Mixture Likelihood and Other Modifications , by Tim Salimans, Andrej Karpathy, Xi Chen, Diederik P. Kingma, and Yaroslav Bulatov. Our work builds on PixelCNNs that were originally proposed in van der Oord et al. in June 2016. PixelCNNs are a class of powerful generative models with tractable likelihood that are also easy to sample from. The core convolutional neural network computes a probability distribution over a value of one pixel conditioned on the values of pixels to the left and above it. Below are example samples from a model trained on CIFAR 10 that achieves 2.92 bits per dimension (compared to 3.03 of the PixelCNN in van der Oord et al.): Samples from the model ( left ) and samples from a model that is conditioned on the CIFAR 10 class labels ( right ): ! Improved PixelCNN papers (data/pixelcnn_samples.png) This code supports multi GPU training of our improved PixelCNN on CIFAR 10 and Small ImageNet , but is easy to adapt for additional datasets. Training on a machine with 8 Maxwell TITAN X GPUs achieves 3.0 bits per dimension in about 10 hours and it takes approximately 5 days to converge to 2.92. Setup To run this code you need the following: a machine with multiple GPUs Python3 Numpy, TensorFlow and imageio packages: pip install numpy tensorflow gpu imageio Training the model Use the train.py script to train the model. To train the default model on CIFAR 10 simply use: python3 train.py You might want to at least change the data_dir and save_dir which point to paths on your system to download the data to (if not available), and where to save the checkpoints. I want to train on fewer GPUs . To train on fewer GPUs we recommend using CUDA_VISIBLE_DEVICES to narrow the visibility of GPUs to only a few and then run the script. Don't forget to modulate the flag nr_gpu accordingly. I want to train on my own dataset . Have a look at the DataLoader classes in the data/ folder. You have to write an analogous data iterator object for your own dataset and the code should work well from there. Pretrained model checkpoint You can download our pretrained (TensorFlow) model that achieves 2.92 bpd on CIFAR 10 here (656MB). Citation If you find this code useful please cite us in your work: @inproceedings{Salimans2017PixeCNN, title {PixelCNN++: A PixelCNN Implementation with Discretized Logistic Mixture Likelihood and Other Modifications}, author {Tim Salimans and Andrej Karpathy and Xi Chen and Diederik P. Kingma}, booktitle {ICLR}, year {2017} }",Image Generation,Image Generation 2084,Computer Vision,Computer Vision,Computer Vision,"Nonlinear Independent Components Estimation An implementation of the NICE model from Dinh et al (2014) in PyTorch. I was only able to find the original theano based repo from the first author, and I figured it would be good practice to re implement the architecture in PyTorch. Please cite the paper by the original authors and credit them (not me or this repo) if any of the code in this repo ends up being useful to you in a publication: NICE: Non linear independent components estimation , Laurent Dinh, David Krueger, Yoshua Bengio. ArXiv 2014. Requirements PyTorch 0.4.1+ NumPy 1.14.5+ tqdm 4.15.0+ (though any version should work we primarily just use the main tqdm and trange wrappers.) Benchmarks We plan to use the same four datasets as in the original paper (MNIST, TFD, SVHN, and CIFAR 10) and attempt to reproduce the results in the paper. At present, MNIST, SVHN, and CIFAR10 are supported; TFD is a bit harder to get access to (due to privacy issues regarding the faces, etc.) Running python make_datasets.py will download the relevant dataset and store it in the appropriate directory the first time you run it; subsequent runs will not re download the datasets if they already exist. Additionally, the ZCA matrices will be computed for the relevant datasets that require them (CIFAR10, SVHN). (TBD: comparisons to original repo & paper results here once I find the time to run on 1500 epochs.) License The license for this repository is the 3 clause BSD, as in the theano based implementation. Status Training on MNIST, CIFAR10, SVHN currently work; trained models can be sampled via python sample.py args . Training on GPU currently works. (Sampling is still CPU only, but this is by design.) Benchmarks are still forthcoming. Toronto Face Dataset support is still something I'm considering if I can find a place to download it. Future To Do List + Implement inpainting from trained model. + Toronto Face Dataset? (See remark about privacy issues above) + Implement affine coupling law + Allow arbitrary partitions of the input in coupling layers?",Image Generation,Image Generation 2085,Computer Vision,Computer Vision,Computer Vision,"Unsupervised Anomaly Detection using Generative Adversarial Network on medical X Ray image Article: Data MURA data set Public, detect abnormality in X Ray images. Model Bidirection GAN / ALI : / Alpha GAN (VAE + GAN): Approach Leveraging the ability to unsupervisedly learned the structure of data to generate realisitic image, this experiments aims to use that ability to perform binary classification when only trained on one class. Usage Run python main.py help for full detail. Example: python main.py batch_size 128 imsize 64 dataset mura adv_loss inverse version sabigan_wrist image_path /datasets/ use_tensorboard true mura_class XR_WRIST mura_type negative How: Train GAN model with the ability to inference on the latent variable (VAE+GAN / BiGAN) on only 'negative class' Let the model learn until it can generate good looking images. Use the Encoder, Generator, Discriminator outputs and hidden features to calculate 'Reconstruction loss' and 'Feature matching' loss. Classify into 'negative' or 'positive' based on the score above. References: Thank for great examples.",Image Generation,Image Generation 2088,Computer Vision,Computer Vision,Computer Vision,"PixelCNN++ A Pytorch Implementation of PixelCNN++. Main work taken from the official implementation Pre trained models are available here I kept the code structure to facilitate comparison with the official code. The code achieves 2.95 BPD on test set, compared to 2.92 BPD on the official tensorflow implementation. Running the code python main.py Differences with official implementation 1. No data dependant weight initialization 2. No exponential moving average of past models for test set evalutation Contact For questions / comments / requests, feel free to send me an email.\ Happy generative modelling :)",Image Generation,Image Generation 2095,Computer Vision,Computer Vision,Computer Vision,Earth landscpe generator based on StyleGAN research by NVIDIA > ! alt text (/../master/example_images.png?raw true),Image Generation,Image Generation 2101,Computer Vision,Computer Vision,Computer Vision,"Chainer implementation of Style based Generator A Style Based Generator Architecture for Generative Adversarial Networks Requirements opencv python python gflags Augmentor h5py Pillow scipy mpi4py chainer > 5.0.0 cupy > 5.0.0 NVIDIA driver 391.35 or newer, CUDA toolkit 9.0 or newer, cuDNN 7.3.1 or newer. NCCL2 A graphic card with at least 11GB memory to train the 1024x1024 model. Tested on 8 Tesla P100. Datasets 1. Please follow ffhq dataset to obtain the ffhq dataset. python download_ffhq.py h i 2. Convert raw ffhq images to a HDF5 file. (Around 198GB) cd src/hdf5_tools bash folder_to_multisize_hdf5_cmds.sh 1 YOUR_PATH_TO_RAW_FFHQ_IMAGES Run 8 GPUs setting cd src/stylegan bash run_ffhq.sh 2 1 GPU setting (up to 256x256) cd src/stylegan bash run_ffhq.sh 1 Pre trained Model on FFHQ GDrive Sampling images on CPU python sampling.py m_style SmoothedGenerator_405000.npz m_mapping SmoothedMapping_405000.npz gpu 1 Results ! samples",Image Generation,Image Generation 2102,Computer Vision,Computer Vision,Computer Vision,"// : sngans : pcgans : GANs with spectral normalization and projection discriminator NOTE: The setup and example code in this README are for training GANs on single GPU . The models are smaller than the ones used in the papers . Please go to link if you are looking for how to reproduce the results in the papers. Official Chainer implementation for conditional image generation on ILSVRC2012 dataset (ImageNet) with spectral normalization sngans and projection discrimiantor pcgans . Demo movies Consecutive category morphing movies: (5x5 panels 128px images) (10x10 panels 128px images) Other materials Generated images from the model trained on all ImageNet images (1K categories): 128px from the model trained on dog and cat images (143 categories): 64px 128px 256px Pretrained models Movies 4 corners category morph. References Takeru Miyato, Toshiki Kataoka, Masanori Koyama, Yuichi Yoshida. Spectral Normalization for Generative Adversarial Networks . ICLR2018. OpenReview sngans Takeru Miyato, Masanori Koyama. cGANs with Projection Discriminator . ICLR2018. OpenReview pcgans Setup Install required python libraries: pip install r requirements.txt Download ImageNet dataset: Please download ILSVRC2012 dataset from Preprocess dataset: cd datasets IMAGENET_TRAIN_DIR /path/to/imagenet/train/ path to the parent directory of category directories named n0 . PREPROCESSED_DATA_DIR /path/to/save_dir/ bash preprocess.sh $IMAGENET_TRAIN_DIR $PREPROCESSED_DATA_DIR Make the list of image label pairs for all images (1000 categories, 1281167 images). python imagenet.py $PREPROCESSED_DATA_DIR Make the list of image label pairs for dog and cat images (143 categories, 180373 images). python imagenet_dog_and_cat.py $PREPROCESSED_DATA_DIR Download inception model: python source/inception/download.py outfile datasets/inception_model Training examples Spectral normalization + projection discriminator for 64x64 dog and cat images: LOGDIR /path/to/logdir CONFIG configs/sn_projection_dog_and_cat_64.yml python train.py config $CONFIG results_dir $LOGDIR data_dir $PREPROCESSED_DATA_DIR pretrained model generated images at 250K iterations Examples of 64x64 generated images: Spectral normalization + projection discriminator for 64x64 all ImageNet images: LOGDIR /path/to/logdir CONFIG configs/sn_projection_64.yml python train.py config $CONFIG results_dir $LOGDIR data_dir $PREPROCESSED_DATA_DIR Evaluation examples (If you want to use pretrained models for the image generation, please download the model from link and set the snapshot argument to the path to the downloaded pretrained model file (.npz).) Generate images python evaluations/gen_images.py config $CONFIG snapshot ${LOGDIR}/ResNetGenerator_ .npz results_dir ${LOGDIR}/gen_images Generate category morphing images Regarding the index category correspondence, please see 1K ImageNet or 143 dog and cat ImageNet . python evaluations/gen_interpolated_images.py n_zs 10 n_intp 10 classes $CATEGORY1 $CATEGORY2 config $CONFIG snapshot ${LOGDIR}/ResNetGenerator_ .npz results_dir ${LOGDIR}/gen_morphing_images Calculate inception score (with the original OpenAI implementation) python evaluations/calc_inception_score.py config $CONFIG snapshot ${LOGDIR}/ResNetGenerator_ .npz results_dir ${LOGDIR}/inception_score splits 10 tf",Image Generation,Image Generation 2128,Computer Vision,Computer Vision,Computer Vision,Style GAN Unofficial Pytorch implementation of Style GAN paper A Style Based Generator Architecture for Generative Adversarial Networks Original Tensorflow code: Generation example (4 x Titan X for 8 hours): To install requirements: python pip install r requirements.txt To download and prepare dataset: python python prepare_celeba.py python downscale_celeba.py To train: python python StyleGAN.py,Image Generation,Image Generation 2180,Computer Vision,Computer Vision,Computer Vision,"Gan implemented using pytorch 2d mixtures: An implementation of the original wgan algorithm. The parameters and utility functions are copied from It just worked as reported. 2d mixtures improved: An implementation of the improved wgan algorithm using gradient norm penalty as described in this paper: Initially, I used lambda 10 as suggested by the author. The gan network behaved like a standard gan with mode collapse. Tuning the optimizer parameters did not mitigate the problem. Then I changed lambda to 1 and it worked very nicely. The results are much better than the original wgan.",Image Generation,Image Generation 2192,Computer Vision,Computer Vision,Computer Vision,"These codes are useful for evaluating GANs for video generation. In order to make the codes can be used more generally, I designed the evaluation flow as follows. Main features 1. Convert video samples to convolutional features (embeddings) of the inception model. I chose ResNet 101 trained with UCF 101 dataset as inception model. I borrowed the model and codes from video classification 3d cnn pytorch . You can use another models available on the repository. 2. Perform evaluation. Various metrics for GANs are available. x Inception Score 1 x Frechet Inception Distace 2 x Precision and Recall for Distributions 3 Requirements Python3 Pytorch FFmpeg Getting Started 1. Install dependencies I strongly recommend to use conda environment. For example, my environment is like following: pyenv install miniconda latest pyenv local miniconda latest conda intall ffmpeg pip install r requirements.txt 2. Download pretrained weights of the inception model Next, download pretrained weights from here . Save resnet 101 kinetics ucf101_split1.pth to under models/weights/ . 3. Prepare your dataset or generated samples in a direcotory The evaluation codes in this repository is implemented to receive a path as input and read all .mp4 files under the directory. Therefore, first of all you must save the dataset samples or generated samples by your model to a directory in mp4 format. 4. Convert video samples to convolutional features Before evaluation, you need to convert the video samples using the Inception Model. In the first place, the Inception Score has to calculate the probabilities of each class output by the video classifier. In addtion, it has been pointed out that other metrics can be evaluated more accurately by treating the sample as an intermediate layer feature (convolutional feature) input to the Inception Model than by treating it as a pixel space feature 4 . So this is a standard procedure. To complete the above procedure, do the following: python compute_conv_features.py batchsize compute_conv_features.py reads all of mp4 files in , and convert them to convolutional features and probabilities of each class. The result is outputted to features.npy , probs.npy under 5. Calculate evaluation score ! Finally, you can peform evaluation using evaluate.py . The program will read the necessary npy files accordingly and perform the evaluation by passing in 3rd step as input. For example, the Inception Score can be calculated from a single set of video samples, and can be performed as follows: shell python evaluate.py is Other metrics, such as the Frechet Inception Distance and Precision and Recall for Distributions , are calculated using a pair of dataset samples and generated samples, and can be performed as follows:. shell python evaluate.py fid o result.json python evaluate.py prd o result.json 6. Visualize results You can also use visualization code if necessary. Especially for PRD, you need to plot the precision recall curve to get the result. You can plot multiple evaluations together and save as image by doing the following. python plot.py prd FAQ Not available yet. Reference 1 Improved Techniques for Training GANs , 2 GANs Trained by a Two Time Scale Update Rule Converge to a Local Nash Equilibrium , 3 Assessing Generative Models via Precision and Recall , 4 An empirical study on evaluation metrics of generative adversarial networks , Credit Icons made by Smashicons from www.flaticon.com is licensed by CC 3.0 BY .",Image Generation,Image Generation 2198,Computer Vision,Computer Vision,Computer Vision,"Turing Generative Adversarial Network Turing GANs are quick to train! This excited me to write my own versions in PyTorch which is based on the original Keras implementation . Thanks Jianlin Su , creator of Turing GANs , for suggesting this code as PyTorch implementation of Turing GANs ! Experiments So, following are my experiments' resulting image data. Note For all the experiments the images shown below are sampled after 100K iterations of training the Turing GAN on various datasets. All the experiments used spectral normalization for 1 Lipschitz contraint enforcement. I trained all of the Turing GANs with both Jensen Shannon and Wasserstein divergences. All experiments were performed with same hyper parameters as devised in paper. Using 32 sized Turing GAN I performed experiments on the following dataset(s): CIFAR 10 MNIST Fashion MNIST CIFAR 10 Turing Standard GAN (Left) Turing Wasserstein GAN (Right) Both Spectrally Normalized ! ! MNIST Turing Standard GAN (Left) Turing Wasserstein GAN (Right) Both Spectrally Normalized ! ! Fashion MNIST Turing Standard GAN (Left) Turing Wasserstein GAN (Right) Both Spectrally Normalized ! ! References Training Generative Adversarial Networks Via Turing Test arXiv Original T GANs implementation Spectral Normalization for Generative Adversarial Networks arXiv Spectral Normalization implementation in PyTorch Contact Reach me at rahulbhalley@icloud.com .",Image Generation,Image Generation 2204,Computer Vision,Computer Vision,Computer Vision,"Progressive Growing of GANs for Improved Quality, Stability, and Variation – Official TensorFlow implementation of the ICLR 2018 paper Tero Karras (NVIDIA), Timo Aila (NVIDIA), Samuli Laine (NVIDIA), Jaakko Lehtinen (NVIDIA and Aalto University) For business inquiries, please contact researchinquiries@nvidia.com (mailto:researchinquiries@nvidia.com) For press and other inquiries, please contact Hector Marinez at hmarinez@nvidia.com (mailto:hmarinez@nvidia.com) ! Representative image Picture: Two imaginary celebrities that were dreamed up by a random number generator. Abstract: We describe a new training methodology for generative adversarial networks. The key idea is to grow both the generator and discriminator progressively: starting from a low resolution, we add new layers that model increasingly fine details as training progresses. This both speeds the training up and greatly stabilizes it, allowing us to produce images of unprecedented quality, e.g., CelebA images at 1024². We also propose a simple way to increase the variation in generated images, and achieve a record inception score of 8.80 in unsupervised CIFAR10. Additionally, we describe several implementation details that are important for discouraging unhealthy competition between the generator and discriminator. Finally, we suggest a new metric for evaluating GAN results, both in terms of image quality and variation. As an additional contribution, we construct a higher quality version of the CelebA dataset. Resources Paper (NVIDIA research) Paper (arXiv) Result video (YouTube) Additional material (Google Drive) ICLR 2018 poster ( karras2018iclr poster.pdf ) ICLR 2018 slides ( karras2018iclr slides.pptx ) Representative images ( images/representative images ) High quality video clips ( videos/high quality video clips ) Huge collection of non curated images for each dataset ( images/100k generated images ) Extensive video of random interpolations for each dataset ( videos/one hour of random interpolations ) Pre trained networks ( networks/tensorflow version ) Minimal example script for importing the pre trained networks ( networks/tensorflow version/example_import_script ) Data files needed to reconstruct the CelebA HQ dataset ( datasets/celeba hq deltas ) Example training logs and progress snapshots ( networks/tensorflow version/example_training_runs ) All the material, including source code, is made freely available for non commercial use under the Creative Commons CC BY NC 4.0 license. Feel free to use any of the material in your own work, as long as you give us appropriate credit by mentioning the title and author list of our paper. Versions There are two different versions of the source code. The TensorFlow version is newer and more polished, and we generally recommend it as a starting point if you are looking to experiment with our technique, build upon it, or apply it to novel datasets. The original Theano version , on the other hand, is what we used to produce all the results shown in our paper. We recommend using it if – and only if – you are looking to reproduce our exact results for benchmark datasets like CIFAR 10, MNIST RGB, and CelebA. The main differences are summarized in the following table: Feature TensorFlow version Original Theano version : : : : : Branch master (this branch) original theano version Multi GPU support Yes No FP16 mixed precision support Yes No Performance High Low Training time for CelebA HQ 2 days (8 GPUs) 2 weeks (1 GPU) 1–2 months Repro CelebA HQ results Yes – very close Yes – identical Repro LSUN results Yes – very close Yes – identical Repro CIFAR 10 results No Yes – identical Repro MNIST mode recovery No Yes – identical Repro ablation study (Table 1) No Yes – identical Dataset format TFRecords HDF5 Backwards compatibility Can import networks trained with Theano N/A Code quality Reasonable Somewhat messy Code status In active use No longer maintained System requirements Both Linux and Windows are supported, but we strongly recommend Linux for performance and compatibility reasons. 64 bit Python 3.6 installation with numpy 1.13.3 or newer. We recommend Anaconda3. One or more high end NVIDIA Pascal or Volta GPUs with 16GB of DRAM. We recommend NVIDIA DGX 1 with 8 Tesla V100 GPUs. NVIDIA driver 391.25 or newer, CUDA toolkit 9.0 or newer, cuDNN 7.1.2 or newer. Additional Python packages listed in requirements pip.txt Importing and using pre trained networks All pre trained networks found on Google Drive, as well as ones produced by the training script, are stored as Python PKL files. They can be imported using the standard pickle mechanism as long as two conditions are met: (1) The directory containing the Progressive GAN code repository must be included in the PYTHONPATH environment variable, and (2) a tf.Session() object must have been created beforehand and set as default. Each PKL file contains 3 instances of tfutil.Network : Import official CelebA HQ networks. with open('karras2018iclr celebahq 1024x1024.pkl', 'rb') as file: G, D, Gs pickle.load(file) G Instantaneous snapshot of the generator, mainly useful for resuming a previous training run. D Instantaneous snapshot of the discriminator, mainly useful for resuming a previous training run. Gs Long term average of the generator, yielding higher quality results than the instantaneous snapshot. It is also possible to import networks that were produced using the Theano implementation, as long as they do not employ any features that are not natively supported by the TensorFlow version (minibatch discrimination, batch normalization, etc.). To enable Theano network import, however, you must use misc.load_pkl() in place of pickle.load() : Import Theano versions of the official CelebA HQ networks. import misc G, D, Gs misc.load_pkl('200 celebahq 1024x1024/network final.pkl') Once you have imported the networks, you can call Gs.run() to produce a set of images for given latent vectors, or Gs.get_output_for() to include the generator network in a larger TensorFlow expression. For further details, please consult the example script found on Google Drive. Instructions: 1. Pull the Progressive GAN code repository and add it to your PYTHONPATH environment variable. 2. Install the required Python packages with pip install r requirements pip.txt 2. Download import_example.py from networks/tensorflow version/example_import_script 3. Download karras2018iclr celebahq 1024x1024.pkl from networks/tensorflow version and place it in the same directory as the script. 5. Run the script with python import_example.py 6. If everything goes well, the script should generate 10 PNG images ( img0.png – img9.png ) that match the ones found in networks/tensorflow version/example_import_script exactly. Preparing datasets for training The Progressive GAN code repository contains a command line tool for recreating bit exact replicas of the datasets that we used in the paper. The tool also provides various utilities for operating on the datasets: usage: dataset_tool.py h ... display Display images in dataset. extract Extract images from dataset. compare Compare two datasets. create_mnist Create dataset for MNIST. create_mnistrgb Create dataset for MNIST RGB. create_cifar10 Create dataset for CIFAR 10. create_cifar100 Create dataset for CIFAR 100. create_svhn Create dataset for SVHN. create_lsun Create dataset for single LSUN category. create_celeba Create dataset for CelebA. create_celebahq Create dataset for CelebA HQ. create_from_images Create dataset from a directory full of images. create_from_hdf5 Create dataset from legacy HDF5 archive. Type dataset_tool.py h for more information. The datasets are represented by directories containing the same image data in several resolutions to enable efficient streaming. There is a separate .tfrecords file for each resolution, and if the dataset contains labels, they are stored in a separate file as well: > python dataset_tool.py create_cifar10 datasets/cifar10 /downloads/cifar10 > ls la datasets/cifar10 drwxr xr x 2 user user 7 Feb 21 10:07 . drwxrwxr x 10 user user 62 Apr 3 15:10 .. rw r r 1 user user 4900000 Feb 19 13:17 cifar10 r02.tfrecords rw r r 1 user user 12350000 Feb 19 13:17 cifar10 r03.tfrecords rw r r 1 user user 41150000 Feb 19 13:17 cifar10 r04.tfrecords rw r r 1 user user 156350000 Feb 19 13:17 cifar10 r05.tfrecords rw r r 1 user user 2000080 Feb 19 13:17 cifar10 rxx.labels The create_ commands take the standard version of a given dataset as input and produce the corresponding .tfrecords files as output. Additionally, the create_celebahq command requires a set of data files representing deltas with respect to the original CelebA dataset. These deltas (27.6GB) can be downloaded from datasets/celeba hq deltas . Note about module versions : Some of the dataset commands require specific versions of Python modules and system libraries (e.g. pillow, libjpeg), and they will give an error if the versions do not match. Please heed the error messages – there is no way to get the commands to work other than installing these specific versions. Training networks Once the necessary datasets are set up, you can proceed to train your own networks. The general procedure is as follows: 1. Edit config.py to specify the dataset and training configuration by uncommenting/editing specific lines. 2. Run the training script with python train.py . 3. The results are written into a newly created subdirectory under config.result_dir 4. Wait several days (or weeks) for the training to converge, and analyze the results. By default, config.py is configured to train a 1024x1024 network for CelebA HQ using a single GPU. This is expected to take about two weeks even on the highest end NVIDIA GPUs. The key to enabling faster training is to employ multiple GPUs and/or go for a lower resolution dataset. To this end, config.py contains several examples for commonly used datasets, as well as a set of configuration presets for multi GPU training. All of the presets are expected to yield roughly the same image quality for CelebA HQ, but their total training time can vary considerably: preset v1 1gpu : Original config that was used to produce the CelebA HQ and LSUN results shown in the paper. Expected to take about 1 month on NVIDIA Tesla V100. preset v2 1gpu : Optimized config that converges considerably faster than the original one. Expected to take about 2 weeks on 1xV100. preset v2 2gpus : Optimized config for 2 GPUs. Takes about 1 week on 2xV100. preset v2 4gpus : Optimized config for 4 GPUs. Takes about 3 days on 4xV100. preset v2 8gpus : Optimized config for 8 GPUs. Takes about 2 days on 8xV100. For reference, the expected output of each configuration preset for CelebA HQ can be found in networks/tensorflow version/example_training_runs Other noteworthy config options: fp16 : Enable FP16 mixed precision training to reduce the training times even further. The actual speedup is heavily dependent on GPU architecture and cuDNN version, and it can be expected to increase considerably in the future. BENCHMARK : Quickly iterate through the resolutions to measure the raw training performance. BENCHMARK0 : Same as BENCHMARK , but only use the highest resolution. syn1024rgb : Synthetic 1024x1024 dataset consisting of just black images. Useful for benchmarking. VERBOSE : Save image and network snapshots very frequently to facilitate debugging. GRAPH and HIST : Include additional data in the TensorBoard report. Analyzing results Training results can be analyzed in several ways: Manual inspection : The training script saves a snapshot of randomly generated images at regular intervals in fakes .png and reports the overall progress in log.txt . TensorBoard : The training script also exports various running statistics in a .tfevents file that can be visualized in TensorBoard with tensorboard logdir . Generating images and videos : At the end of config.py , there are several pre defined configs to launch utility scripts ( generate_ ). For example: Suppose you have an ongoing training run titled 010 pgan celebahq preset v1 1gpu fp32 , and you want to generate a video of random interpolations for the latest snapshot. Uncomment the generate_interpolation_video line in config.py , replace run_id 10 , and run python train.py The script will automatically locate the latest network snapshot and create a new result directory containing a single MP4 file. Quality metrics : Similar to the previous example, config.py also contains pre defined configs to compute various quality metrics (Sliced Wasserstein distance, Fréchet inception distance, etc.) for an existing training run. The metrics are computed for each network snapshot in succession and stored in metric .txt in the original result directory.",Image Generation,Image Generation 2223,Computer Vision,Computer Vision,Computer Vision,"MMD GAN with Repulsive Loss Function GAN: generative adversarial nets; MMD: maximum mean discrepancy; TF: TensorFlow This repository contains codes for MMD GAN and the repulsive loss proposed in ICLR paper 1 : \ Wei Wang, Yuan Sun, Saman Halgamuge. Improving MMD GAN Training with Repulsive Loss Function. ICLR 2019. URL: About the code The code defines the neural network architecture as dictionaries and strings to ease test of different models. It also contains many other models I have tried, so sorry if you find it a little bit confusing. The structure of code: 1. _DeepLearning/my_sngan/SNGan_ defines how a general GAN model is trained and evaluated. 2. _GeneralTools_ contains various tools: 1. _graph_func_ contains functions to run a model graph and metrics for evaluating generative models (Line 1595). 2. _input_func_ contains functions to handle datasets and input pipeline. 3. _layer_func_ contains functions to convert network architecture dictionary to operations 4. _math_func_ defines various mathematical operations. You may find spectral normalization at Line 397, loss functions for GAN at Line 2088, repulsive loss at Line 2505, repulsive with bounded kernel (referred to as rmb) at Line 2530. 5. _misc_fun_ contains FLAGs for the code. 3. my_test_ contain the specific model architectures and hyperparameters. Running the tests 1. Modify _GeneralTools/misc_func_ accordingly; 2. Read _Data/ReadMe.md_; download and prepare the datasets; 3. Run my_test_ with proper hyperparameters. About the algorithms Here we introduce the algorithms and tricks. Proposed Methods The paper 1 proposed three methods: 1. Repulsive loss ! equation 2\sum_{i\ne&space;j}k_D(x_i,y_j)+\sum_{i\ne&space;j}k_D(y_i,y_j)) ! equation \sum_{i\ne&space;j}k_D(y_i,y_j)) where ! equation real samples, ! equation generated samples, ! equation kernel formed by the discriminator ! equation and kernel ! equation . The discriminator loss of previous MMD GAN 2 , or what we called attractive loss, is ! equation . Below is an illustration of the effects of MMD losses on free R(eal) and G(enerated) particles (code in _Figures_ folder). The particles stand for discriminator outputs of samples, but, for illustration purpose, we allow them to move freely. These GIFs extend the Figure 1 of paper 1 . : : : : paired with paired with In the first row, we randomly initialized the particles, and applied or for 600 steps. The velocity of each particle is . In the second row, we obtained the particle positions at the 450th step of the first row and applied for another 600 steps with velocity . The blue and orange arrows stand for the gradients of attractive and repulsive components of MMD losses respectively. In summary, these GIFs indicate how MMD losses may move the free particles. Of course, the actual case of MMD GAN is much more complex as we update the model parameters instead of output scores directly and both networks are updated at each step. We argue that may cause opposite gradients from attractive and repulsive components of both and during training, and thus slow down the training process. Note this is different from the end stage training when the gradients should be opposite and cancelled out to reach 0. Another way of interpretation is that, by minimizing , the discriminator maximizes the similarity between the outputs of real samples, which results in D focusing on the similarities among real images and possibly ignoring the fine details that separate them. The repulsive loss actively learns such fine details to make real sample outputs repel each other. 2. Bounded kernel (used only in ! equation ) ! equation &space; \exp( \frac{1}{2\sigma^2}\min(\left&space;\ &space;D(x_i) D(x_j)&space;\right&space;\ ^2,&space;b_u))) ! equation &space; \exp( \frac{1}{2\sigma^2}\max(\left&space;\ &space;D(y_i) D(y_j)&space;\right&space;\ ^2,&space;b_l))) The gradient of Gaussian kernel is near 0 when the input distance is too small or large. The bounded kernel avoids kernel saturation by truncating the two tails of distance distribution, an idea inspired by the hinge loss. This prevents the discriminator from becoming too confident. 3. Power iteration for convolution (used in spectral normalization) At last, we proposed a method to calculate the spectral norm of convolution kernel. At iteration t, for convolution kernel ! equation , do ! equation ), ! equation ), and ! equation . The spectral norm is estimated as ! equation . Practical Tricks and Issues We recommend using the following tricks. 1. Spectral normalization, initially proposed in 3 . The idea is, at each layer, to use ! equation for convolution/dense multiplication. Here we multiply the signal with a constant 1 title C>1 /> after each spectral normalization to compensate for the decrease of signal norm at each layer. In the main text of paper 1 , we used empirically. In Appendix C.3 of paper 1 , we tested a variety of values. 2. Two time scale update rule (TTUR) 4 . The idea is to use different learning rates for the generator and discriminator. Unlike the case of Wasserstein GAN, we do not encourage using the repulsive loss for discriminator or the MMD loss for generator to indicate the progress of training. You may find that, during the training process, both and may be close to 0 initially; this is because both G and D are weak. may gradually increase during training; this is because it becomes harder for G to generate high quality samples and fool D (and G may not have the capacity to do so). For balanced and capable G and D, we would expect both and to stay close to 0 during the whole training process and any kernel (i.e., , and ) to be away from 0 or 1 and stay in the middle (e.g., 0.6). In some cases, you may find training using the repulsive loss diverges. Do not panic. It may be that the learning rate is not suitable. Please try other learning rate or the bounded kernel. Final Comments Thank you for reading! Please feel free to leave comments if things do not work or suddenly work, or if exploring my code ruins your day. :) Reference 1 Wei Wang, Yuan Sun, Saman Halgamuge. Improving MMD GAN Training with Repulsive Loss Function. ICLR 2019. URL: \ 2 Chun Liang Li, Wei Cheng Chang, Yu Cheng, Yiming Yang, and Barnabas Poczos. MMD GAN: Towards deeper understanding of moment matching network. In NeurIPS, 2017. 3 Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida. Spectral normalization for generative adversarial networks. In ICLR, 2018. \ 4 Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. GANs Trained by a Two Time Scale Update Rule Converge to a Nash Equilibrium. In NeurIPS, 2017.",Image Generation,Image Generation 2248,Computer Vision,Computer Vision,Computer Vision,"glow pytorch PyTorch implementation of Glow, Generative Flow with Invertible 1x1 Convolutions Usage: > python train.py PATH as trainer uses ImageFolder of torchvision, input directory should be structured like this even when there are only 1 classes. (Currently this implementation does not incorporate class classification loss.) > PATH/class1 > PATH/class2 > ... Notes ! Sample (sample.png) I have trained model on vanilla celebA dataset. Seems like works well. I found that learning rate (I have used 1e 4 without scheduling), learnt prior, number of bits (in this cases, 5), and using sigmoid function at the affine coupling layer instead of exponential function is beneficial to training a model. In my cases, LU decomposed invertible convolution was much faster than plain version. So I made it default to use LU decomposed version. ! Progression of samples (progression.gif) Progression of samples during training. Sampled once per 100 iterations during training.",Image Generation,Image Generation 2249,Computer Vision,Computer Vision,Computer Vision,"Style Based GAN in PyTorch Implementation of A Style Based Generator Architecture for Generative Adversarial Networks in PyTorch Usage: for celebA > python train.py mixing d {folder} PATH for FFHQ > python train.py mixing loss r1 sched d {folder} Sample ! Sample of the model trained on CelebA (doc/sample.png) ! Style mixing sample of the model trained on CelebA (doc/sample_mixing.png) I have mixed styles at 4^2 8^2 scale. I can't get samples as dramatic as samles in the original paper. I think my model too dependent on 4^2 scale features it seems like that much of details determined in that scale, so little variations can be acquired after it. ! Sample of the model trained on FFHQ (doc/sample_ffhq.png) ! Style mixing sample of the model trained on FFHQ (doc/sample_mixing_ffhq.png) Trained high resolution model on FFHQ. I think result seems more interesting.",Image Generation,Image Generation 2297,Computer Vision,Computer Vision,Computer Vision,"Vanilla GANS, Minibatch Discrimination Implementated using PyTorch This repository contains my first code in PyTorch: a GAN implemented from scratch (well, not really) and trained to generate MNIST like digits. Minibatch discrimination, was also implemented to avoid mode collapse, which is a common phenomenon observed in trained GANS. (Link to paper: Here is comparison of the generated outputs of GAN, with and without any minibatch discrimination. The networks employed were very simple (only one hidden layer), trained for 20 epochs each with a batch size of 20 and learning rate 1e 4. Note that the code for minibatch discrimination is not really optimal at this stage becuase of a for loop. I am yet to find a way to fix that issue using PyTorch. ! alt text ! alt text (Left) Vanilla GAN outputs (Right) GAN outputs using Minibatch Discrimination Layer Clearly Minibatch discrimination has improved the output digits, but they still don't look realistic enough. Possible ways we can improve this: (a) Adding more layers. As of now the there is only a single hidden layer which is too simplistic in my opinion. (b) Some other improved techniques for training GANs such Wesserstein/Unrolled GANS.",Image Generation,Image Generation 2319,Computer Vision,Computer Vision,Computer Vision,"iWGAN Thank you to the following for the code on which this program is based. For iWGAN implementation For Layernorm implementation To Francois Chollet for his Keras framework and book Deep Learing with Python. Pages 308 311 give some basic GAN code to get started with. The paper by Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, Aaron Courville which introduced the iWGAN is here. https://arxiv.org/abs/1704.00028",Image Generation,Image Generation 2391,Computer Vision,Computer Vision,Computer Vision,"Overview A naive implementation of PixelCNN in Pytorch as described in A Oord et. al. This by no means serve to reproduce the original results in the paper and is only meant to help someone trying to under the concept of Pixel CNNs. Also, the implementation of Pixel RNNs which were also described in the paper is NOT there in this repository. Introduction Pixel CNNs are a type of autoregressive generative models which try to model the generation of images as a sequence of generation of pixels. They use multiple convolutional layers to model the generation of next pixel conditioned on the pixels of the image which have already been generated. The layers preserve the spatial resolution of the input image in order to output the image of same size. During training phase, we start from the input image as shown below and perform convolution over it with the kernel of our first layer. ! Representation of Convolution on the input without masking (images/Unmasked_Conv.png) In the example above, we try to generate the pixel in the centre using the pixels which have already been generated. As described in the paper we are generating the pixels in the sequence as shown below: ! Generating image as a sequence of Pixels (images/Sequence.png) Clearly, pixel a should therefore not take into account the b, f and g since as per the sequence, during testing time, it won't have access to them. In order to replicate this even during the training stage as well, A Oord et. al. propose modification to the convolutional kernel by applying a mask to it. The mask will make that portion, which is not accessible to the model during testing time while generating the central pixel, 0 as can be seen below: ! Representation of Convolution on the input without masking (images/Masked_Conv.png) Thus sequence by sequence we keep on generating the pixels one by one until the entire image is generated. This can be visualised very neatly with the help of the graphic image below: ! Visualisation (images/Visualisation.gif) Masking As explained above, we need to use mask in order to restrict the amount of information model can see in the input during training. If we use the same mask even in the subsequent layers, the central pixel which we are trying to generate will be forced to zero when generating the first pixel in the image. Infact, if we use the same mask for all layers, the output mostly consists blank pixels, majorly because almost all the layers then do not take into account the information gained at that position from the previous layers. This almost translates to using a single layer while generating that pixel which ofcourse does not provide very good results. This can be very well visualised with the help of following imagery: ! Masking (images/Masking.png) In the Mask A while generating the activations in N+1 layer, we take into account only those values of the central pixel which have already been generated. In the case of RGB images, it means taking into account only that channel which has already been generated. For example if we are generating the image in sequence R >G >B , then R channel of next layer won't be taking into account any pixel, G channel will take into account the R channel of Nth layer and B channel will take into account R and G . This is clearly depicted above by highlighting the connections. On the other hand in Mask B , we connect all channels of the central pixel of N+1 layer to all the channels of the central pixel of N layer as depicted. In our model, first layer has Mask A while the subsequent layers have Mask B . Training The training of PixelCNN is very fast as we do not need to generate pixels sequentially due to the availability of pixels in the train data. Hence, we can utilize the advantage of parallelism which CNNs offer us thus making the training much faster than PixelRNN or the likes of it. The training can be started using : python3 train.py config/config_train.txt The defaults can be inferred from the code and can be changed by editing the config_train file accordingly. The models will be saved in the filter Models in the current directory. Generating After the model has trained, we can use the saved checkpoints by passing its path in the config_generate.txt . python3 generate.py config/config_generate.txt Below is the output of generate.py after training the model for 25 epochs and using its checkpoint. ! Sample (images/sample.png) To Do X Complete the README Implement Training and Testing Loss tables Implement for CIFAR Comments The work is based on A Oord et. al. . The model is currently restricted to MNIST dataset only. References 1. Aaron van den Oord, Nal Kalchbrenner, and Koray Kavukcuoglu. Pixel recurrent neural networks. arXiv preprint arXiv:1601.06759, 2016. 2. Tutorial on Pixel RNN",Image Generation,Image Generation 2392,Computer Vision,Computer Vision,Computer Vision,"Generating Devanagari Using DRAW PyTorch implementation of DRAW: A Recurrent Neural Network For Image Generation on the task of generating Devanagari Characters. Deep Recurrent Attentive Writer (DRAW) is a neural network architecture for image generation. DRAW networks combine a novel spatial attention mechanism that mimics the foveation of the human eye, with a sequential variational auto encoding framework that allows for the iterative construction of complex images. The system substantially improves on the state of the art for generative models on MNIST, and, when trained on the Street View House Numbers dataset, it generates images that cannot be distinguished from real data with the naked eye. Articles Blog Post: Articles: Difference With Attention Without Attention Training Download the data and place it in the data/ directory. Run train.py to start training. To change the hyperparameters of the network, update the values in the param dictionary in train.py . Loss Curve Generating New Images To generate new images run generate.py . sh python3 evaluate.py load_path /path/to/pth/checkpoint num_output n The checkpoint file for the model trained for 50 epochs is present in checkpoint/ directory. Results Devanagari Training Data Generated Devanagari After 50 Epochs Devanagari Numbers Only Training Data Generated Devanagari Numbers After 50 Epochs Some more generated images: References 1. Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, Daan Wierstra. DRAW: A Recurrent Neural Network For Image Generation. arxiv 2. ericjang/draw repo 3. What is DRAW (Deep Recurrent Attentive Writer)? blog Data The Devanagari Character dataset is available on kaggle. ( Source ) CREDITS >Kuldeep Singh Sidhu Github: github/singhsidhukuldeep Website: Kuldeep Singh Sidhu (Website) LinkedIn: Kuldeep Singh Sidhu (LinkedIn)",Image Generation,Image Generation 2427,Computer Vision,Computer Vision,Computer Vision,"Wgan GP_cats This is an implementation of Wasserstein GANs with gradient penalty. Link to the paper is : Wasserstein GANs use the Earth mover distance instead of TV or JS divergence or KL divergence. The weaker the distance, the better is the convergence of GANs. The other distances mentioned failed in the case of low dimensional manifolds where the distributions may have very little common projection space. The mathematical details of the advantages of this distance can be read here : The WGAN paper uses RMSprop for optimization and weight clipping to enforce a Lipschitz condition but in WGAN GP, gradient penalty enforces the Lipschitz and they succefully trained the model using Adam as discussed in detail in the paper. Usage Any image set of size 64x64 can be put in a folder and placed in the images folder. The noise dimension is set to 100 as suggested in the paper but one should feel free to play with the parameters like z_dim, n_critic. Further use of a an optimizer with beta1 0 like RMSprop helps improve results in some cases. Results Epoch 1 Epoch 100 Epoch 300 Epoch 500",Image Generation,Image Generation 2444,Computer Vision,Computer Vision,Computer Vision,"Progressive Growing of GANs for Improved Quality, Stability, and Variation – Official TensorFlow implementation of the ICLR 2018 paper Tero Karras (NVIDIA), Timo Aila (NVIDIA), Samuli Laine (NVIDIA), Jaakko Lehtinen (NVIDIA and Aalto University) For business inquiries, please contact researchinquiries@nvidia.com (mailto:researchinquiries@nvidia.com) For press and other inquiries, please contact Hector Marinez at hmarinez@nvidia.com (mailto:hmarinez@nvidia.com) ! Representative image Picture: Two imaginary celebrities that were dreamed up by a random number generator. Abstract: We describe a new training methodology for generative adversarial networks. The key idea is to grow both the generator and discriminator progressively: starting from a low resolution, we add new layers that model increasingly fine details as training progresses. This both speeds the training up and greatly stabilizes it, allowing us to produce images of unprecedented quality, e.g., CelebA images at 1024². We also propose a simple way to increase the variation in generated images, and achieve a record inception score of 8.80 in unsupervised CIFAR10. Additionally, we describe several implementation details that are important for discouraging unhealthy competition between the generator and discriminator. Finally, we suggest a new metric for evaluating GAN results, both in terms of image quality and variation. As an additional contribution, we construct a higher quality version of the CelebA dataset. Resources Paper (NVIDIA research) Paper (arXiv) Result video (YouTube) Additional material (Google Drive) ICLR 2018 poster ( karras2018iclr poster.pdf ) ICLR 2018 slides ( karras2018iclr slides.pptx ) Representative images ( images/representative images ) High quality video clips ( videos/high quality video clips ) Huge collection of non curated images for each dataset ( images/100k generated images ) Extensive video of random interpolations for each dataset ( videos/one hour of random interpolations ) Pre trained networks ( networks/tensorflow version ) Minimal example script for importing the pre trained networks ( networks/tensorflow version/example_import_script ) Data files needed to reconstruct the CelebA HQ dataset ( datasets/celeba hq deltas ) Example training logs and progress snapshots ( networks/tensorflow version/example_training_runs ) All the material, including source code, is made freely available for non commercial use under the Creative Commons CC BY NC 4.0 license. Feel free to use any of the material in your own work, as long as you give us appropriate credit by mentioning the title and author list of our paper. Versions There are two different versions of the source code. The TensorFlow version is newer and more polished, and we generally recommend it as a starting point if you are looking to experiment with our technique, build upon it, or apply it to novel datasets. The original Theano version , on the other hand, is what we used to produce all the results shown in our paper. We recommend using it if – and only if – you are looking to reproduce our exact results for benchmark datasets like CIFAR 10, MNIST RGB, and CelebA. The main differences are summarized in the following table: Feature TensorFlow version Original Theano version : : : : : Branch master (this branch) original theano version Multi GPU support Yes No FP16 mixed precision support Yes No Performance High Low Training time for CelebA HQ 2 days (8 GPUs) 2 weeks (1 GPU) 1–2 months Repro CelebA HQ results Yes – very close Yes – identical Repro LSUN results Yes – very close Yes – identical Repro CIFAR 10 results No Yes – identical Repro MNIST mode recovery No Yes – identical Repro ablation study (Table 1) No Yes – identical Dataset format TFRecords HDF5 Backwards compatibility Can import networks trained with Theano N/A Code quality Reasonable Somewhat messy Code status In active use No longer maintained System requirements Both Linux and Windows are supported, but we strongly recommend Linux for performance and compatibility reasons. 64 bit Python 3.6 installation with numpy 1.13.3 or newer. We recommend Anaconda3. One or more high end NVIDIA Pascal or Volta GPUs with 16GB of DRAM. We recommend NVIDIA DGX 1 with 8 Tesla V100 GPUs. NVIDIA driver 391.25 or newer, CUDA toolkit 9.0 or newer, cuDNN 7.1.2 or newer. Additional Python packages listed in requirements pip.txt Importing and using pre trained networks All pre trained networks found on Google Drive, as well as ones produced by the training script, are stored as Python PKL files. They can be imported using the standard pickle mechanism as long as two conditions are met: (1) The directory containing the Progressive GAN code repository must be included in the PYTHONPATH environment variable, and (2) a tf.Session() object must have been created beforehand and set as default. Each PKL file contains 3 instances of tfutil.Network : Import official CelebA HQ networks. with open('karras2018iclr celebahq 1024x1024.pkl', 'rb') as file: G, D, Gs pickle.load(file) G Instantaneous snapshot of the generator, mainly useful for resuming a previous training run. D Instantaneous snapshot of the discriminator, mainly useful for resuming a previous training run. Gs Long term average of the generator, yielding higher quality results than the instantaneous snapshot. It is also possible to import networks that were produced using the Theano implementation, as long as they do not employ any features that are not natively supported by the TensorFlow version (minibatch discrimination, batch normalization, etc.). To enable Theano network import, however, you must use misc.load_pkl() in place of pickle.load() : Import Theano versions of the official CelebA HQ networks. import misc G, D, Gs misc.load_pkl('200 celebahq 1024x1024/network final.pkl') Once you have imported the networks, you can call Gs.run() to produce a set of images for given latent vectors, or Gs.get_output_for() to include the generator network in a larger TensorFlow expression. For further details, please consult the example script found on Google Drive. Instructions: 1. Pull the Progressive GAN code repository and add it to your PYTHONPATH environment variable. 2. Install the required Python packages with pip install r requirements pip.txt 2. Download import_example.py from networks/tensorflow version/example_import_script 3. Download karras2018iclr celebahq 1024x1024.pkl from networks/tensorflow version and place it in the same directory as the script. 5. Run the script with python import_example.py 6. If everything goes well, the script should generate 10 PNG images ( img0.png – img9.png ) that match the ones found in networks/tensorflow version/example_import_script exactly. Preparing datasets for training The Progressive GAN code repository contains a command line tool for recreating bit exact replicas of the datasets that we used in the paper. The tool also provides various utilities for operating on the datasets: usage: dataset_tool.py h ... display Display images in dataset. extract Extract images from dataset. compare Compare two datasets. create_mnist Create dataset for MNIST. create_mnistrgb Create dataset for MNIST RGB. create_cifar10 Create dataset for CIFAR 10. create_cifar100 Create dataset for CIFAR 100. create_svhn Create dataset for SVHN. create_lsun Create dataset for single LSUN category. create_celeba Create dataset for CelebA. create_celebahq Create dataset for CelebA HQ. create_from_images Create dataset from a directory full of images. create_from_hdf5 Create dataset from legacy HDF5 archive. Type dataset_tool.py h for more information. The datasets are represented by directories containing the same image data in several resolutions to enable efficient streaming. There is a separate .tfrecords file for each resolution, and if the dataset contains labels, they are stored in a separate file as well: > python dataset_tool.py create_cifar10 datasets/cifar10 /downloads/cifar10 > ls la datasets/cifar10 drwxr xr x 2 user user 7 Feb 21 10:07 . drwxrwxr x 10 user user 62 Apr 3 15:10 .. rw r r 1 user user 4900000 Feb 19 13:17 cifar10 r02.tfrecords rw r r 1 user user 12350000 Feb 19 13:17 cifar10 r03.tfrecords rw r r 1 user user 41150000 Feb 19 13:17 cifar10 r04.tfrecords rw r r 1 user user 156350000 Feb 19 13:17 cifar10 r05.tfrecords rw r r 1 user user 2000080 Feb 19 13:17 cifar10 rxx.labels The create_ commands take the standard version of a given dataset as input and produce the corresponding .tfrecords files as output. Additionally, the create_celebahq command requires a set of data files representing deltas with respect to the original CelebA dataset. These deltas (27.6GB) can be downloaded from datasets/celeba hq deltas . Note about module versions : Some of the dataset commands require specific versions of Python modules and system libraries (e.g. pillow, libjpeg), and they will give an error if the versions do not match. Please heed the error messages – there is no way to get the commands to work other than installing these specific versions. Training networks Once the necessary datasets are set up, you can proceed to train your own networks. The general procedure is as follows: 1. Edit config.py to specify the dataset and training configuration by uncommenting/editing specific lines. 2. Run the training script with python train.py . 3. The results are written into a newly created subdirectory under config.result_dir 4. Wait several days (or weeks) for the training to converge, and analyze the results. By default, config.py is configured to train a 1024x1024 network for CelebA HQ using a single GPU. This is expected to take about two weeks even on the highest end NVIDIA GPUs. The key to enabling faster training is to employ multiple GPUs and/or go for a lower resolution dataset. To this end, config.py contains several examples for commonly used datasets, as well as a set of configuration presets for multi GPU training. All of the presets are expected to yield roughly the same image quality for CelebA HQ, but their total training time can vary considerably: preset v1 1gpu : Original config that was used to produce the CelebA HQ and LSUN results shown in the paper. Expected to take about 1 month on NVIDIA Tesla V100. preset v2 1gpu : Optimized config that converges considerably faster than the original one. Expected to take about 2 weeks on 1xV100. preset v2 2gpus : Optimized config for 2 GPUs. Takes about 1 week on 2xV100. preset v2 4gpus : Optimized config for 4 GPUs. Takes about 3 days on 4xV100. preset v2 8gpus : Optimized config for 8 GPUs. Takes about 2 days on 8xV100. For reference, the expected output of each configuration preset for CelebA HQ can be found in networks/tensorflow version/example_training_runs Other noteworthy config options: fp16 : Enable FP16 mixed precision training to reduce the training times even further. The actual speedup is heavily dependent on GPU architecture and cuDNN version, and it can be expected to increase considerably in the future. BENCHMARK : Quickly iterate through the resolutions to measure the raw training performance. BENCHMARK0 : Same as BENCHMARK , but only use the highest resolution. syn1024rgb : Synthetic 1024x1024 dataset consisting of just black images. Useful for benchmarking. VERBOSE : Save image and network snapshots very frequently to facilitate debugging. GRAPH and HIST : Include additional data in the TensorBoard report. Analyzing results Training results can be analyzed in several ways: Manual inspection : The training script saves a snapshot of randomly generated images at regular intervals in fakes .png and reports the overall progress in log.txt . TensorBoard : The training script also exports various running statistics in a .tfevents file that can be visualized in TensorBoard with tensorboard logdir . Generating images and videos : At the end of config.py , there are several pre defined configs to launch utility scripts ( generate_ ). For example: Suppose you have an ongoing training run titled 010 pgan celebahq preset v1 1gpu fp32 , and you want to generate a video of random interpolations for the latest snapshot. Uncomment the generate_interpolation_video line in config.py , replace run_id 10 , and run python train.py The script will automatically locate the latest network snapshot and create a new result directory containing a single MP4 file. Quality metrics : Similar to the previous example, config.py also contains pre defined configs to compute various quality metrics (Sliced Wasserstein distance, Fréchet inception distance, etc.) for an existing training run. The metrics are computed for each network snapshot in succession and stored in metric .txt in the original result directory.",Image Generation,Image Generation 2456,Computer Vision,Computer Vision,Computer Vision,DRAW_pytorch DRAW: A Recurrent Neural Network For Image Generation Result attention Loss Ground truth : : : : Timestep Generated Timestep Generated : : : : : : : : 3000 6000 24000 27000 References,Image Generation,Image Generation 2491,Computer Vision,Computer Vision,Computer Vision,"DCGAN256 DCGAN on 256x256 pictures ( ) This repo is a modification of manicman1999/GAN256 ( ) The full credit of the basic model structure design goes to manicman1999/GAN256 This GAN was based on GAN256 by Matthew Mann as his implementation allowed input and output of pictures of 256x256 resolution. Normally, GANs trained at a personal scale only works well at much lower resolutions (32x32 or 64x64), and may not transfer to datasets other than Mnist. Modifications A couple of modifications were made to the model architecture for better results. With reference from GANs articles and research , the following changes were made: 1. Larger Kernel and more filters The kernel size of layers 2 and 3 in the generator was increased by 1. As larger kernels cover more area it should capture more information. 2. Flip labels (Generated True, Real False) Helps with gradient flow. 3. Instance noise added to stabilize training A Gaussian Noise layer was added at the top of the Discriminator, with the idea being to prevent the Discriminator from being too good too early, which will prevent the Generator from learning and possibly converging. Input For female blouses, the images were scraped using Scrapy from the Amazon website, and filters were used to select only 4 /5 star products. For heels, the images were taken from the UT Zappos50K dataset . They are all catalog images collected from Zappos.com. All the images were resized and background filled to 256x256 resolution before input into the model for training. Usage Read the code first in the Jupyter Notebook, and create an images folders to store your image dataset (with the names renamed to 5 digit numbers starting from 0, images resized to 256x256). Then just run it and let the community know what you came up with! Results Using 10,000 images of women's blouses with 4/5 star ratings scraped from Amazon: ! Amazon Clothes Epoch 1 300 213th Iteration: ! 213th Iteration 300th Iteration: ! 300th Iteration Using 5,700 images of Zippo's Heels from the UT Zappos50K dataset : ! Zippo Heels Epoch 1 165 149th Iteration: ! 149th Iteration 158th Iteration ! 158th Iteration Discussion Using a DCGANs, we were able to input 256x256 images and train the Generative Adversarial Network to generate random novel images. Depending on the input, it can be useful to generate such images in order to inspire designers to create new designs. As in the case of the Amazon clothes, we can restrict the input of the images to popular items so that the generated images will mainly incorporate popular colours and designs. Further research and tweaking is needed to tailor different GANs to suit different purposes and outputs since as we can see, the current DCGAN architecture seems to work better for shoes rather than clothes as it is easier to generate more abstract designs than frabic like images. References Generative adversarial nets arXiv Improved Techniques for Training GANs arXiv Improved Training of Wasserstein GANs arXiv",Image Generation,Image Generation 2494,Computer Vision,Computer Vision,Computer Vision,"Note I wrote this code before the official implementation got released. Now that the official implementation is part of Tensorflow, this codebase is not maintained anymore. Please refer to the official repo. real nvp Implementation of Real NVP in Tensorflow. Started with code from PixelCNN++ by OpenAI Sample usage: 1. Install Python3. 2. Create directories for downloading dataset and saving checkpoints. 3. Run train.py. ' nr_gpu', which denotes the number of GPUs to use, should be specified. Sample usage: $ CUDA_VISIBLE_DEVICES 1,2 python3 train.py nr_gpu 2 data_dir download save_dir checkpoints load_params 0 save_interval 2 Sample image from the model trained on CIFAR10. The test NLL was 3.51.",Image Generation,Image Generation 2495,Computer Vision,Computer Vision,Computer Vision,"pix2pix + BEGAN Image to Image Translation with Conditional Adversarial Nets BEGAN: Boundary Equilibrium Generative Adversarial Networks Install install pytorch and pytorch.vision Dataset Download images from author's implementation Suppose you downloaded facades dataset in /path/to/facades Train pix2pixGAN CUDA_VISIBLE_DEVICES x python main_pix2pixgan.py dataroot /path/to/facades/train valDataroot /path/to/facades/val exp /path/to/a/directory/for/checkpoints pix2pixBEGAN CUDA_VISIBLE_DEVICES x python main_pix2pixBEGAN.py dataroot /path/to/facades/train valDataroot /path/to/facades/val exp /path/to/a/directory/for/checkpoints Most of the parameters are the same for a fair comparision. The original pix2pix is modelled as a conditional GAN, however we didn't. Input samples are not given in D( Only target samples are given ) We used the image buffer (analogyous to replay buffer in DQN) in training D. Try other datasets as your need. Similar results will be found. Training Curve(pix2pixBEGAN) L_D and L_G \w BEGAN ! loss We found out both L_D and L_G are balanced consistently(equilibrium parameter, gamma 0.7) and converged, even thought network D and G are different in terms of model capacity and detailed layer specification. M_global ! Mglobal As the author said, M_global is a good indicator for monitoring convergence. Parsing log : train log file will be saved in the driectory, you specified, named as train.log L_D and L_G \w GAN ! BEGAN_loss Comparison pix2pixGAN vs. pix2pixBEGAN CUDA_VISIBLE_DEVICES x python compare.py netG_GAN /path/to/netG.pth netG_BEGAN /path/to/netG.pth exp /path/to/a/dir/for/saving tstDataroot /path/to/facades/test/ ! failure ! GANvsBEGAN Checkout more results (order in input, real target, fake(pix2pixBEGAN), fake(pix2pixGAN)) Interpolation on the input space. CUDA_VISIBLE_DEVICES x python interpolateInput.py tstDataroot /path/to/your/facades/test/ interval 14 exp /path/to/resulting/dir tstBatchSize 4 netG /path/to/your/netG_epoch_xxx.pth Upper rows: pix2pixGAN, Lower rows: pix2pixBEGAN ! interpolation Showing reconstruction from D and generation from G (order in input, real target, reconstructed real, fake, reconstructed fake) ! reconDandGenG Reference pix2pix.pytorch BEGAN in pytorch A simple conditional version of BEGAN fantastic pytorch misc. We apologize for your inconvenience when cloning this project. Size of resulting images are huge. please be patient.(Downloading zip file seems to need less time.)",Image Generation,Image Generation 2510,Computer Vision,Computer Vision,Computer Vision,"AttnGAN Pytorch implementation for reproducing AttnGAN results in the paper AttnGAN: Fine Grained Text to Image Generation with Attentional Generative Adversarial Networks by Tao Xu, Pengchuan Zhang, Qiuyuan Huang, Han Zhang, Zhe Gan, Xiaolei Huang, Xiaodong He. (This work was performed when Tao was an intern with Microsoft Research). Dependencies python 2.7 Pytorch In addition, please add the project folder to PYTHONPATH and pip install the following packages: python dateutil easydict pandas torchfile nltk scikit image Data 1. Download our preprocessed metadata for birds coco and save them to data/ 2. Download the birds image data. Extract them to data/birds/ 3. Download coco dataset and extract the images to data/coco/ Training Pre train DAMSM models: For bird dataset: python pretrain_DAMSM.py cfg cfg/DAMSM/bird.yml gpu 0 For coco dataset: python pretrain_DAMSM.py cfg cfg/DAMSM/coco.yml gpu 1 Train AttnGAN models: For bird dataset: python main.py cfg cfg/bird_attn2.yml gpu 2 For coco dataset: python main.py cfg cfg/coco_attn2.yml gpu 3 .yml files are example configuration files for training/evaluation our models. Pretrained Model DAMSM for bird . Download and save it to DAMSMencoders/ DAMSM for coco . Download and save it to DAMSMencoders/ AttnGAN for bird . Download and save it to models/ AttnGAN for coco . Download and save it to models/ AttnDCGAN for bird . Download and save it to models/ This is an variant of AttnGAN which applies the propsoed attention mechanisms to DCGAN framework. Sampling Run python main.py cfg cfg/eval_bird.yml gpu 1 to generate examples from captions in files listed in ./data/birds/example_filenames.txt . Results are saved to DAMSMencoders/ . Change the eval_ .yml files to generate images from other pre trained models. Input your own sentence in ./data/birds/example_captions.txt if you wannt to generate images from customized sentences. Validation To generate images for all captions in the validation dataset, change B_VALIDATION to True in the eval_ .yml. and then run python main.py cfg cfg/eval_bird.yml gpu 1 We compute inception score for models trained on birds using StackGAN inception model . We compute inception score for models trained on coco using improved gan/inception_score . Examples generated by AttnGAN Blog bird example coco example : : : : ! ! Creating an API Evaluation code (eval) embedded into a callable containerized API is included in the eval\ folder. Citing AttnGAN If you find AttnGAN useful in your research, please consider citing: @article{Tao18attngan, author {Tao Xu, Pengchuan Zhang, Qiuyuan Huang, Han Zhang, Zhe Gan, Xiaolei Huang, Xiaodong He}, title {AttnGAN: Fine Grained Text to Image Generation with Attentional Generative Adversarial Networks}, Year {2018}, booktitle {{CVPR}} } Reference StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks code Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks code",Image Generation,Image Generation 2517,Computer Vision,Computer Vision,Computer Vision,"The Mozilla Research RRNoise project shows how to apply deep learning to noise suppression. It combines classic signal processing with deep learning, but it’s small and fast. No expensive GPUs required — it runs easily on a Raspberry Pi. The result is easier to tune and sounds better than traditional noise suppression systems (been there!). Acoustic Noise Cancellation by Machine Learning Image classification with Keras and deep learning 以上只是一些人工智能在医疗健康领域应用的例子。据调查,美国有35%的医疗机构将在2年内引进人工智能技术,而计划在5年内引进人工智能技术的医院达到50%,群体健康、患者诊断、临床决策支持、精准医疗是备受期待的应用领域。 ( 原文链接: 译者注:MedyMatch是一家以色列的公司,与IBM Watson Health合作,在后者提供的CT图片的基础上,应用深度学习和计算机视觉技术,识别颅内出血的部位。详见 译者注:该方法是通过人工智能判断X光胸片是否有肺结核的影像特征,使用了AlexNet and GoogLeNet这两种模型,经过1007张X光胸片的训练后,联合应用二者,对肺结核的诊断准确率可达99%。( IBM Watson Health MedyMatch Technology AI 《從 HoloLens 到 AI 輔助結核病治療,17 年 AI 在醫療領域幾個最重要的應用突破》 根本停不下來!給它一個輪廓,TensorFlow 還你一隻完整的喵 (附論文下載)》 谷歌 DeepMind AI 再次完爆人類讀唇語正確率勝專家(附論文下載)》 这个过程通常被称为语音合成(speech synthesis)或文本转语音(TTS) 深度学习于语音合成研究综述 luanfujun/deep photo styletransfer DNN TTS(深度神經網絡文本到語音) WaveNet: A Generative Model for Raw Audio PR 024: Pixel Recurrent Neural Network PixelCNN, Wavenet & Variational Autoencoders Santiago Pascual UPC 2017 WaveNet: A Generative Model for Raw Audio GitHub wavenet_vocoder PixelRNN (van den Oord et al., 2016)图像生成的做法 WaveNet之后,百度第一代Deep Voice出现了。为了解决速度慢这个问题,我们看看百度在Deep Voice第一代 1 是怎么做的。 Artifical_Intelegent ConvNetJS is a Javascript library for training Deep Learning models (Neural Networks) entirely in your browser. Open a tab and you're training. No software requirements, no compilers, no installations, no GPUs, no sweat. MIT 6.S094: Deep Learning for Self Driving Cars Numerical_Analysis Numerical_Analysis numerical analysis python pdf What a Deep Neural Network thinks about your selfie MIT 6.S094: Introduction to Deep Learning and Self Driving Cars 4:20 Deep Learning in Javascript. Train Convolutional Neural Networks (or ordinary ones) in your browser. Submission for the MIT self driving car project on deep reinforcement learning",Image Generation,Image Generation 2525,Computer Vision,Computer Vision,Computer Vision,"IllustrationGAN A simple, clean TensorFlow implementation of Generative Adversarial Networks with a focus on modeling illustrations. Generated Images These images were generated by the model after being trained on a custom dataset of about 20,000 anime faces that were automatically cropped from illustrations using a face detector. ! Generated Images (images/montage.png?raw True) Checking for Overfitting It is theoretically possible for the generator network to memorize training set images rather than actually generalizing and learning to produce novel images of its own. To check for this, I randomly generate images and display the closest images in the training set according to mean squared error. The top row is randomly generated images, the columns are the closest 5 images in the training set. ! Overfitting Check (images/overfitting_check.png?raw True) It is clear that the generator does not merely learn to copy training set images, but rather generalizes and is able to produce its own unique images. How it Works Generative Adversarial Networks consist of two neural networks: a discriminator and a generator. The discriminator receives both real images from the training set and generated images produced by the generator. The discriminator outputs the probability that an image is real, so it is trained to output high values for the real images and low values for the generated ones. The generator is trained to produce images that the discriminator thinks are real. Both the discriminator and generator are trainined simultaneously so that they compete against each other. As a result of this, the generator learns to produce more and more realistic images as it trains. Model Architecture The model is based on DCGANs , but with a few important differences: 1. No strided convolutions. The generator uses bilinear upsampling to upscale a feature blob by a factor of 2, followed by a stride 1 convolution layer. The discriminator uses a stride 1 convolution followed by 2x2 max pooling. 2. Minibatch discrimination. See Improved Techniques for Training GANs for more details. 3. More fully connected layers in both the generator and discriminator. In DCGANs, both networks have only one fully connected layer. 4. A novel regularization term applied to the generator network. Normally, increasing the number of fully connected layers in the generator beyond one triggers one of the most common failure modes when training GANs: the generator collapses the z space and produces only a very small number of unique examples. In other words, very different z vectors will produce nearly the same generated image. To fix this, I add a small auxiliary z predictor network that takes as input the output of the last fully connected layer in the generator, and predicts the value of z. In other words, it attempts to learn the inverse of whatever function the generator fully connected layers learn. The z predictor network and generator are trained together to predict the value of z. This forces the generator fully connected layers to only learn those transformations that preserve information about z. The result is that the aformentioned collapse no longer occurs, and the generator is able to leverage the power of the additional fully connected layers. Training the Model Dependencies: TensorFlow, PrettyTensor, numpy, matplotlib The custom dataset I used is too large to add to a Github repository; I am currently finding a suitable way to distribute it. Instructions for training the model will be in this readme after I make the dataset available.",Image Generation,Image Generation 2540,Computer Vision,Computer Vision,Computer Vision,"TensorFlow Models This repository contains a number of different models implemented in TensorFlow : The official models (official) are a collection of example models that use TensorFlow's high level APIs. They are intended to be well maintained, tested, and kept up to date with the latest stable TensorFlow API. They should also be reasonably optimized for fast performance while still being easy to read. We especially recommend newer TensorFlow users to start here. The research models are a large collection of models implemented in TensorFlow by researchers. They are not officially supported or available in release branches; it is up to the individual researchers to maintain the models and/or provide support on issues and pull requests. The samples folder (samples) contains code snippets and smaller models that demonstrate features of TensorFlow, including code presented in various blog posts. The tutorials folder (tutorials) is a collection of models described in the TensorFlow tutorials . Contribution guidelines If you want to contribute to models, be sure to review the contribution guidelines (CONTRIBUTING.md). License Apache License 2.0 (LICENSE)",Image Generation,Image Generation 2549,Computer Vision,Computer Vision,Computer Vision,"TensorFlow Models This repository contains a number of different models implemented in TensorFlow : The official models (official) are a collection of example models that use TensorFlow's high level APIs. They are intended to be well maintained, tested, and kept up to date with the latest stable TensorFlow API. They should also be reasonably optimized for fast performance while still being easy to read. We especially recommend newer TensorFlow users to start here. The research models are a large collection of models implemented in TensorFlow by researchers. They are not officially supported or available in release branches; it is up to the individual researchers to maintain the models and/or provide support on issues and pull requests. The samples folder (samples) contains code snippets and smaller models that demonstrate features of TensorFlow, including code presented in various blog posts. The tutorials folder (tutorials) is a collection of models described in the TensorFlow tutorials . Contribution guidelines If you want to contribute to models, be sure to review the contribution guidelines (CONTRIBUTING.md). License Apache License 2.0 (LICENSE)",Image Generation,Image Generation 2555,Computer Vision,Computer Vision,Computer Vision,"! Tensorpack (.github/tensorpack.png) Tensorpack is a neural network training interface based on TensorFlow. Build Status ReadTheDoc Gitter chat model zoo Features: It's Yet Another TF high level API, with __speed__, and __flexibility__ built together. 1. Focus on __training speed__. + Speed comes for free with Tensorpack it uses TensorFlow in the __efficient way__ with no extra overhead. On common CNNs, it runs training 1.25x faster than the equivalent Keras code. Your training can probably gets faster if written with Tensorpack. + Data parallel multi GPU/distributed training strategy is off the shelf to use. It scales as well as Google's official benchmark . + See tensorpack/benchmarks for some benchmark scripts. 2. Focus on __large datasets__. + You don't usually need tf.data . Symbolic programming often makes data processing harder. Tensorpack helps you efficiently process large datasets (e.g. ImageNet) in __pure Python__ with autoparallelization. 3. It's not a model wrapper. + There are too many symbolic function wrappers in the world. Tensorpack includes only a few common models. But you can use any symbolic function library inside Tensorpack, including tf.layers/Keras/slim/tflearn/tensorlayer/.... See tutorials and documentations to know more about these features. Examples: We refuse toy examples. Instead of showing tiny CNNs trained on MNIST/Cifar10, we provide training scripts that reproduce well known papers. We refuse low quality implementations. Unlike most open source repos which only __implement__ papers, Tensorpack examples (examples) faithfully __reproduce__ papers, demonstrating its __flexibility__ for actual research. Vision: + Train ResNet (examples/ResNet) and other models (examples/ImageNetModels) on ImageNet. + Train Mask/Faster R CNN on COCO object detection (examples/FasterRCNN) + Generative Adversarial Network(GAN) variants (examples/GAN), including DCGAN, InfoGAN, Conditional GAN, WGAN, BEGAN, DiscoGAN, Image to Image, CycleGAN. + DoReFa Net: train binary / low bitwidth CNN on ImageNet (examples/DoReFa Net) + Fully convolutional Network for Holistically Nested Edge Detection(HED) (examples/HED) + Spatial Transformer Networks on MNIST addition (examples/SpatialTransformer) + Visualize CNN saliency maps (examples/Saliency) + Similarity learning on MNIST (examples/SimilarityLearning) Reinforcement Learning: + Deep Q Network(DQN) variants on Atari games (examples/DeepQNetwork), including DQN, DoubleDQN, DuelingDQN. + Asynchronous Advantage Actor Critic(A3C) with demos on OpenAI Gym (examples/A3C Gym) Speech / NLP: + LSTM CTC for speech recognition (examples/CTC TIMIT) + char rnn for fun (examples/Char RNN) + LSTM language model on PennTreebank (examples/PennTreebank) Install: Dependencies: + Python 2.7 or 3.3+. Python 2.7 is supported until it retires in 2020 . + Python bindings for OpenCV. (Optional, but required by a lot of features) + TensorFlow ≥ 1.3, < 2. (Optional, if you only want to use tensorpack.dataflow alone as a data processing library) pip install upgrade git+ or add user to install to user's local directories Please note that tensorpack is not yet stable. If you use tensorpack in your code, remember to mark the exact version of tensorpack you use as your dependencies. Citing Tensorpack: If you use Tensorpack in your research or wish to refer to the examples, please cite with: @misc{wu2016tensorpack, title {Tensorpack}, author {Wu, Yuxin and others}, howpublished {\url{ year {2016} }",Image Generation,Image Generation 2565,Computer Vision,Computer Vision,Computer Vision,"Semi Supervised Learning Using GANs SSL with GANs is found to be useful when doing classification with limited amount of labeled data. The unlabeled samples can be used in a semi supervised setting to boost performance. Even with limited number of labeled images, the SSL GAN is able to perform better than the supervised baseline. The loss function used is of the form specified in the paper Improved Techniques for Training GANs Results for MNIST No. of labeled samples per class Accuracy SSL GAN Accuracy Supervised : : : : : : 10 0.7220 ± 0.0247 0.6403 ± 0.0203 50 0.8985 ± 0.0609 0.8610 ± 0.0127 100 0.9325 ± 0.0269 0.9218 ± 0.0067 250 0.9693 ± 0.0149 0.9550 ± 0.0088 500 0.9760 ± 0.0065 0.9698 ± 0.0034 750 0.9818 ± 0.0038 0.9795 ± 0.0026 1000 0.9813 ± 0.0010 0.9830 ± 0.0012 ! graph Results for CIFAR10 No. of labeled samples per class Accuracy SSL GAN Accuracy Supervised : : : : : : 10 0.3430 ± 0.0552 0.1808 ± 0.0245 250 0.6070 ± 0.1061 0.4288 ± 0.0229 1000 0.7655 ± 0.0389 0.6500 ± 0.0358 ! graph Use Cases Radar Image analysis : A Deep Convolutional Generative Adversarial Networks (DCGANs) Based Semi Supervised Method for Object Recognition in Synthetic Aperture Radar (SAR) Images Medical Diagnostics : Semi Supervised Deep Learning for Abnormality Classification in Retinal Images In general, semi supervised learning is useful when you have unlabeled samples (which cannot be made use of by supervised models) usually when you cannot label large number of data due to cost associated / unavailability of experts for the task.",Image Generation,Image Generation 2566,Computer Vision,Computer Vision,Computer Vision,"Progressive Growing of GANs for Improved Quality, Stability, and Variation – Official TensorFlow implementation of the ICLR 2018 paper Tero Karras (NVIDIA), Timo Aila (NVIDIA), Samuli Laine (NVIDIA), Jaakko Lehtinen (NVIDIA and Aalto University) For business inquiries, please contact researchinquiries@nvidia.com (mailto:researchinquiries@nvidia.com) For press and other inquiries, please contact Hector Marinez at hmarinez@nvidia.com (mailto:hmarinez@nvidia.com) ! Representative image Picture: Two imaginary celebrities that were dreamed up by a random number generator. Abstract: We describe a new training methodology for generative adversarial networks. The key idea is to grow both the generator and discriminator progressively: starting from a low resolution, we add new layers that model increasingly fine details as training progresses. This both speeds the training up and greatly stabilizes it, allowing us to produce images of unprecedented quality, e.g., CelebA images at 1024². We also propose a simple way to increase the variation in generated images, and achieve a record inception score of 8.80 in unsupervised CIFAR10. Additionally, we describe several implementation details that are important for discouraging unhealthy competition between the generator and discriminator. Finally, we suggest a new metric for evaluating GAN results, both in terms of image quality and variation. As an additional contribution, we construct a higher quality version of the CelebA dataset. Resources Paper (NVIDIA research) Paper (arXiv) Result video (YouTube) Additional material (Google Drive) ICLR 2018 poster ( karras2018iclr poster.pdf ) ICLR 2018 slides ( karras2018iclr slides.pptx ) Representative images ( images/representative images ) High quality video clips ( videos/high quality video clips ) Huge collection of non curated images for each dataset ( images/100k generated images ) Extensive video of random interpolations for each dataset ( videos/one hour of random interpolations ) Pre trained networks ( networks/tensorflow version ) Minimal example script for importing the pre trained networks ( networks/tensorflow version/example_import_script ) Data files needed to reconstruct the CelebA HQ dataset ( datasets/celeba hq deltas ) Example training logs and progress snapshots ( networks/tensorflow version/example_training_runs ) All the material, including source code, is made freely available for non commercial use under the Creative Commons CC BY NC 4.0 license. Feel free to use any of the material in your own work, as long as you give us appropriate credit by mentioning the title and author list of our paper. Versions There are two different versions of the source code. The TensorFlow version is newer and more polished, and we generally recommend it as a starting point if you are looking to experiment with our technique, build upon it, or apply it to novel datasets. The original Theano version , on the other hand, is what we used to produce all the results shown in our paper. We recommend using it if – and only if – you are looking to reproduce our exact results for benchmark datasets like CIFAR 10, MNIST RGB, and CelebA. The main differences are summarized in the following table: Feature TensorFlow version Original Theano version : : : : : Branch master (this branch) original theano version Multi GPU support Yes No FP16 mixed precision support Yes No Performance High Low Training time for CelebA HQ 2 days (8 GPUs) 2 weeks (1 GPU) 1–2 months Repro CelebA HQ results Yes – very close Yes – identical Repro LSUN results Yes – very close Yes – identical Repro CIFAR 10 results No Yes – identical Repro MNIST mode recovery No Yes – identical Repro ablation study (Table 1) No Yes – identical Dataset format TFRecords HDF5 Backwards compatibility Can import networks trained with Theano N/A Code quality Reasonable Somewhat messy Code status In active use No longer maintained System requirements Both Linux and Windows are supported, but we strongly recommend Linux for performance and compatibility reasons. 64 bit Python 3.6 installation with numpy 1.13.3 or newer. We recommend Anaconda3. One or more high end NVIDIA Pascal or Volta GPUs with 16GB of DRAM. We recommend NVIDIA DGX 1 with 8 Tesla V100 GPUs. NVIDIA driver 391.25 or newer, CUDA toolkit 9.0 or newer, cuDNN 7.1.2 or newer. Additional Python packages listed in requirements pip.txt Importing and using pre trained networks All pre trained networks found on Google Drive, as well as ones produced by the training script, are stored as Python PKL files. They can be imported using the standard pickle mechanism as long as two conditions are met: (1) The directory containing the Progressive GAN code repository must be included in the PYTHONPATH environment variable, and (2) a tf.Session() object must have been created beforehand and set as default. Each PKL file contains 3 instances of tfutil.Network : Import official CelebA HQ networks. with open('karras2018iclr celebahq 1024x1024.pkl', 'rb') as file: G, D, Gs pickle.load(file) G Instantaneous snapshot of the generator, mainly useful for resuming a previous training run. D Instantaneous snapshot of the discriminator, mainly useful for resuming a previous training run. Gs Long term average of the generator, yielding higher quality results than the instantaneous snapshot. It is also possible to import networks that were produced using the Theano implementation, as long as they do not employ any features that are not natively supported by the TensorFlow version (minibatch discrimination, batch normalization, etc.). To enable Theano network import, however, you must use misc.load_pkl() in place of pickle.load() : Import Theano versions of the official CelebA HQ networks. import misc G, D, Gs misc.load_pkl('200 celebahq 1024x1024/network final.pkl') Once you have imported the networks, you can call Gs.run() to produce a set of images for given latent vectors, or Gs.get_output_for() to include the generator network in a larger TensorFlow expression. For further details, please consult the example script found on Google Drive. Instructions: 1. Pull the Progressive GAN code repository and add it to your PYTHONPATH environment variable. 2. Install the required Python packages with pip install r requirements pip.txt 2. Download import_example.py from networks/tensorflow version/example_import_script 3. Download karras2018iclr celebahq 1024x1024.pkl from networks/tensorflow version and place it in the same directory as the script. 5. Run the script with python import_example.py 6. If everything goes well, the script should generate 10 PNG images ( img0.png – img9.png ) that match the ones found in networks/tensorflow version/example_import_script exactly. Preparing datasets for training The Progressive GAN code repository contains a command line tool for recreating bit exact replicas of the datasets that we used in the paper. The tool also provides various utilities for operating on the datasets: usage: dataset_tool.py h ... display Display images in dataset. extract Extract images from dataset. compare Compare two datasets. create_mnist Create dataset for MNIST. create_mnistrgb Create dataset for MNIST RGB. create_cifar10 Create dataset for CIFAR 10. create_cifar100 Create dataset for CIFAR 100. create_svhn Create dataset for SVHN. create_lsun Create dataset for single LSUN category. create_celeba Create dataset for CelebA. create_celebahq Create dataset for CelebA HQ. create_from_images Create dataset from a directory full of images. create_from_hdf5 Create dataset from legacy HDF5 archive. Type dataset_tool.py h for more information. The datasets are represented by directories containing the same image data in several resolutions to enable efficient streaming. There is a separate .tfrecords file for each resolution, and if the dataset contains labels, they are stored in a separate file as well: > python dataset_tool.py create_cifar10 datasets/cifar10 /downloads/cifar10 > ls la datasets/cifar10 drwxr xr x 2 user user 7 Feb 21 10:07 . drwxrwxr x 10 user user 62 Apr 3 15:10 .. rw r r 1 user user 4900000 Feb 19 13:17 cifar10 r02.tfrecords rw r r 1 user user 12350000 Feb 19 13:17 cifar10 r03.tfrecords rw r r 1 user user 41150000 Feb 19 13:17 cifar10 r04.tfrecords rw r r 1 user user 156350000 Feb 19 13:17 cifar10 r05.tfrecords rw r r 1 user user 2000080 Feb 19 13:17 cifar10 rxx.labels The create_ commands take the standard version of a given dataset as input and produce the corresponding .tfrecords files as output. Additionally, the create_celebahq command requires a set of data files representing deltas with respect to the original CelebA dataset. These deltas (27.6GB) can be downloaded from datasets/celeba hq deltas . Note about module versions : Some of the dataset commands require specific versions of Python modules and system libraries (e.g. pillow, libjpeg), and they will give an error if the versions do not match. Please heed the error messages – there is no way to get the commands to work other than installing these specific versions. Training networks Once the necessary datasets are set up, you can proceed to train your own networks. The general procedure is as follows: 1. Edit config.py to specify the dataset and training configuration by uncommenting/editing specific lines. 2. Run the training script with python train.py . 3. The results are written into a newly created subdirectory under config.result_dir 4. Wait several days (or weeks) for the training to converge, and analyze the results. By default, config.py is configured to train a 1024x1024 network for CelebA HQ using a single GPU. This is expected to take about two weeks even on the highest end NVIDIA GPUs. The key to enabling faster training is to employ multiple GPUs and/or go for a lower resolution dataset. To this end, config.py contains several examples for commonly used datasets, as well as a set of configuration presets for multi GPU training. All of the presets are expected to yield roughly the same image quality for CelebA HQ, but their total training time can vary considerably: preset v1 1gpu : Original config that was used to produce the CelebA HQ and LSUN results shown in the paper. Expected to take about 1 month on NVIDIA Tesla V100. preset v2 1gpu : Optimized config that converges considerably faster than the original one. Expected to take about 2 weeks on 1xV100. preset v2 2gpus : Optimized config for 2 GPUs. Takes about 1 week on 2xV100. preset v2 4gpus : Optimized config for 4 GPUs. Takes about 3 days on 4xV100. preset v2 8gpus : Optimized config for 8 GPUs. Takes about 2 days on 8xV100. For reference, the expected output of each configuration preset for CelebA HQ can be found in networks/tensorflow version/example_training_runs Other noteworthy config options: fp16 : Enable FP16 mixed precision training to reduce the training times even further. The actual speedup is heavily dependent on GPU architecture and cuDNN version, and it can be expected to increase considerably in the future. BENCHMARK : Quickly iterate through the resolutions to measure the raw training performance. BENCHMARK0 : Same as BENCHMARK , but only use the highest resolution. syn1024rgb : Synthetic 1024x1024 dataset consisting of just black images. Useful for benchmarking. VERBOSE : Save image and network snapshots very frequently to facilitate debugging. GRAPH and HIST : Include additional data in the TensorBoard report. Analyzing results Training results can be analyzed in several ways: Manual inspection : The training script saves a snapshot of randomly generated images at regular intervals in fakes .png and reports the overall progress in log.txt . TensorBoard : The training script also exports various running statistics in a .tfevents file that can be visualized in TensorBoard with tensorboard logdir . Generating images and videos : At the end of config.py , there are several pre defined configs to launch utility scripts ( generate_ ). For example: Suppose you have an ongoing training run titled 010 pgan celebahq preset v1 1gpu fp32 , and you want to generate a video of random interpolations for the latest snapshot. Uncomment the generate_interpolation_video line in config.py , replace run_id 10 , and run python train.py The script will automatically locate the latest network snapshot and create a new result directory containing a single MP4 file. Quality metrics : Similar to the previous example, config.py also contains pre defined configs to compute various quality metrics (Sliced Wasserstein distance, Fréchet inception distance, etc.) for an existing training run. The metrics are computed for each network snapshot in succession and stored in metric .txt in the original result directory.",Image Generation,Image Generation 2574,Computer Vision,Computer Vision,Computer Vision,"EDSR PyTorch About PyTorch 1.1.0 There have been minor changes with the 1.1.0 update. Now we support PyTorch 1.1.0 by default, and please use the legacy branch if you prefer older version. ! (/figs/main.png) This repository is an official PyTorch implementation of the paper Enhanced Deep Residual Networks for Single Image Super Resolution from CVPRW 2017, 2nd NTIRE . You can find the original code and more information from here . If you find our work useful in your research or publication, please cite our work: 1 Bee Lim, Sanghyun Son, Heewon Kim, Seungjun Nah, and Kyoung Mu Lee, Enhanced Deep Residual Networks for Single Image Super Resolution, 2nd NTIRE: New Trends in Image Restoration and Enhancement workshop and challenge on image super resolution in conjunction with CVPR 2017 . PDF arXiv Slide .pptx) @InProceedings{Lim_2017_CVPR_Workshops, author {Lim, Bee and Son, Sanghyun and Kim, Heewon and Nah, Seungjun and Lee, Kyoung Mu}, title {Enhanced Deep Residual Networks for Single Image Super Resolution}, booktitle {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month {July}, year {2017} } We provide scripts for reproducing all the results from our paper. You can train your own model from scratch, or use pre trained model to enlarge your images. Differences between Torch version Codes are much more compact. (Removed all unnecessary parts.) Models are smaller. (About half.) Slightly better performances. Training and evaluation requires less memory. Python based. Dependencies Python 3.6 PyTorch > 1.0.0 numpy skimage imageio matplotlib tqdm cv2 > 3.xx (Only if you want to use video input/output) Code Clone this repository into any place you want. bash git clone cd EDSR PyTorch Quick start (Demo) You can test our super resolution algorithm with your own images. Place your images in test folder. (like test/ ) We support png and jpeg files. Run the script in src folder. Before you run the demo, please uncomment the appropriate line in demo.sh that you want to execute. bash cd src You are now in /EDSR PyTorch/src sh demo.sh You can find the result images from experiment/test/results folder. Model Scale File name (.pt) Parameters PSNR EDSR 2 EDSR_baseline_x2 1.37 M 34.61 dB EDSR_x2 40.7 M 35.03 dB 3 EDSR_baseline_x3 1.55 M 30.92 dB EDSR_x3 43.7 M 31.26 dB 4 EDSR_baseline_x4 1.52 M 28.95 dB EDSR_x4 43.1 M 29.25 dB MDSR 2 MDSR_baseline 3.23 M 34.63 dB MDSR 7.95 M 34.92 dB 3 MDSR_baseline 30.94 dB MDSR 31.22 dB 4 MDSR_baseline 28.97 dB MDSR 29.24 dB Baseline models are in experiment/model . Please download our final models from here (542MB) We measured PSNR using DIV2K 0801 0900, RGB channels, without self ensemble. (scale + 2) pixels from the image boundary are ignored. You can evaluate your models with widely used benchmark datasets: Set5 Bevilacqua et al. BMVC 2012 , Set14 Zeyde et al. LNCS 2010 , B100 Martin et al. ICCV 2001 , Urban100 Huang et al. CVPR 2015 . For these datasets, we first convert the result images to YCbCr color space and evaluate PSNR on the Y channel only. You can download benchmark datasets (250MB). Set dir_data to evaluate the EDSR and MDSR with the benchmarks. You can download some results from here . The link contains EDSR+_baseline_x4 and EDSR+_x4 . Otherwise, you can easily generate result images with demo.sh scripts. How to train EDSR and MDSR We used DIV2K dataset to train our model. Please download it from here (7.1GB). Unpack the tar file to any place you want. Then, change the dir_data argument in src/option.py to the place where DIV2K images are located. We recommend you to pre process the images before training. This step will decode all png files and save them as binaries. Use ext sep_reset argument on your first run. You can skip the decoding part and use saved binaries with ext sep argument. If you have enough RAM (> 32GB), you can use ext bin argument to pack all DIV2K images in one binary file. You can train EDSR and MDSR by yourself. All scripts are provided in the src/demo.sh . Note that EDSR (x3, x4) requires pre trained EDSR (x2). You can ignore this constraint by removing pre_train argument. bash cd src You are now in /EDSR PyTorch/src sh demo.sh Update log Jan 04, 2018 Many parts are re written. You cannot use previous scripts and models directly. Pre trained MDSR is temporarily disabled. Training details are included. Jan 09, 2018 Missing files are included ( src/data/MyImage.py ). Some links are fixed. Jan 16, 2018 Memory efficient forward function is implemented. Add chop_forward argument to your script to enable it. Basically, this function first split a large image to small patches. Those images are merged after super resolution. I checked this function with 12GB memory, 4000 x 2000 input image in scale 4. (Therefore, the output will be 16000 x 8000.) Feb 21, 2018 Fixed the problem when loading pre trained multi gpu model. Added pre trained scale 2 baseline model. This code now only saves the best performing model by default. For MDSR, 'the best' can be ambiguous. Use save_models argument to save all the intermediate models. PyTorch 0.3.1 changed their implementation of DataLoader function. Therefore, I also changed my implementation of MSDataLoader. You can find it on feature/dataloader branch. Feb 23, 2018 Now PyTorch 0.3.1 is default. Use legacy/0.3.0 branch if you use the old version. With a new src/data/DIV2K.py code, one can easily create new data class for super resolution. New binary data pack. (Please remove the DIV2K_decoded folder from your dataset if you have.) With ext bin , this code will automatically generates and saves the binary data pack that corresponds to previous DIV2K_decoded . (This requires huge RAM (45GB, Swap can be used.), so please be careful.) If you cannot make the binary pack, just use the default setting ( ext img ). Fixed a bug that PSNR in the log and PSNR calculated from the saved images does not match. Now saved images have better quality! (PSNR is 0.1dB higher than the original code.) Added performance comparison between Torch7 model and PyTorch models. Mar 5, 2018 All baseline models are uploaded. Now supports half precision at test time. Use precision half to enable it. This does not degrade the output images. Mar 11, 2018 Fixed some typos in the code and script. Now ext img is default setting. Although we recommend you to use ext bin when training, please use ext img when you use test_only. Skip_batch operation is implemented. Use skip_threshold argument to skip the batch that you want to ignore. Although this function is not exactly same with that of Torch7 version, it will work as you expected. Mar 20, 2018 Use ext sep_reset to pre decode large png files. Those decoded files will be saved to the same directory with DIV2K png files. After the first run, you can use ext sep to save time. Now supports various benchmark datasets. For example, try data_test Set5 to test your model on the Set5 images. Changed the behavior of skip_batch. Mar 29, 2018 We now provide all models from our paper. We also provide MDSR_baseline_jpeg model that suppresses JPEG artifacts in original low resolution image. Please use it if you have any trouble. MyImage dataset is changed to Demo dataset. Also, it works more efficient than before. Some codes and script are re written. Apr 9, 2018 VGG and Adversarial loss is implemented based on SRGAN . WGAN and gradient penalty are also implemented, but they are not tested yet. Many codes are refactored. If there exists a bug, please report it. D DBPN is implemented. Default setting is D DBPN L. Apr 26, 2018 Compatible with PyTorch 0.4.0 Please use the legacy/0.3.1 branch if you are using the old version of PyTorch. Minor bug fixes July 22, 2018 Thanks for recent commits that contains RDN and RCAN. Please see code/demo.sh to train/test those models. Now the dataloader is much stable than the previous version. Please erase DIV2K/bin folder that is created before this commit. Also, please avoid to use ext bin argument. Our code will automatically pre decode png images before training. If you do not have enough spaces(10GB) in your disk, we recommend ext img (But SLOW!). Oct 18, 2018 with pre_train download , pretrained models will be automatically downloaded from server. Supports video input/output (inference only). Try with data_test video dir_demo video file directory .",Image Generation,Image Generation 2587,Computer Vision,Computer Vision,Computer Vision,"Generative Adversarial Interpolative Autoencoding (GAIA) Authors: Tim Sainburg, Marvin Thielk, Tim Gentner (UCSD) The Generative Adversarial Interpolative Autoencoder (GAIA; Paper ; Blog post ) is novel hybrid between the Generative Adversarial Network (GAN) and the Autoencoder (AE). The purpose of GAIA is to address three issues which exist in GANs and AEs: 1. GANs are not bidirectional 2. Autoencoders produce blurry images 3. Autoencoder latent spaces are not convex ! Morph Image (images/celeb morph.png) Instructions 1. Download the GAIA dataset with the notebook ' Create_CELEBA HQ.ipynb ' 1. Download the trained weights ' download_weights.ipynb ' 2. Run the notebook ' GAIA simple example.ipynb ' Note : I'm currently in the process of rewriting this code to be cleaner, include more features, etc. For now, this is just the version of the code used in the Arxiv paper. ! Morph Image (images/celeb attrs.png) References Multimodal Unsupervised Image to Image Translation ( Paper ; Author implementation ; Tensorflow implementation ) BEGAN VAEGAN Progressively Growing GANs ( Paper , Code ) python",Image Generation,Image Generation 2594,Computer Vision,Computer Vision,Computer Vision,"Progressive Growing of GANs for Improved Quality, Stability, and Variation – Official TensorFlow implementation of the ICLR 2018 paper Tero Karras (NVIDIA), Timo Aila (NVIDIA), Samuli Laine (NVIDIA), Jaakko Lehtinen (NVIDIA and Aalto University) For business inquiries, please contact researchinquiries@nvidia.com (mailto:researchinquiries@nvidia.com) For press and other inquiries, please contact Hector Marinez at hmarinez@nvidia.com (mailto:hmarinez@nvidia.com) ! Representative image Picture: Two imaginary celebrities that were dreamed up by a random number generator. Abstract: We describe a new training methodology for generative adversarial networks. The key idea is to grow both the generator and discriminator progressively: starting from a low resolution, we add new layers that model increasingly fine details as training progresses. This both speeds the training up and greatly stabilizes it, allowing us to produce images of unprecedented quality, e.g., CelebA images at 1024². We also propose a simple way to increase the variation in generated images, and achieve a record inception score of 8.80 in unsupervised CIFAR10. Additionally, we describe several implementation details that are important for discouraging unhealthy competition between the generator and discriminator. Finally, we suggest a new metric for evaluating GAN results, both in terms of image quality and variation. As an additional contribution, we construct a higher quality version of the CelebA dataset. Resources Paper (NVIDIA research) Paper (arXiv) Result video (YouTube) Additional material (Google Drive) ICLR 2018 poster ( karras2018iclr poster.pdf ) ICLR 2018 slides ( karras2018iclr slides.pptx ) Representative images ( images/representative images ) High quality video clips ( videos/high quality video clips ) Huge collection of non curated images for each dataset ( images/100k generated images ) Extensive video of random interpolations for each dataset ( videos/one hour of random interpolations ) Pre trained networks ( networks/tensorflow version ) Minimal example script for importing the pre trained networks ( networks/tensorflow version/example_import_script ) Data files needed to reconstruct the CelebA HQ dataset ( datasets/celeba hq deltas ) Example training logs and progress snapshots ( networks/tensorflow version/example_training_runs ) All the material, including source code, is made freely available for non commercial use under the Creative Commons CC BY NC 4.0 license. Feel free to use any of the material in your own work, as long as you give us appropriate credit by mentioning the title and author list of our paper. Versions There are two different versions of the source code. The TensorFlow version is newer and more polished, and we generally recommend it as a starting point if you are looking to experiment with our technique, build upon it, or apply it to novel datasets. The original Theano version , on the other hand, is what we used to produce all the results shown in our paper. We recommend using it if – and only if – you are looking to reproduce our exact results for benchmark datasets like CIFAR 10, MNIST RGB, and CelebA. The main differences are summarized in the following table: Feature TensorFlow version Original Theano version : : : : : Branch master (this branch) original theano version Multi GPU support Yes No FP16 mixed precision support Yes No Performance High Low Training time for CelebA HQ 2 days (8 GPUs) 2 weeks (1 GPU) 1–2 months Repro CelebA HQ results Yes – very close Yes – identical Repro LSUN results Yes – very close Yes – identical Repro CIFAR 10 results No Yes – identical Repro MNIST mode recovery No Yes – identical Repro ablation study (Table 1) No Yes – identical Dataset format TFRecords HDF5 Backwards compatibility Can import networks trained with Theano N/A Code quality Reasonable Somewhat messy Code status In active use No longer maintained System requirements Both Linux and Windows are supported, but we strongly recommend Linux for performance and compatibility reasons. 64 bit Python 3.6 installation with numpy 1.13.3 or newer. We recommend Anaconda3. One or more high end NVIDIA Pascal or Volta GPUs with 16GB of DRAM. We recommend NVIDIA DGX 1 with 8 Tesla V100 GPUs. NVIDIA driver 391.25 or newer, CUDA toolkit 9.0 or newer, cuDNN 7.1.2 or newer. Additional Python packages listed in requirements pip.txt Importing and using pre trained networks All pre trained networks found on Google Drive, as well as ones produced by the training script, are stored as Python PKL files. They can be imported using the standard pickle mechanism as long as two conditions are met: (1) The directory containing the Progressive GAN code repository must be included in the PYTHONPATH environment variable, and (2) a tf.Session() object must have been created beforehand and set as default. Each PKL file contains 3 instances of tfutil.Network : Import official CelebA HQ networks. with open('karras2018iclr celebahq 1024x1024.pkl', 'rb') as file: G, D, Gs pickle.load(file) G Instantaneous snapshot of the generator, mainly useful for resuming a previous training run. D Instantaneous snapshot of the discriminator, mainly useful for resuming a previous training run. Gs Long term average of the generator, yielding higher quality results than the instantaneous snapshot. It is also possible to import networks that were produced using the Theano implementation, as long as they do not employ any features that are not natively supported by the TensorFlow version (minibatch discrimination, batch normalization, etc.). To enable Theano network import, however, you must use misc.load_pkl() in place of pickle.load() : Import Theano versions of the official CelebA HQ networks. import misc G, D, Gs misc.load_pkl('200 celebahq 1024x1024/network final.pkl') Once you have imported the networks, you can call Gs.run() to produce a set of images for given latent vectors, or Gs.get_output_for() to include the generator network in a larger TensorFlow expression. For further details, please consult the example script found on Google Drive. Instructions: 1. Pull the Progressive GAN code repository and add it to your PYTHONPATH environment variable. 2. Install the required Python packages with pip install r requirements pip.txt 2. Download import_example.py from networks/tensorflow version/example_import_script 3. Download karras2018iclr celebahq 1024x1024.pkl from networks/tensorflow version and place it in the same directory as the script. 5. Run the script with python import_example.py 6. If everything goes well, the script should generate 10 PNG images ( img0.png – img9.png ) that match the ones found in networks/tensorflow version/example_import_script exactly. Preparing datasets for training The Progressive GAN code repository contains a command line tool for recreating bit exact replicas of the datasets that we used in the paper. The tool also provides various utilities for operating on the datasets: usage: dataset_tool.py h ... display Display images in dataset. extract Extract images from dataset. compare Compare two datasets. create_mnist Create dataset for MNIST. create_mnistrgb Create dataset for MNIST RGB. create_cifar10 Create dataset for CIFAR 10. create_cifar100 Create dataset for CIFAR 100. create_svhn Create dataset for SVHN. create_lsun Create dataset for single LSUN category. create_celeba Create dataset for CelebA. create_celebahq Create dataset for CelebA HQ. create_from_images Create dataset from a directory full of images. create_from_hdf5 Create dataset from legacy HDF5 archive. Type dataset_tool.py h for more information. The datasets are represented by directories containing the same image data in several resolutions to enable efficient streaming. There is a separate .tfrecords file for each resolution, and if the dataset contains labels, they are stored in a separate file as well: > python dataset_tool.py create_cifar10 datasets/cifar10 /downloads/cifar10 > ls la datasets/cifar10 drwxr xr x 2 user user 7 Feb 21 10:07 . drwxrwxr x 10 user user 62 Apr 3 15:10 .. rw r r 1 user user 4900000 Feb 19 13:17 cifar10 r02.tfrecords rw r r 1 user user 12350000 Feb 19 13:17 cifar10 r03.tfrecords rw r r 1 user user 41150000 Feb 19 13:17 cifar10 r04.tfrecords rw r r 1 user user 156350000 Feb 19 13:17 cifar10 r05.tfrecords rw r r 1 user user 2000080 Feb 19 13:17 cifar10 rxx.labels The create_ commands take the standard version of a given dataset as input and produce the corresponding .tfrecords files as output. Additionally, the create_celebahq command requires a set of data files representing deltas with respect to the original CelebA dataset. These deltas (27.6GB) can be downloaded from datasets/celeba hq deltas . Note about module versions : Some of the dataset commands require specific versions of Python modules and system libraries (e.g. pillow, libjpeg), and they will give an error if the versions do not match. Please heed the error messages – there is no way to get the commands to work other than installing these specific versions. Training networks Once the necessary datasets are set up, you can proceed to train your own networks. The general procedure is as follows: 1. Edit config.py to specify the dataset and training configuration by uncommenting/editing specific lines. 2. Run the training script with python train.py . 3. The results are written into a newly created subdirectory under config.result_dir 4. Wait several days (or weeks) for the training to converge, and analyze the results. By default, config.py is configured to train a 1024x1024 network for CelebA HQ using a single GPU. This is expected to take about two weeks even on the highest end NVIDIA GPUs. The key to enabling faster training is to employ multiple GPUs and/or go for a lower resolution dataset. To this end, config.py contains several examples for commonly used datasets, as well as a set of configuration presets for multi GPU training. All of the presets are expected to yield roughly the same image quality for CelebA HQ, but their total training time can vary considerably: preset v1 1gpu : Original config that was used to produce the CelebA HQ and LSUN results shown in the paper. Expected to take about 1 month on NVIDIA Tesla V100. preset v2 1gpu : Optimized config that converges considerably faster than the original one. Expected to take about 2 weeks on 1xV100. preset v2 2gpus : Optimized config for 2 GPUs. Takes about 1 week on 2xV100. preset v2 4gpus : Optimized config for 4 GPUs. Takes about 3 days on 4xV100. preset v2 8gpus : Optimized config for 8 GPUs. Takes about 2 days on 8xV100. For reference, the expected output of each configuration preset for CelebA HQ can be found in networks/tensorflow version/example_training_runs Other noteworthy config options: fp16 : Enable FP16 mixed precision training to reduce the training times even further. The actual speedup is heavily dependent on GPU architecture and cuDNN version, and it can be expected to increase considerably in the future. BENCHMARK : Quickly iterate through the resolutions to measure the raw training performance. BENCHMARK0 : Same as BENCHMARK , but only use the highest resolution. syn1024rgb : Synthetic 1024x1024 dataset consisting of just black images. Useful for benchmarking. VERBOSE : Save image and network snapshots very frequently to facilitate debugging. GRAPH and HIST : Include additional data in the TensorBoard report. Analyzing results Training results can be analyzed in several ways: Manual inspection : The training script saves a snapshot of randomly generated images at regular intervals in fakes .png and reports the overall progress in log.txt . TensorBoard : The training script also exports various running statistics in a .tfevents file that can be visualized in TensorBoard with tensorboard logdir . Generating images and videos : At the end of config.py , there are several pre defined configs to launch utility scripts ( generate_ ). For example: Suppose you have an ongoing training run titled 010 pgan celebahq preset v1 1gpu fp32 , and you want to generate a video of random interpolations for the latest snapshot. Uncomment the generate_interpolation_video line in config.py , replace run_id 10 , and run python train.py The script will automatically locate the latest network snapshot and create a new result directory containing a single MP4 file. Quality metrics : Similar to the previous example, config.py also contains pre defined configs to compute various quality metrics (Sliced Wasserstein distance, Fréchet inception distance, etc.) for an existing training run. The metrics are computed for each network snapshot in succession and stored in metric .txt in the original result directory.",Image Generation,Image Generation 2603,Computer Vision,Computer Vision,Computer Vision,"DeOldify Get more updates on Twitter Simply put, the mission of this project is to colorize and restore old images. I'll get into the details in a bit, but first let's get to the pictures! BTW – most of these source images originally came from the TheWayWeWere subreddit, so credit to them for finding such great photos. Some of many results These are pretty typical! Maria Anderson as the Fairy Fleur de farine and Lyubov Rabtsova as her page in the ballet “Sleeping Beauty” at the Imperial Theater, St. Petersburg, Russia, 1890. ! Ballerinas (result_images/Ballerinas.jpg) Woman relaxing in her livingroom (1920, Sweden) ! SwedenLivingRoom (result_images/SweedishLivingRoom1920.jpg) Medical Students pose with a cadaver around 1890 ! MedStudents (result_images/MedStudentsCards.jpg) Surfer in Hawaii, 1890 ! 1890Surfer (result_images/1890Surfer.jpg) Whirling Horse, 1898 ! WhirlingHorse (result_images/WhirlingHorse.jpg) Interior of Miller and Shoemaker Soda Fountain, 1899 ! SodaFountain (result_images/SodaShop.jpg) Paris in the 1880s ! Paris1880s (result_images/Paris1880s.jpg) Edinburgh from the sky in the 1920s ! Edinburgh (result_images/FlyingOverEdinburgh.jpg) Texas Woman in 1938 ! TexasWoman (result_images/TexasWoman.jpg) People watching a television set for the first time at Waterloo station, London, 1936 ! Television (result_images/FirstTV1930s.jpg) Geography Lessons in 1850 ! Geography (result_images/GeographyLessons.jpg) Chinese Opium Smokers in 1880 ! OpiumReal (result_images/ChineseOpium1880s.jpg) Note that even really old and/or poor quality photos will still turn out looking pretty cool: Deadwood, South Dakota, 1877 ! Deadwood (result_images/OldWest.jpg) Siblings in 1877 ! Deadwood (result_images/Olds1875.jpg) Portsmouth Square in San Franscisco, 1851 ! PortsmouthSquare (result_images/SanFran1850sRetry.jpg) Samurais, circa 1860s ! Samurais (result_images/Samurais.jpg) Granted, the model isn't always perfect. This one's red hand drives me nuts because it's otherwise fantastic: Seneca Native in 1908 ! Samurais (result_images/SenecaNative1908.jpg) It can also colorize b&w line drawings: ! OpiumDrawing (result_images/OpiumSmokersDrawing.jpg) The Technical Details This is a deep learning based model. More specifically, what I've done is combined the following approaches: Self Attention Generative Adversarial Network . Except the generator is a pretrained U Net , and I've just modified it to have the spectral normalization and self attention. It's a pretty straightforward translation. I'll tell you what though – it made all the difference when I switched to this after trying desperately to get a Wasserstein GAN version to work. I liked the theory of Wasserstein GANs but it just didn't pan out in practice. But I'm in love with Self Attention GANs. Training structure inspired by (but not the same as) Progressive Growing of GANs . The difference here is the number of layers remains constant – I just changed the size of the input progressively and adjusted learning rates to make sure that the transitions between sizes happened successfully. It seems to have the same basic end result – training is faster, more stable, and generalizes better. Two Time Scale Update Rule . This is also very straightforward – it's just one to one generator/critic iterations and higher critic learning rate. Generator Loss is two parts: One is a basic Perceptual Loss (or Feature Loss) based on VGG16 – this just biases the generator model to replicate the input image. The second is the loss score from the critic. For the curious – Perceptual Loss isn't sufficient by itself to produce good results. It tends to just encourage a bunch of brown/green/blue – you know, cheating to the test, basically, which neural networks are really good at doing! Key thing to realize here is that GANs essentially are learning the loss function for you – which is really one big step closer to toward the ideal that we're shooting for in machine learning. And of course you generally get much better results when you get the machine to learn something you were previously hand coding. That's certainly the case here. The beauty of this model is that it should be generally useful for all sorts of image modification, and it should do it quite well. What you're seeing above are the results of the colorization model, but that's just one component in a pipeline that I'm looking to develop here with the exact same model. What I develop next with this model will be based on trying to solve the problem of making these old images look great, so the next item on the agenda for me is the defade model. I've committed initial efforts on that and it's in the early stages of training as I write this. Basically it's just training the same model to reconstruct images that augmented with ridiculous contrast/brightness adjustments, as a simulation of fading photos and photos taken with old/bad equipment. I've already seen some promising results on that as well: ! DeloresTwoChanges (result_images/DeloresTwoChanges.jpg) This Project, Going Forward So that's the gist of this project – I'm looking to make old photos look reeeeaaally good with GANs, and more importantly, make the project useful . And yes, I'm definitely interested in doing video, but first I need to sort out how to get this model under control with memory (it's a beast). It'd be nice if the models didn't take two to three days to train on a 1080TI as well (typical of GANs, unfortunately). In the meantime though this is going to be my baby and I'll be actively updating and improving the code over the foreseeable future. I'll try to make this as user friendly as possible, but I'm sure there's going to be hiccups along the way. Oh and I swear I'll document the code properly...eventually. Admittedly I'm one of those people who believes in self documenting code (LOL). Getting Started Yourself The easiest way to get started is to simply try out colorization here on Colab: This was contributed by Matt Robinson, and it's simply awesome. Hardware and Operating System Requirements (Training Only) BEEFY Graphics card . I'd really like to have more memory than the 11 GB in my GeForce 1080TI (11GB). You'll have a tough time with less. The Unet and Critic are ridiculously large but honestly I just kept getting better results the bigger I made them. (Colorization Alone) A decent graphics card . You'll benefit from having more memory in a graphics card in terms of the quality of the output achievable by. Now what the term decent means exactly...I'm going to say 6GB +. I haven't tried it but in my head the math works.... Linux (or maybe Windows 10) I'm using Ubuntu 16.04, but nothing about this precludes Windows 10 support as far as I know. I just haven't tested it and am not going to make it a priority for now. Easy Install You should now be able to do a simple install with Anaconda. Here are the steps: Open the command line and navigate to the root folder you wish to install. Then type the following commands console git clone DeOldify cd DeOldify conda env create f environment.yml Then start running with these commands: console source activate deoldify jupyter lab From there you can start running the notebooks in Jupyter Lab, via the url they provide you in the console. Disclaimer : This conda install process is new I did test it locally but the classic developer's excuse is well it works on my machine! I'm keeping that in mind there's a good chance it doesn't necessarily work on others's machines! I probably, most definitely did something wrong here. Definitely, in fact. Please let me know via opening an issue. Pobody's nerfect. More Details for Those So Inclined This project is built around the wonderful Fast.AI library. Unfortunately, it's the old version and I have yet to upgrade it to the new version. (That's definitely update 11/18/2018: maybe on the agenda.) So prereqs, in summary: Old Fast.AI library (version 0.7) UPDATE 11/18/2018 A forked version is now bundled with the project, for ease of deployment and independence from whatever happens to the old version from here on out. Python 3.6 Pytorch 0.4.1 (needs spectral_norm, so latest stable release is needed). Jupyter Lab conda install c conda forge jupyterlab Tensorboard (i.e. install Tensorflow) and TensorboardX . I guess you don't have to but man, life is so much better with it. And I've conveniently provided hooks/callbacks to automatically write all kinds of stuff to tensorboard for you already! The notebooks have examples of these being instantiated (or commented out since I didn't really need the ones doing histograms of the model weights). Notably, progress images will be written to Tensorboard every 200 iterations by default, so you get a constant and convenient look at what the model is doing. conda install c anaconda tensorflow gpu ImageNet – Only if training of course. It proved to be a great dataset. Pretrained Weights To start right away with your own images without training the model yourself, download the weights here (right click and download from this link). Then open the ColorizeVisualization.ipynb (ColorizeVisualization.ipynb) in Jupyter Lab. Make sure that there's this sort of line in the notebook referencing the weights: python colorizer_path IMAGENET.parent/('colorize_gen_192.h5') Then you simply pass it to this (all this should be in the notebooks already): python filters Colorizer(gpu 0, weights_path colorizer_path) Which then feed into this: python vis ModelImageVisualizer(filters, render_factor render_factor, results_dir 'result_images') Colorizing Your Own Photos Just drop whatever images in the /test_images/ folder you want to run this against and you can visualize the results inside the notebook with lines like this: python vis.plot_transformed_image( test_images/derp.jpg ) The result images will automatically go into that result_dir defined above, in addition to being displayed in Jupyter. There's a render_factor variable that basically determines the quality of the rendered colors (but not the resolution of the output image). The higher it is, the better, but you'll also need more GPU memory to accomodate this. The max I've been able to have my GeForce 1080TI use is 42. Lower the number if you get a CUDA_OUT_OF_MEMORY error. You can customize this render_factor per image like this, overriding the default: python vis.plot_transformed_image( test_images/Chief.jpg , render_factor 17) For older and low quality images in particular, this seems to improve the colorization pretty reliably. In contrast, more detailed and higher quality images tend to do better with a higher render_factor. Additional Things to Know Model weight saves are also done automatically during the training runs by the GANTrainer – defaulting to saving every 1000 iterations (it's an expensive operation). They're stored in the root training data folder you provide, and the name goes by the save_base_name you provide to the training schedule. Weights are saved for each training size separately. I'd recommend navigating the code top down – the Jupyter notebooks are the place to start. I treat them just as a convenient interface to prototype and visualize – everything else goes into .py files (and therefore a proper IDE) as soon as I can find a place for them. I already have visualization examples conveniently included – just open the xVisualization notebooks to run these – they point to test images already included in the project so you can start right away (in test_images). The GAN Schedules you'll see in the notebooks are probably the ugliest looking thing I've put in the code, but they're just my version of implementing progressive GAN training, suited to a Unet generator. That's all that's going on there really. Pretrained weights for the colorizer generator again are here (right click and download from this link). The DeFade stuff is still a work in progress so I'll try to get good weights for those up in a few days. Generally with training, you'll start seeing good results when you get midway through size 192px (assuming you're following the progressive training examples I laid out in the notebooks). Note that this training regime is still a work in progress I'm stil trying to figure out what exactly is optimal. In other words, there's a good chance you'll find something to improve upon there. I'm sure I screwed up something putting this up, so please let me know if that's the case. Known Issues Getting the best images really boils down to the art of selection . You'll mostly get good results the first go, but playing around with the render_factor a bit may make a difference. Thus, I'd consider this tool at this point fit for the AI artist but not something I'd deploy as a general purpose tool for all consumers. It's just not there yet. The model loves blue clothing. Not quite sure what the answer is yet, but I'll be on the lookout for a solution! Want More? I'll be posting more results on Twitter. UPDATE 11/15/2018 I just put up a bunch of significant improvements! I'll just repeat what I put in Twitter, here: So first, this image should really help visualize what is going on under the hood. Notice the smallified square image in the center. ! BeforeAfterChief (result_images/BeforeAfterChief.jpg) Squarification That small square center image is what the deep learning generator actually generates now. Before I was just shrinking the images keeping the same aspect ratio. It turns out, the model does better with squares even if they're distorted in the process! Note that I tried other things like keeping the core image's aspect ratio the same and doing various types of padding to make a square (reflect, symmetric, 0, etc). None of this worked as well. Two reasons why I think this works. One model was trained on squares; Two at smaller resolutions I think this is particularly significant you're giving the model more real image to work with if you just stretch it as opposed to padding. And padding wasn't something the model trained on anyway. Chrominance Optimization It turns out that the human eye doesn't perceive color (chrominance) with nearly as much sensitivity as it does intensity (luminance). Hence, we can render the color part at much lower resolution compared to the desired target res. Before, I was having the model render the image at the same size as the end result image that you saw. So you maxed out around 550px (maybe) because the GPU couldn't handle anymore. Now? Colors can be rendered at say a tiny 272x272 (as the image above), then the color part of the model output is simply resized and stretched to map over the much higher resolution original images's luminance portion (we already have that!). So the end result looks fantastic, because your eyes can't tell the difference with the color anyway! Graceful Rendering Degradation With the above, we're now able to generate much more consistently good looking images, even at different color gpu rendering sizes. Basically, you do generally get a better image if you have the model take up more memory with a bigger render. BUT if you reduce that memory footprint even in half with having the model render a smaller image, the difference in image quality of the end result is often pretty negligible. This effectively means the colorization is usable on a wide variety of machines now! i.e. You don't need a GeForce 1080TI to do it anymore. You can get by with much less. Consistent Rendering Quality Finally With the above, I was finally able to narrow down a scheme to make it so that the hunt to find the best version of what the model can render is a lot less tedious. Basically, it amounts to providing a render_factor (int) by the user and multiplying it by a base size multiplier of 16. This, combined with the square rendering, plays well together. It means that you get predictable behavior of rendering as you increase and decrease render_factor, without too many surprise glitches. Increase render_factor: Get more details right. Decrease: Still looks good but might miss some details. Simple! So you're no longer going to deal with a clumsy sz factor. Bonus: The memory usage is consistent and predictable so you just have to figure out the render_factor that works for your gpu once and forget about it. I'll probably try to make that render_factor determination automatic eventually but this should be a big improvement in the meantime. P.S You're not losing any image anymore with padding issues. That's solved as a byproduct. Also Also I added a new generic filter interface that replaces the visualizer dealing with models directly. The visualizer loops through these filters that you provide as a list. They don't have to be backed by deep learning models they can be any image modification you want!",Image Generation,Image Generation 2680,Computer Vision,Computer Vision,Computer Vision,"Torch implementation of DRAW: A Recurrent Neural Network For Image Generation Watch Deep Learning Lecture 14: Karol Gregor on Variational Autoencoders and Image Generation Run th draw_attention.lua in Terminal.app, it generates x_prediction , which you can plot by running plot_results .lua in zbs torch with QLua LuaJit interpreter selected from 'Project' tab. Adjust the running time of the script by changing: 1. n_data (the number of MNIST examples to train on) 2. number of iterations 3. n_z, dimension of the hidden layer z 4. rnn_size, dimension of h_dec and h_enc draw_attention.lua works with 28x28 MNIST dataset. You can adjust it to other datasets by changing A, N and replacing number '28' everywhere in the script. I haven't done it but it is possible. draw_no_attention .lua scripts implement DRAW without attention. In draw_attention_read.lua only read is attentive, while write is without attention. draw_no_attention .lua scripts print arrays in the end, which helps to quickly estimate the quality of the results without plotting Example output by plot_results.lua ! th visualize_word_vectors.lua Example output by plot_results_no_binarization.lua ! th visualize_word_vectors.lua",Image Generation,Image Generation 2684,Computer Vision,Computer Vision,Computer Vision,"Gated PixelCNN A TensorFlow implementation of the gated variant of PixelCNN (Gated PixelCNN) from Conditional Image Generation with PixelCNN Decoders . The Gated PixelCNN matches the log likelihood of PixelRNN on both CIFAR and ImageNet while requiring less than half the training time. Training the Network MNIST (default) python main.py Color MNIST python main.py data color mnist gated_conv_num_layers 7 gated_conv_num_feature_maps 48 output_conv_num_feature_maps 96 q_levels 4 CIFAR 10 python main.py data cifar gated_conv_num_layers 15 gated_conv_num_feature_maps 126 output_conv_num_feature_maps 1020 q_levels 256 Configuration Parameter Quick MNIST MNIST COLOR MNIST CIFAR 10 Description batch_size 100 100 100 100 Size of a batch. gated_conv_num_layers 1 7 7 15 The number of gated conv layers. gated_conv_num_feature_maps 4 16 48 126 (128 in paper) The number of input / output feature maps in gated conv layers. Must be multiple of two, should be multiple of two times num_channels. output_conv_num_feature_maps 16 32 96 1020 (1024 in paper) The number of output feature maps in output conv layers. Must be multiple of two, should be multiple of two times num_channels. q_levels 4 4 4 256 The number of quantization levels in the output. data mnist mnist color mnist cifar Name of dataset. Gated Convolutional Layers ! architecture (./assets/gated_pixel_cnn_architecture.png) ! activation_unit (./assets/gated_activation_unit.png) References PixelCNN++ paper with Code Conditional Image Generation with PixelCNN Decoders Pixel Recurrent Neural Networks Review by Kyle Kastner carpedm20/pixel rnn tensorflow igul222/pixel_rnn kundan2510/pixelCNN",Image Generation,Image Generation 2702,Computer Vision,Computer Vision,Computer Vision,"GAN_Lib_Tensorflow Tensorflow implemention of various GAN. Please refer to corresponding folder for more details. Prerequisite Python 3.5.4 TensorFlow 1.5 Numpy Scipy Quantitative evaluation Method Inception (This repo) Inception (Official) FID Real data 12.0 11.24 3.2 (train vs test) PGGAN 8.80 ± 0.05 ( , Unsupervised) SNGAN 8.43 ± 0.12 (ResNet, Supervised) 8.24 ± 0.08 (ResNet, Unsupervised) ACGAN 7.86 ± 0.09 (ResNet, Supervised) 8.25 ± 0.07 ( , Supervised) Inception scores are calculated by average of 10 evaluation with 5000 samples. TODO FID MS SSIM Generated images SNGAN ! sample ACGAN ! sample Refrence",Image Generation,Image Generation 2707,Computer Vision,Computer Vision,Computer Vision,SAGAN with relativistic A pytorch implmentation of SAGAN with relativistic loss . The main difference here is that we replace the hinge loss used in the SAGAN with relativistic loss. Sample Results Dependancies Python 3.5+ Pytorch 0.4.1 Torchvision 0.2.1 Usage Train bash python main.py View sample results bash cd images Acknowledgement/reference,Image Generation,Image Generation 2723,Computer Vision,Computer Vision,Computer Vision,pytorch Real NVP simple Real NVP code prior : Multi variable Normal Distribution Data : Sklearn moon datasets paper : ! (./resource/exam1.png) Layer ! (./resource/layer.png) Loss ! (./resource/loss.png) Result Inference p(x) > P(z) P(x) ! (./resource/inference1.png) P(z) ! (./resource/inference2.png) Generate P(z) > P(x) P(z) ! (./resource/generate1.png) P(x) ! (./resource/generate2.png),Image Generation,Image Generation 2790,Computer Vision,Computer Vision,Computer Vision,"Self Attention GAN Han Zhang, Ian Goodfellow, Dimitris Metaxas and Augustus Odena, Self Attention Generative Adversarial Networks. arXiv preprint arXiv:1805.08318 (2018) . Meta overview This repository provides a PyTorch implementation of SAGAN . Both wgan gp and wgan hinge loss are ready, but note that wgan gp is somehow not compatible with the spectral normalization. Remove all the spectral normalization at the model for the adoption of wgan gp. Self attentions are applied to later two layers of both discriminator and generator. Current update status Supervised setting Tensorboard loggings x 20180608 updated the self attention module. Thanks to my colleague Cheonbok Park ! see 'sagan_models.py' for the update. Should be efficient, and run on large sized images x Attention visualization (LSUN Church outdoor) x Unsupervised setting (use no label yet) x Applied: Spectral Normalization , code from here x Implemented: self attention module, two timescale update rule (TTUR), wgan hinge loss, wgan gp loss Results Attention result on LSUN (epoch 8) Per pixel attention result of SAGAN on LSUN church outdoor dataset. It shows that unsupervised training of self attention module still works, although it is not interpretable with the attention map itself. Better results with regard to the generated images will be added. These are the visualization of self attention in generator layer3 and layer4, which are in the size of 16 x 16 and 32 x 32 respectively, each for 64 images. To visualize the per pixel attentions, only a number of pixels are chosen, as shown on the leftmost and the rightmost numbers indicate. CelebA dataset (epoch on the left, still under training) LSUN church outdoor dataset (epoch on the left, still under training) Prerequisites Python 3.5+ PyTorch 0.3.0 Usage 1. Clone the repository bash $ git clone $ cd Self Attention GAN 2. Install datasets (CelebA or LSUN) bash $ bash download.sh CelebA or $ bash download.sh LSUN 3. Train (i) Train bash $ python python main.py batch_size 64 imsize 64 dataset celeb adv_loss hinge version sagan_celeb or $ python python main.py batch_size 64 imsize 64 dataset lsun adv_loss hinge version sagan_lsun 4. Enjoy the results bash $ cd samples/sagan_celeb or $ cd samples/sagan_lsun Samples generated every 100 iterations are located. The rate of sampling could be controlled via sample_step (ex, sample_step 100).",Image Generation,Image Generation 2796,Computer Vision,Computer Vision,Computer Vision,"DRAW This is a reimplementation of a paper called DRAW (Deep Recurrent Attention Writer) in Tensorflow, the original paper can be found at This paper uses a bunch of technologies namely a Variational Autoencoder, a Recurrent Neural Network (lstm cell) and Gaussian Attention. This model can run with or without attention and my code contains both the versions of it.",Image Generation,Image Generation 2798,Computer Vision,Computer Vision,Computer Vision,"Transferring GANs generating images from limited data Abstract: Transferring the knowledge of pretrained networks to new domains by means of finetuning is a widely used practice for applications based on discriminative models. To the best of our knowledge this practice has not been studied within the context of generative deep networks. Therefore, we study domain adaptation applied to image generation with generative adversarial networks. We evaluate several aspects of domain adaptation, including the impact of target domain size, the relative distance between source and target domain, and the initialization of conditional GANs. Our results show that using knowledge from pretrained networks can shorten the convergence time and can significantly improve the quality of the generated images, especially when the target data is limited. We show that these conclusions can also be drawn for conditional GANs even when the pretrained model was trained without conditioning. Our results also suggest that density may be more important than diversity and a dataset with one or few densely sampled classes may be a better source model than more diverse datasets such as ImageNet or Places. Overview Dependences ( dependences) Installation ( installtion) Instructions ( instructions) Results ( results) References ( references) Contact ( contact) Dependences Python2.7, NumPy, SciPy, NVIDIA GPU Tensorflow: the version should be more 1.0 Dataset: lsun bedroom or your dataset Installation Install tensorflow Opencv Instructions Using 'git clone You will get new folder whose name is 'Transferring GANs' in your current path, then use 'cd Transferring GANs' to enter the downloaded new folder Download pretrain models Google driver ; Tencent qcloud Uncompressing downloaded folder to current folder, then you have new folder 'transfer_model' which contains two folders: 'conditional', 'unconditional', each of which has four folders: 'imagenet', 'places', 'celebA', 'bedroom' Download dataset or use your dataset. I have shown one example and you could make it with same same form. Run 'python transfer_gan.py' Runing code with default setting. The pretrained model can be seleted by changing the parameter 'TARGET_DOMAIN' Conditional GAN If you are interested in using conditional model, just setting parameter 'ACGAN True' Results Using pretrained models not only get high performance, but fastly attach convergence. In following figure, we show conditional and unconditional settings. References \ 1\ 'Improved Training of Wasserstein GANs' by Ishaan Gulrajani et. al, code \ 2\ 'GANs Trained by a Two Time Scale Update Rule Converge to a Local Nash Equilibrium' by Martin Heusel et. al, Contact If you run into any problems with this code, please submit a bug report on the Github site of the project. For another inquries pleace contact with me: yaxing@cvc.uab.es",Image Generation,Image Generation 2806,Computer Vision,Computer Vision,Computer Vision,"FID score in PyTorch Requirements: pytorch torchvision Usage To compute the FID score between two datasets and get gradient for the first dataset, where images of each dataset are contained in an individual folder: python ./fid_score.py path/to/dataset1 path/to/dataset2 Example python ./fid_score.py cifar/dev1 cifar/dev2 Using different layers for feature maps In difference to the official implementation, you can choose to use a different feature layer of the Inception network instead of the default pool3 layer. As the lower layer features still have spatial extent, the features are first global average pooled to a vector before estimating mean and covariance. This might be useful if the datasets you want to compare have less than the otherwise required 2048 images. Note that this changes the magnitude of the FID score and you can not compare them against scores calculated on another dimensionality. The resulting scores might also no longer correlate with visual quality. You can select the dimensionality of features to use with the flag dims N , where N is the dimensionality of features. The choices are: 64: first max pooling features 192: second max pooling featurs 768: pre aux classifier features 2048: final average pooling features (this is the default) Disclaimer This implementation is heavily based on this License This implementation is licensed under the Apache License 2.0. FID was introduced by Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler and Sepp Hochreiter in GANs Trained by a Two Time Scale Update Rule Converge to a Local Nash Equilibrium , see The original implementation is by the Institute of Bioinformatics, JKU Linz, licensed under the Apache License 2.0. See",Image Generation,Image Generation 2808,Computer Vision,Computer Vision,Computer Vision,convs Conv techniques in dnn simple implement tf.1.12 1. spectral conv Spectral Normalization for Generative Adversarial Networks tf.1.12 2. coord conv An Intriguing Failing of Convolutional Neural Networks and the CoordConv Solution,Image Generation,Image Generation 2854,Computer Vision,Computer Vision,Computer Vision,"First Order Divergence for training GANs This repository contains code accompanying the paper First Order Generative Advesarial Netoworks The majority of the code was copied from the repository First Order Wasserstein Divergence GAN The key added value of this code is its implementation two GANS that minimize not the KL divergence or the WGAN GP divergence, but the First Order Wasserstein Divergence, leading to better stability and perfomance. Frechet Inception Distance (FID) The FID is the performance measure used to evaluate the experiments in the paper. There, a detailed description can be found in the experiment section as well as in the the appendix in section A1. In short: The Frechet distance between two multivariate Gaussians X_1 N(mu_1, C_1) and X_2 N(mu_2, C_2) is d^2 mu_1 mu_2 ^2 + Tr(C_1 + C_2 2 sqrt(C_1 C_2)). The FID is calculated by assuming that X_1 and X_2 are the activations of the pool_3 layer of the inception model (see below) for generated samples and real world samples respectivly. Compatibility notice Previous versions of this repository contained two implementations to calculate the FID, a unbatched and a batched version. The unbatched version should not be used anymore. If you've downloaded this code previously, please update it immediately to the new version. The old version included a bug! Provided Code Requirements: TF 1.1, Python 3.x, for faster JSD estimation in language model, compile the language model code. fid.py This file contains the implementation of all necessary functions to calculate the FID. It can be used either as a python module imported into your own code, or as a standalone script to calculate the FID between precalculated (training set) statistics and a directory full of images, or between two directories of images. To compare directories with pre calculated statistics (e.g. the ones from use: fid.py /path/to/images /path/to/precalculated_stats.npz To compare two directories, use fid.py /path/to/images /path/to/other_images See fid.py help for more details. fid_example.py Example code to show the usage of fid.py in your own Python scripts. precalc_stats_example.py Example code to show how to calculate and save training set statistics. WGAN_GP Improved WGAN (WGAN GP) implementation forked from with added FID evaluation for the image model and switchable TTUR/orig settings. Lanuage model with JSD Tensorboard logging and switchable TTUR/orig settings. Precalculated Statistics for FID calculation Precalculated statistics for datasets cropped CelebA (calculated on all samples) LSUN bedroom (calculated on all training samples) CIFAR 10 (calculated on all training samples) SVHN (calculated on all training samples) ImageNet Train (calculated on all training samples) ImageNet Valid (calculated on all validation samples) are provided at: Additional Links For FID evaluation download the Inception modelf from The cropped CelebA dataset can be downloaded here To download the LSUN bedroom dataset go to: The 64x64 downsampled ImageNet training and validation datasets can be found here",Image Generation,Image Generation 2875,Computer Vision,Computer Vision,Computer Vision,"Graphical Generative Adversarial Networks (Graphical GAN) Chongxuan Li , Max Welling, Jun Zhu and Bo Zhang Code for reproducing most of the results in the paper . The results of our method is called LOCAL_EP in the code. We also provide implementation of a lot of recent papers, which is of independent interests. The papers including VEGAN , ALI , ALICE . We also try some combination of these methods while the most direct competitor of our method is ALI. Warning: the code is still under development. If you have any problem with the code, please send an email to chongxuanli1991@gmail.com. Any feedback will be appreciated! We thank the authors of wgan gp for providing their code. Our code is widely adapted from their repositories. You may need to download the datasets and save it to the dataset folder except the MNIST case. See details in the corresponding files of the dataset. If you find the code is useful, please cite our paper! @article{li2018graphical, title {Graphical Generative Adversarial Networks}, author {Li, Chongxuan and Welling, Max and Zhu, Jun and Zhang, Bo}, journal {arXiv preprint arXiv:1804.03429}, year {2018} }",Image Generation,Image Generation 2876,Computer Vision,Computer Vision,Computer Vision,"Triple Generative Adversarial Nets (Triple GAN) Chongxuan Li , Kun Xu , Jun Zhu and Bo Zhang Code for reproducing most of the results in the paper . Triple GAN: a unified GAN model for classification and class conditional generation in semi supervised learning. Warning: the code is still under development. Envoronment settings and libs we used in our experiments This project is tested under the following environment setting. OS: Ubuntu 16.04.3 GPU: Geforce 1080 Ti or Titan X(Pascal or Maxwell) Cuda: 8.0, Cudnn: v5.1 or v7.03 Python: 2.7.14(setup with Miniconda2) Theano: 0.9.0.dev c697eeab84e5b8a74908da654b66ec9eca4f1291 Lasagne: 0.2.dev1 Parmesan: 0.1.dev1 > Python > Numpy > Scipy > Theano > Lasagne (version 0.2.dev1) > Parmesan Thank the authors of these libs. We also thank the authors of Improved GAN and Temporal Ensemble for providing their code. Our code is widely adapted from their repositories. Results Triple GAN can achieve excellent classification results on MNIST, SVHN and CIFAR10 datasets, see the paper for a comparison with the previous state of the art. See generated images as follows: Comparing Triple GAN (right) with GAN trained with feature matching (left) Generating images in four specific classes (airplane, automobile, bird, horse) Disentangling styles from classes (left: data, right: Triple GAN) Class conditional linear interpolation on latent space",Image Generation,Image Generation 2890,Computer Vision,Computer Vision,Computer Vision,cegan_iclr2017 Code release for the paper Calibrating Energy based Generative Adversarial Networks,Image Generation,Image Generation 2896,Computer Vision,Computer Vision,Computer Vision,"NNCodesAndProjects Codes for studying and projects from Neural Networks Implementation in python of many popular neural networks, their improvements based on research and a few projects. Each project will contain own readme explaining how it works. Prerequisites All dependencies can be installed by running pip install r requirements.txt Hand drawn graph recognition (Master's thesis) My Master's thesis written in 2017 on Jagiellonian University. Purpose of project is to develop an algorithm that given a photo of a graph (as vertices and edges), parses it and outputs computer friendly representation (list of edges). To achieve that, multiple computer vision techniques are used, such as: blob detection edges and corners detection morphological operations convolutions with different kernels To find out more about running the software and the thesis itself, please refer to readme file in Graph Recognition folder. You can also find the original text of the thesis. Generative Adversarial Networks CIFAR10 Own implementation of GANs for generating CIFAR10 images. python GAN/gan.py NOTE cifar_input.py reads CIFAR10 data, downloaded to a separate folder donwloaded from: Results: Measuring result of GANs is usually subjective. I decided to measure the performance of this network by training new Convolutional NN only on generated images (output from GAN) and testing what how well will such CNN perform on real CIFAR10 dataset. Result is around 65% of accuracy on test set (remember that training and validation set are not touched). Resources: Generative Adversarial Nets Improved Techniques for Training GANs GAN hacks CIFAR 10 classification Convolutional NN is used to classify CIFAR10 dataset. Deep, residual network with additional improvements Adam optimizer, using exponential moving averages, learnt variance and bias when normalizing batch. Code reaches accuracy of 93.5% python CIFAR10/convolutional.py This trains and tests the network. Trained parameters are saved using saver and can be reused. Plenty of state of art improvements to standard CNN are implemented in this project. I found those to be most influential Residual networks, wide vs deep networks, replacing pooling with convolutions with stride 2. Resources: Deep Residual Learning for Image Recognition (Microsoft Research) Striving for simplicity Text generation LSTM Implementation of recurrent neural network long short term memory network. Trained on a very famous Polish book Pan Tadeusz. Tries to predict next letter of the text based on last read letters (how many exactly is a hyper parameter). After such training (after each epoch) new text is generated, which starts with only beginning of one sentence and the network tries to recreate the text. The result are very exciting text has a lot features of Polish language and would be recognised by any Polish speaker as an attempt to write some actual text. python LSTM/lstm2.py Resources:",Image Generation,Image Generation 2904,Computer Vision,Computer Vision,Computer Vision,"PGGAN PyTorch implementation of PROGRESSIVE GROWING OF GANS FOR IMPROVED QUALITY, STABILITY, AND VARIATION arxiv Official TF Project Authors : Jihyeong Yoo , Daewoong Ahn How to use: python3 main.py h usage: main.py h gpus GPUS cpus CPUS save_dir SAVE_DIR img_num IMG_NUM optim_G {adam,sgd} optim_D {adam,sgd} loss {wgangp,lsgan} start_resl START_RESL end_resl END_RESL beta BETA BETA ... momentum MOMENTUM decay DECAY gp_lambda GP_LAMBDA PGGAN optional arguments: h, help show this help message and exit gpus GPUS Select GPU Numbering 0,1,2,3 cpus CPUS The number of CPU workers save_dir SAVE_DIR Directory which models will be saved in img_num IMG_NUM The number of images to be used for each phase optim_G {adam,sgd} optim_D {adam,sgd} loss {wgangp,lsgan} start_resl START_RESL end_resl END_RESL beta BETA BETA ... Beta for Adam optimizer momentum MOMENTUM Momentum for SGD optimizer decay DECAY Weight decay for optimizers gp_lambda GP_LAMBDA Lambda as a weight of Gradient Panelty in WGAN GP loss TODO Evaluation Metric Upload Results Reference:",Image Generation,Image Generation 2048,Playing Games,Playing Games,Other,"Introduction This repo trains a Reinforcement Learning Neural Network so that it's able to play Pong from raw pixel input. I've written up a blog post which walks through the code here and the basic principles of Reinforcement Learning, with Pong as the guiding example. It is largely based on a Gist by Andrej Karpathy , which in turn is based on the Playing Atari with Deep Reinforcement Learning paper by Mnih et al. This script uses the Open AI Gym environments in order to run the Atari emulator and environments, and currently uses no external ML framework & only numpy. The AI Agent Pong in action Prior to training (mostly random actions) ! Prior to training (mostly random actions) After training base repo + learning rate modification ! After training The agent that played this game was trained for 12000 episodes (basically 12000 episodes of 'best of 21' rounds) over a period of 15 hours, on a Macbook Pro 2018 with 2.6GHz i7 (6 cores). The running mean score per episode, over the trailing 100 episodes, at the point I stopped training was 5, i.e. the CPU would win each episode 21 16 on average. Hyperparameters: Default except for learning rate 1e 3 After training base repo + learning rate modification + a bugfix A minor fix was added which crops more of the image vs the base repo, by removing noisy parts of the image where we can safely ignore the ball motion. This boosted the observed performance and speed at which the AI beat the CPU on average (i.e. when the average reward for an episode exceeded 0) Hyperparameters: Default except for learning rate 1e 3 The agent that played this game was trained for 10000 episodes (basically 10000 episodes of 'best of 21' rounds) over a period of 13 hours, on a Macbook Pro 2018 with 2.6GHz i7 (6 cores). The running mean score per episode, over the trailing 100 episodes, at the point I stopped training was 2.5, i.e. the trained AI Agent would win each episode 21 points to 18.5. Training for another 10 hours & another 5000 episodes allowed the trained AI Agent to reach a running mean score per epsisode of 5, i.e. the trained AI Agent would win each episode 21 points to 16. Graph of reward over time first 10000 episodes of training ! Reward over time with bugfix Graph of reward over time 10000 to 15000 episodes of training ! Reward over time after 10000 episodes Modifications vs Source Gist Records output video of the play Modified learning rate from 1e 4 to 1e 3 Comments for clarity Minor fix which crops more of the image vs the base repo Installation Requirements The instructions below are for Mac OS & assume you have Homebrew installed. You'll need to run the code with Python 2.7 I recommend the use of conda to manage python environments Install Open AI Gym brew install gym Install Cmake brew install cmake Install ffmpeg brew install ffmpeg Required for monitoring / videos",Atari Games,Playing Games 2056,Playing Games,Playing Games,Other,"Status: Archive (code is provided as is, no updates expected) Distributed evolution This is a distributed implementation of the algorithm described in Evolution Strategies as a Scalable Alternative to Reinforcement Learning (Tim Salimans, Jonathan Ho, Xi Chen, Ilya Sutskever). The implementation here uses a master worker architecture: at each iteration, the master broadcasts parameters to the workers, and the workers send returns back to the master. The humanoid scaling experiment in the paper was generated with an implementation similar to this one. The code here runs on EC2, so you need an AWS account. It's resilient to worker termination, so it's safe to run the workers on spot instances. Instructions Build AMI The humanoid experiment depends on Mujoco. Provide your own Mujoco license and binary in scripts/dependency.sh . Install Packer , and then build images by running (you can optionally configure scripts/packer.json to choose build instance or AWS regions) cd scripts && packer build packer.json Packer should return you a list of AMI ids, which you should place in AMI_MAP in scripts/launch.py . Launching Use scripts/launch.py along with an experiment JSON file. An example JSON file is provided in the configurations directory. You must fill in all command line arguments to scripts/launch.py .",Atari Games,Playing Games 2063,Playing Games,Playing Games,Other,"This repository has been deprecated in favor of the Retro library. See our Retro Contest blog post for detalis. universe starter agent The codebase implements a starter agent that can solve a number of universe environments. It contains a basic implementation of the A3C algorithm , adapted for real time environments. Dependencies Python 2.7 or 3.5 Golang six (for py2/3 compatibility) TensorFlow 0.12 tmux (the start script opens up a tmux session with multiple windows) htop (shown in one of the tmux windows) gym gym atari libjpeg turbo ( brew install libjpeg turbo ) universe opencv python numpy scipy Getting Started conda create name universe starter agent python 3.5 source activate universe starter agent brew install tmux htop cmake golang libjpeg turbo On Linux use sudo apt get install y tmux htop cmake golang libjpeg dev pip install gym atari pip install universe pip install six pip install tensorflow conda install y c opencv3 conda install y numpy conda install y scipy Add the following to your .bashrc so that you'll have the correct environment when the train.py script spawns new bash shells source activate universe starter agent Atari Pong python train.py num workers 2 env id PongDeterministic v3 log dir /tmp/pong The command above will train an agent on Atari Pong using ALE simulator. It will see two workers that will be learning in parallel ( num workers flag) and will output intermediate results into given directory. The code will launch the following processes: worker 0 a process that runs policy gradient worker 1 a process identical to process 1, that uses different random noise from the environment ps the parameter server, which synchronizes the parameters among the different workers tb a tensorboard process for convenient display of the statistics of learning Once you start the training process, it will create a tmux session with a window for each of these processes. You can connect to them by typing tmux a in the console. Once in the tmux session, you can see all your windows with ctrl b w . To switch to window number 0, type: ctrl b 0 . Look up tmux documentation for more commands. To access TensorBoard to see various monitoring metrics of the agent, open in a browser. Using 16 workers, the agent should be able to solve PongDeterministic v3 (not VNC) within 30 minutes (often less) on an m4.10xlarge instance. Using 32 workers, the agent is able to solve the same environment in 10 minutes on an m4.16xlarge instance. If you run this experiment on a high end MacBook Pro, the above job will take just under 2 hours to solve Pong. Add ' visualise' toggle if you want to visualise the worker using env.render() as follows: python train.py num workers 2 env id PongDeterministic v3 log dir /tmp/pong visualise ! pong For best performance, it is recommended for the number of workers to not exceed available number of CPU cores. You can stop the experiment with tmux kill session command. Playing games over remote desktop The main difference with the previous experiment is that now we are going to play the game through VNC protocol. The VNC environments are hosted on the EC2 cloud and have an interface that's different from a conventional Atari Gym environment; luckily, with the help of several wrappers (which are used within envs.py file) the experience should be similar to the agent as if it was played locally. The problem itself is more difficult because the observations and actions are delayed due to the latency induced by the network. More interestingly, you can also peek at what the agent is doing with a VNCViewer. Note that the default behavior of train.py is to start the remotes on a local machine. Take a look at for documentation on managing your remotes. Pass additional r flag to point to pre existing instances. VNC Pong python train.py num workers 2 env id gym core.PongDeterministic v3 log dir /tmp/vncpong _Peeking into the agent's environment with TurboVNC_ You can use your system viewer as open vnc://localhost:5900 (or open vnc://${docker_ip}:5900 ) or connect TurboVNC to that ip/port. VNC password is openai . ! pong Important caveats One of the novel challenges in using Universe environments is that they operate in real time , and in addition, it takes time for the environment to transmit the observation to the agent. This time creates a lag: where the greater the lag, the harder it is to solve environment with today's RL algorithms. Thus, to get the best possible results it is necessary to reduce the lag, which can be achieved by having both the environments and the agent live on the same high speed computer network. So for example, if you have a fast local network, you could host the environments on one set of machines, and the agent on another machine that can speak to the environments with low latency. Alternatively, you can run the environments and the agent on the same EC2/Azure region. Other configurations tend to have greater lag. To keep track of your lag, look for the phrase reaction_time in stderr. If you run both the agent and the environment on nearby machines on the cloud, your reaction_time should be as low as 40ms. The reaction_time statistic is printed to stderr because we wrap our environment with the Logger wrapper, as done in here ( ). Generally speaking, environments that are most affected by lag are games that place a lot of emphasis on reaction time. For example, this agent is able to solve VNC Pong ( gym core.PongDeterministic v3 ) in under 2 hours when both the agent and the environment are co located on the cloud, but this agent had difficulty solving VNC Pong when the environment was on the cloud while the agent was not. This issue affects environments that place great emphasis on reaction time. A note on tuning This implementation has been tuned to do well on VNC Pong, and we do not guarantee its performance on other tasks. It is meant as a starting point. Playing flash games You may run the following command to launch the agent on the game Neon Race: python train.py num workers 2 env id flashgames.NeonRace v0 log dir /tmp/neonrace _What agent sees when playing Neon Race_ (you can connect to this view via note ( vnc pong) above) ! neon Getting 80% of the maximal score takes between 1 and 2 hours with 16 workers, and getting to 100% of the score takes about 12 hours. Also, flash games are run at 5fps by default, so it should be possible to productively use 16 workers on a machine with 8 (and possibly even 4) cores. Next steps Now that you have seen an example agent, develop agents of your own. We hope that you will find doing so to be an exciting and an enjoyable task.",Atari Games,Playing Games 2065,Playing Games,Playing Games,Other,AI vs Pong Deep Reinforcement Learning Youtube Result AI learned and played Atari Pong. Deep Learning framework: Pytorch NN architecture: Dueling DQN Min/Max scores : 21 / 21 ! Graph Result,Atari Games,Playing Games 2078,Playing Games,Playing Games,Other,OpenAI Gym Bipedal Walker v2 solution using A Distributional Perspective on Reinforcement Learning git clone repo cd experiments python run_server_tf.py hparams config_bipedal_walker.yml logdir logs/bipedal_walker see errors and solve dependences run it again here you have output of RL server app get new therminal cd experiments python run_many_agents.py hparams config_bipedal_walker.yml logdir logs/asd again solve dependences here you have output of 8 agents It runs 8 agents in parallel. 2 of them will be visible. One with exploration (added normal noise of 0.1) another for validation (without exploration noise). Near to 200K server train ops you probably get working bipedal walker,Atari Games,Playing Games 2079,Playing Games,Playing Games,Other,RL Server Quadrotor 2D RL video NIPS 2017 Learning To Run video Reinforcement Learning Server. Includes: DQN: Deep Q Networks arXiv:1312.5602 DDPG: Deep Deterministic Policy Gradient arXiv:1509.02971 DQN and DDPG implemetations are taken from Thank you Szymon Sidor ;) Running Currently works with Quadrotor 2D Simulator and Learning To Run . So you need it up nd running to see how it works. You need Python 3. It communicates with simulator with websockets and json. git clone this repository python main_..._.py Solve python deps ;) then run again,Atari Games,Playing Games 2081,Playing Games,Playing Games,Other,"AlphaXos Project status experimental! Self play with Deep Reinforcement Learning: Deep Q Learning using board games in an Open AI Gym like Environment What? Concise working example of self play Deep Q Learning You may find it a useful example in discussing and understanding Alpha Zero / AlphaGo Zero (AZ/AGZ) Environment similar to those in OpenAI gym at gym/envs/toy_text/ General approach to piece placement board games Agents: ChaosAgent: Same as DQNAgent, but Epsilon greedy during play (not just during training) DQNAgent: Double Deep Q Learning agent trained with keras rl RandomAgent: always plays a random (but valid) move HumanAgent: takes keyboard input Comparison with AlphaZero / AlphaGo Zero Similar to AZ/AGZ: reinforcement learning for a binary board game game state represented via board input matrix uses single neural network (aside from the fact it uses double DQN), instead of separate policy and value networks like earlier AlphaGos learns entirely from self play (in the case of AlphaXos, also learns from play against purely random player, as well as self play) no human engineered features or logic Different from AZ/AGZ: AX uses Double Deep Q Learning (via keras rl), as opposed to the novel Monte Carlo Tree Search variation of Policy Improvement used by AZ/AGZ, which I think was the meat of their contribution AGZ used rotated/reflected board positions to increase sample efficiency. AZ did not do this. AlphaXos does not currently do this. uses a simple shallow keras FF network (instead of a deep residual convolutional network in the case of AGZ) uses single 2D matrix for representing board including both players, instead of a multi layer matrix like AZ/AGZ. The games we consider here do not require previous timesteps in order to completely capture game state. Ie. here the current board state is sufficient to satisfy the Markhov assumption for an MDP. adjusts representation of board depending on turn side, as opposed to AGZ which provides turn side as input to the network probably many other things! Next steps lots References Alpha Zero: AlphaGo Zero: OpenAI gym: Keras RL: Keras: DQNs: Double DQN: Copyright (c) 2018 Robin Chauhan License: The MIT License",Atari Games,Playing Games 2082,Playing Games,Playing Games,Other,How can I improve the convergence of this genetic algorithm in training an auto encoder on MNIST? autoencoder ga.py A classic GA implementation with selection based on fitness. autoencoder es.py An evolutionary strategy that uses a sum of the entire population weighted by fitness,Atari Games,Playing Games 2086,Playing Games,Playing Games,Other,"Playing custom games using Deep Learning Implementation of Google's paper on playing atari games using deep learning in python. Paper Authors: Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller Paper Link: This project presents an implementation of a model (based on the above linked paper) that successfully learns control policies directly from high dimensional sensory input using reinforcement learning. The model is a convolutional neural network, trained with a variant of Q learning, whose input is raw pixels and whose output is a value function estimating future rewards. This model is tested on variety of Atari and custom made games and its performance is compared with human players. Dependencies: Python 2.7 numpy Lasagne Theano matplotlib scipy Arcade Learning environment (for Atari games) pygame (for flappy bird and shooter) GPU with CC score of greater than or equal to 3 (refer and Atari Games The model is trained for 2 Atari games Space Invaders and Breakout. The model was trained for about 12 13 hrs and has achieved good performace that is consistent with the paper. To run the agent: Breakout: python AtariGame Breakout/tester.py rom_file breakout.bin play_games 10 display_screen load_weights breakout_models/dep q rmsprop breakout99 epoch.pkl Space Invaders: python AtariGame SpaceInvaders/tester.py rom_file space_invaders.bin play_games 10 display_screen load_weights spaceinvaders_models/dep q rmsprop space_invaders99 epoch.pkl Custom Games Flappy Bird Q Learning I have trained a plain vanilla Q learning (based on based agent where the agent gets information such as the x and y distance from the pipes to compare the performance of this game specific model to a generalized model as described in the Google's paper. Training time is about 2 3 hrs. To run the agent: python FlappyQ/run_qvflappy.py Flappy Bird DQN Similar to the Atari games, I have trained the same model with minor only minor modificaions to the parameters to play Flappy Bird although the performance is not as good as the Q learning mode which had explicit game data it still gets a decent average score of about 20 30. To run the agent: python FlappyBirdDQN/ftester.py play_games 10 display_screen load_weights flappy_models/dep q flappy 60 epoch.pkl Shooter game This is a very simple game I made using pygame where the player controls a spaceship is tasked to dodge the incomming meteoroids and stay alive as long as possible. I also tried an (silly?) experiment where I trained different models wherein each model had agents with different degrees of control over the space ship and compared the performance of the same. To run the agent with just 2 control setting (left and right): python ShooterDQN/stester2.py play_games 10 display_screen load_weights shooter_models/dep q shooter nipscuda 8movectrl 99 epoch.pkl To run the agent with just 4 control setting (left, right, top and bottom): python ShooterDQN/stester4.py play_games 10 display_screen load_weights shooter_models/dep q shooter nipscuda 4movectrl 99 epoch.pkl To run the agent with just 8 control setting (all directions): python ShooterDQN/stester8.py play_games 10 display_screen load_weights shooter_models/dep q shooter nipscuda 2movectrl 80 epoch.pkl Statistics For all the below graphs, the X axis is the traning timeline and the Y axis the score funtion for each game. (Note: scores in Shooter anf flappy bird have been modified (reward amplified) because the original +1 or 1 is not applicable since the player does not have lives here and rewards are also very sparse in the these 2 games.) Atari Breakout: Atari Space Invaders: Flappy Q Learning: Flappy DQN: Shooter (4 control): Pics: Atari Breakout: Atari Space Invaders: Flappy Bird DQN: Flappy Bird Q Learning: Shooter (custom game): Note: Number of epochs and train cycles has been adjusted such that all the above code when used for traning takes only about 12 15 hrs max. depending on your CPU and GPU (My CPU: i5 3.4 GHz and GPU: nVidia GeForce 660). Also, do not expect super human level performance (as said in Google's paper) from the models as I have trained it only for 12 15 hrs more traning with further parameter tuning can improve the scores of all the above games. Resources used: The deep Q network used in this project is a modified version of spragunr's dqn code . 1 Deep Learning in Neural Networks: An Overview 2 The Arcade Learning Environment: 3 ImageNet Classification with Deep Convolutional Neural Networks: 4 Lasagne: 5 Theano: 6 CUDA: 7 Pygame: 8 General:",Atari Games,Playing Games 2118,Playing Games,Playing Games,Other,"Distributional Reinforcement Learning This repository is by Paul Ambroise D. and Pierre Alexandre K. and contains the PyTorch source code to reproduce the results of Bellemare and al. A Distributional Perspective on Reinforcement Learning . Requirements Python 3.6 Torch OpenAI gym Results We used the categorical algorithm to solve CartPole v0 . The following results were not optimized over different hyperparameters, so there is room for improvement. ! (/results/figs/test_score.png) The evolution of the distribution for the 0, 0, 0, 0 state is the following: ! (/results/figs/gifs/seed 1.gif) Discussion We want to extend the work of Bellemare and al. to continuous action using either ICNN, CEM or NAF to handle continuous actions. An ICNN implementation is yet available but needs optimization. Implicit : étendre aux actions continues QUOTA : Quantile regression : c51 qrdqn DISTRIBUTED DISTRIBUTIONAL DETERMINISTIC POLICY GRADIENTS:",Atari Games,Playing Games 2123,Playing Games,Playing Games,Other,"AlphaZero_Chess From scratch implementation of AlphaZero for Chess This repo demonstrates an implementation of AlphaZero framework for Chess, using python and PyTorch. We all know that AlphaGo, created by DeepMind, created a big stir when it defeated reigning world champion Lee Sedol 4 1 in the game of Go in 2016, hence becoming the first computer program to achieve superhuman performance in an ultra complicated game. However, AlphaGoZero, published a year later in 2017, push boundaries one big step further by achieving a similar feat without any human data inputs. A subsequent paper released by the same group DeepMind successfully applied the same reinforcement learning + supervised learning framework to chess, outperforming the previous best chess program Stockfish after just 4 hours of training. Inspired by the power of such supervised reinforcement learning models, I created a repository to build my own chess AI program from scratch, closely following the methods as described in the papers above. Contents In this repository, you will find the following core scripts: 1) MCTS_chess.py implements the Monte Carlo Tree Search (MCTS) algorithm based on Polynomial Upper Confidence Trees (PUCT) method for leaf transversal. This generates datasets (state, policy, value) for neural network training 2) alpha_net.py PyTorch implementation of the AlphaGoZero neural network architecture, with slightly reduced number of residual blocks (19) and convolution channels (256) for faster computation. The network consists of, in order: A convolution block with batch normalization 19 residual blocks with each block consisting of two convolutional layers with batch normalization An output block with two heads: a policy output head that consists of convolutional layer with batch normalization followed by logsoftmax, and a value head that consists of a convolutional layer with relu and tanh activation. 3) chess_board.py – Implementation of a chess board python class with all game rules and possible moves 4) encoder_decoder.py – list of functions to encode/decode chess board class for input/interpretation into neural network, as well as encode/decode the action policy output from neural network 5) evaluator.py – arena class to pit current neural net against the neural net from previous iteration, and keeps the neural net that wins the most games 6) train.py – function to start the neural network training process 7) train_multiprocessing.py – multiprocessing version of train.py 8) pipeline.py – script to starts a sequential iteration pipeline consisting of MCTS search to generate data and neural network training. The evaluator arena function is temporarily excluded here during the early stages of training the neural network. 9) visualize_board.py – miscellaneous function to visualize the chessboard in a more attractive way 10) analyze_games.py – miscellaneous script to visualize and save the chess games Iteration pipeline A full iteration pipeline consists of: 1) Self play using MCTS (MCTS_chess.py) to generate game datasets (game state, policy, value), with the neural net guiding the search by providing the prior probabilities in the PUCT algorithm 2) Train the neural network (train.py) using the (game state, policy, value) datasets generated from MCTS self play 3) Evaluate (evaluator.py) the trained neural net (at predefined checkpoints) by pitting it against the neural net from the previous iteration, again using MCTS guided by the respective neural nets, and keep only the neural net that performs better. 4) Rinse and repeat. Note that in the paper, all these processes are running simultaneously in parallel, subject to available computing resources one has. How to play 1) Run pipeline.py to start the MCTS search and neural net training process. Change the folder and net saved names accordingly. Note that for the first time, you will need to create and save a random, initialized alpha_net for loading. Multiprocessing is enabled, which shares the PyTorch net model in a single CUDA GPU across 6 CPUs workers each running a MCTS self play. OR 1) Run the MCTS_chess.py to generate self play datasets. Note that for the first time, you will need to create and save a random, initialized alpha_net for loading. Multiprocessing is enabled, which shares the PyTorch net model in a single CUDA GPU across 6 CPUs workers each running a MCTS self play. 2) Run train.py to train the alpha_net with the datasets. 3) At predetermined checkpoints, run evaluator.py to evaluate the trained net against the neural net from previous iteration. Saves the neural net that performs better. Multiprocessing is enabled, which shares the PyTorch net model in a single CUDA GPU across 6 CPUs workers each running a MCTS self play. 4) Repeat for next iteration.",Game of Go,Playing Games 2124,Playing Games,Playing Games,Other,"AlphaZero_Chess From scratch implementation of AlphaZero for Chess This repo demonstrates an implementation of AlphaZero framework for Chess, using python and PyTorch. We all know that AlphaGo, created by DeepMind, created a big stir when it defeated reigning world champion Lee Sedol 4 1 in the game of Go in 2016, hence becoming the first computer program to achieve superhuman performance in an ultra complicated game. However, AlphaGoZero, published a year later in 2017, push boundaries one big step further by achieving a similar feat without any human data inputs. A subsequent paper released by the same group DeepMind successfully applied the same reinforcement learning + supervised learning framework to chess, outperforming the previous best chess program Stockfish after just 4 hours of training. Inspired by the power of such supervised reinforcement learning models, I created a repository to build my own chess AI program from scratch, closely following the methods as described in the papers above. Contents In this repository, you will find the following core scripts: 1) MCTS_chess.py implements the Monte Carlo Tree Search (MCTS) algorithm based on Polynomial Upper Confidence Trees (PUCT) method for leaf transversal. This generates datasets (state, policy, value) for neural network training 2) alpha_net.py PyTorch implementation of the AlphaGoZero neural network architecture, with slightly reduced number of residual blocks (19) and convolution channels (256) for faster computation. The network consists of, in order: A convolution block with batch normalization 19 residual blocks with each block consisting of two convolutional layers with batch normalization An output block with two heads: a policy output head that consists of convolutional layer with batch normalization followed by logsoftmax, and a value head that consists of a convolutional layer with relu and tanh activation. 3) chess_board.py – Implementation of a chess board python class with all game rules and possible moves 4) encoder_decoder.py – list of functions to encode/decode chess board class for input/interpretation into neural network, as well as encode/decode the action policy output from neural network 5) evaluator.py – arena class to pit current neural net against the neural net from previous iteration, and keeps the neural net that wins the most games 6) train.py – function to start the neural network training process 7) train_multiprocessing.py – multiprocessing version of train.py 8) pipeline.py – script to starts a sequential iteration pipeline consisting of MCTS search to generate data and neural network training. The evaluator arena function is temporarily excluded here during the early stages of training the neural network. 9) visualize_board.py – miscellaneous function to visualize the chessboard in a more attractive way 10) analyze_games.py – miscellaneous script to visualize and save the chess games Iteration pipeline A full iteration pipeline consists of: 1) Self play using MCTS (MCTS_chess.py) to generate game datasets (game state, policy, value), with the neural net guiding the search by providing the prior probabilities in the PUCT algorithm 2) Train the neural network (train.py) using the (game state, policy, value) datasets generated from MCTS self play 3) Evaluate (evaluator.py) the trained neural net (at predefined checkpoints) by pitting it against the neural net from the previous iteration, again using MCTS guided by the respective neural nets, and keep only the neural net that performs better. 4) Rinse and repeat. Note that in the paper, all these processes are running simultaneously in parallel, subject to available computing resources one has. How to play 1) Run pipeline.py to start the MCTS search and neural net training process. Change the folder and net saved names accordingly. Note that for the first time, you will need to create and save a random, initialized alpha_net for loading. Multiprocessing is enabled, which shares the PyTorch net model in a single CUDA GPU across 6 CPUs workers each running a MCTS self play. OR 1) Run the MCTS_chess.py to generate self play datasets. Note that for the first time, you will need to create and save a random, initialized alpha_net for loading. Multiprocessing is enabled, which shares the PyTorch net model in a single CUDA GPU across 6 CPUs workers each running a MCTS self play. 2) Run train.py to train the alpha_net with the datasets. 3) At predetermined checkpoints, run evaluator.py to evaluate the trained net against the neural net from previous iteration. Saves the neural net that performs better. Multiprocessing is enabled, which shares the PyTorch net model in a single CUDA GPU across 6 CPUs workers each running a MCTS self play. 4) Repeat for next iteration.",Game of Go,Playing Games 2125,Playing Games,Playing Games,Other,"AlphaZero Connect4 From scratch implementation of AlphaZero for Connect4 This repo demonstrates an implementation of AlphaZero framework for Connect4, using python and PyTorch. We all know that AlphaGo, created by DeepMind, created a big stir when it defeated reigning world champion Lee Sedol 4 1 in the game of Go in 2016, hence becoming the first computer program to achieve superhuman performance in an ultra complicated game. However, AlphaGoZero, published a year later in 2017, push boundaries one big step further by achieving a similar feat without any human data inputs. A subsequent paper released by the same group DeepMind successfully applied the same reinforcement learning + supervised learning framework to chess, outperforming the previous best chess program Stockfish after just 4 hours of training. Inspired by the power of such supervised reinforcement learning models, I initially created a repository to build my own chess AI program from scratch, closely following the methods as described in the papers above. However, I quickly realized that the cost/computational power of training the chess AI would be too much to bear, thus I decided to try to implement AlphaZero on Connect4, which has much reduced moves complexity and hence would be more gentle on computational power. The point here, is to demonstrate that the AlphaZero algorithm works well to create a powerful Connect4 AI. For more implementation details, please see my published article: Contents In this repository, you will find the following core scripts: 1) MCTS_c4.py implements the Monte Carlo Tree Search (MCTS) algorithm based on Polynomial Upper Confidence Trees (PUCT) method for leaf transversal. This generates datasets (state, policy, value) for neural network training 2) alpha_net_c4.py PyTorch implementation of the AlphaZero neural network architecture, with slightly reduced number of residual blocks (19) and convolution channels (128) for faster computation. The network consists of, in order: A convolution block with batch normalization 19 residual blocks with each block consisting of two convolutional layers with batch normalization An output block with two heads: a policy output head that consists of convolutional layer with batch normalization followed by logsoftmax, and a value head that consists of a convolutional layer with relu and tanh activation. 3) connect_board.py – Implementation of a Connect4 board python class with all game rules and possible moves 4) encoder_decoder_c4.py – list of functions to encode/decode Connect4 board class for input/interpretation into neural network 5) evaluator_c4.py – arena class to pit current neural net against the neural net from previous iteration, and keeps the neural net that wins the most games 6) train_c4.py – function to start the neural network training process 7) visualize_board_c4.py – miscellaneous function to visualize the board in a more attractive way 8) analyze_games_c4.py – miscellaneous script to visualize and save the Connect4 games 9) play_against_c4.py run it to play a Connect4 game against AlphaZero! (change best_net to the alpha net you've trained) Iteration pipeline A full iteration pipeline consists of: 1) Self play using MCTS (MCTS_c4.py) to generate game datasets (game state, policy, value), with the neural net guiding the search by providing the prior probabilities in the PUCT algorithm 2) Train the neural network (train_c4.py) using the (game state, policy, value) datasets generated from MCTS self play 3) Evaluate (evaluator_c4.py) the trained neural net (at predefined checkpoints) by pitting it against the neural net from the previous iteration, again using MCTS guided by the respective neural nets, and keep only the neural net that performs better. 4) Rinse and repeat. Note that in the paper, all these processes are running simultaneously in parallel, subject to available computing resources one has. How to play 1) Run the MCTS_c4.py to generate self play datasets. Note that for the first time, you will need to create and save a random, initialized alpha_net for loading. 2) Run train_c4.py to train the alpha_net with the datasets. 3) At predetermined checkpoints, run evaluator_c4.py to evaluate the trained net against the neural net from previous iteration. Saves the neural net that performs better. 4) Repeat for next iteration. Results Iteration 0: alpha_net_0 (Initialized with random weights) 151 games of MCTS self play generated Iteration 1: alpha_net_1 (trained from iteration 0) 148 games of MCTS self play generated Iteration 2: alpha_net_2 (trained from iteration 1) 310 games of MCTS self play generated Evaluation 1: After Iteration 2, alpha_net_2 is pitted against alpha_net_0 to check if the neural net is improving in terms of policy and value estimate. Indeed, out of 100 games played, alpha_net_2 won 83. Iteration 3: alpha_net_3 (trained from iteration 2) 584 games of MCTS self play generated Iteration 4: alpha_net_4 (trained from iteration 3) 753 games of MCTS self play generated Iteration 5: alpha_net_5 (trained from iteration 4) 1286 games of MCTS self play generated Iteration 6: alpha_net_6 (trained from iteration 5) 1670 games of MCTS self play generated ! alt text Typical Loss vs Epoch when training neural net (alpha_net_0)",Game of Go,Playing Games 2140,Playing Games,Playing Games,Other,"Description ! In this environment, a double jointed arm can move to target locations. A reward of +0.1 is provided for each step that the agent's hand is in the goal location. Thus, the goal of your agent is to maintain its position at the target location for as many time steps as possible. The observation space consists of 33 variables corresponding to position, rotation, velocity, and angular velocities of the arm. Each action is a vector with four numbers, corresponding to torque applicable to two joints. Every entry in the action vector should be a number between 1 and 1. This projects implements DDPG for continous control for the Reacher environment. Learning Algorithm Used The reinforcement learning agent implementation follows the ideas of arXiv:1509.02971 paper implementing a DDPG agent. It is an Actor Critic method. The algorithm helps the agent to act in an environment with a goal of solving the task defined by the environment as well as explore the environment in order to improve the agent's behaviour. The algorithm is also augmented with the fixed Q target, double network, soft updates and experience replay. The agent exploits the initial lack of knowledge as well as Ornstein–Uhlenbeck process generated noise to explore the environment. The hyperparameters selected for the demonstration are: Actor learning rate: 0.0001 Critic learning rate: 0.0001 Update rate: 1 Memory size: 100000 Batch size: 64 Gamma: 0.99 Tau: 0.001 Adam weight decay: 0 Number of episodes: 200 It took the network 114 episodes to be able to perform with not less than score of 30 as an average of 100 episodes. Training it different times give us lesser number of episodes to solve sometimes. It can also be reduced by tuning the hyperparameters. Plot of Rewards ! The saved weights of the Actor and Critic networks can be found here. Follow setup here. Train the network here. Ideas for Future Work Search for better hyperparameters of algorithm as well as neural network Implement a state to state predictor to improve the explorative capabilities of the agent",Atari Games,Playing Games 2141,Playing Games,Playing Games,Other,"Description image1 : Trained Agent ! Trained Agent image1 The RL agent is allowed to traverse across a two dimensional grid with blue and yellow bananas placed across it. The agent is expected to collect the yellow bananas while avoiding the blue ones. The agent receives a positive reward for every yellow banana it collects and a negative reward for every blue banana collected. The size of the state space is 37. The agent is able to move forwards and backwards as well as turn left and right, thus the size of the action space is 4. The minimal expected performance of the agent after training is a score of +13 over 100 consecutive episodes. Algorithm Used The current solution implements a Dueling DDQN algorithm with Prioritized Experience Replay as described in the Dueling Network Architectures for Deep Reinforcement Learning paper (arXiv:1511.06581) The network's architecture looks like : DuelQNet( (input): Linear(in_features 37, out_features 100, bias True) (dropout): Dropout(p 0.3) (hidden): ModuleList( (0): Linear(in_features 100, out_features 64, bias True) (1): Linear(in_features 64, out_features 64, bias True) (2): Linear(in_features 64, out_features 64, bias True) (3): Linear(in_features 64, out_features 64, bias True) ) (value): ModuleList( (0): Linear(in_features 64, out_features 64, bias True) (1): Linear(in_features 64, out_features 1, bias True) ) (advantage): ModuleList( (0): Linear(in_features 64, out_features 64, bias True) (1): Linear(in_features 64, out_features 4, bias True) ) ) The hyperparameters selected for the demonstration are: Learning Rate: 0.0005 Batch size : 64 Learns after every steps : 4 Gamma : 0.99 Tau : 0.003 It took the agent about 471 episodes to be able to perform with not less than score of 13 as an average of 100 episodes. Plot of Rewards ! The goal set of the agent is to reach +13 points as average reward of the last 100 episodes. The current solution manages to reach the goal after 400 500 episodes and keep improving over 17 points. The saved weights can be found here. Follow the training notebook here. Some ideas for future work Search for better hyperparameters of algorithm as well as neural network Implement prioritized experience replay mechanism",Atari Games,Playing Games 2178,Playing Games,Playing Games,Other,"Linux Build Status Windows Build Status What A Go program with no human provided knowledge. Using MCTS (but without Monte Carlo playouts) and a deep residual convolutional neural network stack. This is a fairly faithful reimplementation of the system described in the Alpha Go Zero paper Mastering the Game of Go without Human Knowledge . For all intents and purposes, it is an open source AlphaGo Zero. Wait, what? If you are wondering what the catch is: you still need the network weights. No network weights are in this repository. If you manage to obtain the AlphaGo Zero weights, this program will be about as strong, provided you also obtain a few Tensor Processing Units. Lacking those TPUs, I'd recommend a top of the line GPU it's not exactly the same, but the result would still be an engine that is far stronger than the top humans. Gimme the weights Recomputing the AlphaGo Zero weights will take about 1700 years on commodity hardware . One reason for publishing this program is that we are running a public, distributed effort to repeat the work. Working together, and especially when starting on a smaller scale, it will take less than 1700 years to get a good network (which you can feed into this program, suddenly making it strong). I want to help You need a PC with a GPU, i.e. a discrete graphics card made by NVIDIA or AMD, preferably not too old, and with the most recent drivers installed. It is possible to run the program without a GPU, but performance will be much lower. If your CPU is not very recent (Haswell or newer, Ryzen or newer), performance will be outright bad, and it's probably of no use trying to join the distributed effort. But you can still play, especially if you are patient. Running Leela Zero client on a Tesla K80 GPU for free (Google Colaboratory) (COLAB.md) Windows Head to the Github releases page at download the latest release, unzip, and launch autogtp.exe. It will connect to the server automatically and do its work in the background, uploading results after each game. You can just close the autogtp window to stop it. macOS and Linux Follow the instructions below to compile the leelaz binary, then go into the autogtp subdirectory and follow the instructions there (autogtp/README.md) to build the autogtp binary. Copy the leelaz binary into the autogtp dir, and launch autogtp. I just want to play right now Download the best known network weights file from: And head to the Usage ( usage) section of this README. If you prefer a more human style, a network trained from human games is available here: Compiling Requirements GCC, Clang or MSVC, any C++14 compiler Boost 1.58.x or later, headers and program_options library (libboost dev & libboost program options dev on Debian/Ubuntu) BLAS Library: OpenBLAS (libopenblas dev) or (optionally) Intel MKL zlib library (zlib1g & zlib1g dev on Debian/Ubuntu) Standard OpenCL C headers (opencl headers on Debian/Ubuntu, or at OpenCL ICD loader (ocl icd libopencl1 on Debian/Ubuntu, or reference implementation at An OpenCL capable device, preferably a very, very fast GPU, with recent drivers is strongly recommended (OpenCL 1.1 support is enough). If you do not have a GPU, modify config.h in the source and remove the line that says define USE_OPENCL . The program has been tested on Windows, Linux and macOS. Example of compiling and running Ubuntu Test for OpenCL support & compatibility sudo apt install clinfo && clinfo Clone github repo git clone cd leela zero/src sudo apt install libboost dev libboost program options dev libopenblas dev opencl headers ocl icd libopencl1 ocl icd opencl dev zlib1g dev make cd .. wget src/leelaz weights best network Example of compiling and running macOS Clone github repo git clone cd leela zero/src brew install boost make cd .. curl O src/leelaz weights best network Example of compiling and running Windows Clone github repo git clone cd leela zero cd msvc Double click the leela zero2015.sln or leela zero2017.sln corresponding to the Visual Studio version you have. Build from Visual Studio 2015 or 2017 Download to msvc\x64\Release msvc\x64\Release\leelaz.exe weights best network Example of compiling and running CMake (macOS/Ubuntu) Clone github repo git clone cd leela zero git submodule update init recursive Use stand alone directory to keep source dir clean mkdir build && cd build cmake .. make leelaz make tests ./tests curl O ./leelaz weights best network Usage The engine supports the GTP protocol, version 2 . Leela Zero is not meant to be used directly. You need a graphical interface for it, which will interface with Leela Zero through the GTP protocol. Sabaki is a very nice looking GUI with GTP 2 capability. It should work with this engine. A lot of go software can interface to an engine via GTP, so look around. Add the gtp commandline option on the engine command line to enable Leela Zero's GTP support. You will need a weights file, specify that with the w option. All required commands are supported, as well as the tournament subset, and loadsgf . The full set can be seen with list_commands . The time control can be specified over GTP via the time\_settings command. The kgs time\_settings extension is also supported. These have to be supplied by the GTP 2 interface, not via the command line! Weights format The weights file is a text file with each line containing a row of coefficients. The layout of the network is as in the AlphaGo Zero paper, but any number of residual blocks is allowed, and any number of outputs (filters) per layer, as long as the latter is the same for all layers. The program will autodetect the amounts on startup. The first line contains a version number. Convolutional layers have 2 weight rows: 1) convolution weights 2) channel biases Batchnorm layers have 2 weight rows: 1) batchnorm means 2) batchnorm variances Innerproduct (fully connected) layers have 2 weight rows: 1) layer weights 2) output biases The convolution weights are in output, input, filter\_size, filter\_size order, the fully connected layer weights are in output, input order. The residual tower is first, followed by the policy head, and then the value head. All convolution filters are 3x3 except for the ones at the start of the policy and value head, which are 1x1 (as in the paper). There are 18 inputs to the first layer, instead of 17 as in the paper. The original AlphaGo Zero design has a slight imbalance in that it is easier for the black player to see the board edge (due to how padding works in neural networks). This has been fixed in Leela Zero. The inputs are: 1) Side to move stones at time T 0 2) Side to move stones at time T 1 (0 if T 0) ... 8) Side to move stones at time T 7 (0 if T< 6) 9) Other side stones at time T 0 10) Other side stones at time T 1 (0 if T 0) ... 16) Other side stones at time T 7 (0 if T< 6) 17) All 1 if black is to move, 0 otherwise 18) All 1 if white is to move, 0 otherwise Each of these forms a 19 x 19 bit plane. In the training/caffe directory there is a zero.prototxt file which contains a description of the full 40 residual block design, in (NVIDIA) Caffe protobuff format. It can be used to set up nv caffe for training a suitable network. The zero\_mini.prototxt file describes a smaller 12 residual block case. The training/tf directory contains the network construction in TensorFlow format, in the tfprocess.py file. Expert note: the channel biases seem redundant in the network topology because they are followed by a batchnorm layer, which is supposed to normalize the mean. In reality, they encode beta parameters from a center/scale operation in the batchnorm layer, corrected for the effect of the batchnorm mean/variance adjustment. At inference time, Leela Zero will fuse the channel bias into the batchnorm mean, thereby offsetting it and performing the center operation. This roundabout construction exists solely for backwards compatibility. If this paragraph does not make any sense to you, ignore its existence and just add the channel bias layer as you normally would, output will be correct. Training Getting the data At the end of the game, you can send Leela Zero a dump\_training command, followed by the winner of the game (either white or black ) and a filename, e.g: dump_training white train.txt This will save (append) the training data to disk, in the format described below, and compressed with gzip. Training data is reset on a new game. Supervised learning Leela can convert a database of concatenated SGF games into a datafile suitable for learning: dump_supervised sgffile.sgf train.txt This will cause a sequence of gzip compressed files to be generated, starting with the name train.txt and containing training data generated from the specified SGF, suitable for use in a Deep Learning framework. Training data format The training data consists of files with the following data, all in text format: 16 lines of hexadecimal strings, each 361 bits longs, corresponding to the first 16 input planes from the previous section 1 line with 1 number indicating who is to move, 0 black, 1 white, from which the last 2 input planes can be reconstructed 1 line with 362 (19x19 + 1) floating point numbers, indicating the search probabilities (visit counts) at the end of the search for the move in question. The last number is the probability of passing. 1 line with either 1 or 1, corresponding to the outcome of the game for the player to move Running the training For training a new network, you can use an existing framework (Caffe, TensorFlow, PyTorch, Theano), with a set of training data as described above. You still need to contruct a model description (2 examples are provided for Caffe), parse the input file format, and outputs weights in the proper format. There is a complete implementation for TensorFlow in the training/tf directory. Supervised learning with TensorFlow This requires a working installation of TensorFlow 1.4 or later: src/leelaz w weights.txt dump_supervised bigsgf.sgf train.out exit training/tf/parse.py train.out This will run and regularly dump Leela Zero weight files to disk, as well as snapshots of the learning state numbered by the batch number. If interrupted, training can be resumed with: training/tf/parse.py train.out leelaz model batchnumber Todo List of package names for more distros Multi GPU support for training Optimize Winograd transformations CUDA specific version using cuDNN AMD specific version using MIOpen Related links Status page of the distributed effort: Watch Leela Zero's training games live in a GUI: GUI and study tool for Leela Zero: Stockfish chess engine ported to Leela Zero framework: Original Alpha Go (Lee Sedol) paper: Newer Alpha Zero (Go, Chess, Shogi) paper: AlphaGo Zero Explained In One Diagram: License The code is released under the GPLv3 or later, except for ThreadPool.h, cl2.hpp, half.hpp and the clblast_level3 subdirs, which have specific licenses (compatible with GPLv3) mentioned in those files.",Game of Go,Playing Games 2215,Playing Games,Playing Games,Other,"CLTrainer a command line tool that compares the deep q learning and NEAT algorithms in environments such as Ping Pong and Flappy Bird At command line: 1) pip install r .\requirements.txt 2) python .\__main__.py game INSERT GAME mode INSERT TRAINING MODE fps INSERT FRAMES PER SEC. name INSERT SESSION NAME Argument currently available: game: pong, flappybird mode: human: you playing the game dqn: deep q learning (Google's DeepMind paper: neat: Neuroevolution of Augmented Topologies (Kenneth Stanley paper: Project is still early in development, you may see bugs when running a startup command. Cleaner Interface is in the works to allow for easier integration with games and training modes.",Atari Games,Playing Games 2220,Playing Games,Playing Games,Other,"Minitaur Pybullet Minitaur with Distributional Policy Gradients I implemented distributional policy gradient in pybullet's Minitaur environment. I added priority replay buffer to the base implementation by Maxim Lapan, 1 which uses architecture described in A Distributional Perspective on Reinforcement Learning . 2 Priority replay buffer is inspired by Rainbow: Combining Improvements in Deep Reinforcement Learning 3 and is applied to the critic. 1 2 3",Atari Games,Playing Games 2269,Playing Games,Playing Games,Other,"Chess AI Developed by Ryan Pope for EECS 649 (Intro to Artificial Intelligence). Overview Here is a link to a youtube video explaining my code: This repo contains several Jupyter notebooks to explore chess algorithms using python. I utilized the python library python chess in order to execute and manage all of the chess backend features. I began with simple random move generation and slowly upgraded to using a minimax algorithm with basic material evaluation to determine best moves, After researching about AlphaZero and Leela Chess Zero, I tried to implement some similar approachs utilizing UCT Monte Carlo Tree Search to generate move sequences and a neural network that was trained to evaluate different positions using previous stockfish evaluations from the lichess dataset. Data I downloaded the May 2015 Lichess dataset from . I then filtered games by Elo of both players greater than 1800 to generate better play and I used only games with the stockfish evaluations. I iterated over the moves of the game and created frames from the PGNs where I utilized an array of size 64 to store the encoded board. Files filtered_games.pgn is the filtered PGN dataset that I utilized for training. preprocessed.csv is a CSV file containing the data as arrays of 64 values plus the evaluation for the state. ChessExploration.ipynb is a Jupyter notebook with basic algorithms such as Minimax. Preprocessing.ipynb converted the PGN dataset into the arrays stored in the csv. NeuralNetTraining.ipynb trained the neural network. MCTSChess.ipynb implements the Monte Carlo Tree Search and utilizes the Neural Net to evaluate positions. Depenedencies Keras Tensorflow Numpy Scikit learn python chess Key Resources Papers: Articles: Code Influences: Other:",Game of Go,Playing Games 2282,Playing Games,Playing Games,Other,"Linux Build Status Windows Build Status What A Go program with no human provided knowledge. Using MCTS (but without Monte Carlo playouts) and a deep residual convolutional neural network stack. This is a fairly faithful reimplementation of the system described in the Alpha Go Zero paper Mastering the Game of Go without Human Knowledge . For all intents and purposes, it is an open source AlphaGo Zero. Wait, what? If you are wondering what the catch is: you still need the network weights. No network weights are in this repository. If you manage to obtain the AlphaGo Zero weights, this program will be about as strong, provided you also obtain a few Tensor Processing Units. Lacking those TPUs, I'd recommend a top of the line GPU it's not exactly the same, but the result would still be an engine that is far stronger than the top humans. Gimme the weights Recomputing the AlphaGo Zero weights will take about 1700 years on commodity hardware . One reason for publishing this program is that we are running a public, distributed effort to repeat the work. Working together, and especially when starting on a smaller scale, it will take less than 1700 years to get a good network (which you can feed into this program, suddenly making it strong). I want to help You need a PC with a GPU, i.e. a discrete graphics card made by NVIDIA or AMD, preferably not too old, and with the most recent drivers installed. It is possible to run the program without a GPU, but performance will be much lower. If your CPU is not very recent (Haswell or newer, Ryzen or newer), performance will be outright bad, and it's probably of no use trying to join the distributed effort. But you can still play, especially if you are patient. Running Leela Zero client on a Tesla K80 GPU for free (Google Colaboratory) (COLAB.md) Windows Head to the Github releases page at download the latest release, unzip, and launch autogtp.exe. It will connect to the server automatically and do its work in the background, uploading results after each game. You can just close the autogtp window to stop it. macOS and Linux Follow the instructions below to compile the leelaz binary, then go into the autogtp subdirectory and follow the instructions there (autogtp/README.md) to build the autogtp binary. Copy the leelaz binary into the autogtp dir, and launch autogtp. I just want to play right now Download the best known network weights file from: And head to the Usage ( usage) section of this README. If you prefer a more human style, a network trained from human games is available here: Compiling Requirements GCC, Clang or MSVC, any C++14 compiler Boost 1.58.x or later, headers and program_options library (libboost dev & libboost program options dev on Debian/Ubuntu) BLAS Library: OpenBLAS (libopenblas dev) or (optionally) Intel MKL zlib library (zlib1g & zlib1g dev on Debian/Ubuntu) Standard OpenCL C headers (opencl headers on Debian/Ubuntu, or at OpenCL ICD loader (ocl icd libopencl1 on Debian/Ubuntu, or reference implementation at An OpenCL capable device, preferably a very, very fast GPU, with recent drivers is strongly recommended (OpenCL 1.1 support is enough). If you do not have a GPU, modify config.h in the source and remove the line that says define USE_OPENCL . The program has been tested on Windows, Linux and macOS. Example of compiling and running Ubuntu Test for OpenCL support & compatibility sudo apt install clinfo && clinfo Clone github repo git clone cd leela zero/src sudo apt install libboost dev libboost program options dev libopenblas dev opencl headers ocl icd libopencl1 ocl icd opencl dev zlib1g dev make cd .. wget src/leelaz weights best network Example of compiling and running macOS Clone github repo git clone cd leela zero/src brew install boost make cd .. curl O src/leelaz weights best network Example of compiling and running Windows Clone github repo git clone cd leela zero cd msvc Double click the leela zero2015.sln or leela zero2017.sln corresponding to the Visual Studio version you have. Build from Visual Studio 2015 or 2017 Download to msvc\x64\Release msvc\x64\Release\leelaz.exe weights best network Example of compiling and running CMake (macOS/Ubuntu) Clone github repo git clone cd leela zero git submodule update init recursive Use stand alone directory to keep source dir clean mkdir build && cd build cmake .. make leelaz make tests ./tests curl O ./leelaz weights best network Usage The engine supports the GTP protocol, version 2 . Leela Zero is not meant to be used directly. You need a graphical interface for it, which will interface with Leela Zero through the GTP protocol. Sabaki is a very nice looking GUI with GTP 2 capability. It should work with this engine. A lot of go software can interface to an engine via GTP, so look around. Add the gtp commandline option on the engine command line to enable Leela Zero's GTP support. You will need a weights file, specify that with the w option. All required commands are supported, as well as the tournament subset, and loadsgf . The full set can be seen with list_commands . The time control can be specified over GTP via the time\_settings command. The kgs time\_settings extension is also supported. These have to be supplied by the GTP 2 interface, not via the command line! Weights format The weights file is a text file with each line containing a row of coefficients. The layout of the network is as in the AlphaGo Zero paper, but any number of residual blocks is allowed, and any number of outputs (filters) per layer, as long as the latter is the same for all layers. The program will autodetect the amounts on startup. The first line contains a version number. Convolutional layers have 2 weight rows: 1) convolution weights 2) channel biases Batchnorm layers have 2 weight rows: 1) batchnorm means 2) batchnorm variances Innerproduct (fully connected) layers have 2 weight rows: 1) layer weights 2) output biases The convolution weights are in output, input, filter\_size, filter\_size order, the fully connected layer weights are in output, input order. The residual tower is first, followed by the policy head, and then the value head. All convolution filters are 3x3 except for the ones at the start of the policy and value head, which are 1x1 (as in the paper). There are 18 inputs to the first layer, instead of 17 as in the paper. The original AlphaGo Zero design has a slight imbalance in that it is easier for the black player to see the board edge (due to how padding works in neural networks). This has been fixed in Leela Zero. The inputs are: 1) Side to move stones at time T 0 2) Side to move stones at time T 1 (0 if T 0) ... 8) Side to move stones at time T 7 (0 if T< 6) 9) Other side stones at time T 0 10) Other side stones at time T 1 (0 if T 0) ... 16) Other side stones at time T 7 (0 if T< 6) 17) All 1 if black is to move, 0 otherwise 18) All 1 if white is to move, 0 otherwise Each of these forms a 19 x 19 bit plane. In the training/caffe directory there is a zero.prototxt file which contains a description of the full 40 residual block design, in (NVIDIA) Caffe protobuff format. It can be used to set up nv caffe for training a suitable network. The zero\_mini.prototxt file describes a smaller 12 residual block case. The training/tf directory contains the network construction in TensorFlow format, in the tfprocess.py file. Expert note: the channel biases seem redundant in the network topology because they are followed by a batchnorm layer, which is supposed to normalize the mean. In reality, they encode beta parameters from a center/scale operation in the batchnorm layer, corrected for the effect of the batchnorm mean/variance adjustment. At inference time, Leela Zero will fuse the channel bias into the batchnorm mean, thereby offsetting it and performing the center operation. This roundabout construction exists solely for backwards compatibility. If this paragraph does not make any sense to you, ignore its existence and just add the channel bias layer as you normally would, output will be correct. Training Getting the data At the end of the game, you can send Leela Zero a dump\_training command, followed by the winner of the game (either white or black ) and a filename, e.g: dump_training white train.txt This will save (append) the training data to disk, in the format described below, and compressed with gzip. Training data is reset on a new game. Supervised learning Leela can convert a database of concatenated SGF games into a datafile suitable for learning: dump_supervised sgffile.sgf train.txt This will cause a sequence of gzip compressed files to be generated, starting with the name train.txt and containing training data generated from the specified SGF, suitable for use in a Deep Learning framework. Training data format The training data consists of files with the following data, all in text format: 16 lines of hexadecimal strings, each 361 bits longs, corresponding to the first 16 input planes from the previous section 1 line with 1 number indicating who is to move, 0 black, 1 white, from which the last 2 input planes can be reconstructed 1 line with 362 (19x19 + 1) floating point numbers, indicating the search probabilities (visit counts) at the end of the search for the move in question. The last number is the probability of passing. 1 line with either 1 or 1, corresponding to the outcome of the game for the player to move Running the training For training a new network, you can use an existing framework (Caffe, TensorFlow, PyTorch, Theano), with a set of training data as described above. You still need to contruct a model description (2 examples are provided for Caffe), parse the input file format, and outputs weights in the proper format. There is a complete implementation for TensorFlow in the training/tf directory. Supervised learning with TensorFlow This requires a working installation of TensorFlow 1.4 or later: src/leelaz w weights.txt dump_supervised bigsgf.sgf train.out exit training/tf/parse.py train.out This will run and regularly dump Leela Zero weight files to disk, as well as snapshots of the learning state numbered by the batch number. If interrupted, training can be resumed with: training/tf/parse.py train.out leelaz model batchnumber Todo List of package names for more distros Multi GPU support for training Optimize Winograd transformations CUDA specific version using cuDNN AMD specific version using MIOpen Related links Status page of the distributed effort: Watch Leela Zero's training games live in a GUI: GUI and study tool for Leela Zero: Stockfish chess engine ported to Leela Zero framework: Original Alpha Go (Lee Sedol) paper: Newer Alpha Zero (Go, Chess, Shogi) paper: AlphaGo Zero Explained In One Diagram: License The code is released under the GPLv3 or later, except for ThreadPool.h, cl2.hpp, half.hpp and the clblast_level3 subdirs, which have specific licenses (compatible with GPLv3) mentioned in those files.",Game of Go,Playing Games 2283,Playing Games,Playing Games,Other,"Deep Pepper MCTS based algorithm for parallel training of a chess engine. Adapted from existing deep learning game engines such as Giraffe and AlphaZero, Deep Pepper is a clean room implementation of a chess engine that leverages Stockfish for the opening and closing book, and learns a policy entirely through self play. Technologies Used We use the following technologies to train the model and interface with the Stockfish Chess engine. python chess For handling the chess environment and gameplay. pytorch For training and inference. Stockfish For value function and endgame evaluation. Tensorboard For visualizing training progress. Setup Instructions 1. Run pip install r requirements.txt to install the necessary dependencies. 2. Run python launch_script.py to start training the Chess Engine. Acknowledgements Giraffe Alpha Zero StockFish",Game of Go,Playing Games 2294,Playing Games,Playing Games,Other,agent.py is a q learning implementation using tensorflow. It doesn't really work. See mcts.py is an AlphaZero style MCTS based learning system also using tensorflow. I have no idea if it works. See and,Game of Go,Playing Games 2360,Playing Games,Playing Games,Other,Distributed DQN Distributed DQN to play CartPole game using chainer (the Distributed version of DQN: without Prioritized part) Paper: Distributed Prioritized Experience Replay:,Atari Games,Playing Games 2385,Playing Games,Playing Games,Other,"APO Automatic Program Optimizer APO is a toy project to explore reinforcement learning for program optimization pattern rewriting on compute DAGs in the search for the shortest program. The project is inspired by AlphaZero . The toy language This toy program computes the expression 7 ((b + a) b) : :: 0: add %b %a 1: sub %0 %b 2: 7 3: mul %2 %1 4: ret %3 A toy program consists of a sequence of operations on unsigned, 32bit integers. Every statement has the form :: : ( Identifiers Identifiers can refer to line numbers or arguments, for example: %2 is the result of the statement in line 2 (general case %n with n being an integer > 0 ). %c is the value of the third parameter ( %a to %z ). \ Binary Operators :: 3: add %0 %1 adds the values of variables %0 and %1 and store the result in %3 . available operators are add / sub / mul / and / or / xor . \ Constant yielding :: 2: 42 Stores 42 in variable %2 . \ pipe (fake use) :: 3: %2 Assign to %3 the contents of %2 . \ return value :: 6: ret %5 %5 is the return value of the program. configuration files devices.conf Purpose: Tensorflow tower configuration file Syntax: :: , Interpretation: create a tower on with the op prefix /. The tower is instantiated for the tasks specified in . The interpretation of depends on the task. It is possible to create multiple towers per device. infer task: Inference tower used during reinforcement learning. translates to the number of concurrent search threads using this inference device. An infer tower uses a StagingArea (dev/put_stage is required to pull data into the model before dev/infer_X_dist can be used). loss task: Inference tower used for loss computation (logging). This device is unbuffered (directly feeded from tf.placeholders). does not have any meaning here. train task: There must be only a single train device at the moment. may in the future be used to automatically distribute gradient computation to all train devices. server.conf SampleServer configuration (mostly the training sample queue). train.conf Soft model parameters. Does not affect the MetaGraph so no rebuilding necessary. model.conf Hard model parameters. If any entry changes, the Tensorflow MetaGraph needs to be rebuild (which takes forever).",Game of Go,Playing Games 2393,Playing Games,Playing Games,Other,RL_example Implicit quantile network for GYM cartpole and other RL problem Hindsight experiance replay for the 2d arm Reference 1. IQN 2. HER 3. 4.,Atari Games,Playing Games 2394,Playing Games,Playing Games,Other,RL_example Implicit quantile network for GYM cartpole and other RL problem Hindsight experiance replay for the 2d arm Reference 1. IQN 2. HER 3. 4.,Atari Games,Playing Games 2407,Playing Games,Playing Games,Other,"Project 1: Navigation Deep Reinforcement Learning for Banana Collecting This project was one of the requirements for completing the Deep Reinforcement Learning Nanodegree (DRLND) course at Udacity.com. The preceding lessons focused on deep Q networks. Project Details: The Environment A learning agent is trained to navigate and collect bananas in a finite square world shown in the clip below. Collecting a yellow banana results in a reward of +1 while collecting a blue banana results in a negative reward of 1. The environment was pre built for the project using the Unity ML agents toolkit. ! (environment.gif) ​ (From the Udacity course project introduction) State Space The state space has 37 dimensions. Parameters characterize the agent's velocity, along with ray based perception of objects around the agent's forward direction. Given this information, the agent ideally learns how to select actions that increase the score Action Space There are 4 possible actions for the agent to choose from: 0 move forward. 1 move backward. 2 turn left. 3 turn right. Specified Project Goal for a Solution The environment is episodic. The stated goal of the project is to have the learning agent achieve a score of at least +13 averaged over 100 consecutive episodes. Getting Started: Installation The installation of the software is accomplished with the package manager, conda. Installing Anaconda will include conda as well as facilitate the installation of other data science software packages. The Jupyter Notebook App is also required for running this project and is installed automatically with Anaconda. The dependencies for this project can be installed by following the instructions at Required components include but are are not limited to Python 3.6 (I specifically used 3.6.6), and PyTorch v0.4, and a version of the Unity ML Agents toolkit. Note that ML Agents are only supported on Microsoft Windows 10. I only used Windows 10, so cannot vouch for the accuracy of the instructions for other operating systems. 1. After installing anaconda, create (and activate) an environment Linux or Mac : In a terminal window, perform the following commands: conda create name drlnd python 3.6 source activate drlnd Windows : Make sure you are using the anaconda command line rather than the usual windows cmd.exe. conda create name drlnd python 3.6 activate drlnd 2. Clone the Udacity Deep Reinforcement Learning Nanodegree repository and install dependencies. The instructions at indicate that you should enter the following on the command line to clone the the repository and install the dependencies: git clone cd deep reinforcement learning/python pip install . However, for Windows 10, this did not work for me. The pip command fails when it tries to install torch 0.4.0. This version may no longer be available. I edited the dependencies shown in the requirements.txt file in the directory and changed the line for torch from torch 0.4.0 to torch 0.4.1 . The pip command worked after the change. Otherwise you can install the required packages in the requirements folder manually. Sometimes these software packages change and you may need to refer to the specific instructions for an individual package. For example, may be helpful for installing PyTorch. If you clone the DRLND repository, the original files from the project can be found in the folder deep reinforcement learning\p1_navigation 3. Clone or copy my repository or folder for this project The folder is named p1_navigation_SNH. 4. Download the Unity environment for this project Use one of the following links: Linux: click here Mac OSX: click here Windows (32 bit): click here Windows (64 bit): click here Place the file in the DRLND GitHub repository, in the p1_navigation_SNH/ folder, and unzip (or decompress) the file. Copy the file into the folder p1_navigation_SNH 5. Prepare and use Jupyter Notebooks for training the agent and for running the software. Create an IPython kernel for the drlnd environment: python m ipykernel install user name drlnd display name drlnd These steps only need to be performed once. Instructions 1. In a terminal window, specifically an Anaconda terminal window for Microsoft Windows, activate the conda environment if not already done: Linux or Mac : source activate drlnd Windows : Make sure you are using the anaconda command line rather than the usual windows cmd.exe. activate drlnd 2. Change directory to the p1_navigate_SNH folder. Run Jupyter Notebook: jupyter notebook 3. Open the notebook Navigation_SNH.ipynb . Before running code in a notebook, change the kernel to match the drlnd environment by using the drop down Kernel menu: Kernel 4. To train the the deep Q network with the provided parameters, just run all under the Cell drop down menu of the jupyter notebook. The parameters of the learning agent can be changed in Section 4 of the notebook. The parameters for running the simulation and training the agent can be modified in Section 5. The notebook can then be run again. The parameters are described below. During training, a checkpoint named checkpoint13.pth is saved after it achieves a score of greater than 13 averaged over 100 episodes. After all the training is completed (currently set at 5000 episodes), a checkpoint named checkpoint_final.pth is saved. Run the notebook named Navigation_run_saved.ipynb to read in the save checkpoint for the trained agent to watch it play the game without further learning. The name of the notebook can be changed in section 3 of the notebook. It is currently set up to run the agent through 100 episodes end provide scores and the final average score. The final parameter is the number of episodes to run and can also be changed: load_and_run_agent(agent, env, 'checkpoint_5000_not_prioritized.pth', 100) Files Navigation_SNH.ipynb: Jupyter notebook to train the agent and to save the trained agent as a checkpoint. Navigation_run_saved.ipynb: Notebook to read in a saved checkpoint and run the agent without additional learning. model.py: The neural networks agent.py: The deep Q learning agent (python class Agent) Parameters These parameters and the implementation are discussed more in the file Report.md. Agent parameters state_size (int): Number of parameters in the environment state action_size (int): Number of different actions seed (int): random seed learning_rate (float): initial learning rate batch_normalize (boolean): Flag for using batch normalization in the neural network error_clipping (boolean): Flag for limiting the TD error to between 1 and 1 reward_clipping (boolean): Flag for limiting the reward to between 1 and 1 gradient_clipping (boolean): Flag for clipping the norm of the gradient to 1 target_update_interval (int): Set negative to use soft updating. The number of learning steps between updating the neural network for fixed Q targets. double_dqn (boolean): Flag for using double Q learning dueling_dqn (boolean): Flag for using dueling Q networks prioritized_replay (boolean): Flag for using prioritized replay memory sampling Training parameters n_episodes (int): Maximum number of training episodes max_t (int): Maximum number of timesteps per episode epsilon_initial (float): Initial value of epsilon for epsilon greedy selection of an action epsilon_final (float): Final value of epsilon epsilon_rate (float): A rate (0.0 to 1.0) for decreasing epsilon for each episode. Higher is faster decay. gamma_initial (float): Initial gamma discount factor (0 to 1). Higher values favor long term over current rewards. gamma_final (float): Final gamma discount factor (0 to 1). gammma_rate (float): A rate (0 to 1) for increasing gamma. beta_initial (float): For prioritized replay. Corrects bias induced by weighted sampling of stored experiences. The beta parameters have no effect if the agent unless prioritized experience replay is used. beta_rate (float): Rate (0 to 1) for increasing beta to 1 as per Schauel et al. tau_initial (float): Initial value for tau, the weighting factor for soft updating the neural network. The tau parameters have no effect if the agent uses fixed Q targets instead of soft updating. tau_final (float): Final value of tau. tau_rate (float): Rate (0 to 1) for increasing tau each episode.",Atari Games,Playing Games 2408,Playing Games,Playing Games,Other,"Project 2: Continuous Control Scott Hwang snhwang@alum.mit.edu This project was one of the requirements for completing the Deep Reinforcement Learning Nanodegree (DRLND) course at Udacity.com. Project Details: The Environment This project uitlized the Reacher environment. In this environment, an agent, represented by a double jointed arm, moves its hand to target locations. The figure below shows a clip of 10 agents following their targets (green balls) with their hands (small blue balls). The agent must control its joints to maintain its hand at the target. A reward of +0.1 is provided for each step that the agent's hand is in the target location. Thus, the goal of each agent is to maintain its position at the target location for as many time steps as possible. ! (reacher.gif) ​ Taken from the Udacity project introduction page State Space The observation space consists of 33 variables corresponding to position, rotation, velocity, and angular velocities of the arm. Action Space Each action is a vector with four numbers, corresponding to torque applicable to two joints. Every entry in the action vector should be a number between 1 and 1. Specified Project Goal We were given 2 choices with instructions to solve one: 1. A single agent following a single target 2. Twenty agents, each following a different single target. A score of >30 averaged over 100 episodes (and averaged over all of the agents for the multi agent task) is required to successfully complete the project. Although we were instructed to choose one of the above. I focused on the second option of 20 agents but the same code can be used for both. Getting Started: Installation The installation of the software is accomplished with the package manager, conda. Installing Anaconda will include conda as well as facilitate the installation of other data science software packages. The Jupyter Notebook App is also required for running this project and is installed automatically with Anaconda. The dependencies for this project can be installed by following the instructions at Required components include but are are not limited to Python 3.6 (I specifically used 3.6.6), and PyTorch v0.4, and a version of the Unity ML Agents toolkit. Note that ML Agents are only supported on Microsoft Windows 10. I only used Windows 10, so cannot vouch for the accuracy of the instructions for other operating systems. 1. After installing anaconda, create (and activate) an environment Linux or Mac : In a terminal window, perform the following commands: conda create name drlnd python 3.6 source activate drlnd Windows : Make sure you are using the anaconda command line rather than the usual windows cmd.exe. conda create name drlnd python 3.6 activate drlnd 2. Clone the Udacity Deep Reinforcement Learning Nanodegree repository and install dependencies. The instructions at indicate that you should enter the following on the command line to clone the the repository and install the dependencies: git clone cd deep reinforcement learning/python pip install . However, for Windows 10, this did not work for me. The pip command fails when it tries to install torch 0.4.0. This version may no longer be available. I edited the dependencies shown in the requirements.txt file in the directory and changed the line for torch from torch 0.4.0 to torch 0.4.1 . The pip command worked after the change. Otherwise you can install the required packages in the requirements folder manually. Sometimes these software packages change and you may need to refer to the specific instructions for an individual package. For example, may be helpful for installing PyTorch. If you clone the DRLND repository, the original files from the project can be found in the folder deep reinforcement learning\p1_navigation 3. Clone or copy my repository or folder for this project The folder is named p2_continuous control_SNH. 4. Download the Unity environment for this project Download the environment from one of the links below. You need only select the environment that matches your operating system: _Version 1: One (1) Agent_ Linux: click here Mac OSX: click here Windows (32 bit): click here Windows (64 bit): click here _Version 2: Twenty (20) Agents_ Linux: click here Mac OSX: click here Windows (32 bit): click here Windows (64 bit): click here Unzip (or decompress) the file which provides a folder. Copy folder into the folder p2_continuous control_SNH. If you want to run both, you must change the names of the decompressed folders since they have the same name. For example, I named the folders the following for versions 1 and 2, respectively: 1. Reacher_Windows_x86_64_v1 2. Reacher_Windows_x86_64_v2 The Jupyter notebook for running the code is called Continuous_Control_SNH.ipynb . The folder name indicated in Section1 of the notebook for starting the environment must match one of these. 5. Prepare and use Jupyter Notebooks for training the agent and for running the software. Create an IPython kernel for the drlnd environment: python m ipykernel install user name drlnd display name drlnd These steps only need to be performed once. Instructions 1. In a terminal window, specifically an Anaconda terminal window for Microsoft Windows, activate the conda environment if not already done: Linux or Mac : source activate drlnd Windows : Make sure you are using the anaconda command line rather than the usual windows cmd.exe. activate drlnd 1. Change directory to the p1_navigate_SNH folder. Run Jupyter Notebook: jupyter notebook 1. Open the notebook Continuous_Control_SNH.ipynb to train with the multi agent environment. The single agent model can be run with Continuous_Control_SNH SELU.ipynb , but this is slower. Before running code in a notebook, change the kernel to match the drlnd environment by using the drop down Kernel menu: Kernel (taken from the Udacity instructions) 1. To train the the agent(s) with the provided parameters, just run all under the Cell drop down menu of the Jupyter notebook. The parameters of the learning agent can be changed in Section 4 of the notebook. The parameters for running the simulation and training the agent can be modified in Section 5. The parameters are described below. During training, multiple checkpoints are saved for running the trained agent later. One of the parameters for training is a prefix string for the checkpoints. This must be provided. I initially included a default name but I kept forgetting to change it and would overwrite my previous files, so I made it a required parameter to specify. For example, if checkpoint_name is specified as 'v2,' the following checkpoint will be generated: v2_actor_first.pth and v2_critic_first.pth : The first time the agent(s) achieve an agent average score of >30, averaged over all of the agents for the multi agent version. For a single agent, the agent average score is the same as the individual score of the agent. v2_actor.pth and v2_critic.pth : The first time the agent(s) achieve a 100 episode average score of >30, meaning the agent average score averaged over 100 episodes. Keep in mind, that the agent was changing during training during those 100 episodes. After training, a run of 100 episodes without any training is performed to see how well it performs. v2_actor_best_agent_average.pth and v2_critic_best_agent_average.pth : The actor and critic network weights for the model that achieves the best agent average. This is only saved. v2_actor_final.pth and v2_critic_final.pth : The most recent version of the agent, trained by the last episode in the training run. Run the notebook named Continous_Control pretrained.ipynb to read in a saved checkpoint and run the environment without further training. Make sure the the network type and the weights stored in the checkpoints match. The agent(s) are defined in Section 3. Please make sure the network name ('SELU' or 'RELU') matches the type of neural network weights stored in the checkpoint, e.g.: agent Agent( ​ state_size state_size, ​ action_size action_size, ​ num_agents num_agents, ​ network 'RELU' ) The name of the checkpoint can be changed in Section 4 of the notebook. It is currently set up to run the agent through 100 episodes end provide scores and the final average score. The final parameter is the number of episodes to run and can also be changed: load_and_run( ​ agent, ​ env, ​ v2_RELU_actor_best_agent_average.pth , ​ v2_RELU_critic_best_agent_average.pth , ​ 100) Files 1. Continuous_Control_SNH.ipynb: Jupyter notebook to train the agent(s) and to save the trained the neural network weights as checkpoints. This notebook is set up for version 2 with multiple agents. 2. Continuous_Control_SNH SELU.ipynb: Nearly identical, except setup to read in the single agent environment. 3. Continuous_Control_SNH pretrained.ipynb: Notebook to read in a saved checkpoint and run the agent without additional learning. 4. model.py: The neural networks 5. agent.py: Defines the learning agent based on DDPG (python class Agent) Parameters These parameters and the implementation are discussed more in the file Report.md. Agent parameters batch_size: Batch size for neural network training lr_actor: Learning rate for the actor neural network lr_critic: Learning rate for the critic neural network noise_theta (float): theta for Ornstein Uhlenbeck noise process noise_sigma (float): sigma for Ornstein Uhlenbeck noise process actor_fc1 (int): Number of hidden units in the first fully connected layer of the actor network actor_fc2: Units in second layer actor_fc3: Units in third fully connected layer. This parameter does nothing for the RELU network critic_fc1: Number of hidden units in the first fully connected layer of the critic network critic_fc2: Units in second layer critic_fc3: Units in third layer. This parameter does nothing for the RELU network update_every: The number of time steps between each updating of the neural networks num_updates: The number of times to update the networks at every update_every interval buffer_size: Buffer size for experience replay. Default 2e6. network (string): The name of the neural networks that are used for learning. There are only 2 choices, one with only 2 fully connected layers and RELU activations and one with 3 fully connected layers with SELU activations. Their names are RELU and SELU, respectively. Default is RELU. Training parameters n_episodes (int): Maximum number of training episodes max_t (int): Maximum number of timesteps per episode epsilon_initial (float): Initial value of epsilon for epsilon greedy selection of an action epsilon_final (float): Final value of epsilon epsilon_rate (float): A rate (0.0 to 1.0) for decreasing epsilon for each episode. Higher is faster decay. gamma_initial (float): Initial gamma discount factor (0 to 1). Higher values favor long term over current rewards. gamma_final (float): Final gamma discount factor (0 to 1). gammma_rate (float): A rate (0 to 1) for increasing gamma. beta_initial (float): For prioritized replay. Corrects bias induced by weighted sampling of stored experiences. The beta parameters have no effect if the agent unless prioritized experience replay is used. beta_rate (float): Rate (0 to 1) for increasing beta to 1 as per Schauel et al. tau_initial (float): Initial value for tau, the weighting factor for soft updating the neural network. The tau parameters have no effect if the agent uses fixed Q targets instead of soft updating. tau_final (float): Final value of tau. tau_rate (float): Rate (0 to 1) for increasing tau each episode.",Atari Games,Playing Games 2409,Playing Games,Playing Games,Other,"Project 2: Continuous Control Scott Hwang snhwang@alum.mit.edu This project was one of the requirements for completing the Deep Reinforcement Learning Nanodegree (DRLND) course at Udacity.com. Project Details: The Environment ​ This project uitlizes the Tennis environment. In this environment, two agents play tennis with each other. Each agent is represented by a racket and they try to learn to hit a ball back and forth between each other across a net. ! (tennis.gif) ​ During one episode of play, an agent earns a reward of +0.1 every time it hits the ball over the net. A negative reward of 0.01 is given if the ball hits the ground or goes out of bounds. Ideally, the two agents should learn how to keep the ball in play to earn a high total reward. State Space ​ The introduction to the course project indicates the state space consists of 8 variables corresponding to the position and velocity of the ball and racket. Each agent receives its own, local observation. On examination of the environment, it indicates that it's state size is 24 for each agent, so there must be other parameters in the state space. Action Space ​ The action space consists of two possible continuous actions, corresponding to movement towards (or away from) the net and jumping. Specified Project Goal The environment is episodic. Each agent earns a score in one episode and the episode is then characterized by the maximum score between the agents. The maximum score between the two agents during one episode is average over 100 consecutive episodes. This 100 episode average of the maximum agent score must exceed +0.5 in order for the environment to be considered solved. Getting Started: Installation ​ The installation of the software is accomplished with the package manager, conda. Installing Anaconda will include conda as well as facilitate the installation of other data science software packages. The Jupyter Notebook App is also required for running this project and is installed automatically with Anaconda. The dependencies for this project can be installed by following the instructions at Required components include but are are not limited to Python 3.6 (I specifically used 3.6.6), and PyTorch v0.4, and a version of the Unity ML Agents toolkit. Note that ML Agents are only supported on Microsoft Windows 10. I only used Windows 10, so cannot vouch for the accuracy of the instructions for other operating systems. 1. After installing anaconda, create (and activate) an environment Linux or Mac : In a terminal window, perform the following commands: conda create name drlnd python 3.6 source activate drlnd Windows : Make sure you are using the anaconda command line rather than the usual windows cmd.exe. conda create name drlnd python 3.6 activate drlnd 2. Clone the Udacity Deep Reinforcement Learning Nanodegree repository and install dependencies. ​ The instructions at indicate that you should enter the following on the command line to clone the the repository and install the dependencies: git clone cd deep reinforcement learning/python pip install . However, for Windows 10, this did not work for me. The pip command fails when it tries to install torch 0.4.0. This version may no longer be available. I edited the dependencies shown in the requirements.txt file in the directory and changed the line for torch from torch 0.4.0 to torch 0.4.1 . The pip command worked after the change. Otherwise you can install the required packages in the requirements folder manually. Sometimes these software packages change and you may need to refer to the specific instructions for an individual package. For example, may be helpful for installing PyTorch. If you clone the DRLND repository, the original files from the project can be found in the folder deep reinforcement learning/p3_collab compet 3. Clone or copy my repository or folder for this project The github repository is 4. Download the Unity environment for this project Download the environment from one of the links below. You need only select the environment that matches your operating system: _Version 1: One (1) Agent_ Linux: click here Mac OSX: click here Windows (32 bit): click here Windows (64 bit): click here _Version 2: Twenty (20) Agents_ Linux: click here Mac OSX: click here Windows (32 bit): click here Windows (64 bit): click here Unzip (or decompress) the file which provides a folder. Copy folder into the folder p3_collab compet SNH. The Jupyter notebook for running the code is called Tennis SNH.ipynb . The folder name indicated in Section1 of the notebook for starting the environment must match one of the folder you copied the environment into. 5. Prepare and use Jupyter Notebooks for training the agent and for running the software. Create an IPython kernel for the drlnd environment: python m ipykernel install user name drlnd display name drlnd These steps only need to be performed once. Instructions 1. In a terminal window, specifically an Anaconda terminal window for Microsoft Windows, activate the conda environment if not already done: Linux or Mac : source activate drlnd Windows : Make sure you are using the anaconda command line rather than the usual windows cmd.exe. activate drlnd 1. Change directory to the p1_navigate_SNH folder. Run Jupyter Notebook: jupyter notebook 1. Open the notebook Tennis SNH.ipynb to train with the multi agent environment. Before running code in a notebook, change the kernel to match the drlnd environment by using the drop down Kernel menu: Kernel (taken from the Udacity instructions) 1. To train the the agent(s) with the provided parameters, just run all under the Cell drop down menu of the Jupyter notebook. The parameters of the learning agent can be changed in Section 4 of the notebook. The parameters for running the simulation and training the agent can be modified in Section 5. The parameters are described below. During training, multiple checkpoints are saved for running the trained agent later: checkpoint_actor_first.pth and checkpoint_critic_first.pth : The first time an agent achieves a scores >0.5 checkpoint_actor.pth and checkpoint_critic.pth : The first time the agents achieve a 100 episode average maximum score of >0.5. Keep in mind, that the agent's neural networks were changing during training during those 100 episodes. After training, a run of 100 episodes without any training can be performed using one of the checkpoints to see how well it performs. checkpoint_actor_best_agent_max.pth and checkpoint_critic_best_agent_max.pth : The actor and critic network weights for the model that achieve the highest maximum. checkpoint_actor_best_avg_max.pth and checkpoint_critic_best_avg_max.pth : The actor and critic network weights for the model that achieve the highest 100 episode average of the maximum episode score. checkpoint_actor_final.pth and checkpoint_critic_final.pth : The most recent version of the neural networks, trained by the last episode in the training run. Run the notebook named Tennis SNH pretrained.ipynb to read in a saved checkpoint and run the environment without further training. Make sure the the network type and the weights stored in the checkpoints match. The agent(s) are defined in Section 3. Please make sure the network name ('SELU' or 'RELU') matches the type of neural network weights stored in the checkpoint, e.g.: agent Agent( state_size state_size, action_size action_size, num_agents num_agents, network 'RELU' ) The default is 'RELU' if not specified and I did not get 'SELU' to work. I recommend just using 'RELU,' or not specifying so that it just always uses the default. The name of the checkpoint can be changed in Section 4 of the notebook. The following examples shows how to run the agents using the checkpoint files for the neural networks which achieved the highest maximum agent score in a single episode. which a agent through 100 episodes and provide scores as well as the final average score. The final parameter is the number of episodes to run and can also be changed: load_and_run( agent, env, checkpoint_actor_best_agent_max.pth , checkpoint_critic_best_agent_max.pth , 100 ) Files 1. Tennis SNH.ipynb: Jupyter notebook to train the agent(s) and to save the trained the neural network weights as checkpoints. This notebook is set up for version 2 with multiple agents. 2. Tennis SNH pretrained.ipynb: Notebook to read in a saved checkpoint and run the agent without additional learning. 3. model.py: The neural networks 4. agent.py: Defines the learning agent based on DDPG (python class Agent) 5. Multiple files with the prefix .pth : Checkpoint files contained the weights of previously training neural networks. Parameters These parameters and the implementation are discussed more in the file Report.md. Agent parameters batch_size: Batch size for neural network training lr_actor: Learning rate for the actor neural network lr_critic: Learning rate for the critic neural network noise_theta (float): theta for Ornstein Uhlenbeck noise process noise_sigma (float): sigma for Ornstein Uhlenbeck noise process actor_fc1 (int): Number of hidden units in the first fully connected layer of the actor network actor_fc2: Units in second layer actor_fc3: Units in third fully connected layer. This parameter does nothing for the RELU network critic_fc1: Number of hidden units in the first fully connected layer of the critic network critic_fc2: Units in second layer critic_fc3: Units in third layer. This parameter does nothing for the RELU network update_every: The number of time steps between each updating of the neural networks num_updates: The number of times to update the networks at every update_every interval buffer_size: Buffer size for experience replay. Default 2e6. network (string): The name of the neural networks that are used for learning. There are only 2 choices, one with only 2 fully connected layers and RELU activations and one with 3 fully connected layers with SELU activations. Their names are RELU and SELU, respectively. Default is RELU. Training parameters n_episodes (int): Maximum number of training episodes max_t (int): Maximum number of timesteps per episod epsilon_initial (float): Initial value of epsilon for epsilon greedy selection of an action epsilon_final (float): Final value of epsilon epsilon_rate (float): A rate (0.0 to 1.0) for decreasing epsilon for each episode. Higher is faster decay. gamma_initial (float): Initial gamma discount factor (0 to 1). Higher values favor long term over current rewards. gamma_final (float): Final gamma discount factor (0 to 1). gammma_rate (float): A rate (0 to 1) for increasing gamma. beta_initial (float): For prioritized replay. Corrects bias induced by weighted sampling of stored experiences. The beta parameters have no effect if the agent unless prioritized experience replay is used. beta_rate (float): Rate (0 to 1) for increasing beta to 1 as per Schauel et al. tau_initial (float): Initial value for tau, the weighting factor for soft updating the neural network. The tau parameters have no effect if the agent uses fixed Q targets instead of soft updating. tau_final (float): Final value of tau. tau_rate (float): Rate (0 to 1) for increasing tau each episode. Please refer to report.pdf for more details about the algorithm and for some results: ! (plot.png)",Atari Games,Playing Games 2430,Playing Games,Playing Games,Other,"Introduction This package provides a Lasagne/Theano based implementation of the deep Q learning algorithm described in: Playing Atari with Deep Reinforcement Learning Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller and Mnih, Volodymyr, et al. Human level control through deep reinforcement learning. Nature 518.7540 (2015): 529 533. Here is a video showing a trained network playing breakout (using an earlier version of the code): Dependencies A reasonably modern NVIDIA GPU OpenCV Theano ( Lasagne ( Pylearn2 ( Arcade Learning Environment ( The script dep_script.sh can be used to install all dependencies under Ubuntu. Running Use the scripts run_nips.py or run_nature.py to start all the necessary processes: $ ./run_nips.py rom breakout $ ./run_nature.py rom breakout The run_nips.py script uses parameters consistent with the original NIPS workshop paper. This code should take 2 4 days to complete. The run_nature.py script uses parameters consistent with the Nature paper. The final policies should be better, but it will take 6 10 days to finish training. Either script will store output files in a folder prefixed with the name of the ROM. Pickled version of the network objects are stored after every epoch. The file results.csv will contain the testing output. You can plot the progress by executing plot_results.py : $ python plot_results.py breakout_05 28 17 09_0p00025_0p99/results.csv After training completes, you can watch the network play using the ale_run_watch.py script: $ python ale_run_watch.py breakout_05 28 17 09_0p00025_0p99/network_file_99.pkl Performance Tuning Theano Configuration Setting allow_gc False in THEANO_FLAGS or in the .theanorc file significantly improves performance at the expense of a slight increase in memory usage on the GPU. Getting Help The deep Q learning web forum can be used for discussion and advice related to deep Q learning in general and this package in particular. See Also This is the code DeepMind used for the Nature paper. The license only permits the code to be used for evaluating and reviewing the claims made in the paper. Working Caffe based implementation. (I haven't tried it, but there is a video of the agent playing Pong successfully.) Defunct? As far as I know, this package was never fully functional. The project is described here: This is an almost working implementation developed during Spring 2014 by my student Brian Brown. I haven't reused his code, but Brian and I worked together to puzzle through some of the blank areas of the original paper.",Atari Games,Playing Games 2435,Playing Games,Playing Games,Other,reinforcement\_learning\_algorithms Catalog of reinforcement learning algorithms on Frozen Lake from gym: 1. Montecarlo Policy Gradient 2. TD 3. Baseline Policy Gradient 4. DQN 5. DDQN 6. Actor Critic,Atari Games,Playing Games 2461,Playing Games,Playing Games,Other,"Deep Q Network using Tensorflow This repository contains Deep Q Networks and Double Deep Q Networks implementation in tensorflow for Open AI Gym environments such as Cartpole problem and the mountain car problems. Its a plug and play code with no preprocessing required and just the main code has to be run. Different hyperparameters can be changed as per the user. The Deep Q Networks algorithm implemented in the code is a direct implementation of the paper Playing Atari with Deep Reinforcement Learning by Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller. Archive Link: The code is currently by default set for running in tensorflow gpu and thus requires tensorflow gpu installation but can be easily modified for making it run in CPU Libraries Required: Tensorflow GPU Open AI Gym (full package installation) Note: If someone can write some code to save the trained model it would be great and I would love to add it to the main branch.",Atari Games,Playing Games 2471,Playing Games,Playing Games,Other,"Binder Demo Notebook About Chess reinforcement learning by AlphaGo Zero methods. This project is based on these main resources: 1) DeepMind's Oct 19th publication: Mastering the Game of Go without Human Knowledge . 2) The great Reversi development of the DeepMind ideas that @mokemokechicken did in his repo: 3) DeepMind just released a new version of AlphaGo Zero (named now AlphaZero) where they master chess from scratch: In fact, in chess AlphaZero outperformed Stockfish after just 4 hours (300k steps) Wow! See the wiki for more details. Note I'm the creator of this repo. I (and some others collaborators did our best: but we found the self play is too much costed for an only machine. Supervised learning worked fine but we never try the self play by itself. Anyway I want to mention we have moved to a new repo where lot of people is working in a distributed version of AZ for chess (MCTS in C++): Project is almost done and everybody will be able to participate just by executing a pre compiled windows (or Linux) application. A really great job and effort has been done is this project and I'm pretty sure we'll be able to simulate the DeepMind results in not too long time of distributed cooperation. So, I ask everybody that wish to see a UCI engine running a neural network to beat Stockfish go into that repo and help with his machine power. Environment Python 3.6.3 tensorflow gpu: 1.3.0 Keras: 2.0.8 New results (after a great number of modifications due to @Akababa) Using supervised learning on about 10k games, I trained a model (7 residual blocks of 256 filters) to a guesstimate of 1200 elo with 1200 sims/move. One of the strengths of MCTS is it scales quite well with computing power. Here you can see an example where I (black) played against the model in the repo (white): ! img Here you can see an example of a game where I (white, 2000 elo) played against the model in this repo (black): ! img First good results Using the new supervised learning step I created, I've been able to train a model to the point that seems to be learning the openings of chess. Also it seems the model starts to avoid losing naively pieces. Here you can see an example of a game played for me against this model (AI plays black): ! partida1 Here we have a game trained by @bame55 (AI plays white): ! partida3 This model plays in this way after only 5 epoch iterations of the 'opt' worker, the 'eval' worker changed 4 times the best model (4 of 5). At this moment the loss of the 'opt' worker is 5.1 (and still seems to be converging very well). Modules Supervised Learning I've done a supervised learning new pipeline step (to use those human games files PGN we can find in internet as play data generator). This SL step was also used in the first and original version of AlphaGo and maybe chess is a some complex game that we have to pre train first the policy model before starting the self play process (i.e., maybe chess is too much complicated for a self training alone). To use the new SL process is as simple as running in the beginning instead of the worker self the new worker sl . Once the model converges enough with SL play data we just stop the worker sl and start the worker self so the model will start improving now due to self play data. bash python src/chess_zero/run.py sl If you want to use this new SL step you will have to download big PGN files (chess files) and paste them into the data/play_data folder ( FICS is a good source of data). You can also use the SCID program to filter by headers like player ELO, game result and more. To avoid overfitting, I recommend using data sets of at least 3000 games and running at most 3 4 epochs. Reinforcement Learning This AlphaGo Zero implementation consists of three workers: self , opt and eval . self is Self Play to generate training data by self play using BestModel. opt is Trainer to train model, and generate next generation models. eval is Evaluator to evaluate whether the next generation model is better than BestModel. If better, replace BestModel. Distributed Training Now it's possible to train the model in a distributed way. The only thing needed is to use the new parameter: type distributed : use mini config for testing, (see src/chess_zero/configs/distributed.py ) So, in order to contribute to the distributed team you just need to run the three workers locally like this: bash python src/chess_zero/run.py self type distributed (or python src/chess_zero/run.py sl type distributed) python src/chess_zero/run.py opt type distributed python src/chess_zero/run.py eval type distributed GUI uci launches the Universal Chess Interface, for use in a GUI. To set up ChessZero with a GUI, point it to C0uci.bat (or rename to .sh). For example, this is screenshot of the random model using Arena's self play feature: ! capture Data data/model/model_best_ : BestModel. data/model/next_generation/ : next generation models. data/play_data/play_ .json : generated training data. logs/main.log : log file. If you want to train the model from the beginning, delete the above directories. How to use Setup install libraries bash pip install r requirements.txt If you want to use GPU, follow these instructions to install with pip3. Make sure Keras is using Tensorflow and you have Python 3.6.3+. Depending on your environment, you may have to run python3/pip3 instead of python/pip. Basic Usage For training model, execute Self Play , Trainer and Evaluator . Note : Make sure you are running the scripts from the top level directory of this repo, i.e. python src/chess_zero/run.py opt , not python run.py opt . Self Play bash python src/chess_zero/run.py self When executed, Self Play will start using BestModel. If the BestModel does not exist, new random model will be created and become BestModel. options new : create new BestModel type mini : use mini config for testing, (see src/chess_zero/configs/mini.py ) Trainer bash python src/chess_zero/run.py opt When executed, Training will start. A base model will be loaded from latest saved next generation model. If not existed, BestModel is used. Trained model will be saved every epoch. options type mini : use mini config for testing, (see src/chess_zero/configs/mini.py ) total step : specify total step(mini batch) numbers. The total step affects learning rate of training. Evaluator bash python src/chess_zero/run.py eval When executed, Evaluation will start. It evaluates BestModel and the latest next generation model by playing about 200 games. If next generation model wins, it becomes BestModel. options type mini : use mini config for testing, (see src/chess_zero/configs/mini.py ) Tips and Memory GPU Memory Usually the lack of memory cause warnings, not error. If error happens, try to change vram_frac in src/configs/mini.py , python self.vram_frac 1.0 Smaller batch_size will reduce memory usage of opt . Try to change TrainerConfig batch_size in MiniConfig .",Game of Go,Playing Games 2480,Playing Games,Playing Games,Other,"Introduction This repository is a fork of the Nathan Sprague implementation of the deep Q learning algorithm described in: Playing Atari with Deep Reinforcement Learning Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller and Mnih, Volodymyr, et al. Human level control through deep reinforcement learning. Nature 518.7540 (2015): 529 533. We use the DQN algorithm to learn the strategies for Atari games using the RAM state of the machine. Dependencies A reasonably modern NVIDIA GPU OpenCV Theano ( Lasagne ( Pylearn2 ( Arcade Learning Environment ( The script dep_script.sh can be used to install all dependencies under Ubuntu. Running We've done a number of experiments with models that use RAM state. They don't fully share the code, so we split them in branches. To re run them, you can use our scripts, which are located in the main directory of the repository. Network types just_ram network that takes only RAM as inputs, passes it through 2 ReLU layers with 128 nodes each and scales the output to the appropriate size big_ram the analogous network, but with 4 hidden layers mixed_ram network taking both ram and screen as an input big_mixed_ram deeper version of mixed_ram ram_dropout the just_ram with applied dropout to all the layers except the output big_dropout the big_ram network with dropout Frame skip Evaluation of a model using a different frame skip: ./frameskip.sh , e.g: ./frameskip.sh breakout just_ram 8 Dropout We added dropout to the two ram only networks. You can run it as: ./dropout.sh ram_dropout OR ./dropout big_dropout ram_dropout is a network with two dense hidden layers, big_dropout with 4. Weight decay You can try the models with l2 regularization using: ./weight decay.sh , e.g: ./weight decay.sh breakout big_ram Decreasing learning rate The models with learning rate decreased to $0.001$ can be run as: ./learningrate.sh , e.g: ./learningrate.sh breakout big_ram Roms You need to put roms in the roms subdirectory. Their names should be spelled with lowercase letters, e.g. breakout.bin . See Also Original Nathan Sprague implementation of DQN. This is the code DeepMind used for the Nature paper. The license only permits the code to be used for evaluating and reviewing the claims made in the paper. Working Caffe based implementation. (I haven't tried it, but there is a video of the agent playing Pong successfully.) Defunct? As far as I know, this package was never fully functional. The project is described here: This is an almost working implementation developed during Spring 2014 by my student Brian Brown. I haven't reused his code, but Brian and I worked together to puzzle through some of the blank areas of the original paper.",Atari Games,Playing Games 2496,Playing Games,Playing Games,Other,"Deep Learning Project for IEOR 4720 Implementation of Bootstrapped DQN To run this code ensure that you have numpy, gym, tensorflow, sklearn, pandas and matplotlib.pyplot installed. This repository contains 2 files. 1. Simple_Version_CartPole.ipynb: This is a simple version of our project that only has 2 fully connected layers. It is only for the purpose of running CartPole which is the simplest possible environment on Atari Gym. To run the Bootstrapped DQN in line 2 set model_name Bootstrap To run the DQN in line 2 set model_name DQN 2. Modified_DQN.ipynb: This is a version of our project in which we have made an improvement to the Bootstrapped DQN. It is a modification. It is for the purpose of running any game in Atari Gym. You can run the DQN, Modified DQN and Bootstrapped DQN for any Atari Game.You can set the game you want to play in line 4: env gym.make( Qbert v0 ). To run the Modified DQN in line 2 set model_name MDQN To run the Bootstrapped DQN in line 2 set model_name Bootstrap To run the DQN in line 2 set model_name DQN",Atari Games,Playing Games 2518,Playing Games,Playing Games,Other,"AI and Deep learning tools for Unity using CNTK Note This project was developed for Aalto University's Computational Intelligence in Games course material. The development is stopped now, because we decided to use Tensorflowsharp with Unity MLAgent instead of CNTK for multiplatform support. The new project will be in public soon. Here . Content This rep contains some useful deep learning related tools implemented primarily using CNTK C library. Current contents: Helper functions to build/train neural network layers. Layers definitions Simple neural network cGAN Universal Style Transfer Reinforcement Learning Proximal Policy Optimization(PPO) Deep Q Learning(DQL) Platform and Installation Currently it only works on Windows. If you need to use GPU for NN, you also need a proper Nvidia graphic card. Installation steps: 1. Download the repo(Unity project) 2. Download the zip that includes necessary dlls 3. Put the dlls in correct places: (Adapted from Put those files/folders into any Plugins folder under /DeepLearningToolsForUnity/Assets. Cntk.Core.Managed 2.4.dll MathNet.Numerics.dll MathNet.Numerics.MKL.dll System.Drawing.dll Accord folder Copy the other dlls(not folders), and put them DIRECTLY under /DeepLearningToolsForUnity folder, or another place where Windows can find those dlls. 4. Done. Note that the file Assets/UnityCNTK/Tools/UniversalStyleTransfer/Data/UST_combined.bytes uses Git LFS, be sure you download it correctly (It should be larger than 100MB) Documentation Go to Wiki to see detailed documentaion.",Atari Games,Playing Games 2532,Playing Games,Playing Games,Other,"Minigo: A minimalist Go engine modeled after AlphaGo Zero, built on MuGo This is an implementation of a neural network based Go AI, using TensorFlow. While inspired by DeepMind's AlphaGo algorithm, this project is not a DeepMind project nor is it affiliated with the official AlphaGo project. This is NOT an official version of AlphaGo Repeat, this is not the official AlphaGo program by DeepMind . This is an independent effort by Go enthusiasts to replicate the results of the AlphaGo Zero paper ( Mastering the Game of Go without Human Knowledge, Nature ), with some resources generously made available by Google. Minigo is based off of Brian Lee's MuGo a pure Python implementation of the first AlphaGo paper Mastering the Game of Go with Deep Neural Networks and Tree Search published in Nature . This implementation adds features and architecture changes present in the more recent AlphaGo Zero paper, Mastering the Game of Go without Human Knowledge . More recently, this architecture was extended for Chess and Shogi in Mastering Chess and Shogi by Self Play with a General Reinforcement Learning Algorithm . These papers will often be abridged in Minigo documentation as AG (for AlphaGo), AGZ (for AlphaGo Zero), and AZ (for AlphaZero) respectively. Goals of the Project 1. Provide a clear set of learning examples using Tensorflow, Kubernetes, and Google Cloud Platform for establishing Reinforcement Learning pipelines on various hardware accelerators. 2. Reproduce the methods of the original DeepMind AlphaGo papers as faithfully as possible, through an open source implementation and open source pipeline tools. 3. Provide our data, results, and discoveries in the open to benefit the Go, machine learning, and Kubernetes communities. An explicit non goal of the project is to produce a competitive Go program that establishes itself as the top Go AI. Instead, we strive for a readable, understandable implementation that can benefit the community, even if that means our implementation is not as fast or efficient as possible. While this product might produce such a strong model, we hope to focus on the process. Remember, getting there is half the fun. :) We hope this project is an accessible way for interested developers to have access to a strong Go model with an easy to understand platform of python code available for extension, adaptation, etc. If you'd like to read about our experiences training models, see RESULTS.md (RESULTS.md). To see our guidelines for contributing, see CONTRIBUTING.md (CONTRIBUTING.md). Getting Started This project assumes you have the following: virtualenv / virtualenvwrapper Python 3.5+ Docker Cloud SDK The Hitchhiker's guide to python has a good intro to python development and virtualenv usage. The instructions after this point haven't been tested in environments that are not using virtualenv. shell pip3 install virtualenv pip3 install virtualenvwrapper Install Bazel shell wget chmod 755 bazel 0.19.2 installer linux x86_64.sh sudo ./bazel 0.19.2 installer linux x86_64.sh rm bazel 0.19.2 installer linux x86_64.sh Install TensorFlow First set up and enter your virtualenv and then the shared requirements: pip3 install r requirements.txt Then, you'll need to choose to install the GPU or CPU tensorflow requirements: GPU: pip3 install tensorflow gpu> 1.13, 1.13, GCE_VM_NAME GCE_ZONE In this example, we will use the following values: GCE_PROJECT example project GCE_VM_NAME minigo etpu test GCE_ZONE us central1 f Create the Cloud TPU enabled VM. ctpu up \ project ${GCE_PROJECT} \ zone ${GCE_ZONE} \ name ${GCE_VM_NAME} \ tf version 1.13 This will take a few minutes and you should see output similar to the following: ctpu will use the following configuration values: Name: minigo etpu test Zone: us central1 f GCP Project: example project TensorFlow Version: 1.13 OK to create your Cloud TPU resources with the above configuration? Yn : y 2019/04/09 10:50:04 Creating GCE VM minigo etpu test (this may take a minute)... 2019/04/09 10:50:04 Creating TPU minigo etpu test (this may take a few minutes)... 2019/04/09 10:50:11 GCE operation still running... 2019/04/09 10:50:12 TPU operation still running... Once the Cloud TPU is created, ctpu will have SSHed you into the machine. Remember to set the same environment variables on your VM. GCE_PROJECT example project GCE_VM_NAME minigo etpu test GCE_ZONE us central1 f Clone the Minigo Github repository: git clone cd minigo Install virtualenv. pip3 install virtualenv virtualenvwrapper Create a virtual environment virtualenv p /usr/bin/python3 system site packages ${HOME}/.venvs/minigo Activate the virtual environment. source ${HOME}/.venvs/minigo/bin/activate Install Minigo dependencies (TensorFlow for Cloud TPU is already installed as part of the VM image). pip install r requirements.txt When training on a Cloud TPU, the training work directory must be on Google Cloud Storage. You'll need to choose your own globally unique bucket name. The bucket location should be close to your VM. GCS_BUCKET_NAME minigo_test_bucket GCE_BUCKET_LOCATION us central1 gsutil mb p ${GCE_PROJECT} l ${GCE_BUCKET_LOCATION} gs://${GCS_BUCKET_NAME} Run the training script and note the location of the training work_dir it reports, e.g. Writing to gs://minigo_test_bucket/train/2019 04 25 18 ./oneoffs/train.sh ${GCS_BUCKET_NAME} Launch tensorboard, pointing it at the work_dir reported by the train.sh script. tensorboard logdir gs://minigo_test_bucket/train/2019 04 25 18 After a few minutes, TensorBoard should start updating. Interesting graphs to look at are value_cost_normalized, policy_cost and policy_entropy. Running Minigo on a Kubernetes Cluster See more at cluster/README.md",Game of Go,Playing Games 2536,Playing Games,Playing Games,Other,"TensorFlow Models This repository contains a number of different models implemented in TensorFlow : The official models (official) are a collection of example models that use TensorFlow's high level APIs. They are intended to be well maintained, tested, and kept up to date with the latest stable TensorFlow API. They should also be reasonably optimized for fast performance while still being easy to read. We especially recommend newer TensorFlow users to start here. The research models are a large collection of models implemented in TensorFlow by researchers. They are not officially supported or available in release branches; it is up to the individual researchers to maintain the models and/or provide support on issues and pull requests. The samples folder (samples) contains code snippets and smaller models that demonstrate features of TensorFlow, including code presented in various blog posts. The tutorials folder (tutorials) is a collection of models described in the TensorFlow tutorials . Contribution guidelines If you want to contribute to models, be sure to review the contribution guidelines (CONTRIBUTING.md). License Apache License 2.0 (LICENSE)",Atari Games,Playing Games 2562,Playing Games,Playing Games,Other,"Introduction Neural Symbolic Machines (NSM) Neural Symbolic Machines is a framework to integrate neural networks and symbolic representations using reinforcement learning. Applications The framework can be used to learn semantic parsing and program synthesis from weak supervision (e.g., question answer pairs), which is easier to collect and more flexible than full supervision (e.g., question program pairs). Applications include virtual assistant, natural language interface to database, human robot interaction, etc. It has been used to learn semantic parsers on Freebase and natural language interfaces to database tables . Memory Augmented Policy Optimization (MAPO) We use Memory Augmented Policy Optimization (MAPO) to train NSM. It is a novel policy optimization formulation that incorporates a memory buffer of promising trajectories to reduce the variance of policy gradient estimates for deterministic environments with discrete actions. We also apply systematic exploration to improve exploration and marginal likelihood constraint to accelerate and stabilize training. Distributed Actor Learner Architecture Our implementation uses a distributed actor learner architecture that utilizes multiple CPUs and GPUs for scalable training, similar to the one introduced in the IMPALA paper from DeepMind . Dependencies Python 2.7 TensorFlow> 1.7 Other required packages are summarized in requirements.txt . Quick start Setup AWS instance Start a g3.8xlarge instance with “Deep Learning AMI (Ubuntu) Version 10.0” image. (The experiments are conducted using this type of instance and image, you will need to adjust the configurations in scripts to run on other instances.) Open port (for example, 6000 6010) in the security group for tensorboard. Instructions: ssh into the instance. Download the data and install the dependencies mkdir /projects cd /projects/ git clone cd /projects/neural symbolic machines/ ./aws_setup.sh Running experiments and monitor with tensorboard Start WikiTable experiment screen S wtq source activate tensorflow_p27 cd /projects/neural symbolic machines/table/wtq/ ./run.sh mapo your_experiment_name This script trains the model for 30k steps and evaluates the checkpoint with the highest dev accuracy on the test set. It takes about 2.5 hrs to finish. All the data about this experiment will be saved in /projects/data/wikitable/output/your_experiment_name , and the evaluation result would be saved in /projects/data/wikitable/output/eval_your_experiment_name . You could also evaluate a trained model on the dev set or test set using ./eval.sh your_experiment_name dev ./eval.sh your_experiment_name test Start tensorboard to monitor WikiTable experiment screen S tb source activate tensorflow_p27 cd /projects/data/wikitable/ tensorboard logdir output To see the tensorboard, in the browser, go to your AWS public DNS :6006 avg_return_1 is the main metric (accuracy). Start WikiSQL experiment. screen S ws source activate tensorflow_p27 cd /projects/neural symbolic machines/table/wikisql/ ./run.sh mapo your_experiment_name This script trains the model for 15k steps and evaluates the checkpoint with the highest dev accuracy on the test set. It takes about 6.5 hrs to finish. All the data about this experiment will be saved in /projects/data/wikisql/output/your_experiment_name , and the evaluation result would be saved in /projects/data/wikisql/output/eval_your_experiment_name . You could also evaluate a trained model on the dev set or test set using ./eval.sh your_experiment_name dev ./eval.sh your_experiment_name test Start tensorboard to monitor WikiSQL experiment screen S tb source activate tensorflow_p27 cd /projects/data/wikisql/ tensorboard logdir output To see the tensorboard, in the browser, go to your AWS public DNS :6006 avg_return_1 is the main metric (accuracy). Example outputs Example learning curves for WikiTable (left) and WikiSQL (right) experiments (0.9 smoothing): Citation If you use the code in your research, please cite: @misc{1807.02322, Author {Chen Liang and Mohammad Norouzi and Jonathan Berant and Quoc Le and Ni Lao}, Title {Memory Augmented Policy Optimization for Program Synthesis and Semantic Parsing}, Year {2018}, Eprint {arXiv:1807.02322}, } @inproceedings{liang2017neural, title {Neural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak Supervision}, author {Liang, Chen and Berant, Jonathan and Le, Quoc and Forbus, Kenneth D and Lao, Ni}, booktitle {Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)}, volume {1}, pages {23 33}, year {2017} }",Atari Games,Playing Games 2630,Playing Games,Playing Games,Other,"DQN chainer This software is a python implementation of Deep Q Networks for playing ATARI games with Chainer package. I followed the implementation described in: V. Mnih et al ., Playing atari with deep reinforcement learning V. Mnih et al. , Human level control through deep reinforcement learning For japanese instruction of DQN and historical review, please check: Requirement My implementation is dependent on RL glue, Arcade Learning Environment, and Chainer. To run the software, you need following softwares/packages. Python 2.7+ Numpy Scipy Pillow (PIL) Chainer (1.3.0): RL glue core: RL glue Python codec: Arcade Learning Environment (version ALE 0.4.4): This software was tested on Ubuntu 14.04 LTS. How to run Please check readme.txt",Atari Games,Playing Games 2641,Playing Games,Playing Games,Other,"rl_implementation Implementation of DQNs. Environment : OpenAI Gym Atari 2600 games Papers DQN : Playing Atari with Deep Reinforcement Learning Double DQN : Deep Reinforcement Learning with Double Q learning Prioritized Replay : PRIORITIZED EXPERIENCE REPLAY Dueling Network : Dueling Network Architectures for Deep Reinforcement Learning Ape X DQN : DISTRIBUTED PRIORITIZED EXPERIENCE REPLAY Usage $ python dqn_atari.py prioritized 1 double 1 dueling 1 n_step 3 Prioritized Experience Replay prioritezed : 0 or 1 Double Deep Q Learning (DDQN) double : 0 or 1 Dueling Network dueling : 0 or 1 multi step bootstrap target n_step : int (1 : normal TD error) Other arguments are described in dqn_atari.py Ape X DQN See Results After 12,000 episodes (Ape X DQN) ! apex Learning curves",Atari Games,Playing Games 2682,Playing Games,Playing Games,Other,"universe starter agent The codebase implements a starter agent that can solve a number of universe environments. It contains a basic implementation of the A3C algorithm , adapted for real time environments. Dependencies Python 2.7 or 3.5 six (for py2/3 compatibility) TensorFlow 0.11 tmux (the start script opens up a tmux session with multiple windows) htop (shown in one of the tmux windows) gym gym atari universe opencv python numpy scipy Getting Started conda create name universe starter agent python 3.5 source activate universe starter agent brew install tmux htop On Linux use sudo apt get install y tmux htop pip install gym atari pip install universe pip install six pip install tensorflow conda install y c opencv3 conda install y numpy conda install y scipy Add the following to your .bashrc so that you'll have the correct environment when the train.py script spawns new bash shells source activate universe starter agent Atari Pong python train.py num workers 2 env id PongDeterministic v3 log dir /tmp/pong The command above will train an agent on Atari Pong using ALE simulator. It will see two workers that will be learning in parallel ( num workers flag) and will output intermediate results into given directory. The code will launch the following processes: worker 0 a process that runs policy gradient worker 1 a process identical to process 1, that uses different random noise from the environment ps the parameter server, which synchronizes the parameters among the different workers tb a tensorboard process for convenient display of the statistics of learning Once you start the training process, it will create a tmux session with a window for each of these processes. You can connect to them by typing tmux a in the console. Once in the tmux session, you can see all your windows with ctrl b w . To switch to window number 0, type: ctrl b 0 . Look up tmux documentation for more commands. To access TensorBoard to see various monitoring metrics of the agent, open in a browser. Using 16 workers, the agent should be able to solve PongDeterministic v3 (not VNC) within 30 minutes (often less) on an m4.10xlarge instance. Using 32 workers, the agent is able to solve the same environment in 10 minutes on an m4.16xlarge instance. If you run this experiment on a high end MacBook Pro, the above job will take just under 2 hours to solve Pong. ! pong For best performance, it is recommended for the number of workers to not exceed available number of CPU cores. You can stop the experiment with tmux kill session command. Playing games over remote desktop The main difference with the previous experiment is that now we are going to play the game through VNC protocol. The VNC environments are hosted on the EC2 cloud and have an interface that's different from a conventional Atari Gym environment; luckily, with the help of several wrappers (which are used within envs.py file) the experience should be similar to the agent as if it was played locally. The problem itself is more difficult because the observations and actions are delayed due to the latency induced by the network. More interestingly, you can also peek at what the agent is doing with a VNCViewer. Note that the default behavior of train.py is to start the remotes on a local machine. Take a look at for documentation on managing your remotes. Pass additional r flag to point to pre existing instances. VNC Pong python train.py num workers 2 env id gym core.PongDeterministic v3 log dir /tmp/vncpong _Peeking into the agent's environment with TurboVNC_ You can use your system viewer as open vnc://localhost:5900 (or open vnc://${docker_ip}:5900 ) or connect TurboVNC to that ip/port. VNC password is openai . ! pong Important caveats One of the novel challenges in using Universe environments is that they operate in real time , and in addition, it takes time for the environment to transmit the observation to the agent. This time creates a lag: where the greater the lag, the harder it is to solve environment with today's RL algorithms. Thus, to get the best possible results it is necessary to reduce the lag, which can be achieved by having both the environments and the agent live on the same high speed computer network. So for example, if you have a fast local network, you could host the environments on one set of machines, and the agent on another machine that can speak to the environments with low latency. Alternatively, you can run the environments and the agent on the same EC2/Azure region. Other configurations tend to have greater lag. To keep track of your lag, look for the phrase reaction_time in stderr. If you run both the agent and the environment on nearby machines on the cloud, your reaction_time should be as low as 40ms. The reaction_time statistic is printed to stderr because we wrap our environment with the Logger wrapper, as done in here ( ). Generally speaking, environments that are most affected by lag are games that place a lot of emphasis on reaction time. For example, this agent is able to solve VNC Pong ( gym core.PongDeterministic v3 ) in under 2 hours when both the agent and the environment are co located on the cloud, but this agent had difficulty solving VNC Pong when the environment was on the cloud while the agent was not. This issue affects environments that place great emphasis on reaction time. A note on tuning This implementation has been tuned to do well on VNC Pong, and we do not guarantee its performance on other tasks. It is meant as a starting point. Playing flash games You may run the following command to launch the agent on the game Neon Race: python train.py num workers 2 env id flashgames.NeonRace v0 log dir /tmp/neonrace _What agent sees when playing Neon Race_ (you can connect to this view via note ( vnc pong) above) ! neon Getting 80% of the maximal score takes between 1 and 2 hours with 16 workers, and getting to 100% of the score takes about 12 hours. Also, flash games are run at 5fps by default, so it should be possible to productively use 16 workers on a machine with 8 (and possibly even 4) cores. Next steps Now that you have seen an example agent, develop agents of your own. We hope that you will find doing so to be an exciting and an enjoyable task.",Atari Games,Playing Games 2720,Playing Games,Playing Games,Other,"StreetLearn Overview This repository contains an implementation of the StreetLearn environment for training navigation agents as well as code for implementing the agents used in the NeurIPS 2018 paper on Learning to Navigate in Cities Without a Map . The StreetLearn environment relies on panorama images from Google Street View and provides an interface for moving a first person view agent inside the Street View graph. This is not an officially supported Google product. For a detailed description of the architecture please read our paper. Please cite the paper if you use the code from this repository in your work. Our paper also provides a detailed description of how to train and implement navigation agents in the StreetLearn environment by using a TensorFlow implementation of Importance Weighted Actor Learner Architectures , published in Espeholt, Soyer, Munos et al. (2018) IMPALA: Scalable Distributed Deep RL with Importance Weighted Actor Learner Architectures . The generic agent and trainer code have been published by Lasse Espeholt under an Apache license at: Bibtex @article{mirowski2018learning, title {Learning to Navigate in Cities Without a Map}, author {Mirowski, Piotr and Grimes, Matthew Koichi and Malinowski, Mateusz and Hermann, Karl Moritz and Anderson, Keith and Teplyashin, Denis and Simonyan, Karen and Kavukcuoglu, Koray and Zisserman, Andrew and Hadsell, Raia}, journal {arXiv preprint arXiv:1804.00168}, year {2018} } Code structure This environment code contains: streetlearn/engine Our C++ StreetLearn engine for loading, caching and serving Google Street View panoramas by projecting them from a equirectangular representation to first person projected view at a given yaw, pitch and field of view, and for handling navigation (moving from one panorama to another) depending on the city street graph and the current orientation. streetlearn/proto The message protocol buffer used to store panoramas and street graph. streetlearn/python/environment A Python based interface for calling the StreetLearn environment with custom action spaces. streetlearn/python/human_agent A simple human agent, implemented in Python using pygame, that instantiates the StreetLearn environment on the requested map and enables a user to play the courier game. The directory also contains an oracle agent, similar to the human agent, which automatically navigates towards the goal and reports oracle performance on the courier game. Compilation from source Bazel is the official build system for StreetLearn. The build has only been tested running on Ubuntu 18.04. Install build prerequisites shell sudo apt get install autoconf automake libtool curl make g++ unzip virtualenv python virtualenv cmake subversion pkg config libpython dev libcairo2 dev libboost all dev python pip libssl dev pip install setuptools pip install pyparsing Install Protocol Buffers For detailed information see: shell git clone cd protobuf git submodule update init recursive ./autogen.sh ./configure make j7 sudo make install sudo ldconfig cd python python setup.py build sudo python setup.py install cd ../.. Install CLIF shell git clone cd clif ./INSTALL.sh cd .. Install OpenCV 2.4.13 shell wget unzip 2.4.13.6.zip cd opencv 2.4.13.6 mkdir build cd build cmake D CMAKE_BUILD_TYPE Release D CMAKE_INSTALL_PREFIX /usr/local .. make j7 sudo make install sudo ldconfig cd ../.. Install Python dependencies shell pip install six pip install absl py pip install inflection pip install wrapt pip install numpy pip install dm sonnet pip install tensorflow gpu pip install tensorflow probability gpu pip install pygame Install Bazel This page describes how to install the Bazel build and test tool on your machine. Building StreetLearn Clone this repository: shell git clone cd streetlearn To build the StreetLearn engine only: shell export CLIF_PATH $HOME/opt bazel build streetlearn:streetlearn_engine_py To build the human agent and the oracle agent in the StreetLearn environment, with all the dependencies: shell export CLIF_PATH $HOME/opt bazel build streetlearn/python/human_agent:all Running the StreetLearn human agent To run the human agent using one of the StreetLearn datasets downloaded and stored at dataset_path : shell bazel run streetlearn/python/human_agent dataset_path For help with the options of the human_agent: shell bazel run streetlearn/python/human_agent help Similarly, to run the oracle agent on the courier game: shell bazel run streetlearn/python/human_agent:oracle_agent dataset_path The human agent and the oracle agent show a view_image (on top) and a graph_image (on bottom). Actions available to an agent: Rotate left or right in the panorama, by a specified angle (change the yaw of the agent). In the human_agent, press a or d . Rotate up or down in the panorama, by a specified angle (change the pitch of the agent). In the human_agent, press w or s . Move from current panorama A forward to another panorama B if the current bearing of the agent from A to B is within a tolerance angle of 30 degrees. In the human_agent, press space . Zoom in and out in the panorama. In the human_agent, press i or o . Additional keys for the human_agent are escape and p (to print the current view as a bitmap image). For training RL agents, action spaces are discretized using integers. For instance, in our paper, we used 5 actions: (move forward, turn left by 22.5 deg, turn left by 67.5 deg, turn right by 22.5 deg, turn right by 67.5 deg). Navigation Bar Along the bottom of the view_image is the navigation bar which displays a small circle in any direction in which travel is possible: When within the centre range, they will turn green meaning the user can move in this direction. When they are out of this range, they will turn red meaning this is inaccessible. When more than one dots are within the centre range, all except the most central will turn orange, meaning that there are multiple (forward) directions available. Stop signs The graph is constructed by breadth first search to the depth specified by the graph depth flags. At the maximum depth the graph will suddenly stop, generally in the middle of a street. Because we are trying to train agents to recognize streets as navigable, and in order not to confuse the agents, red stop signs are shown from two panoramas away from any terminal node in the graph. Obtaining the StreetLearn dataset You can request the StreetLearn dataset on the StreetLearn project website . Using the StreetLearn environment code The Python StreetLearn environment follows the specifications from OpenAI Gym . The call to function step(action) returns: observation (tuple of observations requested at construction), reward (a float with the current reward of the agent), done (boolean indicating whether the episode has ended) and info (a dictionary of environment state variables). After creating the environment, it is initialised by calling function reset() . If the flag auto_reset is set to True at construction, reset() will be called automatically every time that an episode ends. Environment Settings Default environment settings are stored in streetlearn/python/default_config.py. width : Width of rendered window. seed : Random seed. width : Width of the streetview image. height : Height of the streetview image. graph_width : Width of the map graph image. graph_height : Height of the map graph image. status_height : Status bar height in pixels. field_of_view : Horizontal field of view, in degrees. min_graph_depth : Min bound on BFS depth for panos. max_graph_depth : Max bound on BFS depth for panos. max_cache_size : Pano cache size. frame_cap : Episode frame cap. full_graph : Boolean indicating whether to build the entire graph upon episode start. sample_graph_depth : Boolean indicating whether to sample graph depth between min_graph_depth and max_graph_depth. start_pano : The pano ID string to start from. The graph will be build out from this point. graph_zoom : Initial graph zoom. Valid between 1 and 32. neighbor_resolution : Used to calculate a binary occupancy vector of neighbors to the current pano. color_for_observer : RGB color for the observer. color_for_coin : RGB color for the panos containing coins. color_for_goal : RGB color for the goal pano. observations : Array containing one or more names of the observations requested from the environment: 'view_image', 'graph_image', 'yaw', 'pitch', 'metadata', 'target_metadata', 'latlng', 'target_latlng', 'yaw_label', 'neighbors' reward_per_coin : Coin reward for coin game. proportion_of_panos_with_coins : The proportion of panos with coins. level_name : Level name, can be: 'coin_game', 'exploration_game'. action_spec : Either of 'streetlearn_default', 'streetlearn_fast_rotate', 'streetlearn_tilt' rotation_speed : Rotation speed in degrees. Used to create the action spec. Observations The following observations can be returned by the agent: view_image : RGB image for the first person view image returned from the environment and seen by the agent, graph_image : RGB image for the top down street graph image, usually not seen by the agent, yaw : Scalar value of the yaw angle of the agent, in degrees (zero corresponds to North), pitch : Scalar value of the pitch angle of the agent, in degrees (zero corresponds to horizontal), metadata : Message protocol buffer of type Pano with the metadata of the current panorama, target_metadata : Message protocol buffer of type Pano with the metadata of the target/goal panorama, latlng : Tuple of lat/lng scalar values for the current position of the agent, target_latlng : Tuple of lat/lng scalar values for the target/goal position, yaw_label : Integer discretized value of the agent yaw using 16 bins, neighbors : Vector of immediate neighbor egocentric traversability grid around the agent, with 16 bins for the directions around the agent and bin 0 corresponding to the traversability straight ahead of the agent. Games The following games are available in the StreetLearn environment: coin_game : invisible coins scattered throughout the map, yielding a reward of 1 for each. Once picked up, these rewards do not reappear until the end of the episode. courier_game : the agent is given a goal destination, specified as lat/long pairs. Once the goal is reached (with 100m tolerance), a new goal is sampled, until the end of the episode. Rewards at a goal are proportional to the number of panoramas on the shortest path from the agent's position when it gets the new goal assignment to that goal position. Additional reward shaping consists in early rewards when the agent gets within a range of 200m of the goal. Additional coins can also be scattered throughout the environment. The proportion of coins, the goal radius and the early reward radius are parametrizable. curriculum_courier_game : same as the courier game, but with a curriculum on the difficulty of the task (maximum straight line distance from the agent's position to the goal when it is assigned). License The Abseil C++ library is licensed under the terms of the Apache license. See LICENSE (LICENSE) for more information. Disclaimer This is not an official Google product.",Atari Games,Playing Games 2726,Playing Games,Playing Games,Other,"Pytorch_DDQN_Unity_Navigation Deep Reinforcement Learning. ! (agent2.gif) Shown above: First person view of a Reinforcement Learning agent collecting yellow bananas while avoiding blue bananas. Uses Unity ML Banana Navigation environment: Written using Python 3 and Pytorch. Deep Reinforcement Learning Uses Double Q Learning written in Pytorch. For further info on Double Q Networks (DDQN): The Environment State Space Uses a state with 37 numeric features derived from ray tracing (rather than pixel inputs). Action Space 4 possible actions (0, 1, 2, 3) corresponding with the moving forward, backward, and rotating left and right. Scoring +1 for moving into a yellow banana 1 for moving into a blue banana 0 elsewhere Custom Scoring +1 for moving into a yellow banana 1 for moving into a blue banana 0.03 elsewhere Termination The game terminates once the agent has performed 300 actions. Dependencies copy numpy random sys torch unityagents Solve criteria The agent has solved the environment if it achieves a consecutive 100 game average score of 13 or higher within 1800 games. Usage Extract the Banana_Windows_x86_64 folder. All code is contained in the ipynb notebook. To train from scratch: DDQN_run().train() ! (agent.gif) If the agent solves the environment, weights are saved (included) as checkpoint.pth To load saved weights and watch a game: DDQN_run().run_saved_model() Note: Must have weights saved as checkpoint.pth. Further details View report.ipynb to view an explanation of the implementation.",Atari Games,Playing Games 2735,Playing Games,Playing Games,Other,"Deep Reinforcement Learning for Atari Breakout Game Replicating Deep RL papers by DeepMind for the Atari Breakout game. Uses the OpenAI gym environment and Keras for Deep Learning models. ! game (./sample.gif) Models Implemented Deep Q Network (DQN) Double Deep Q Network (DDQN) Dueling Deep Q Network (Dueling DDQN) Asynchronous Advantage Actor Critic (A3C) Training To train a Q Learning model, python DQN.py Specify within the code if double True for Double DQN or Dueling True Dueling DQN. The exact hyperparameters are according to the paper but are all commented within the code. To train the A3C model, python A3C.py Specify whether lstm True for a final lstm layer. Training summary will be outputted to Tensorboard. To visualize, tensorboard logdir /summary Evaluation To evaluate a trained Q Learning model, python DQNEvaluator.py Specify the number of games (default games 1 ) and whether to render (default True ). To evaluate a trained A3C model, python A3CEvaluator.py Specify the number of games (default games 1 ) and whether to render (default True ). Replicated Papers Playing Atari with Deep Reinforcement Learning: Human level control through deep reinforcement learning: Deep Reinforcement Learning with Double Q learning: Dueling Network Architectures for Deep Reinforcement Learning: Asynchronous Methods for Deep Reinforcement Learning: Other References Helpful Introductory Blogposts Discounted Reward Calculation for A3C",Atari Games,Playing Games 2738,Playing Games,Playing Games,Other,"TransferReinforcementLearning Setup The current code works on python 3.6 install the require python packages gym gym\ atari\ tensorflow see instructions from openai , tensorflow clone the repositary git clone cd TransferReinforcementLearning (optional) copy gym and atari_py folders to your python package directory, which allows you to use rotated atari pong game. eg. cp r gym path_to_python_lib/python3.6/site packages/gym (obsolete) eg. cp r atari_py path_to_python_lib/python3.6/site packages/atari_py (optional) get tensorpack which Get a solution for pong game . python pongSimpleSol.py Get a solution for breakout game with transfer solutions from pong game . python breakoutTransferSol.py Files src: code pongSimpleFunc.py compareMultiThread.py results: results from this project gym: my custom revision of gym from openai . atari\_py: my custom revision of atari\_py from openai atari_py Reference Tricks ./tensorpack/examples/DeepQNetwork/DQN.py env /anaconda3/lib/python3.6/site packages/atari_py/atari_roms/breakout.bin task play load DoubleDQN Breakout.npz ./tensorpack/examples/DeepQNetwork/DQN.py env ./breakout.bin task play load DoubleDQN Breakout.npz ./tensorpack/examples/DeepQNetwork/DQN_new.py env ./pong.bin task play load DoubleDQN Breakout.npz pip install opencv python pip install ale_python_interface pretrained models Tricks ffmpeg ffmpeg i input.mkv codec copy output.mp4 mkdir frames ffmpeg i input.mp4 vf scale 320: 1:flags lanczos,fps 10 frames/ffout%05d.png ffmpeg i input.mp4 vf fps 10 frames/ffout%05d.png convert delay 5 loop 0 frames/ffout .png output.gif Tricks pix2pix python tools/process.py input_dir ../train/origin/input_dir/ operation resize output_dir ../train/resize/input_dir/; python tools/process.py input_dir ../train/origin/b_dir/ operation resize output_dir ../train/resize/b_dir/; for f in ../train/resize/b_dir/ ; do mv $f ${f/out/in}; done python tools/process.py input_dir ../train/resize/input_dir/ b_dir ../train/resize/b_dir/ operation combine output_dir ../train/combined python tools/split.py dir ../train/combined python pix2pix.py mode train output_dir ../train/f_train max_epochs 200 input_dir ../train/combined/train which_direction BtoA References A3C: ./tensorpack/examples/A3C Gym/train atari.py env Pong v0 task play load Breakout v0.npz task dump_video output . episode 1",Atari Games,Playing Games 2743,Playing Games,Playing Games,Other,"Deep Q learning applied to quantum error correction An implementation of a quantum error correction algorithm for bit flip errors on the topological toric code using deep reinforcement learning. Toric_class: this class takes care of the environment and keeps track of all the parameters related to the toric code. The syndrom is stored in a 3d matrix (2 x system_size x system_size), this is the input to the neural network. Moreover it tracks all the operations on the toric code such that a logical qubit flip (nontrivial loop) can be detected. RL: This class contains all the relevant functions and hyperparameters to train the agent and to interact with the toric code environment. For the reinforcement learning part I was strongly inspired by the paper 'Playing Atari with Deep Reinforcement Learning'. all the different hyperparameters can be tuned in the run.py file and several different networks can be applied. The neural network is implemented in pytorch.",Atari Games,Playing Games 2748,Playing Games,Playing Games,Other,"Reinforcement Learning Toy repository for implementating different RL algorithms. DQN Roderick, M., MacGlashan, J., & Tellex, S. (2017). Implementing the Deep Q Network. CoRR, abs/1711.07478. Double DQN Hasselt, H.V., Guez, A., & Silver, D. (2016). Deep Reinforcement Learning with Double Q learning. AAAI. Policy Gradients Actor Critic Automatic Hyperparameter Logging Work in progress",Atari Games,Playing Games 2757,Playing Games,Playing Games,Other,"Dopamine Dopamine is a research framework for fast prototyping of reinforcement learning algorithms. It aims to fill the need for a small, easily grokked codebase in which users can freely experiment with wild ideas (speculative research). Our design principles are: _Easy experimentation_: Make it easy for new users to run benchmark experiments. _Flexible development_: Make it easy for new users to try out research ideas. _Compact and reliable_: Provide implementations for a few, battle tested algorithms. _Reproducible_: Facilitate reproducibility in results. In particular, our setup follows the recommendations given by Machado et al. (2018) machado . In the spirit of these principles, this first version focuses on supporting the state of the art, single GPU Rainbow agent ( Hessel et al., 2018 rainbow ) applied to Atari 2600 game playing ( Bellemare et al., 2013 ale ). Specifically, our Rainbow agent implements the three components identified as most important by Hessel et al. rainbow : n step Bellman updates (see e.g. Mnih et al., 2016 a3c ) Prioritized experience replay ( Schaul et al., 2015 prioritized_replay ) Distributional reinforcement learning ( C51; Bellemare et al., 2017 c51 ) For completeness, we also provide an implementation of DQN ( Mnih et al., 2015 dqn ). For additional details, please see our documentation . This is not an official Google product. What's new 30/01/2019: Dopamine 2.0 now supports general discrete domain gym environments. 01/11/2018: Download links for each individual checkpoint, to avoid having to download all of the checkpoints. 29/10/2018: Graph definitions now show up in Tensorboard. 16/10/2018: Fixed a subtle bug in the IQN implementation and upated the colab tools, the JSON files, and all the downloadable data. 18/09/2018: Added support for double DQN style updates for the ImplicitQuantileAgent . Can be enabled via the double_dqn constructor parameter. 18/09/2018: Added support for reporting in iteration losses directly from the agent to Tensorboard. Set the run_experiment.create_agent.debug_mode True via the configuration file or using the gin_bindings flag to enable it. Control frequency of writes with the summary_writing_frequency agent constructor parameter (defaults to 500 ). 27/08/2018: Dopamine launched! Instructions Install via source Installing from source allows you to modify the agents and experiments as you please, and is likely to be the pathway of choice for long term use. These instructions assume that you've already set up your favourite package manager (e.g. apt on Ubuntu, homebrew on Mac OS X), and that a C++ compiler is available from the command line (almost certainly the case if your favourite package manager works). The instructions below assume that you will be running Dopamine in a virtual environment . A virtual environment lets you control which dependencies are installed for which program; however, this step is optional and you may choose to ignore it. Dopamine is a Tensorflow based framework, and we recommend you also consult the Tensorflow documentation for additional details. Finally, these instructions are for Python 2.7. While Dopamine is Python 3 compatible, there may be some additional steps needed during installation. Ubuntu First set up the virtual environment: sudo apt get update && sudo apt get install virtualenv virtualenv python python2.7 dopamine env source dopamine env/bin/activate This will create a directory called dopamine env in which your virtual environment lives. The last command activates the environment. Then, install the dependencies to Dopamine. If you don't have access to a GPU, then replace tensorflow gpu with tensorflow in the line below (see Tensorflow instructions for details). sudo apt get update && sudo apt get install cmake zlib1g dev pip install absl py atari py gin config gym opencv python tensorflow gpu During installation, you may safely ignore the following error message: tensorflow 1.10.1 has requirement numpy 1.13.3, but you'll have numpy 1.15.1 which is incompatible . Finally, download the Dopamine source, e.g. git clone Mac OS X First set up the virtual environment: pip install virtualenv virtualenv python python2.7 dopamine env source dopamine env/bin/activate This will create a directory called dopamine env in which your virtual environment lives. The last command activates the environment. Then, install the dependencies to Dopamine: brew install cmake zlib pip install absl py atari py gin config gym opencv python tensorflow During installation, you may safely ignore the following error message: tensorflow 1.10.1 has requirement numpy 1.13.3, but you'll have numpy 1.15.1 which is incompatible . Finally, download the Dopamine source, e.g. git clone Running tests You can test whether the installation was successful by running the following: export PYTHONPATH ${PYTHONPATH}:. python tests/dopamine/atari_init_test.py The entry point to the standard Atari 2600 experiment is dopamine/discrete_domains/train.py . To run the basic DQN agent, python um dopamine.discrete_domains.train \ base_dir /tmp/dopamine \ gin_files 'dopamine/agents/dqn/configs/dqn.gin' By default, this will kick off an experiment lasting 200 million frames. The command line interface will output statistics about the latest training episode: ... I0824 17:13:33.078342 140196395337472 tf_logging.py:115 gamma: 0.990000 I0824 17:13:33.795608 140196395337472 tf_logging.py:115 Beginning training... Steps executed: 5903 Episode length: 1203 Return: 19. To get finer grained information about the process, you can adjust the experiment parameters in dopamine/agents/dqn/configs/dqn.gin , in particular by reducing Runner.training_steps and Runner.evaluation_steps , which together determine the total number of steps needed to complete an iteration. This is useful if you want to inspect log files or checkpoints, which are generated at the end of each iteration. More generally, the whole of Dopamine is easily configured using the gin configuration framework . Non Atari discrete environments We provide sample configuration files for training an agent on Cartpole and Acrobot. For example, to train C51 on Cartpole with default settings, run the following command: python um dopamine.discrete_domains.train \ base_dir /tmp/dopamine \ gin_files 'dopamine/agents/rainbow/configs/c51_cartpole.gin' You can train Rainbow on Acrobot with the following command: python um dopamine.discrete_domains.train \ base_dir /tmp/dopamine \ gin_files 'dopamine/agents/rainbow/configs/rainbow_acrobot.gin' Install as a library An easy, alternative way to install Dopamine is as a Python library: Alternatively brew install, see Mac OS X instructions above. sudo apt get update && sudo apt get install cmake pip install dopamine rl pip install atari py Depending on your particular system configuration, you may also need to install zlib (see Install via source above). Running tests From the root directory, tests can be run with a command such as: python um tests.agents.rainbow.rainbow_agent_test References Bellemare et al., The Arcade Learning Environment: An evaluation platform for general agents . Journal of Artificial Intelligence Research, 2013. ale Machado et al., Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents , Journal of Artificial Intelligence Research, 2018. machado Hessel et al., Rainbow: Combining Improvements in Deep Reinforcement Learning . Proceedings of the AAAI Conference on Artificial Intelligence, 2018. rainbow Mnih et al., Human level Control through Deep Reinforcement Learning . Nature, 2015. dqn Mnih et al., Asynchronous Methods for Deep Reinforcement Learning . Proceedings of the International Conference on Machine Learning, 2016. a3c Schaul et al., Prioritized Experience Replay . Proceedings of the International Conference on Learning Representations, 2016. prioritized_replay Giving credit If you use Dopamine in your work, we ask that you cite our white paper dopamine_paper . Here is an example BibTeX entry: @article{castro18dopamine, author {Pablo Samuel Castro and Subhodeep Moitra and Carles Gelada and Saurabh Kumar and Marc G. Bellemare}, title {Dopamine: {A} {R}esearch {F}ramework for {D}eep {R}einforcement {L}earning}, year {2018}, url { archivePrefix {arXiv} } machado : ale : dqn : a3c : prioritized_replay : c51 : rainbow : iqn : dopamine_paper :",Atari Games,Playing Games 2766,Playing Games,Playing Games,Other,"DQN_Unity_Keras DQN with Unity and Keras A simple example of how to use DQN Reinforcement Learning in Unity using Keras. Included are (1) example Python scripts that illustrate single and two agent DQN training and testing using Keras, and (2) a Unity package with two simple 2D unity games: 1. Wall Pong: A single agent game similar to pong. Agent moves a paddle to hit a ball against a wall. 2. Pong: A simple example of the classic two agent Atari game. The python agent connects to the unity game via a virtual (TCP) socket. To use the examples, you will need the following installed: 1. Python 2.7 2. Tensorflow 3. Keras 4. Unity NOTE 1: the python code has only been tested using Python 2.7, on a Mac Book Pro. I recommend installing Keras and Tensorflow and running the python agents in a Py2.7 virtual environment. To run the code: 1. Run a python training or testing script in terminal 2. Launch the corresponding game (either in the Unity editor or as a standalone) 3. Select AI type and click ‘connect’ in the game 4. Watch… and watch… and watch…and eventually a successful agent (training usually takes about 1 to 2 hours for WallPong and 2 to 4 for Multiagent Pong). General Information: Recently, Unity released a great toolbox for DQN using Tensorflow: ). I highly recommend you check this toolbox out. Basically, this is much simpler version of the recently released Unity toolbox and illustrates how to do (more or less) the same thing using Keras. Why Keras? Well, Keras offers a front end to Tensorflow and is much simpler to use if you are new to neural networks and deep learning. If you are interested in learning more about Keras and want a practical guide to how to use it for deep learning more generally, I recommend the deep learning in python using keras series by Dr. Jason Brownlee at Machine Mastery: NOTE 2: Rather than using image data (as in the original DQN work), the current DQN agent(s) receive position and velocity data from the game, which for most Unity and VR applications is preferable to image data, due to the computational cost of deep convolutional network architectures (i.e., no GPU required for these examples). Keon Kim also has a great blog tutorial on DQN using Keras: Some of the DQN code provided here was adapted from Keon’s tutorial, as well as various other educational/tutorial resources. NOTE 3: I developed these examples in early 2017 for a number of students in my lab. I should have posted it on GitHub then…but there is never enough time in the day. Have fun! For more reading on DQN see: 1. Playing Atari with Deep Reinforcement Learning: 2. Human level Control Through Deep Reinforcement Learning: 3. Multiagent cooperation and competition with deep reinforcement learning:",Atari Games,Playing Games 2883,Playing Games,Playing Games,Other,"Linux Build Status Windows Build Status What A Go program with no human provided knowledge. Using MCTS (but without Monte Carlo playouts) and a deep residual convolutional neural network stack. This is a fairly faithful reimplementation of the system described in the Alpha Go Zero paper Mastering the Game of Go without Human Knowledge . For all intents and purposes, it is an open source AlphaGo Zero. Wait, what? If you are wondering what the catch is: you still need the network weights. No network weights are in this repository. If you manage to obtain the AlphaGo Zero weights, this program will be about as strong, provided you also obtain a few Tensor Processing Units. Lacking those TPUs, I'd recommend a top of the line GPU it's not exactly the same, but the result would still be an engine that is far stronger than the top humans. Gimme the weights Recomputing the AlphaGo Zero weights will take about 1700 years on commodity hardware . One reason for publishing this program is that we are running a public, distributed effort to repeat the work. Working together, and especially when starting on a smaller scale, it will take less than 1700 years to get a good network (which you can feed into this program, suddenly making it strong). I want to help Using your own hardware You need a PC with a GPU, i.e. a discrete graphics card made by NVIDIA or AMD, preferably not too old, and with the most recent drivers installed. It is possible to run the program without a GPU, but performance will be much lower. If your CPU is not very recent (Haswell or newer, Ryzen or newer), performance will be outright bad, and it's probably of no use trying to join the distributed effort. But you can still play, especially if you are patient. Windows Head to the Github releases page at download the latest release, unzip, and launch autogtp.exe. It will connect to the server automatically and do its work in the background, uploading results after each game. You can just close the autogtp window to stop it. macOS and Linux Follow the instructions below to compile the leelaz binary, then go into the autogtp subdirectory and follow the instructions there (autogtp/README.md) to build the autogtp binary. Copy the leelaz binary into the autogtp dir, and launch autogtp. Using a Cloud provider Many cloud companies offer free trials (or paid solutions, not discussed here) that are usable for helping the leela zero project. There are community maintained instructions available here: Running Leela Zero client on a Tesla V100 GPU for free (Google Cloud Free Trial, Microsoft Azure, Oracle cloud, etc) I just want to play right now Download the best known network weights file from: And head to the Usage ( usage) section of this README. If you prefer a more human style, a network trained from human games is available here: Compiling Requirements GCC, Clang or MSVC, any C++14 compiler Boost 1.58.x or later, headers and program_options, filesystem and system libraries (libboost dev, libboost program options dev and libboost filesystem dev on Debian/Ubuntu) zlib library (zlib1g & zlib1g dev on Debian/Ubuntu) Standard OpenCL C headers (opencl headers on Debian/Ubuntu, or at OpenCL ICD loader (ocl icd libopencl1 on Debian/Ubuntu, or reference implementation at An OpenCL capable device, preferably a very, very fast GPU, with recent drivers is strongly recommended (OpenCL 1.1 support is enough). If you do not have a GPU, add the define USE_CPU_ONLY , for example by adding DUSE_CPU_ONLY 1 to the cmake command line. Optional: BLAS Library: OpenBLAS (libopenblas dev) or Intel MKL The program has been tested on Windows, Linux and macOS. Example of compiling and running Ubuntu & similar Test for OpenCL support & compatibility sudo apt install clinfo && clinfo Clone github repo git clone cd leela zero git submodule update init recursive Install build depedencies sudo apt install libboost dev libboost program options dev libboost filesystem dev opencl headers ocl icd libopencl1 ocl icd opencl dev zlib1g dev Use stand alone directory to keep source dir clean mkdir build && cd build cmake .. cmake build . ./tests curl O ./leelaz weights best network Example of compiling and running macOS Clone github repo git clone cd leela zero git submodule update init recursive Install build depedencies brew install boost cmake Use stand alone directory to keep source dir clean mkdir build && cd build cmake .. cmake build . ./tests curl O ./leelaz weights best network Example of compiling and running Windows Clone github repo git clone cd leela zero git submodule update init recursive cd msvc Double click the leela zero2015.sln or leela zero2017.sln corresponding to the Visual Studio version you have. Build from Visual Studio 2015 or 2017 Download to msvc\x64\Release msvc\x64\Release\leelaz.exe weights best network Usage The engine supports the GTP protocol, version 2 . Leela Zero is not meant to be used directly. You need a graphical interface for it, which will interface with Leela Zero through the GTP protocol. Lizzie is a client specifically for Leela Zero which shows live search probilities, a win rate graph, and has an automatic game analysis mode. Has binaries for Windows, Mac, and Linux. Sabaki is a very nice looking GUI with GTP 2 capability. LeelaSabaki is modified to show variations and winning statistics in the game tree, as well as a heatmap on the game board. A lot of go software can interface to an engine via GTP, so look around. Add the gtp commandline option on the engine command line to enable Leela Zero's GTP support. You will need a weights file, specify that with the w option. All required commands are supported, as well as the tournament subset, and loadsgf . The full set can be seen with list_commands . The time control can be specified over GTP via the time\_settings command. The kgs time\_settings extension is also supported. These have to be supplied by the GTP 2 interface, not via the command line! Weights format The weights file is a text file with each line containing a row of coefficients. The layout of the network is as in the AlphaGo Zero paper, but any number of residual blocks is allowed, and any number of outputs (filters) per layer, as long as the latter is the same for all layers. The program will autodetect the amounts on startup. The first line contains a version number. Convolutional layers have 2 weight rows: 1) convolution weights 2) channel biases Batchnorm layers have 2 weight rows: 1) batchnorm means 2) batchnorm variances Innerproduct (fully connected) layers have 2 weight rows: 1) layer weights 2) output biases The convolution weights are in output, input, filter\_size, filter\_size order, the fully connected layer weights are in output, input order. The residual tower is first, followed by the policy head, and then the value head. All convolution filters are 3x3 except for the ones at the start of the policy and value head, which are 1x1 (as in the paper). There are 18 inputs to the first layer, instead of 17 as in the paper. The original AlphaGo Zero design has a slight imbalance in that it is easier for the black player to see the board edge (due to how padding works in neural networks). This has been fixed in Leela Zero. The inputs are: 1) Side to move stones at time T 0 2) Side to move stones at time T 1 (0 if T 0) ... 8) Side to move stones at time T 7 (0 if T< 6) 9) Other side stones at time T 0 10) Other side stones at time T 1 (0 if T 0) ... 16) Other side stones at time T 7 (0 if T< 6) 17) All 1 if black is to move, 0 otherwise 18) All 1 if white is to move, 0 otherwise Each of these forms a 19 x 19 bit plane. In the training/caffe directory there is a zero.prototxt file which contains a description of the full 40 residual block design, in (NVIDIA) Caffe protobuff format. It can be used to set up nv caffe for training a suitable network. The zero\_mini.prototxt file describes a smaller 12 residual block case. The training/tf directory contains the network construction in TensorFlow format, in the tfprocess.py file. Expert note: the channel biases seem redundant in the network topology because they are followed by a batchnorm layer, which is supposed to normalize the mean. In reality, they encode beta parameters from a center/scale operation in the batchnorm layer, corrected for the effect of the batchnorm mean/variance adjustment. At inference time, Leela Zero will fuse the channel bias into the batchnorm mean, thereby offsetting it and performing the center operation. This roundabout construction exists solely for backwards compatibility. If this paragraph does not make any sense to you, ignore its existence and just add the channel bias layer as you normally would, output will be correct. Training Getting the data At the end of the game, you can send Leela Zero a dump\_training command, followed by the winner of the game (either white or black ) and a filename, e.g: dump_training white train.txt This will save (append) the training data to disk, in the format described below, and compressed with gzip. Training data is reset on a new game. Supervised learning Leela can convert a database of concatenated SGF games into a datafile suitable for learning: dump_supervised sgffile.sgf train.txt This will cause a sequence of gzip compressed files to be generated, starting with the name train.txt and containing training data generated from the specified SGF, suitable for use in a Deep Learning framework. Training data format The training data consists of files with the following data, all in text format: 16 lines of hexadecimal strings, each 361 bits longs, corresponding to the first 16 input planes from the previous section 1 line with 1 number indicating who is to move, 0 black, 1 white, from which the last 2 input planes can be reconstructed 1 line with 362 (19x19 + 1) floating point numbers, indicating the search probabilities (visit counts) at the end of the search for the move in question. The last number is the probability of passing. 1 line with either 1 or 1, corresponding to the outcome of the game for the player to move Running the training For training a new network, you can use an existing framework (Caffe, TensorFlow, PyTorch, Theano), with a set of training data as described above. You still need to contruct a model description (2 examples are provided for Caffe), parse the input file format, and outputs weights in the proper format. There is a complete implementation for TensorFlow in the training/tf directory. Supervised learning with TensorFlow This requires a working installation of TensorFlow 1.4 or later: src/leelaz w weights.txt dump_supervised bigsgf.sgf train.out exit training/tf/parse.py train.out This will run and regularly dump Leela Zero weight files to disk, as well as snapshots of the learning state numbered by the batch number. If interrupted, training can be resumed with: training/tf/parse.py train.out leelaz model batchnumber Todo Further optimize Winograd transformations. Implement GPU batching. GTP extention to exclude moves from analysis. Root filtering for handicap play. More backends: MKL DNN based backend. CUDA specific version using cuDNN or cuBLAS. AMD specific version using MIOpen/ROCm. Related links Status page of the distributed effort: GUI and study tool for Leela Zero: Watch Leela Zero's training games live in a GUI: Original Alpha Go (Lee Sedol) paper: Alpha Go Zero paper: Alpha Zero (Go, Chess, Shogi) paper: AlphaGo Zero Explained In One Diagram: Stockfish chess engine ported to Leela Zero framework: Leela Chess Zero (chess optimized client) License The code is released under the GPLv3 or later, except for ThreadPool.h, cl2.hpp, half.hpp and the eigen and clblast_level3 subdirs, which have specific licenses (compatible with GPLv3) mentioned in those files.",Game of Go,Playing Games 2911,Playing Games,Playing Games,Other,Dueling DQN Line plot extraction using Reinforcement learning (Dueling DQN) Reference paper : Dueling Network Architectures for Deep Reinforcement Learning,Atari Games,Playing Games 1938,Natural Language Processing,Natural Language Processing,Natural Language Processing,"FlowQA This is our first attempt to make state of the art single turn QA models conversational. Feel free to build on top of our code to build an even stronger conversational QA model. For more details, please see: FlowQA: Grasping Flow in History for Conversational Machine Comprehension Step 1: perform the following: shell pip install r requirements.txt to install all dependent python packages. Step 2: download necessary files using: shell ./download.sh Step 3: preprocess the data files using: shell python preprocess_QuAC.py python preprocess_CoQA.py Step 4: run the training code using: shell python train_QuAC.py python train_CoQA.py For naming the output model, you can do shell python train_OOOO.py name XXX Remove any answer marking by: shell python train_OOOO.py explicit_dialog_ctx 0 OOOO is the name of the dataset (QuAC or CoQA). Step 5: Do prediction with answer thresholding using shell python predict_OOOO.py m models_XXX/best_model.pt show SS XXX is the name you used during train.py. SS is the number of dialog examples to be shown. OOOO is the name of the dataset (QuAC or CoQA).",Question Answering,Question Answering 1939,Natural Language Processing,Natural Language Processing,Natural Language Processing,"FusionNet for Natural Language Inference This is an example for applying FusionNet to natural language inference task. For more details on FusionNet, please refer to our paper: FusionNet: Fusing via Fully Aware Attention with Application to Machine Comprehension Requirements + Python (version 3.5.2) + PyTorch (0.2.0) + spaCy (1.x) + NumPy + JSON Lines + MessagePack Since package update sometimes break backward compatibility, it is recommended to use Docker, which can be downloaded from here . To enable GPU, nvidia docker may also needs to be installed. After setting up Docker, simply perform docker pull momohuang/fusionnet docker to pull the docker file. Note that this may take some time to download. Then we can run the docker image through docker run it momohuang/fusionnet docker (Only CPU) or nvidia docker run it momohuang/fusionnet docker (GPU enabled). Quick Start pip install r requirements.txt bash download.sh python prepro.py python train.py train.py supports an option full_att_type , where full_att_type 0 : standard attention full_att_type 1 : fully aware attention full_att_type 2 : fully aware multi level attention",Question Answering,Question Answering 1946,Natural Language Processing,Natural Language Processing,Natural Language Processing,"SDNet This is the official code for the Microsoft's submission of SDNet model to CoQA leaderboard. It is implemented under PyTorch framework. The related paper to cite is: SDNet: Contextualized Attention based Deep Network for Conversational Question Answering , by Chenguang Zhu, Michael Zeng and Xuedong Huang, at For usage of this code, please follow Microsoft Open Source Code of Conduct . Directory structure: main.py: the starter code Models/ BaseTrainer.py: Base class for trainer SDNetTrainer.py: Trainer for SDNet, including training and predicting procedures SDNet.py: The SDNet network structure Layers.py: Related network layer functions Bert/ Bert.py: Customized class to compute BERT contextualized embedding modeling.py, optimization.py, tokenization.py: From Huggingface's PyTorch implementation of BERT Utils/ Arguments.py: Process argument configuration file Constants.py: Define constants used CoQAPreprocess.py: preprocess CoQA raw data into intermediate binary/json file, including tokenzation, history preprending CoQAUtils.py, General Utils.py: utility functions used in SDNet Timing.py: Logging time How to run Requirement: PyTorch 0.4.0, spaCy 2.0 1. Create a folder (e.g. coqa ) to contain data and running logs; 2. Create folder coqa/data to store CoQA raw data: coqa train v1.0.json and coqa dev v1.0.json ; 3. Copy the file conf from the repo into folder coqa ; 4. If you want to use BERT Large, download their model into coqa/bert large uncased ; if you want to use BERT base, download their model into coqa/bert base cased ; The models can be downloaded from Huggingface: 'bert base uncased': 'bert large uncased': bert large uncased vocab.txt can be downloaded from Google's BERT repository 5. Create a folder glove in the same directory of coqa and download GloVe embedding glove.840B.300d.txt into the folder. Your directory should look like this: coqa/ data/ coqa train v1.0.json coqa dev v1.0.json bert large uncased/ bert large uncased vocab.txt bert_config.json pytorch_model.bin conf glove/ glove.840B.300d.txt Then, execute python main.py train path_to_coqa/conf . If you run for the first time, CoQAPreprocess.py will automatically create folders conf/spacy_intermediate_features inside coqa to store intermediate tokenization results, which will take a few hours. Every time you run the code, a new running folder run_idx will be created inside coqa/conf , which contains running logs, prediction result on dev set, and best model. Contact If you have any questions, please contact Chenguang Zhu, chezhu@microsoft.com",Question Answering,Question Answering 1953,Natural Language Processing,Natural Language Processing,Natural Language Processing,"sunburst A simple Python implementation of ngram sunburst (nested pie chart) visualization showed in this paper: CoQA: A Conversational Question Answering Challenge, 2018 ( ) With its beautiful Figure 3: Distribution of trigram prefixes of questions in SQuAD and CoQA as follows: Fig 1:Original paper figure Here are some basic arguments to run the Analysis.py: python parser.add_argument(' read', help 'read from', default example.txt ) parser.add_argument(' ngram', help 'ngram', type int, default 3) parser.add_argument(' max_display_num', help 'max number of ngrams to display', type int, default 3) parser.add_argument(' min_count', help 'min word occurence below which the word will not be displayed', type int, default 1) parser.add_argument(' adjust_value', help 'adjust node value for better visulization', type int, default 1) parser.add_argument(' adjust_ratio', help 'the total ratio taken up by child nodes', type float, default 0.65) Simple run in a console python python Analysis.py read example.txt The ngrams are stored through Trie structure, which is later pruned based on max_display_num and min_count arguments. The arguments are rather simple despite maybe the last two: adjust_value is a boolean value (1 or 0) indicating whether or not to adjust the ngram value for better visualization. It's recommanded when visualizing ngrams from a large corpus, for the distribution are always too sparse to visualize. When adjust_value 1, all sub words of an ngram will be adjusted to take up adjust_ratio of radian of the father word. See the below two figures (representing the same distribution) for a straight forward understanding. Fig 2:Adjusted 3 gram visualization (adjust_ratio 0.9 (left) and adjust_ratio 0.65 (right)) for example.txt",Question Answering,Question Answering 1966,Natural Language Processing,Natural Language Processing,Natural Language Processing,"co squac A repository for converting between CoQA, SQuAD 2.0, and QuAC and for visualizing the output of such models. The repository was created in support of the following paper (and please cite if you use this repo). @inproceedings{yatskar_cosquac_2018, title {A Qualitative Comparison of CoQA, SQuAD 2.0 and QuAC}, journal {ArXiv}, year {2018} } If you want the outputs and models generated in this work, they can be found here: outputs , models . Running these models depends on AllenNLP , so please install understand how to use that before you to try to run them. If you are unfamiliar with either of these three datasets, please read up on them first: SQuAD 2.0 QuAC , CoQA . This repo offers a shared format for representing all three datasets (essentially QuAC format with an extra field for an abstractive answer). It also contains tools for converting from QuAC output style to either SQuAD 2.0 or CoQA output format. These conversation tools allow you to use the official scorer of each respective dataset while maintaining one internal data format. There are also tools for visualization of the data and output (see the visualizations directory). Format The shared format corresponds to SQuAD 2.0 format, with additions made for QuAC and CoQA. The current format is compatible with the standard QuAC format, as read by the AllenNLP data reader. The easiest way to understand the format is by cloning the repo, untaring the data in the datasets directory, and running ipython. Here is an example from QuAC: > git clone > cd co squac/datasets > tar xvf converted.tar > tar xvf original.tar > ipython In 1 : import json In 2 : quac json.load(open( converted/quac_dev.json )) In 3 : quac data 0 Release The datasets directory already contains all three dataset train and development sets converted to this format (as well as the orginally formated data). If you want to regenerate these files, use the scripts in the convert directory. Example Usage In experiments in the comparison paper, the general methodology was to use the joint format for all the datasets and a single model (the AllenNLP dialog qa model). After getting output from this model by running the predict command (assuming you are using AllenNLP): > python m allennlp.run predict models/squad2.tar co squac/datasets/converted/quac_dev.json use dataset reader output file output/squad2.quac batch size 10 cuda device 0 silent It would then be converted to the source dataset using the appopriate output script in the convert directory. For example: > cd co squac > python convert/output_quac_to_squad.py input_file ../output/squad2.quac output_file ../output/squad2.squad And then could be evaluated using the official scirpts: > python evals/squad2_eval.py datasets/squad2_dev.json ../output/squad2.squad Visualization Code to produce visualizations like those found in the original QuAC paper, and those used to do the qualitative analysis, can be found in visualize. Examples of the output is in visualizations folder. You can also configure the script, providing system output for future qualitive analysis of errors, or small variation in formatting, such as number of references and interactions per figure. The script outputs LaTEX, so you need to compile the files you generate using pdflatex. Here is an example, assuming you have outputs from a model (in the shared format), to generate 50 random examples from the development set, 8 interactions per page, and no additional references beyond the first one. > python visualize/visualize_with_predictions.py input_file datasets/converted/squad2_dev.json output_directory visualizations/squad2/ predictions ../output/squad2.quac examples 50 interactions 8 references 0 > cd visualizations/squad2 > for i in ls ; do pdflatex interaction nonstopmode $i; done > rm .log .aux .out",Question Answering,Question Answering 1982,Natural Language Processing,Natural Language Processing,Natural Language Processing,"SQLova SQLova is a neural semantic parser translating natural language utterance to SQL query. The name is originated from the name of our department: S earch & QLova ( Search & Clova ). Authors Wonseok Hwang (wonseok.hwang@navercorp.com), Jinyeong Yim (jinyeong.yim@navercorp.com), Seunghyun Park (seung.park@navercorp.com), and Minjoon Seo (minjoon.seo@navercorp.com). Affiliation: Clova AI Research, NAVER Corp., Seongnam, Korea. Technical report . Abstract We present the new state of the art semantic parsing model that translates a natural language (NL) utterance into a SQL query. The model is evaluated on WikiSQL , a semantic parsing dataset consisting of 80,654 (NL, SQL) pairs over 24,241 tables from Wikipedia. We achieve 83.6% logical form accuracy and 89.6% execution accuracy on WikiSQL test set. The model in a nutshell BERT based table and context aware word embedding. The sequence to SQL model leveraging recent works ( Seq2SQL , SQLNet ). Execution guided decoding is applied in SQLova EG. Results (Updated at Jan 12, 2019) Model Dev logical form accuracy Dev execution accuracy Test logical form accuracy Test execution accuracy SQLova 81.6 ( +5.5 )^ 87.2 ( +3.2 )^ 80.7 ( +5.3 )^ 86.2 ( +2.5 )^ SQLova EG 84.2 ( +8.2 ) 90.2 ( +3.0 ) 83.6( +8.2 ) 89.6 ( +2.5 ) ^: Compared to current SOTA models that do not use execution guided decoding. : Compared to current SOTA . The order of where conditions is ignored in measuring logical form accuracy in our model. Source code Requirements python3.6 or higher. PyTorch 0.4.0 or higher. CUDA 9.0 Python libraries: babel, matplotlib, defusedxml, tqdm Example Install minicoda conda install pytorch torchvision c pytorch conda install c conda forge records conda install babel conda install matplotlib conda install defusedxml conda install tqdm The code has been tested on Tesla M40 GPU running on Ubuntu 16.04.4 LTS. Running code Type python3 train.py seed 1 bS 16 accumulate_gradients 2 bert_type_abb uS fine_tune lr 0.001 lr_bert 0.00001 max_seq_leng 222 on terminal. seed 1 : Set the seed of random generator. The accuracies changes by few percent depending on seed . bS 16 : Set the batch size by 16. accumulate_gradients 2 : Make the effective batch size be 16 2 32 . bert_type_abb uS : Uncased Base BERT model is used. Use uL to use Uncased Large BERT. fine_tune : Train BERT. Without this, only the sequence to SQL module is trained. lr 0.001 : Set the learning rate of the sequence to SQL module as 0.001. lr_bert 0.00001 : Set the learning rate of BERT module as 0.00001. max_seq_leng 222 : Set the maximum number of input token lengths of BERT. The model should show 79% logical accuracy (lx) on dev set after 12 hrs (10 epochs). Higher accuracy can be obtained with longer training, by selecting different seed, by using Uncased Large BERT model, or by using execution guided decoding. Add EG argument while running train.py to use execution guided decoding. Whenever higher logical form accuracy calculated on the dev set, following three files are saved on current folder: model_best.pt : the checkpoint of the the sequence to SQL module. model_bert_best.pt : the checkpoint of the BERT module. results_dev.jsonl : json file for official evaluation. Shallow Layer and Decoder Layer models can be trained similarly ( train_shallow_layer.py , train_decoer_layer.py ). Evaluation on WikiSQL DEV set To calculate logical form and execution accuracies on dev set using official evaluation script, Download original WikiSQL dataset . tar xvf data.tar.bz2 Move them under $HOME/data/WikiSQL 1.1/data Set path on evaluation_ws.py . This is the file where the path information has added on original evaluation.py script. Or you can use original evaluation.py by setting the path to the files by yourself. Type python3 evaluation_ws.py on terminal. Evaluation on WikiSQL TEST set Uncomment line 550 557 of train.py to load test_loader and test_table . One test(...) function, use test_loader and test_table instead of dev_loader and dev_table . Save the output of test(...) with save_for_evaluation(...) function. Evaluate with evaluatoin_ws.py as before. Load pre trained SQLova parameters. Pretrained SQLova model parameters are uploaded in release . To start from this, uncomment line 562 565 and set paths. Code base Pretrained BERT models were downloaded from official repository . BERT code is from huggingface pytorch pretrained BERT . The sequence to SQL model is started from the source code of SQLNet and significantly re written while maintaining the basic column attention and sequence to set structure of the SQLNet. Data The data is annotated by using annotate_ws.py which is based on annotate.py from WikiSQL repository. The tokens of natural language guery, and the start and end indices of where conditions on natural language tokens are annotated. Pre trained BERT parameters can be downloaded from BERT official repository and can be coverted to pt file following instruction from huggingface pytorch pretrained BERT . For the conveinience, the annotated WikiSQL data and the PyTorch converted pre trained BERT parameters are available at here . License Copyright 2019 present NAVER Corp. Licensed under the Apache License, Version 2.0 (the License ); you may not use this file except in compliance with the License. You may obtain a copy of the License at Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.",Question Answering,Question Answering 2008,Natural Language Processing,Natural Language Processing,Natural Language Processing,"nlp research/bilm tf This repository supports both (1) training ELMo representations and (2) using pre trained ELMo representaions to your new model Installing packages for training pip install tensorflow gpu 1.2 h5py hgtk Insatlling packaages for using pre trained ELMo pip install allennlp hgtk Using pre trained ELMo representatinos to your new model See usr_dir/embed_with_elmo.py for detailed example. Make sure to set n_characters 262 example during prediction in the options.json . See here . python from allennlp.commands.elmo import ElmoEmbedder import hgtk import preprocess options_file 'path/to/options.json' Make sure to set n_characters 262 weight_file 'path/to/weights.hdf5' elmo ElmoEmbedder(options_file, weight_file) create your ELMo class based on weight and option file sentences '밥을 먹자', 'apple은 맛있다' normalize, split emj to jaso, add bio tag through preprocess.preprocess_and_tokenize() preprocessed_sentences for sentence in sentences: preprocessed_sentences.append(preprocess.preprocess_and_tokenize(sentence)) 'Bㅂㅏㅂ', 'Iㅇㅡㄹ', 'Bㅁㅓㄱ', 'Iㅈㅏ' , 'BA', 'Iㅇㅡㄴ', 'Bㅁㅏㅅ', 'Iㅇㅣㅆ', 'Iㄷㅏ' get ELMo vectors vectors elmo.embed_batch(preprocessed_sentences) return value 'vectors' is list of tensors. Each vector contains each layer of ELMo representations of sentences with shape (number of sentences, number of tokens(emjs), dimension). use elmo.embed_senteces(preprocessed_sentences) to return generator instead of list Training new ELMo model Launch docker container if the docker is not launched. (Only Once) bash cd /path/to/usr_dir/scripts ./run_docker.sh Install system packages and set datetime and timezone. Run this script inside the docker. (Only Once) bash docker attach elmo if you are not inside of docker cd /path/to/usr_dir/scripts ./install_packages.sh Inside the docker, set hyperparameters by editing code in train_elmo.py Edit train.sh to set model name (model directory), vocab file path, train file path. Before training, make sure to convert data files from raw format to train format. See build_data.sh Run train.sh inside the docker Print stream to nohoup file for logging (Recommanded). bash cd /path/to/usr_dir/scripts nohoup ./train.sh & Converting triained model to hdf5 file Either inside or outside of docker, edit and run dump.sh to convert trained model to hdf5 file bash cd /path/to/usr_dir/scripts ./dump.sh NOTE : Check your model path in /path/to/usr_dir/model/model_name/checkpoint if error occurs when running dump.sh to convert trained model to hdf5 file.",Question Answering,Question Answering 2029,Natural Language Processing,Natural Language Processing,Natural Language Processing,"AMANDA This repository contains the source code of the Neural Reading Comprehension based Question Answering system, AMANDA, which is published in AAAI 2018. Overview We experimented on three datasets: NewsQA, TriviaQA and SearchQA. Source code is available in the corresponding directories ( NEWSQA , TRIVIAQA , and SEARCHQA ) for training and testing on every dataset. Additionally, each of the directories contains a separate README file for further details. amanda/ contains the source code of the model architecture. Publication If you use the source code or models from this work, please cite our paper: @article{kundu2018amanda, author {Kundu, Souvik and Ng, Hwee Tou}, title {A Question Focused Multi Factor Attention Network for Question Answering}, booktitle {Proceedings of the Thirty Second {AAAI} Conference on Artificial Intelligence}, month {February}, year {2018}, } You can also find the paper in arxiv: License AMANDA is licensed under the GNU General Public License Version 3. Separate commercial licensing is also available. For more information contact: Souvik Kundu (souvik@comp.nus.edu.sg) Hwee Tou Ng (nght@comp.nus.edu.sg)",Question Answering,Question Answering 2060,Natural Language Processing,Natural Language Processing,Natural Language Processing,"gpt 2 Code from the paper Language Models are Unsupervised Multitask Learners . We have currently released small (117M parameter) and medium (345M parameter) versions of GPT 2. While we have not released the larger models, we have released a dataset for researchers to study their behaviors. See more details in our blog post . Usage This repository is meant to be a starting point for researchers and engineers to experiment with GPT 2. Some caveats GPT 2 models' robustness and worst case behaviors are not well understood. As with any machine learned model, carefully evaluate GPT 2 for your use case, especially if used without fine tuning or in safety critical applications where reliability is important. The dataset our GPT 2 models were trained on contains many texts with biases and factual inaccuracies, and thus GPT 2 models are likely to be biased and inaccurate as well. To avoid having samples mistaken as human written, we recommend clearly labeling samples as synthetic before wide dissemination. Our models are often incoherent or inaccurate in subtle ways, which takes more than a quick read for a human to notice. Work with us Please let us know (mailto:languagequestions@openai.com) if you’re doing interesting research with or working on applications of GPT 2! We’re especially interested in hearing from and potentially working with those who are studying Potential malicious use cases and defenses against them (e.g. the detectability of synthetic text) The extent of problematic content (e.g. bias) being baked into the models and effective mitigations Development See DEVELOPERS.md (./DEVELOPERS.md) Contributors See CONTRIBUTORS.md (./CONTRIBUTORS.md) Citation Please use the following bibtex entry: @article{radford2019language, title {Language Models are Unsupervised Multitask Learners}, author {Radford, Alec and Wu, Jeff and Child, Rewon and Luan, David and Amodei, Dario and Sutskever, Ilya}, year {2019} } Future work We may release code for evaluating the models on various benchmarks. We are still considering release of the larger models. License MIT (./LICENSE)",Question Answering,Question Answering 2100,Natural Language Processing,Natural Language Processing,Natural Language Processing,"BERT exploration Workspace to explore classification and other tasks based on the pytorch implementation of the original bert paper GLUE See this notebook for an implementation of the GLUE tasks. SWAG The Situations With Adversarial Generations (SWAG) dataset contains 113k sentence pair com pletion examples that evaluate grounded common sense inference (Zellers et al., 2018). Given a sentence from a video captioning dataset, the task is to decide among four choices the most plausible continuation. Running bash export SWAG_DIR /home/pfecht/thesis/swagaf python run_swag.py \ bert_model bert base uncased \ do_train \ do_eval \ data_dir $SWAG_DIR/data \ train_batch_size 16 \ learning_rate 2e 5 \ num_train_epochs 3.0 \ max_seq_length 80 \ output_dir /home/pfecht/tmp/swag_output/ \ gradient_accumulation_steps 4 results in Accuracy : 78.58 (BERT paper 81.6 ) 12/14/2018 18:42:18 INFO __main__ eval_accuracy 0.7858642407277817 12/14/2018 18:42:18 INFO __main__ eval_loss 0.6655298910721517 12/14/2018 18:42:18 INFO __main__ global_step 13788 12/14/2018 18:42:18 INFO __main__ loss 0.07108418613090857 with fine tuning time on a single GPU (GeForce GTX TITAN X): around 4 hours . SQuAD > see Running bash $ python run_squad.py \ bert_model bert base uncased \ do_train \ do_predict \ train_file $SQUAD_DIR/train v1.1.json \ predict_file $SQUAD_DIR/dev v1.1.json \ train_batch_size 12 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir $OUT_DIR \ optimize_on_cpu optimize_on_CPU is important to obtain enough space on the GPU for training. BertOptimiezer stores 2 moving averages of the weights of the model wich means We have to store 3 times the size of the model in the GPU if we don't move it to CPU. OOM errors are proportional to train_batch_size and max_seq_length . results in F1 score : 88.28 (BERT paper 88.5) EM (Exact match) : 81.05 (BERT paper 80.8) with fine tuning time on a single GPU (GeForce GTX TITAN X): around 8 hours . running evaluation based on json $ python evaluate v1.1.py /home/pfecht/res/SQUAD/dev v1.1.json predictions.json { f1 : 88.28409344840951, exact_match : 81.05014191106906}",Question Answering,Question Answering 2144,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Question Answering System using End to End Memory Networks This project is the implementation of End to End Memory Networks to build Question Answering System. It takes small story and query as an input and predicts a possibe answer to the query. Here we have used Facebook Babi Dataset to train the model. After training, user can give similar stories and query as input and it would give you accurate results. Dependencies tensorflow keras functools tarfile re Architecture of End to End Memory Networks ! alt text Results/Observations Layers Dropouts Batch size Epochs Results LSTM(32) (0.3) 32 100 94.6% LSTM(64) (0.3) 32 100 96.5% LSTM(32), LSTM(32) (0.5, 0.5) 32 100 92.4% LSTM(32), LSTM(32) (0.5, 0.5) 32 200 96.9% GRU(32) (0.3) 32 100 86.4% GRU(64) (0.3) 32 100 87.4% GRU(32), GRU(32) (0.5, 0.5) 64 100 52.6% The models with two or more layers required more training since there are more parameters that need to be set, but then have greater accuracies than the other models once trained completely. Overall, LSTM based models performed better than GRU based models for this task. References Jason Weston, Antoine Bordes, Sumit Chopra, Tomas Mikolov, Alexander M. Rush, Towards AI Complete Question Answering: A Set of Prerequisite Toy Tasks , Sainbayar Sukhbaatar, Arthur Szlam, Jason Weston, Rob Fergus, End To End Memory Networks ,",Question Answering,Question Answering 2201,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Query based Information Extraction System using Deep Learning This is an implementation of attention based End To End Memory Network architecture to build a Question Answering model. The reference paper is MemN2N . The model is GPU enabled and supports Adjacent Weight Tying, Position Encoding, Temporal Encoding and Linear Start. The data corpus used is the collection of internal policies belonging to ITC Infotech Limited Organization. The goal was to provide employees an easier information access to the organization's policies. The accuracy achieved was 93%. Here is a sample output: ! alt text (output.png) File Descriptions: preprocess.py: a Python script to index the data. train.lua: a Lua script to train and load the QA model. interact.lua: a Lua script to load the saved model and querying user requirements. model.dot: a grahical representation of the model configuration. output.png: a sample output screenshot.",Question Answering,Question Answering 2207,Natural Language Processing,Natural Language Processing,Natural Language Processing,Recurrent Relational Networks for Complex Relational Reasoning Contains the code to reproduce the experiments in the paper Recurrent Relational Networks for Complex Relational Reasoning Sudoku python tasks/sudoku/train.py will train a RRN with the hyper parameters from the paper and save the best model in the working directory python tasks/sudoku/test.py will load the best model from the working directory and evaluate it on the test set BaBi python tasks/babi/train.py will train a RRN with the hyper parameters from the paper and save the best model in the working directory python tasks/babi/test.py will load the best model from the working directory and evaluate it on the test set,Question Answering,Question Answering 2211,Natural Language Processing,Natural Language Processing,Natural Language Processing,MemN2Ns Implementation of End to End Memory Networks Presentation:,Question Answering,Question Answering 2238,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Attention Sum Reader Introduction This is a Theano/Blocks implementation of the Attention Sum Reader model as presented in Text Comprehension with the Attention Sum Reader Network available at We encourage you to familiarize yourself with the model by reading the above article prior to studying the particulars of this implementation. Quick start If you want to get started as fast as possible try this: ./prerequisites.sh cd asreader ./quick start cbt ne.sh If you do not have a GPU available, remove the device gpu flag from quick start generic.sh. However note that training the text comprehension tasks on a CPU is likely to take a prohibitively long time. This should install the prerequisites, download the CBT dataset, train two models on the named entity part of the data, form an ensemble and report the accuracies. License © Copyright IBM Corporation. 2016. This licensed material is licensed for academic, non commercial use only . The licensee may use, copy, and modify the licensed materials in any form without payment to IBM for the sole purposes of evaluating and extending the licensed materials for non commercial purposes. The above copyright notice and this permission notice shall be included in all copies or substantial portions of the licensed materials.. Notwithstanding anything to the contrary, IBM PROVIDES THE LICENSED MATERIALS ON AN AS IS BASIS AND IBM DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, ANY IMPLIED WARRANTIES OR CONDITIONS OF MERCHANTABILITY, SATISFACTORY QUALITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE, AND ANY WARRANTY OR CONDITION OF NON INFRINGEMENT. IBM SHALL NOT BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY OR ECONOMIC CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR OPERATION OF THE LICENSED MATERIALS. IBM SHALL NOT BE LIABLE FOR LOSS OF, OR DAMAGE TO, DATA, OR FOR LOST PROFITS, BUSINESS REVENUE, GOODWILL, OR ANTICIPATED SAVINGS. IBM HAS NO OBLIGATION TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS OR MODIFICATIONS TO THE LICENSED MATERIALS. Detailed usage Installation Quick installation Provided you have python with pip installed, running prerequisites.sh should install Blocks and dependencies for you. It also downloads the Children's Book Test dataset and the CNN and Daily Mail news datasets. We are aware that the news data download sometimes crashes. Rerunning the script prepare rec data.sh should be able to resume the download if that happens (alternatively you can download the datasets from However if you prefer to install the dependencies by yourself, some details are below: Dependencies: 1. HDF5 (required for installing Blocks) In the Debian/Ubuntu family of distributions, you should be able to install the library using sudo apt get install libhdf5 serial dev Otherwise installation instructions and source download can be found at 2. Blocks and its dependencies Installation instructions can be found at blocks.readthedocs.io/en/latest/setup.html. You should be able to install Blocks including Theano and other dependencies using pip by running pip install git+ r user It is important to use older version of Blocks since the latest version isn’t backward compatible. 3. NLTK + punkt corpus This tokenizer that we use for reading the bAbI datasets can be installed using pip install nltk user python m nltk.downloader punkt Getting data Children Book Test Children Book Test data should be already downloaded by the quick start script. If you skipped this script you can prepare the data by prepare cbt data.sh CNN and Daily Mail The best way how to get the CNN and DailyMail datasets is to download the questions and stories files from Place them into folder $CLONE_DIR/data and run a script $CLONE_DIR/data/prepare rc data downloaded.sh . Alternatively you can use a script $CLONE_DIR/data/prepare rc data.sh that downloads the data using the original scripts from However, the news data download sometimes crashes. Therefore it is often necessary to download missing articles by re running generate_questions.py script. Now when you have CNN and DailyMail datasets you can use them to train the models: cd asreader ./quick start cnn.sh ./quick start dm.sh Training models The model can be trained by running the text comprehension/as_reader.py script. The simplest usage is: python text_comprehension/as_reader.py dataset_root data/CBTest/data/ train train.txt valid valid.txt test test.txt where the .txt files are the appropriate datasets. Some of the recommended configurations can be copied from the quick start cbt ne.sh script You may need to prepend the following prefixes in front of the command and run it from the project root directory THEANO_FLAGS floatX float32,device gpu PYTHONPATH $PYTHONPATH:./ Some of the most useful command line arguments you may wish to use are the following dataset_type cbt cnn babi the type of dataset that is being used. Defaults to the Children's Book Test b 32 ... batch size larger values usually speed up training however increase the memory usage sed 256 ... source embedding dimension ehd 256 ... the number of hidden units in each half of the bidirectional GRU encoders lr 0.001 ... learning rate output_dir ... output directory for the validation and test prediction files patience_metric accuracy ... when this metric stops improving, training is eventually stopped p 1 ... the number of epochs for which training continues since achieving the best value of the patience metric own_eval ... runs a script that eb append_metaparams ... includes the metaparameters in the filename of the generated prediction files useful when generating multiple models weighted_att ... instead of attention sum, use the weighted attention model to which we compare the ASReader in the paper The full list of parameters with descriptions can be displayed by running the script with the h flag. Ensembling as_reader.py can generate the predictions for the test and validation datasets into the output directory. By default the predictions are generated every epoch. The text_comprehension/eval/copyBestPredictions directory can then be used to find the time at which model achieved the best validation accuracy and it copies the corresponding validation and test predictions to a separate folder. An example syntax is python text_comprehension/eval/copyBestPredictions.py vp cbtest_NE_valid_2000ex.txt. tp cbtest_NE_test_2500ex.txt. i out_dir o out_dir/best_predictions where vp and tp give the prefixes of the validation and test predictions respectively. These are usually the validation and test dataset filenames. Once the best_predictions directory contains only one test and one validation prediction for each model, we can fuse these using the text_comprehension/eval/fusion.py for instance using the following command: python text_comprehension/eval/fusion.py pr out_dir/best_predictions/ .y_hat_valid o $OUT_DIR/best_predictions/simple_fusion.y_hat t foo fusion_method AverageAll where pr gives an expression for the validation predictions to be used and o specifies the file to output. The script provides three methods of fusion toggled by the fusion_method parameter: AverageAll the ensemble prediction is a mean of all the supplied single model predictions pBest sorts the candidate models by validation accuracy and selects the best proportion p of models to form the ensemble AddImprover sorts the candidate models by validation accuracy and then tries adding them to the ensemble in that order keeping each model in the ensemble only if it improves its val. accuracy Contributors Rudolf Kadlec, Martin Schmid, Ondrej Bajgar, Tamir Klinger, Ladislav Kunc, Jan Kleindienst",Question Answering,Question Answering 2265,Natural Language Processing,Natural Language Processing,Natural Language Processing,"doc2vec workshop Q: How can we embed not just a word, but a whole sentence/paragraph/document? A1: Averaged word embedding A2: doc2vec / paragraph vectors Paper 1: Paper 2: These notebooks provide an excercise in gensim's doc2vec models and t SNE visualisation of the result.",Question Answering,Question Answering 2286,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Contextualized Word Vectors (CoVe) This repo provides the best, pretrained MT LSTM from the paper Learned in Translation: Contextualized Word Vectors (McCann et. al. 2017) . For a high level overview of why CoVe are great, check out the post . This repository uses a PyTorch implementation of the MTLSTM class in mtlstm.py to load a pretrained encoder, which takes in sequences of vectors pretrained with GloVe and outputs CoVe. Need CoVe in Tensorflow? A Keras/TensorFlow implementation of the MT LSTM/CoVe can be found at Unknown Words Out of vocabulary words for CoVe are also out of vocabulary for GloVe, which should be rare for most use cases. During training the CoVe encoder would have received a zero vector for any words that were not in GloVe, and it used zero vectors for unknown words in our classification and question answering experiments, so that is recommended. You could also try initializing unknown inputs to something close to GloVe vectors instead, but we have no experiments suggesting that this would work better than zero vectors. If you wanted to try this, GloVe vectors follow (very roughly) a Gaussian with mean 0 and standard deviation 0.4. You could initialize by randomly drawing from that distrubtion, but you would probably want to train those embeddings while keeping the CoVe encoder (MTLSTM) and GloVe fixed. Example Usage The following example can be found in test/example.py . It demonstrates a few different variations of how to use the pretrained MTLSTM class that generates contextualized word vectors (CoVe) programmatically. Running with Docker Install Docker . Install nvidia docker if you would like to use with with a GPU. bash docker pull bmccann/cove pull the docker image On CPU docker run it rm v pwd /.embeddings:/.embeddings/ v pwd /.data/:/.data/ bmccann/cove bash c python /test/example.py device 1 On GPU nvidia docker run it rm v pwd /.embeddings:/.embeddings/ v pwd /.data/:/.data/ bmccann/cove bash c python /test/example.py Running without Docker Install PyTorch . bash git clone use ssh: git@github.com:salesforce/cove.git cd cove pip install r requirements.txt python setup.py develop On CPU python test/example.py device 1 On GPU python test/example.py Re training CoVe There is also the third option if you are operating in an entirely different context retrain the bidirectional LSTM using trained embeddings. If you are mostly encoding a non English language, that might be the best option. Check out the paper for details; code for this is included in the directory OpenNMT py, which was forked from OpenNMT py a long while back and includes changes we made to the repo internally. References If using this code, please cite: B. McCann, J. Bradbury, C. Xiong, R. Socher, Learned in Translation: Contextualized Word Vectors @inproceedings{mccann2017learned, title {Learned in translation: Contextualized word vectors}, author {McCann, Bryan and Bradbury, James and Xiong, Caiming and Socher, Richard}, booktitle {Advances in Neural Information Processing Systems}, pages {6297 6308}, year {2017} } Contact: bmccann@salesforce.com (mailto:bmccann@salesforce.com)",Question Answering,Question Answering 2355,Natural Language Processing,Natural Language Processing,Natural Language Processing,"做法 bilm tf Tensorflow implementation of the pretrained biLM used to compute ELMo representations from Deep contextualized word representations . This repository supports both training biLMs and using pre trained models for prediction. We also have a pytorch implementation available in AllenNLP . You may also find it easier to use the version provided in Tensorflow Hub if you just like to make predictions. Citation: @inproceedings{Peters:2018, author {Peters, Matthew E. and Neumann, Mark and Iyyer, Mohit and Gardner, Matt and Clark, Christopher and Lee, Kenton and Zettlemoyer, Luke}, title {Deep contextualized word representations}, booktitle {Proc. of NAACL}, year {2018} } Installing Install python version 3.5 or later, tensorflow version 1.2 and h5py: pip install tensorflow gpu 1.2 h5py python setup.py install Ensure the tests pass in your environment by running: python m unittest discover tests/ Installing with Docker To run the image, you must use nvidia docker, because this repository requires GPUs. sudo nvidia docker run t allennlp/bilm tf:training gpu Using pre trained models We have several different English language pre trained biLMs available for use. Each model is specified with two separate files, a JSON formatted options file with hyperparameters and a hdf5 formatted file with the model weights. Links to the pre trained models are available here . There are three ways to integrate ELMo representations into a downstream task, depending on your use case. 1. Compute representations on the fly from raw text using character input. This is the most general method and will handle any input text. It is also the most computationally expensive. 2. Precompute and cache the context independent token representations, then compute context dependent representations using the biLSTMs for input data. This method is less computationally expensive then 1, but is only applicable with a fixed, prescribed vocabulary. 3. Precompute the representations for your entire dataset and save to a file. We have used all of these methods in the past for various use cases. 1 is necessary for evaluating at test time on unseen data (e.g. public SQuAD leaderboard). 2 is a good compromise for large datasets where the size of the file in 3 is unfeasible (SNLI, SQuAD). 3 is a good choice for smaller datasets or in cases where you'd like to use ELMo in other frameworks. In all cases, the process roughly follows the same steps. First, create a Batcher (or TokenBatcher for 2) to translate tokenized strings to numpy arrays of character (or token) ids. Then, load the pretrained ELMo model (class BidirectionalLanguageModel ). Finally, for steps 1 and 2 use weight_layers to compute the final ELMo representations. For 3, use BidirectionalLanguageModel to write all the intermediate layers to a file. Shape conventions Each tokenized sentence is a list of str , with a batch of sentences a list of tokenized sentences ( List List str ). The Batcher packs these into a shape (n_sentences, max_sentence_length + 2, 50) numpy array of character ids, padding on the right with 0 ids for sentences less then the maximum length. The first and last tokens for each sentence are special begin and end of sentence ids added by the Batcher . The input character id placeholder can be dimensioned (None, None, 50) , with both the batch dimension (axis 0) and time dimension (axis 1) determined for each batch, up the the maximum batch size specified in the BidirectionalLanguageModel constructor. After running inference with the batch, the return biLM embeddings are a numpy array with shape (n_sentences, 3, max_sentence_length, 1024) , after removing the special begin/end tokens. Vocabulary file The Batcher takes a vocabulary file as input for efficency. This is a text file, with one token per line, separated by newlines ( \n ). Each token in the vocabulary is cached as the appropriate 50 character id sequence once. Since the model is completely character based, tokens not in the vocabulary file are handled appropriately at run time, with a slight decrease in run time. It is recommended to always include the special and tokens (case sensitive) in the vocabulary file. ELMo with character input See usage_character.py for a detailed usage example. ELMo with pre computed and cached context independent token representations To speed up model inference with a fixed, specified vocabulary, it is possible to pre compute the context independent token representations, write them to a file, and re use them for inference. Note that we don't support falling back to character inputs for out of vocabulary words, so this should only be used when the biLM is used to compute embeddings for input with a fixed, defined vocabulary. To use this option: 1. First create a vocabulary file with all of the unique tokens in your dataset and add the special and tokens. 2. Run dump_token_embeddings with the full model to write the token embeddings to a hdf5 file. 3. Use TokenBatcher (instead of Batcher ) with your vocabulary file, and pass use_token_inputs False and the name of the output file from step 2 to the BidirectonalLanguageModel constructor. See usage_token.py for a detailed usage example. Dumping biLM embeddings for an entire dataset to a single file. To take this option, create a text file with your tokenized dataset. Each line is one tokenized sentence (whitespace separated). Then use dump_bilm_embeddings . The output file is hdf5 format. Each sentence in the input data is stored as a dataset with key str(sentence_id) where sentence_id is the line number in the dataset file (indexed from 0). The embeddings for each sentence are a shape (3, n_tokens, 1024) array. See usage_cached.py for a detailed example. Training a biLM on a new corpus Broadly speaking, the process to train and use a new biLM is: 1. Prepare input data and a vocabulary file. 2. Train the biLM. 3. Test (compute the perplexity of) the biLM on heldout data. 4. Write out the weights from the trained biLM to a hdf5 file. 5. See the instructions above for using the output from Step 4 in downstream models. 1. Prepare input data and a vocabulary file. To train and evaluate a biLM, you need to provide: a vocabulary file a set of training files a set of heldout files The vocabulary file is a a text file with one token per line. It must also include the special tokens , and (case sensitive) in the file. IMPORTANT : the vocabulary file should be sorted in descending order by token count in your training data. The first three lines should be the special tokens ( , and ), then the most common token in the training data, ending with the least common token. NOTE : the vocabulary file used in training may differ from the one use for prediction. The training data should be randomly split into many training files, each containing one slice of the data. Each file contains pre tokenized and white space separated text, one sentence per line. Don't include the or tokens in your training data. All tokenization/normalization is done before training a model, so both the vocabulary file and training files should include normalized tokens. As the default settings use a fully character based token representation, in general we do not recommend any normalization other then tokenization. Finally, reserve a small amount of the training data as heldout data for evaluating the trained biLM. 2. Train the biLM. The hyperparameters used to train the ELMo model can be found in bin/train_elmo.py . The ELMo model was trained on 3 GPUs. To train a new model with the same hyperparameters, first download the training data from the 1 Billion Word Benchmark . Then download the vocabulary file . Finally, run: export CUDA_VISIBLE_DEVICES 0,1,2 python bin/train_elmo.py \ train_prefix '/path/to/1 billion word language modeling benchmark r13output/training monolingual.tokenized.shuffled/ ' \ vocab_file /path/to/vocab 2016 09 10.txt \ save_dir /output_path/to/checkpoint 3. Evaluate the trained model. Use bin/run_test.py to evaluate a trained model, e.g. export CUDA_VISIBLE_DEVICES 0 python bin/run_test.py \ test_prefix '/path/to/1 billion word language modeling benchmark r13output/heldout monolingual.tokenized.shuffled/news.en.heldout 000 ' \ vocab_file /path/to/vocab 2016 09 10.txt \ save_dir /output_path/to/checkpoint 4. Convert the tensorflow checkpoint to hdf5 for prediction with bilm or allennlp . First, create an options.json file for the newly trained model. To do so, follow the template in an existing file (e.g. the original options.json and modify for your hyperpararameters. Important : always set n_characters to 262 after training (see below). Then Run: python bin/dump_weights.py \ save_dir /output_path/to/checkpoint outfile /output_path/to/weights.hdf5 Frequently asked questions and other warnings Can you provide the tensorflow checkpoint from training? The tensorflow checkpoint is available by downloading these files: vocabulary checkpoint options 1 2 3 How to do fine tune a model on additional unlabeled data? First download the checkpoint files above. Then prepare the dataset as described in the section Training a biLM on a new corpus , with the exception that we will use the existing vocabulary file instead of creating a new one. Finally, use the script bin/restart.py to restart training with the existing checkpoint on the new dataset. For small datasets (e.g. , and . You can find our vocabulary file here . At the model input, all text used the full character based representation, including tokens outside the vocab. For the softmax output we replaced OOV tokens with . The model was trained with a fixed size window of 20 tokens. The batches were constructed by padding sentences with and , then packing tokens from one or more sentences into each row to fill completely fill each batch. Partial sentences and the LSTM states were carried over from batch to batch so that the language model could use information across batches for context, but backpropogation was broken at each batch boundary. Why do I get slightly different embeddings if I run the same text through the pre trained model twice? As a result of the training method (see above), the LSTMs are stateful, and carry their state forward from batch to batch. Consequently, this introduces a small amount of non determinism, expecially for the first two batches. Why does training seem to take forever even with my small dataset? The number of gradient updates during training is determined by: the number of tokens in the training data ( n_train_tokens ) the batch size ( batch_size ) the number of epochs ( n_epochs ) Be sure to set these values for your particular dataset in bin/train_elmo.py . What's the deal with n_characters and padding? During training, we fill each batch to exactly 20 tokens by adding and to each sentence, then packing tokens from one or more sentences into each row to fill completely fill each batch. As a result, we do not allocate space for a special padding token. The UnicodeCharsVocabulary that converts token strings to lists of character ids always uses a fixed number of character embeddings of n_characters 261 , so always set n_characters 261 during training. However, for prediction, we ensure each sentence is fully contained in a single batch, and as a result pad sentences of different lengths with a special padding id. This occurs in the Batcher see here . As a result, set n_characters 262 during prediction in the options.json . How can I use ELMo to compute sentence representations? Simple methods like average and max pooling of the word level ELMo representations across sentences works well, often outperforming supervised methods on benchmark datasets. See Evaluation of sentence embeddings in downstream and linguistic probing tasks , Perone et al, 2018 arxiv link . I'm seeing a WARNING when serializing models, is it a problem? The below warning can be safely ignored: 2018 08 24 13:04:08,779 : WARNING : Error encountered when serializing lstm_output_embeddings. Type is unsupported, or the types of the items don't match field type in CollectionDef. 'list' object has no attribute 'name'",Question Answering,Question Answering 2362,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Memory Augmented GCN (MemGCN) Overview This repository contains TensorFlow code for implementing Memory Augmented Graph Convolutional Network multi modal data learning in healthcare. The rationale behind the model is that both patient health records and neuroimages are important for disease understanding because of their complementary aspects of diseases. In detail, the proposed method MemGCN is a matching network embeds multi hop memory augmented graph convolutions and can be trained in an end to end fashion with stochastic optimization. The brain connectivity graphs are transformed by graph convolutional networks into representations, while the external memory mechanism is in charge of iteratively (multiple hops) reading clinical sequences and choosing what to retrieve from memories so that the representations learned by graph convolution can be augmented. Memory Augmentation The key contribution of MemGCN is incorporating sequential records into the representation learning of brain connectivity in terms of memories. By pushing the clinical sequences into the memories, the continuous representations of this external information are processed with brain graphs together so that a more comprehensive diagnosis could be made. The above figure is an illustration of memory augmented graph convolution in a single hop (the 1 st hop). This repository contains the slides we presented in ICDM 2018. MemGCN provides a learning strategy for multi modality data with sequential and graph structure in general scenarios. The code is documented and should be easy to modify for your own applications. Requirements This package has the following requirements: An NVIDIA GPU. Python 3.x TensorFlow 1.4 Usage How to Run To run MemGCN on your data, you need to: change the function of loading data in utils.py; set hyperparameters for MemGCN in memgcn.sh; run the shell script memgcn.sh bash bash memgcn.sh Additional Material There is implementations used in: Sainbayar Sukhbaatar, Arthur Szlam, Jason Weston, Rob Fergus, End To End Memory Networks , Neural Information Processing Systems (NIPS), 2015. Michaël Defferrard, Xavier Bresson, Pierre Vandergheynst, Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering , Neural Information Processing Systems (NIPS), 2016. Sofia Ira Ktena, Sarah Parisot, Enzo Ferrante, Martin Rajchl, Matthew Lee, Ben Glocker, Daniel Rueckert, Distance Metric Learning using Graph Convolutional Networks: Application to Functional Brain Networks , Medical Image Computing and Computer Assisted Interventions (MICCAI), 2017. References If you happen to use our work, please consider citing our paper: @inproceedings{zhang2018integrative, title {Integrative Analysis of Patient Health Records and Neuroimages via Memory based Graph Convolutional Network}, author {Zhang, Xi and Chou, Jingyuan and Wang, Fei}, booktitle {2018 IEEE International Conference on Data Mining (ICDM)}, pages {767 776}, year {2018}, organization {IEEE} } This paper can be accessed on : Memory based GCN",Question Answering,Question Answering 2367,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Contextualized Word Representations for Reading Comprehension Shimi Salant and Jonathan Berant Requirements Theano , Matplotlib , Java Setup (1): Preparing SQuAD bash $ python setup.py prepare squad Downloads GloVe word embeddings and Stanford CoreNLP . Once downloaded, SQuAD's training and development sets will be pre processed and tokenized. Setup (2): Preparing pre trained LM bash $ python setup.py prepare lm Downloads the pre trained (TensorFlow) language model released along 1 . Setup (3): Encoding SQuAD via the LM Internal representations of the LM (when operated over SQuAD's questions and paragraphs) are calculated offline and saved to disk in shards. In order to manufacture and persist a shard, execute: bash $ python setup.py lm encode dataset DATASET sequences SEQUENCES layer LAYER num_shards NUM_SHARDS shard SHARD device DEVICE Where DATASET is either train or dev ; SEQUENCES is either contexts or questions ; and LAYER is L1 , L2 or EMB corresponding to _LM(L1)_, _LM(L2)_ and _LM(emb)_ in the paper, respectively. Since this is a lengthy process, it can be carried out in parallel if multiple GPUs are available: specify the number of shards to produce via NUM_SHARDS , the current shard to work on via SHARD , and the device to use via DEVICE ( cpu or an indexed GPU specifications e.g. gpu0 ). For example, in order to manufacture the first out of 4 shards via the first GPU when producing _LM(L1)_ encodings for the training dataset's paragraphs, execute: bash $ python setup.py lm encode dataset train sequences contexts layer L1 num_shards 4 shard 1 device gpu0 Training and Validation bash $ python main.py name NAME mode MODE lm_layer LM_LAYER device DEVICE Supply an arbitrary name as NAME (log file will be named as such), and set MODE to one of: TR , TR_MLP or LM which respectively correspond to _TR_, _TR(MLP)_ and to the LM based variants from the paper. If LM is chosen, specify the internal LM representation to utilize by setting LM_LAYER to one of: L1 , L2 , or EMB . Results Validation set: Model EM F1 : : : RaSoR (base model 2 ) 70.6 78.7 RaSoR + TR(MLP) 72.5 79.9 RaSoR + TR 75.0 82.5 RaSoR + TR + LM(emb) 75.8 83.0 RaSoR + TR + LM(L1) 77.0 84.0 RaSoR + TR + LM(L2) 76.1 83.3 Test set results available on SQuAD's leaderboard . Tested in the following environment: Ubuntu 14.04 Python 2.7.6 NVIDIA CUDA 8.0.44 and cuDNN 5.1.5 Theano 0.8.2 TensorFlow 0.11.0rc1 Matplotlib 1.3.1 Oracle JDK 8 1 Rafal Jozefowicz, Oriol Vinyals, Mike Schuster, Noam Shazeer, and Yonghui Wu. 2016. Exploring the limits of language modeling. CoRR abs/1602.02410 2 Kenton Lee, Shimi Salant, Tom Kwiatkowski, Ankur P. Parikh, Dipanjan Das, and Jonathan Berant. 2016. Learning recurrent span representations for extractive question answering. CoRR abs/1611.01436.",Question Answering,Question Answering 2368,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Learning Recurrent Span Representations for Extractive Question Answering Requirements Theano , Matplotlib , Java Initial setup bash $ python setup.py This will download GloVe word embeddings and tokenize raw training / development data. (download will be skipped if zipped GloVe file is manually placed in data directory). Training bash $ python rasor.py device DEVICE train where DEVICE is cpu , or an indexed GPU specification e.g. gpu0 . When specifying a certain GPU, the theano device flag must be set to cpu , i.e. set device cpu in your .theanorc file. Making predictions bash $ python rasor.py device DEVICE test_json_path pred_json_path where test_json_path is the path of a JSON file containing articles, paragraphs and questions (see SQuAD website for specification of JSON structure), and pred_json_path is the path to write predictions to. Tested in the following environment: Ubuntu 14.04 Python 2.7.6 NVIDIA CUDA 8.0.44 and cuDNN 5.1.5 Theano 0.8.2 Matplotlib 1.3.1 Oracle JDK 8",Question Answering,Question Answering 2377,Natural Language Processing,Natural Language Processing,Natural Language Processing,"SeqMatchSeq Implementations of three models described in the three papers related to sequence matching: Learning Natural Language Inference with Lstm by Shuohang Wang, Jing Jiang Machine Comprehension Using Match LSTM and Answer Pointer by Shuohang Wang, Jing Jiang A Compare Aggregate Model for Matching Text Sequences by Shuohang Wang, Jing Jiang Learning Natural Language Inference with Lstm Requirements Torch7 nn nngraph optim Python 2.7 Datasets The Stanford Natural Language Inference (SNLI) Corpus GloVe: Global Vectors for Word Representation Usage sh preprocess.sh snli cd main th main.lua task snli model mLSTM dropoutP 0.3 num_classes 3 sh preprocess.sh snli will download the datasets and preprocess the SNLI corpus into the files (train.txt dev.txt test.txt) under the path data/snli/sequence with the format: >sequence1(premise) \t sequence2(hypothesis) \t label(from 1 to num_classes) \n main.lua will first initialize the preprossed data and word embeddings into a Torch format and then run the alogrithm. dropoutP is the main prarameter we tuned. Docker You may try to use Docker for running the code. Docker Install Image : docker pull shuohang/seqmatchseq:1.0 After installation, just run the following codes (/PATH/SeqMatchSeq need to change): docker run it v /PATH/SeqMatchSeq:/opt rm w /opt shuohang/seqmatchseq:1.0 /bin/bash c sh preprocess.sh snli docker run it v /PATH/SeqMatchSeq:/opt rm w /opt/main shuohang/seqmatchseq:1.0 /bin/bash c th main.lua Machine Comprehension Using Match LSTM and Answer Pointer Requirements Torch7 nn nngraph optim parallel Python 2.7 Python Packages: NLTK , collections, json, argparse NLTK Data : punkt Multiple cores CPU Datasets Stanford Question Answering Dataset (SQuAD) GloVe: Global Vectors for Word Representation Usage sh preprocess.sh squad cd main th mainDt.lua sh preprocess.sh squad will download the datasets and preprocess the SQuAD corpus into the files (train.txt dev.txt) under the path data/squad/sequence with the format: >sequence1(Doument) \t sequence2(Question) \t sequence of the positions where the answer appear in Document (e.g. 3 4 5 6) \n mainDt.lua will first initialize the preprossed data and word embeddings into a Torch format and then run the alogrithm. As this code is run through multiple CPU cores, the initial parameters are written in the file main/init.lua . opt.num_processes : 5. The number of threads used. opt.batch_size : 6. Batch size for each thread. (Then the mini_batch would be 5 6 .) opt.model : boundaryMPtr / sequenceMPtr Docker You may try to use Docker for running the code. Docker Install Image : docker pull shuohang/seqmatchseq:1.0 After installation, just run the following codes (/PATH/SeqMatchSeq need to change): docker run it v /PATH/SeqMatchSeq:/opt rm w /opt shuohang/seqmatchseq:1.0 /bin/bash c sh preprocess.sh squad docker run it v /PATH/SeqMatchSeq:/opt rm w /opt/main shuohang/seqmatchseq:1.0 /bin/bash c th mainDt.lua A Compare Aggregate Model for Matching Text Sequences Requirements Torch7 nn nngraph optim Python 2.7 Datasets The Stanford Natural Language Inference (SNLI) Corpus MovieQA: Story Understanding Benchmark InsuranceQA Corpus V1: Answer Selection Task WikiQA: A Challenge Dataset for Open Domain Question Answering GloVe: Global Vectors for Word Representation For now, this code only support SNLI and WikiQA data sets. Usage SNLI task (The preprocessed format follows the previous description): sh preprocess.sh snli cd main th main.lua task snli model compAggSNLI comp_type submul learning_rate 0.002 mem_dim 150 dropoutP 0.3 WikiQA task: sh preprocess.sh wikiqa (Please first dowload the file WikiQACorpus.zip to the path SeqMatchSeq/data/wikiqa/ through address: cd main th main.lua task wikiqa model compAggWikiqa comp_type mul learning_rate 0.004 dropoutP 0.04 batch_size 10 mem_dim 150 model (model name) : compAggSNLI / compAggWikiqa comp_type (8 different types of word comparison): submul / sub / mul / weightsub / weightmul / bilinear / concate / cos Docker You may try to use Docker for running the code. Docker Install Image : docker pull shuohang/seqmatchseq:1.0 After installation, just run the following codes (/PATH/SeqMatchSeq need to change): For SNLI: docker run it v /PATH/SeqMatchSeq:/opt rm w /opt shuohang/seqmatchseq:1.0 /bin/bash c sh preprocess.sh snli docker run it v /PATH/SeqMatchSeq:/opt rm w /opt/main shuohang/seqmatchseq:1.0 /bin/bash c th main.lua task snli model compAggSNLI comp_type submul learning_rate 0.002 mem_dim 150 dropoutP 0.3 For WikiQA docker run it v /PATH/SeqMatchSeq:/opt rm w /opt shuohang/seqmatchseq:1.0 /bin/bash c sh preprocess.sh wikiqa docker run it v /PATH/SeqMatchSeq:/opt rm w /opt/main shuohang/seqmatchseq:1.0 /bin/bash c th main.lua task wikiqa model compAggWikiqa comp_type mul learning_rate 0.004 dropoutP 0.04 batch_size 10 mem_dim 150 Copyright Copyright 2015 Singapore Management University (SMU). All Rights Reserved.",Question Answering,Question Answering 2379,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Reading Comprehension Experiments About This is the tensorflow version implementation/reproduce of some reading comprehension models in some reading comprehension datasets including the following: Models: Attention Sum Reader model as presented in Text Comprehension with the Attention Sum Reader Network (ACL2016) available at ! Attention over Attention Reader model as presented in Attention over Attention Neural Networks for Reading Comprehension (arXiv2016.7) available at ! Datasets: CBT, Children’s Book Test. Start To Use 1.Clone the code shell git clone 2.Get needed data Download and extract the dataset used in this repo. shell cd data ./prepare all.sh 3.Environment Preparation Python 64bit > v3.5. Install require libraries using the following command. shell pip install r requirements.txt Install tensorflow > 1.1.0. shell pip install tensorflow gpu upgrade Install nltk punkt for tokenizer. shell python m nltk.downloader punkt 4.Set model, dataset and other command parameters What is the entrance of the program? The main.py file in root directory. How can I specify a model in command line? Type a command like above, the model_class is the class name of model, usually named in cambak style: shell python main.py model_class For example, if you want to use AttentionSumReader: shell python main.py AttentionSumReader How can I specify the dataset? Type a command like above, the dataset_class is the class name of dataset: shell python main.py model_class dataset dataset_class For example, if you want to use CBT: shell python main.py model_class dataset CBT You don't need to specify the data_root and train valid test file name in most cases, just specify the dataset. How can I know all the parameters? The program use argparse to deal with parameters, you can type the following command to get help: shell python main.py help or: shell python main.py h The command parameters is so long! The parameters will be stored into a file named args.json when executed, so next time you can type the following simplified command: shell python main.py model_class args_file args.json 5.Train and test the model First, modify the parameters in the args.json. You can now train and test the model by entering the following commands. The params in should be determined by the real situation. Train: shell python main.py model_class args_file args.json train 1 test 0 After train, the parameters are stored in weight_path/args.json and the model checkpoints are stored in weight_path . Test: shell python main.py model_class args_file args.json train 0 test 1 After test, the performance of model are stored in weight_path/result.json . 6.model performance All the trained results and corresponding config params are saved in sub directories of weight_path(by default the weight folder) named args.json and result.json . You should know that the implementation of some models are slightly different from the original, but the basic ideas are same, so the results are for reference only. The best results of implemented models are listed below: best result we achieve (with little hyper parameter tune in single model) best result listed in original paper(in the brackets) CBT NE CBT CN AS Reader 69.88(68.6) 65.0(63.4) AoA Reader 71.0(72.0) 68.12(69.4) 7.FAQ How do I use args_file argument in the shell? Once you enter a command in the shell(maybe a long one), the config will be stored in weight_path/args.json where weight_path is defined by another argument, after the command execute you can use args.json to simplify the following command: shell python main.py model_class args_file args.json And the priorities of arguments typed in the command line is higher than those stored in args.json, so you can change some arguments.",Question Answering,Question Answering 2381,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Recurrent Entity Networks Tensorflow/TFLearn Implementation of Tracking the World State with Recurrent Entity Networks by Henaff et. al. Punchline By building a set of disparate memory cells, each responsible for different concepts, entities, or other content, Recurrent Entity Networks (EntNets) are able to efficiently and robustly maintain a “world state” one that can be updated easily and effectively with the influx of new information. Furthermore, one can either let EntNet cell keys vary, or specifically seed them with specific embeddings, thereby forcing the model to track a given set of entities/objects/locations, allowing for the easy interpretation of the underlying decision making process. Results Implementation results are as follows (graphs of training/validation loss will be added later). Some of the tasks are fairly computationally intensive, so it might take a while to get benchmark results. Note that for efficiency, training stopped after validation accuracy passed a threshold of 95%. This is different than the method used in the paper, which runs tasks for 200 epochs, and reports the best model across 10 different runs. The number of runs, epochs to converge, and final train/validation/test accuracies (best on validation over different runs) for this repository relative to the paper results are as follows: Note that the italics above indicate examples of overfitting . Note that the notes rows consist of single runs of the model this is probably why multiple runs are necessary. If this continues to happen, I'll look into ways to better regularize the network (via dropout, for example). The bold above denotes failure to convergence. I'm not sure why this is happening, but I'll note that Jim Fleming reports the same sort of issue in his implementation . Additionally, plots of the training/validation loss and accuracies through training can be found in eval/qa_id, where id is the id of the task at hand. As an example, here is the plot for the graph of Task 1 Single Supporting Fact's training: ! alt text Components Entity Networks consist of three separate components: 1) An Input Encoder, that takes the input sequence at a given time step, and encodes it into a fixed size vector representation 2) The Dynamic Memory (the core of the model), that keeps a disparate set of memory cells, each with a different vector key (the location), and a hidden state memory (the content) 3) The Output Module, that takes the hidden states, and applies a series of transformations to generate the output . A breakdown of the components are as follows: Input Encoder : Takes the input from the environment (i.e. a sentence from a story), and maps it to a fixed size state vector . This repository (like the paper) utilizes a learned multiplicative mask, where each embedding of the sentence is multiplied element wise with a mask vector and then summed together. Alternatively, one could just as easily imagine an LSTM or CNN encoder to generate this initial input. Dynamic Memory : Core of the model, consists of a series of key vectors and memory (hidden state) vectors . The keys and state vectors function similarly to how the program keys and program embeddings function in the NPI/NTM the keys represent location, while the memories are content. Only the content (memories) get updated at inference time, with the influx of new information. Furthermore, one can seed and fix the key vectors such that they reflect certain words/entities > the paper does this by fixing key vectors to certain word embeddings, and using a simple BoW state encoding. This repository currently only supports random key vector seeds. The Dynamic Memory updates given an input are as follows this is very similar to the GRU update equations: + Gating function, determines how much memory j should be affected by the given input. + New state update U, V, W are model parameters that are shared across all memory cells . Model can be simplified by constraining U, V, W to be zero, or identity. + Gated update, elementwise product of g with $\tilde{h}$. Dictates how much the given memory should be updated. Output Module : Model interface, takes in the memories and a query vector q, and transforms them into the required output. Functions like a 1 hop Memory Network (Sukhbaatar, Weston), building a weighting mechanism over each input, then combines and feeds them through some intermediate layers. The actual updates are as follows: + Normalizes states based on cosine similarity. + Weighted sum of hidden states + R, H are trainable model parameters. As long as you can build some sort of loss using y, then the entirety of the model is trainable via Backpropagation Through Time (BPTT). Repository Structure Directory is structured in the following way: + model/ Model definition code, including the definition of the Dynamic Memory Cell. + preprocessor/ Preprocessing code to load and vectorize the bAbI Tasks. + tasks/ Raw bAbI Task files. + run.py Core script for training and evaluating the Recurrent Entity Network. References Big shout out to Jim Fleming for his initial Tensorflow Implementation his Dynamic Memory Cell Implementation specifically made things a lot easier. Reference: Jim Fleming's EntNet Memory Cell",Question Answering,Question Answering 2386,Natural Language Processing,Natural Language Processing,Natural Language Processing,"End to End Memory Network ! (/Notebooks/figs/E2EMN.png) paper: Getting Started Prerequisites pytorch 0.4.1 argparse 1.1 numpy 1.14.3 matplotlib 2.2.2 Notebooks Model Link : : Tutorial(not very nice yet) link Test and Visualization Memories for each hop link Test Result > Please Check for 4 models for 20 tasks in test_result.md > pe: position encoding > te: temporal encoding adjacent weight sharing method and encoding with position encoding helped a lot to improve tasks accuracy. Alse, temporal encoding for story helped. Blog Link : simonjisu.github.io Demo Not ready yet",Question Answering,Question Answering 2438,Natural Language Processing,Natural Language Processing,Natural Language Processing,"coqa baselines We provide several baselines: conversational models, extractive reading comprehension models and their combined models for the CoQA challenge . See more details in the paper . We also provide instructions (codalab.md) on how to run pretrained models on Codalab our platform for evaluation on the test set. As we use the OpenNMT py library for all our seq2seq experiments, please use the following command to clone our repository. bash git clone recurse submodules git@github.com:stanfordnlp/coqa baselines.git This code repository was mostly written by Danqi Chen , built on top of the DrQA and OpenNMT py projects, with some help from Shayne Longpre and Siva Reddy . If you have any questions about this repository, please use Github Issues. Requirements torch> 0.4.0 torchtext 0.2.1 gensim pycorenlp Download Download the dataset: bash mkdir data wget P data wget P data Download pre trained word vectors: bash mkdir wordvecs wget P wordvecs unzip d wordvecs wordvecs/glove.42B.300d.zip wget P wordvecs unzip d wordvecs wordvecs/glove.840B.300d.zip Start a CoreNLP server bash mkdir lib wget P lib java mx4g cp lib/stanford corenlp 3.9.1.jar edu.stanford.nlp.pipeline.StanfordCoreNLPServer port 9000 timeout 15000 Conversational models Preprocessing Generate the input files for seq2seq models needs to start a CoreNLP server ( n_history can be changed to {0, 1, 2, ..} or 1): bash python scripts/gen_seq2seq_data.py data_file data/coqa train v1.0.json n_history 2 lower output_file data/seq2seq train h2 python scripts/gen_seq2seq_data.py data_file data/coqa dev v1.0.json n_history 2 lower output_file data/seq2seq dev h2 Preprocess the data and embeddings: bash python seq2seq/preprocess.py train_src data/seq2seq train h2 src.txt train_tgt data/seq2seq train h2 tgt.txt valid_src data/seq2seq dev h2 src.txt valid_tgt data/seq2seq dev h2 tgt.txt save_data data/seq2seq h2 lower dynamic_dict src_seq_length 10000 PYTHONPATH seq2seq python seq2seq/tools/embeddings_to_torch.py emb_file_enc wordvecs/glove.42B.300d.txt emb_file_dec wordvecs/glove.42B.300d.txt dict_file data/seq2seq h2.vocab.pt output_file data/seq2seq h2.embed Training Run a seq2seq (with attention) model: bash python seq2seq/train.py data data/seq2seq h2 save_model seq2seq_models/seq2seq word_vec_size 300 pre_word_vecs_enc data/seq2seq h2.embed.enc.pt pre_word_vecs_dec data/seq2seq h2.embed.dec.pt epochs 50 gpuid 0 seed 123 Run a seq2seq+copy model: bash python seq2seq/train.py data data/seq2seq h2 save_model seq2seq_models/seq2seq_copy copy_attn reuse_copy_attn word_vec_size 300 pre_word_vecs_enc data/seq2seq.embed.enc.pt pre_word_vecs_dec data/seq2seq.embed.dec.pt epochs 50 gpuid 0 seed 123 Testing bash python seq2seq/translate.py model seq2seq_models/seq2seq_copy_acc_65.49_ppl_4.71_e15.pt src data/seq2seq dev h2 src.txt output seq2seq_models/pred.txt replace_unk verbose gpu 0 python scripts/gen_seq2seq_output.py data_file data/coqa dev v1.0.json pred_file seq2seq_models/pred.txt output_file seq2seq_models/seq2seq_copy.prediction.json Reading comprehension models Preprocessing Generate the input files for the reading comprehension (extractive question answering) model needs to start a CoreNLP server: bash python scripts/gen_drqa_data.py data_file data/coqa train v1.0.json output_file coqa.train.json python scripts/gen_drqa_data.py data_file data/coqa dev v1.0.json output_file coqa.dev.json Training n_history can be changed to {0, 1, 2, ..} or 1. bash python rc/main.py trainset data/coqa.train.json devset data/coqa.dev.json n_history 2 dir rc_models embed_file wordvecs/glove.840B.300d.txt Testing bash python rc/main.py testset data/coqa.dev.json n_history 2 pretrained rc_models The pipeline model Preprocessing bash python scripts/gen_pipeline_data.py data_file data/coqa train v1.0.json output_file1 data/coqa.train.pipeline.json output_file2 data/seq2seq train pipeline python scripts/gen_pipeline_data.py data_file data/coqa dev v1.0.json output_file1 data/coqa.dev.pipeline.json output_file2 data/seq2seq dev pipeline python seq2seq/preprocess.py train_src data/seq2seq train pipeline src.txt train_tgt data/seq2seq train pipeline tgt.txt valid_src data/seq2seq dev pipeline src.txt valid_tgt data/seq2seq dev pipeline tgt.txt save_data data/seq2seq pipeline lower dynamic_dict src_seq_length 10000 PYTHONPATH seq2seq python seq2seq/tools/embeddings_to_torch.py emb_file_enc wordvecs/glove.42B.300d.txt emb_file_dec wordvecs/glove.42B.300d.txt dict_file data/seq2seq pipeline.vocab.pt output_file data/seq2seq pipeline.embed Training n_history can be changed to {0, 1, 2, ..} or 1. bash python rc/main.py trainset data/coqa.train.pipeline.json devset data/coqa.dev.pipeline.json n_history 2 dir pipeline_models embed_file wordvecs/glove.840B.300d.txt predict_raw_text n python seq2seq/train.py data data/seq2seq pipeline save_model pipeline_models/seq2seq_copy copy_attn reuse_copy_attn word_vec_size 300 pre_word_vecs_enc data/seq2seq pipeline.embed.enc.pt pre_word_vecs_dec data/seq2seq pipeline.embed.dec.pt epochs 50 gpuid 0 seed 123 Testing bash python rc/main.py testset data/coqa.dev.pipeline.json n_history 2 pretrained pipeline_models python scripts/gen_pipeline_for_seq2seq.py data_file data/coqa.dev.pipeline.json output_file pipeline_models/pipeline seq2seq src.txt pred_file pipeline_models/predictions.json python seq2seq/translate.py model pipeline_models/seq2seq_copy_acc_85.00_ppl_2.18_e16.pt src pipeline_models/pipeline seq2seq src.txt output pipeline_models/pred.txt replace_unk verbose gpu 0 python scripts/gen_seq2seq_output.py data_file data/coqa dev v1.0.json pred_file pipeline_models/pred.txt output_file pipeline_models/pipeline.prediction.json Results All the results are based on n_history 2 : Model Dev F1 Dev EM seq2seq 20.9 17.7 seq2seq_copy 45.2 38.0 DrQA 55.6 46.2 pipeline 65.0 54.9 Citation @article{reddy2018coqa, title {CoQA: A Conversational Question Answering Challenge}, author {Reddy, Siva and Chen, Danqi and Manning, Christopher D}, journal {arXiv preprint arXiv:1808.07042}, year {2018} } License MIT",Question Answering,Question Answering 2445,Natural Language Processing,Natural Language Processing,Natural Language Processing,Language_Modeling Language modeling with different models. Environment Nvidia K80 Nvidia docker Python 3 TensorFlow Keras Dataset Penn Tree bank (PTB) Train data size: 929589 > aer banknote berlitz calloway centrust cluett fromstein gitano guterman hydro quebec ipo kia memotec mlx nahb punts rake regatta rubens sim snack food ssangyong swapo ....... Valid data size: 73760 > consumers may want to move their telephones a little closer to the tv set watching abc 's monday night football can now ....... Test data size: 82430 > no it was n't black monday but while the new york stock exchange did ....... RNN baseline model ! RNN baseline Character aware model Character Aware Neural Language Models arxiv 1508.06615 AAAI 2016 ! character aware model Gated CNN model Language Modeling with Gated Convolutional Networks arxiv 1612.08083 Facebook AI Research ! Gated CNN model End to end memory networks model End To End Memory Networks arxiv 1503.08895 NIPS 2015 ! E2E MM model,Question Answering,Question Answering 2478,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Clever Commenter: Let's Try More Apps This repo contrains of the Clever Commenter: Let's Try More Apps project in Google AI ML Winter Camp. What Problem We solve Comments are one of the most important ways for App downloaders to understand this App. However, many newly released (online) Apps have few comments, which seriously affects the user's interest and enthusiasm of those apps. Therefore, in order to help App downloaders better understand the newly released Apps , we designed an automatic comment generator called Clever Commenter: Let's Try More Apps . What is Clever Commenter: Let's Try More Apps Clever Commenter: Let's Try More Apps is an interesting and powerful automatic comment generator. It consists of the following modules: Key words Extraction : This module uses the structure data of the app (such as Category , Age group , Price ) to find the most relevant apps based on Social Network theory instead of basic low order similarity. Then extracts the key words of the related apps as an alternative of the newly released App. Key words Based Review Generator : This module generates a review based on given key words. Key words are extracted by the first module or input from the App designers. Review Sentiment Transfer : This module transfer a negative review into a positive review, and vice versa. In this way, Clever Commenter: Let's Try More Apps can control the emotion of the generated reviews. Module1: Key words Extraction The model aims to find APP's most similar APPs based on Social Network theory instead of basic low level similarity, then extract these APP's keywords. 1. Dataset In our example, we use Google Play Store Apps Dataset as our source data. 2. keywords extraction model By run the follwing files, go to the keywords extraction folder, and you can get each APP's most similar APP's keywords. get_ppmi_matrix.py can calculate each existing APP's high level similarity , by the Soical Network theory Random Walk. loworder_similarity_to_highorder_similarity_model.py can train and predict APP's high level similarity with other existing APPs. change_ppmi_matrix_to_similar_app.py can get each APP's most similar APP's name. convert_orl_data_to_keyword_by_Category.py can get each category APPs' top non emotional keywords and emotional keywords. convert_orl_data_to_keyword_of_each_app_by_similar_app.py can get each APP's most similar APPs' top non emotional keywords and emotional keywords. Module2: Key words Based Review Generator The model aims to generate fluent and reasonable reviews based on the input keywords describing the product. 1. Data Preprocess Before running the review generator/preprocess.py, your should provide the following files in the data/source_data/ folder: XX.src1 is the file of the input keywords. XX.src2 is the file of the concepts extracted from ConceptNet . XX.tgt is the file of the output reviews. Run preprocess.py as following, and the preprocessed files are stored in the data/save_data/ folder. bash python3 preprocess.py load_data data/source_data/ save_data data/save_data/ 2. Train To train a model, go to the review generator folder and run the following command: bash python3 train.py gpus gpu_id config config.yaml log log_name 3. Test To test the well trained model, go to the review generator folder and run the following command: bash python3 predict.py gpus gpu_id config config.yaml restore checkpoint_path log log_name Module3: Review Sentiment Transfer The model learns to transfer a negative sentiment review into a positive one without any parallel data. 1. Data Preprocess After running the sentiment transfer/format_data.py, it can generate three files in the sentiment_transfer folder: train.0 , dev.0 , test.0 denotes the negative train/dev/test files train.1 , dev.1 , test.1 denotes the positive train/dev/test files 2. Train To train a model, go to the sentiment transfer folder and run the following command: bash python style_transfer.py train ../data/sentiment_transfer/train dev ../data/sentiment_transfer/dev output ../tmp/sentiment.dev vocab ../tmp/google.vocab model ../tmp/model 3. Test Test file has sentiment labels If the test file has sentiment labels, just run the following command: bash python style_transfer.py test ../data/sentiment_transfer/test output ../tmp/sentiment_transfer.test vocab ../tmp/google.vocab model ../tmp/model load_model true Test file doesn't have sentiment labels If the test file doesn't have sentiment labels, such as the generated reviews, just run the following model to train a binary sentiment classifier. And then load the trained model to detect which generated review is negative or positive. bash train python classifier.py train ../data/sentiment_transfer/train dev ../data/sentiment_transfer/dev vocab ../tmp/google.vocab model ../tmp/classifer model test python classifier.py test TEST_FILE_PATH output OUTPUT_FILE_PATH vocab ../tmp/google.vocab model ../tmp/model load_model true And then, run the follow code to get the transferred review: bash python style_transfer.py test OUTPUT_FILE_PATH output ../tmp/sentiment_transfer.test vocab ../tmp/google.vocab model ../tmp/model load_model true Web to show the demo 1. Run the server to allow the js. script. bash python3 m web/run_server.sh & 2. Visit web/demo.htm to watch the demo. Cite This code is based on the following paper: deep neural network for learning graph representations . Cao, Shaosheng . Thirtieth Aaai Conference on Artificial Intelligence AAAI Press, 2016. aaai.org Style Transfer from Non Parallel Text by Cross Alignment . Tianxiao Shen, Tao Lei, Regina Barzilay, and Tommi Jaakkola. NIPS 2017. arXiv End To End Memory Networks . Sainbayar Sukhbaatar, Arthur Szlam, Jason Weston, Rob Fergus. NIPS 2015. arXiv Dynamic Memory Networks for Visual and Textual Question Answering . Caiming Xiong, Stephen Merity, Richard Socher. 2017. arXiv Author Yue Sun (孙悦), WeiTu(涂威), FuliLuo (罗福莉)",Question Answering,Question Answering 2526,Natural Language Processing,Natural Language Processing,Natural Language Processing,"PyTorch implementation for Bi directional Attention Flow model Question answering system based on the paper This is a slightly modified version where its answer selection module use bilinear function, giving slight improvement in accuracy over the original model. Part of the code are from In Order to run, 1. Download SQuAD dataset and GloVe embeddings (850 MB, this will download files to $HOME/data ): chmod +x download.sh; ./download.sh 2. Preprocess SQuAD data python m squad.prepro Then place the processed data and unzipped GloVe embeddings into the data directory (by default it is ./data/squad) 3. Training To train, run the following command. python main.py To test, python main.py test 1 resume",Question Answering,Question Answering 2583,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Extending the BiDAF Model with No Answer This is the implementation used in Zero Shot Relation Extraction via Reading Comprehension paper1 (Levy et al., 2017). It is an extension of the BiDAF model paper2 by Seo et al. This file describes some basic use cases in the relation extraction setting. The original implementation's readme file is BiDAF_README.md fullreadme . Requirements Python (developed on 3.5.2. Issues have been reported with Python 2!) tensorflow (deep learning library, verified on r0.11) nltk (NLP tools, verified on 3.2.1) tqdm (progress bar, verified on 4.7.4) Scripts run_prep.sh calls an internal script ( zeroshot2squad.py ) that changes our tab delimited format to SQuAD's JSON format. It then performs any necessary preprocessing for the BiDAF model. run_train.sh runs the training procedure. run_test.sh runs the testing procedure, and yields an answer file in out/basic/ /test .json python analyze.py reads the test set and the model's answers, and returns the F1 score broken down by different factors. paper1 : paper2 : fullreadme : BiDAF_README.md",Question Answering,Question Answering 2612,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Doc2Vec and Annotated Lyrics: Are they Genius? ! (images/genius_header2.png) Using Gensim's Doc2Vec to Evaluate Crowdsourced Lyric Annotations on Genius.com by Taite Sandefer Capstone II Project for Galvanize Data Science Immersive, Week 8 Last Updated: 4/7/19 Table of Contents Introduction ( introduction) Background ( background) Hypothesis and Assumptions ( hypothesis and assumptions) Methodology ( methodology) Data Overview ( data overview) Exploratory Data Analysis ( exploratory data analysis) Engineered Features ( engineered features) Challenges ( challenges) Model Selection ( model selection) Text Preprocessing ( text preprocessing) Model Architecture ( model architecture) Training Corpus ( training corpus) Hyperparameter Tuning ( hyperparameter tuning) Performance Metrics ( performance metrics) Chosen Model ( chosen model) Specifications ( specifications) Model Assessment ( model assessment) Results and Interpretation ( results and interpretation) Discussion ( discussion) Acknowledgements ( acknowledgements) Citations ( citations) Introduction Background What is Genius? Genius , formerly Rap Genius, is a website where users can view and add annotations to lyrics that help explain their meaning and context. The primary goal of Genius is to explain lyrics and help make them more accessible to listeners. Generally, these are explanations regarding the semantic and/or cultural meanings behind lyrics, which can often cryptic and filled with linguistic subtleties that we wouldn't normally expect a computer to be able to pick up on. Problem Today, the Genius system still relies heavily on crowdsourced human work. When an annotation gets posted, it must be read and accepted by a higher ranking user in the community for it to stick on the public lyrics page. Costs Associated with Human Evaluation Time Human error Accepting bad annotations (FP) Rejecting good annotations (FN) If the moderators are busy, or uninterested, good annotations can go unreviewed and unposted. Additionally, a grumpy moderator might let poor annotations slip through, or choose to trash good annotations. If moderators do take the time to read through annotations, it's likely to take up a lot of their time. If it were possible to reliably automate this process, it could both save time and increase the accuracy of evaluation. So, what makes a good Genius annotation? According to Genius, For example, this is a good explanation of the double meaning behind this line from Frank Ocean's Pilot Jones : However, annotations can be anything that helps add to the experience of the music, which isn't limited to this sort of explanation. For example, verified artists can annotate their own lyrics, and often discuss how they were feeling the day they wrote the lines, rather than explaining the meaning behind them. Eminem does this a lot, actually. Here's an example of this from Eminem's Rap God : A Potential Solution: Learning a Word from its Context Doc2Vec is a neural network model that strives to learn how to best encode words, and documents, into vectors that represent their contextual orientation, based on the data it was exposed to in training. > Tell me who your friends are, and I'll tell you who you are The idea is that as you read lines of text, a latent context window traverses through the text and captures the aggregate meaning of the words within, while continues to shift and evolve as it moves along the text. Doc2Vec is designed to pick up on these subtle linguistic patterns, which is why it's likely better suited to this lyric annotation problem than other text encoding methods, like BoW/Tf idf. Instead of using a frequentistic measure of occurrence or co occurence, Doc2Vec attempts to measure association between words with a combination of both through pointwise mutual information. > “The meaning of a word can be inferred by the company it keeps Thanks to this exciting innovation in NLP, it might be possible to create an evaluation system that automatically accepts/rejects user submitted annotations based on the similarity of their Doc2Vec representations. Properties of Word/Doc2Vec Interestingly, Word2Vec vectors have been found to have arithmetic properties that resemble the functionality of word analogies when properly trained. For example, if you subtract the word vector for queen from the vector for king, you get approximately the same result as when you subtract the vector for woman from the vector for man, which roughly gives us the vector representation of male female meaning in context. Will this work similarly for Doc2Vec vectors when the semantic or contextual meanings between two documents are similar? Hypothesis and Assumptions Hypothesis The idea behind this project is that the DocVec representations of lyric segments and their corresponding annotations will be inherently more similar if they are good annotations, whereas those of bad annotations will not be similar. If this is true, then we can systematically use this metric to infer the DocVec representations of unseen annotations to determine whether they should be accepted or not. If the assumptions hold true, an appropriately trained Doc2Vec model will be able to infer vector representations of unseen lyrics and annotations that are more similar for good annotations than for bad annotations. Assumptions Distributional Hypothesis: Words that frequently occur near each other will have similar semantic meanings Good annotations are contextually similar to the lyric they describe. They attempt to explain the semantic meaning behind their respective lyrics, acting approximately as a prose translation of the lyrics. Compared to the lyrics they describe, annotations are more verbose and use natural language and are more explicitly clear. The vocabulary used by annotations will generally not be the same as the vocabulary used in the annotation itself, except with specific rare/slang words that are the object of discussion. When they are similar, this isn't as much about the quality of the annotation most annotations can, and do, repeat some of the exact verbage of the lyrics. What matters is that the words utilized that aren't identical in literal vocabulary ARE similar in their semantic/contextual meaning. AKA, they tend to have similar neighbors as each other. Methodology Obtain data from API Preprocess text Train Doc2Vec model on preprocessed text data Obtain inferred Doc2Vec representations of lyric annotation pairs Calculate similarity metric of pairs Convert select DocVec representation to 3 dimensions using t SNE Represent t SNE representations graphically Back to Top ( Table of Contents) Data Overview This data came from the Genius API and has been stored in both MongoDB and .csv files using requests and johnwmillr's LyricsGenius . This information came from scraping all annotations from the top 50 songs from the 20 most active artists on Genius. I pulled the text and other characteristic features for annotations and their corresponding lyric segments. Although I had originally planned to get about 17,000 observations, I ended up working with 3,573 lyric annotation pairs. Exploratory Data Analysis Top 12 Artists on Genius Artist Songs in Corpus Total Annotations 1 Drake 33 314 2 Eminem 43 426 3 Kendrick Lamar 35 350 4 Kanye West 35 341 5 The Weeknd 34 286 6 J. Cole 46 438 7 XXXTENTACION 38 254 8 Lil Wayne 14 139 9 Original Broadway Cast of Hamilton 46 457 10 JAY Z 18 180 11 Ariana Grande 41 273 12 Beyoncé 12 115 Engineered Features Votes per 100k viewers Character count for text Word count for text Cosine Similarity of annotation lyric pairs Challenges Unfortunately, the Genius API does not provide access to data on rejected annotations. Thus, we need to use other features to help us distinguish between good and bad annotations. It looks like Votes might not be a great metric for determinng whether an annotation is good or not! However, there are many studies in the past that have had promising experiences using Doc2Vec to predict whether a piece of text is similar to another piece of text. Researchers have often created datasets to do this by mixing up pairings between sentences and paragraphs, which they do or don't belong to, and compared the similarity of vectors from true matches and false matches. For testing, I decided to randomly assign lyric annotation pairs that were tied to music from different artists. Then, I wanted to examine whether there was a statistically significant difference between the true pairs and the mismatched pairs of lyrics and annotations. If the DocVecs were able to pick up on the relevant context patterns, I'd expect good / true pairs to be more similar than their bad / mismatched partners. Back to Top ( Table of Contents) Model Selection Text Preprocessing While other NLP techniques call for various transformations in the text data Pipeline, Doc2Vec typically performs better when the variation in the text is preserved. Therefore, text was lowercased and verse tags were removed from the data, but otherwise most of the text was kept as is. Apostrophes were removed from contraction words so that they would be treated as one word, and all other punctuation was treated as its own token as well. Model Architecture As an extension of Word2Vec, which was originally published in 2013, Doc2Vec has two architectural flavors. Distributed Bag of Words Pr(word surrounding words) Distributed Memory Pr(surrounding words word) Generally has been found to perform better, particularly with semantic tasks Training Corpus Transfer Learning Since there are only 3,500 data points, making the model vulnerable to overfitting, it'd be wise to consider using pretrained word vectors that have already been exposed to millions, or more, words. However, I was unable to find any pretrained models that were exposed to text comparable to these lyric and annotation pairs. The majority of pretrained word vectors have been trained on Wikipedia articles or similar text, which is fairly academic and less prone to poetic or naturalistic tendencies than these lyrics and annotations. I made the decision to use models trained only on this Genius annotation/lyric data because it would be misleading to use a model trained primarily on text that is so inherently different from the target data. Categories of Texts With two distinct categories of text that need to be encoded in comparable context window dimensions, which should my model be trained on? Trained 4 different training corpus variations for comparison: Lyrics only Annotations only Lyrics & Annotations Lyrics & Annotations (with distinguishing annotation/lyric tag) Hyperparameter Tuning Hyperparameter Description Default Value window number of words in context window 5 vector_size number of nodes in hidden layer 100 epochs number of iterations through the data 100 Performance Metrics Self Recognition Infer DocVectors for each training data point Find the most similar DocVector, based on training What percentage of the training dataset can the model accurately predict as its own best contextual match? Standard: 95% and above Each of my models were achieving self recognition for around 97 99.4% of the training data Except for the model that I trained on tagged annotations and lyrics, which achieved roughly 17% Comparison against best/worst pairs Back to Top ( Table of Contents) Chosen Model Specifications Gensim's Doc2Vec Model Distributed Memory architecture trained only on untagged lyric & annotations Corpus was lowercased, included punctuation, and not stemmed or lemmatized when tokenized vector_size 100 ( of neurons in hidden layer) window 5 100 epochs Model Assessment Hypothesis Test between True and Mismatched Pairs Using the cosine similarities calculated across annotation lyric pairs for true and mistmatched groups, hypothesis testing yielded interesting results! H0: The mean of the Cosine Similarity for false match pairs is equal to the mean of the Cosine Similarity for true match pairs Statistic Result t Stat 29.32 p val 1.1e 171 This p value is very close to zero, allowing us to reject the null hypothesis at the 99% confidence level, given the observed data and that other assumptions hold true. Therefore, this evidence suggests that using a Doc2Vec model to infer vector representations of annotations and lyrics could be effective in determining whether or not annotations are relevant to the lyrics they're describing. It's important to note that this particular hypothesis test is done to determine whether the means of the cosine similarities for the true match and the false match pairs are statistically different from each other, when doc2vec DocVectors are obtained from a model trained on the preferred specifications (outlined above). This shows that it is possible obtain the results we hoped to find, using Doc2Vec, but it doesn't necessarily prove that they are reliable or that it is truly measuring the difference in meaning across these lyric annotation pairs. It merely shows that there is more work to be done! Results and Interpretation Visualization: Reducing Dimensionality with t SNE Back to Top ( Table of Contents) Discussion Although we did find a Doc2Vec model that produced interesting results in two tailed hypothesis testing, it's important to note that the difference between these distributions was not strong enough to distinguish true and false pairs on its own. This shows us that this metric may be helpful, but it's not enough on its own to determine whether we should accept an annotation. We might be able to increase the complexity of these representations, which could help make their representations more distinct from each other, by increasing the number of nodes in the hidden layer of the Doc2Vec network, but this would require a lot more data and leave the model more vulnerable to overfitting. Until then, it'd be necessary to include other metrics in our final model to help distinguish between good and bad annotation groups. While I did find a model that produced interesting results, more data is necessary to produce reliable results. Most research published recommends having a dataset in the millions, not few thousand, so that the model will not be subject to overfitting. It also seems like my assumption about the nature of Genius annotations does not hold as well as I expected. Many of the annotations I observed during this project were off topic , at least in my opinion, and tended to link lyrics to indirect outside cultural events, rather than focusing on explaining their meaning. Future Work We may have found evidence supporting the claim that Doc2Vec can help distinguish between good and bad annotation pairs, but more research is needed with more data. The investigation of this project has revealed that this dataset might not satisfy the necessary assumptions, but that we still have something to learn from lyric annotation Doc2Vecs that could be useful in creating an automatic evaluation system for annotations. Back to Top ( Table of Contents) Acknowledgements Genius.com DSI instructors: Frank Burkholder, Danny Lumian, Kayla Thomas Cohort Peers working with NLP: Matt Devor, Aidan Jared, Lei Shan johnwmillr's LyricsGenius Gensim's Doc2Vec model Robert Meyer's Presentation from PyData's 2017 Berlin Conference Andy Jones' blog on the mechanics of Word2Vec Citations Word2Vec Paper Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems (pp. 3111 3119). Random Walks on Context Spaces Paper Arora, S., Li, Y., Liang, Y., Ma, T., & Risteski, A. (2015). Rand walk: A latent variable model approach to word embeddings. arXiv preprint arXiv:1502.03520. Doc2Vec Papers Le, Q., & Mikolov, T. (2014, January). Distributed representations of sentences and documents. In International conference on machine learning (pp. 1188 1196). Lau, J. H., & Baldwin, T. (2016). An empirical evaluation of doc2vec with practical insights into document embedding generation. arXiv preprint arXiv:1607.05368. Back to Top ( Table of Contents)",Question Answering,Question Answering 2627,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Jack the Reader Wercker build badge wercker_badge wercker codecov Gitter license A Machine Reading Comprehension framework. All work and no play makes Jack a great frame work ! All work and no play makes Jack a great frame work ! All work and no play makes Jack a great frame work ! wercker_badge : wercker : heres_johnny : Jack the Reader or jack , for short is a framework for building and using models on a variety of tasks that require reading comprehension . For more informations about the overall architecture, we refer to Jack the Reader – A Machine Reading Framework (ACL 2018). Installation To install Jack, install requirements and TensorFlow . In case you want to use PyTorch for writing models, please install PyTorch as well. Supported ML Backends We currently support TensorFlow and PyTorch . Readers can be implemented using both. Input and output modules (i.e., pre and post processing) are independent of the ML backend and can thus be reused for model modules that either backend. Though most models are implemented in TensorFlow by reusing the cumbersome pre and post processing it is easy to quickly build new readers in PyTorch as well. Pre trained Models Find pre trained models here . Code Structure jack.core core abstractions used jack.readers implementations of models jack.eval task evaluation code jack.util utility code that is used throughout the framework, including shared ML code jack.io IO related code, including loading and dataset conversion scripts Projects Integration of Knowledge into neural NLU systems (/projects/knowledge_integration) Quickstart Coding Tutorials Notebooks & CLI We provide ipython notebooks with tutorials on Jack. For the quickest start, you can begin here quickstart . If you're interested in training a model yourself from code, see this tutorial model_training (we recommend the command line, see below), and if you'd like to implement a new model yourself, this notebook implementation gives you a tutorial that explains this process in more detail. There is documentation on our command line interface cli for actually training and evaluating models . For a high level explanation of the ideas and vision, see Understanding Jack the Reader understanding . quickstart : notebooks/quick_start.ipynb model_training : notebooks/model_training.ipynb implementation : notebooks/model_implementation.ipynb install : docs/How_to_install_and_run.md api : notebooks : notebooks/ understanding : docs/Understanding_Jack_the_Reader.md cli : docs/CLI.md Command line Training and Usage of a QA System To illustrate how jack works, we will show how to train a question answering model using our command line interface cli which is analoguous for other tasks (browse conf/ (./conf/) for existing task dataset configurations). It is probably best to setup a virtual environment to avoid clashes with system wide python library versions. First, install the framework: bash $ python3 m pip install e . tf Then, download the SQuAD dataset, and the GloVe word embeddings: bash $ ./data/SQuAD/download.sh $ ./data/GloVe/download.sh Train a FastQA fastqa model: bash $ python3 bin/jack train.py with train 'data/SQuAD/train v1.1.json' dev 'data/SQuAD/dev v1.1.json' reader 'fastqa_reader' \ > repr_dim 300 dropout 0.5 batch_size 64 seed 1337 loader 'squad' save_dir './fastqa_reader' epochs 20 \ > with_char_embeddings True embedding_format 'memory_map_dir' embedding_file 'data/GloVe/glove.840B.300d.memory_map_dir' vocab_from_embeddings True or shorter, using our prepared config: bash $ python3 bin/jack train.py with config './conf/qa/squad/fastqa.yaml' A copy of the model is written into the save_dir directory after each training epoch when performance improves. These can be loaded using the commands below or see e.g. quickstart . You want to train another model? No problem, we have a fairly modular QAModel implementation which allows you to stick together your own model. There are examples in conf/qa/squad/ (e.g., bidaf.yaml or our own creation jack_qa.yaml ). These models are defined solely in the configs, i.e., there is not implementation in code. This is possible through our ModularQAModel . If all of that is too cumbersome for you and you just want to play, why not downloading a pretrained model: bash $ we still need GloVe in memory mapped format, ignore the next 2 commands if already downloaded and transformed $ data/GloVe/download.sh $ wget O fastqa.zip $ unzip fastqa.zip && mv fastqa fastqa_reader python from jack import readers from jack.core import QASetting fastqa_reader readers.reader_from_file( ./fastqa_reader ) support It is a replica of the grotto at Lourdes, France where the Virgin Mary reputedly appeared to Saint Bernadette Soubirous in 1858. At the end of the main drive (and in a direct line that connects through 3 statues and the Gold Dome), is a simple, modern stone statue of Mary. answers fastqa_reader( QASetting( question To whom did the Virgin Mary allegedly appear in 1858 in Lourdes France? , support support ) ) print(answers 0 0 .text) fastqa : tf_summaries : quick_start : notebooks/quick_start.ipynb cli : docs/CLI.md Support We are thankful for support from: Developer guidelines Comply with the PEP 8 Style Guide pep8 Make sure all your code runs from the top level directory, e.g.: shell $ pwd /home/pasquale/workspace/jack $ python3 bin/jack train.py .. pep8 : Citing @InProceedings{weissenborn2018jack, author {Dirk Weissenborn, Pasquale Minervini, Tim Dettmers, Isabelle Augenstein, Johannes Welbl, Tim Rocktäschel, Matko Bošnjak, Jeff Mitchell, Thomas Demeester, Pontus Stenetorp, Sebastian Riedel}, title {{Jack the Reader – A Machine Reading Framework}}, booktitle {{Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL) System Demonstrations}}, Month {July}, year {2018}, url { }",Question Answering,Question Answering 2645,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Query Reduction Networks (QRN) ! Teaser figure for QRN (assets/teaser.png) QRN qrn is a purely sequential model like LSTM or GRU (but simpler than them) for story based question answering ( bAbI QA tasks babi ). QRN qrn is implemented using TensorFlow tensorflow . Here are some notable results (error rates in %) on bAbI QA dataset: Task LSTM lstm MemN2N memn2n Ours : : : : : : 1k avg 51.3 15.2 9.9 10k avg 36.4 4.2 0.3 See model details and more results in this paper qrn . 1. Quick Start We are assuming you are working in a Linux environment. Make sure that you have Python (verified on 3.5, issues have been reported with 2.x), and you installed these Python packages: tensorflow (> 0.8, 0.12) and progressbar2 . First, download bAbI QA dataset (note that this downloads the dataset to $HOME/data/babi ): bash chmod +x download.sh; ./download.sh Then preprocess the data for a particular task, say Task 2 (this stores the preprocessed data in data/babi/en/02/ ): bash python m prepro task 2 Finally, you train the model (test is automatically performed at the end): bash python m babi.main noload task 2 It took 3 minutes on my laptop using CPU. You can run it several times with new weight initialization (e.g. 10) and report the test result with the lowest dev loss: bash python m babi.main noload task 2 num_trials 10 This is critical to stably get the reported results; some weight initialization leads to a bad optima. 2. Visualizing Results After training and testing, the result is stored in evals/babi/en/02 None 00 01/test_0150.json . We can visualize the magnitudes of the update and reset gates using the result file. Note that you need jinja2 (Python package). Run the following command to host a web server for visualization and open it via browser: bash python m babi.visualize_result task 2 open True then click the file(s). It takes a a few seconds to load the heatmap coloring of the gate values. You will see something like this: ! visualization (assets/vis.png) By default visualize_result retrieves the first trial (1). If you want to retrieve a particular trial number, specify the trial number if trial_num option. 3. 10k and Other Options To train the model on 10k dataset, first preprocess the data with large flag: bash python m prepro task 2 large True Then train the model with large flag as well: bash python m babi.main noload task 2 large True batch_size 128 init_lr 0.1 wd 0.0005 hidden_size 200 Note that the batch size, init_lr, wd, and hidden_size changed. Finally, visualization requires the large flag: bash python m babi.visualize_result task 2 open True large True To control other parameters and see other options, type: bash python m babi.main h 4. Run bAbI dialog To train the model on bAbI dialog, preprocess the data with bAbI dialog dataset: bash python m prepro dialog task 2 Then train the model: bash python m dialog.main noload task 2 To use match, use_match flag is required: bash python m dialog.main noload task 2 use_match True To use RNN decoder, use_rnn flag is required: bash python m dialog.main noload task 2 use_rnn True qrn : babi : lstm : memn2n : tensorflow :",Question Answering,Question Answering 2655,Natural Language Processing,Natural Language Processing,Natural Language Processing,"NIPS2018_DECAPROP Implementation of Densely Connected Attention Propagation for Reading Comprehension (NIPS 2018) Yi Tay, Luu Anh Tuan, Siu Cheung Hui, Jian Su. This model achieves quite competitive performance on four RC benchmarks (SearchQA, NewsQA, Quasar and NarrativeQA). Model Code The general idea here is that ./model/span_model.py contains the main span model and ./model/decaprop.py contains the DecaProp implementation. Bidirectional Attention Connectors (BAC) implementation is found at ./tylib/lib/att_op.py . python from tylib.lib.att_op import bidirectional_attention_connector c and q are sequences of bsz x seq_len x dim. seq_len may be different the output ff is the propagated feature. c, q, ff bidirectional_attention_connector( c, q, c_len, q_len, None, None, mask_a cmask, mask_b qmask, initializer self.init, factor 32, factor2 32, name 'bac') Prep Scripts You may find them at ./prep/ where datasets such as Squad, NewsQA, SearchQA and Quasar are found. Many of our pre processing scripts reference Open domain QA dataset preprocessing were obtained from (reinforced reader ranker codebase by Wang et al.) Please make a directory named ./corpus/ (for hosting raw datasets) and ./datasets/ for hosting prep ed files. The key idea is that we prep the dataset into an env.gz file for training/evaluation. Notes and Disclaimer Most of the relevant code have been uploaded to this repository. I currently do not have the GPU resources to re validate this repository. Assuming I didn't accidentally omit any code (while copying from my main repository and removing irrelevant/WIP code), this repository should run fine (the entry point is train_span.py , more running notes will be added when I have time). The arguments in the argparser do not represent the optimal hyperparameters (from the time of NIPS'18 experiments, many other experiments were conducted, which may have changed the default hyperparameters). However, just couple of weeks ago I managed to get similar scores for searchqa/quasar. Another useful note is that i use a language based compositional control for model architecture, using if statements and keyword to control which the graph construction. This is controlled by rnn_type in argparser. Also note that due to some tensorflow version upgrade issues, the cudnn CoVe LSTM is not working for the time being. References If you find our repository useful, please cite our paper: @article{DBLP:journals/corr/abs 1811 04210, author {Yi Tay and Luu Anh Tuan and Siu Cheung Hui and Jian Su}, title {Densely Connected Attention Propagation for Reading Comprehension}, journal {CoRR}, volume {abs/1811.04210}, year {2018}, url { archivePrefix {arXiv}, eprint {1811.04210}, timestamp {Fri, 23 Nov 2018 12:43:51 +0100}, biburl { bibsource {dblp computer science bibliography, } Acknowledgements Several useful code bases we used in our work: 1. (for cudnn RNNs and base R NET model) 2. (thanks for the evaluators and preprocessors which were useful!) 3. (For preprocessing of searchqa and quasar!)",Question Answering,Question Answering 2656,Natural Language Processing,Natural Language Processing,Natural Language Processing,"HyperQA Code for WSDM 2018 Paper Hyperbolic Representation Learning for Fast and Efficient Neural Question Answering Yi Tay, Anh Tuan Luu, Siu Cheung Hui Proceedings of WSDM 2018. This repository contains a reference implementation of HyperQA (which is merely copied from the main experiment repository I have). It is helpful for some details that I had no space to report in the actual paper. I had to strip away some components from other models and also copy paste from my own library. Therefore, some dependencies may be missing for now. A running version with Preprocessing or training scripts will be uploaded when I have time. Coming Soon. Cleanin in Progress.. Reference Please cite the WSDM version when it is out in the proceedings. @article{DBLP:journals/corr/TayLH17a, author {Yi Tay and Anh Tuan Luu and Siu Cheung Hui}, title {Enabling Efficient Question Answer Retrieval via Hyperbolic Neural Networks}, journal {CoRR}, volume {abs/1707.07847}, year {2017}, url { archivePrefix {arXiv}, eprint {1707.07847}, timestamp {Sat, 05 Aug 2017 14:56:20 +0200}, biburl { bibsource {dblp computer science bibliography, }",Question Answering,Question Answering 2674,Natural Language Processing,Natural Language Processing,Natural Language Processing,"End To End Memory Networks for Question Answering This is an implementation of MemN2N model in Python for the bAbI question answering tasks as shown in the Section 4 of the paper End To End Memory Networks . It is based on Facebook's Matlab code . ! Web based Demo Requirements Python 2.7 Numpy, Flask (only for web based demo) can be installed via pip: $ sudo pip install r requirements.txt bAbI dataset should be downloaded to data/tasks_1 20_v1 2 : $ wget qO tar xvz C data Usage To run on a single task, use babi_runner.py with t followed by task's id. For example, python babi_runner.py t 1 The output will look like: Using data from data/tasks_1 20_v1 2/en Train and test for task 1 ... 1 train error: 0.876116 val error: 0.75 71% 0.5s To run on 20 tasks: python babi_runner.py a To train using all training data from 20 tasks, use the joint mode: python babi_runner.py j Question Answering Demo In order to run the Web based demo using the pretrained model memn2n_model.pklz in trained_model/ , run: python m demo.qa Alternatively, you can try the console based demo: python m demo.qa console The pretrained model memn2n_model.pklz can be created by running: python m demo.qa train To show all options, run python m demo.qa h Benchmarks See the results here . Author Vinh Khuc Future Plans Port to TensorFlow/Keras Support Python 3 References Sainbayar Sukhbaatar, Arthur Szlam, Jason Weston, Rob Fergus, End To End Memory Networks , arXiv:1503.08895 cs.NE .",Question Answering,Question Answering 2675,Natural Language Processing,Natural Language Processing,Natural Language Processing,"bert benchmarks Google's Bert algorithm study and real life applications for Deep Learning project In this project we will be studying, analyzing and exploring the recently launched Bidirectional Encoder Representations from Transformers (BERT). It was launched by Google AI in Oct 2018 in the following paper The original repository of Google Research can be found at : We will also inspire our work and research in the pytorch implementation of BERT made by huggingface : In a first moment we have devoted all our efforts to better understand how it works and why it is considered a breakthrough in the NLP reasearch field. Some remarkable material can be found in the internet including the following ones : Bert paper: In a second step we will be fine tuning, adapting and applying existing models to real life applications as Kaggles competitions and the SQuAD Dataset. Some of the competitions we have applied and/or are intending to apply are: Use Case Link Our results : : : : : : Kaggle: Quora Insincere Questions Classification F Score 0.70240 / Acc: 0.96 Kaggle: WSDM Fake News detection 0.86535 Kaggle: Toxic Comment Classification Challenge Glove Comparation Mean column wise Area under ROC Curve // Glove: 0.97718 Vs Bert: 0.97922 French transformation of word vectors Extracted Word vectors but did not implement a specific task Instructions to run the code: Quora Insincere Questions Classification To finetune BERT for this competition please download the dataset from the Kaggle competition and put it in a folder called input inside the directory containing the bert classification.ipynb script. You should have installed pandas, numpy, sklearn, tensorflow, zipfile, matplotlib and tqdm. It took 48 hours to train 3 epochs in an Azure VM: Standard NC24 (24 vcpus, 224 GB memory) Fake News Detection To finetune BERT for this competition please download the dataset from the Kaggle competition and put it in a folder called input inside the directory containing the train.py script. You should have installed pandas, numpy, sklearn, pytorch and pytorch pretrained bert. train.py script based on this Kernel. It took 7 hours to train 3 epochs in an Azure VM: Standard NC24 (24 vcpus, 224 GB memory) Toxic Comments To finetune BERT for this competition please download the dataset from the Kaggle competition . Create a input folder, download this dataset in that folder so at the end all the .csv data will be at 'input/jigsaw toxic comment classification challenge/' inside the directory containing the train_model_final.py script. You should also create a directory models inside toxicComments where you will download the model data here and extract it, creating the path model/uncased_L 12_H 768_A 12 . You should have installed keras, pandas, numpy, sklearn, pytorch and pytorch pretrained bert. train_model_final.py script comparing to GloVe performance at this Kernel. Run all the pipeline script with python train_model_final.py directly. It took 5 hours to train 2 epochs in an Azure VM: Standard NC24 (24 vcpus, 224 GB memory)",Question Answering,Question Answering 2700,Natural Language Processing,Natural Language Processing,Natural Language Processing,PyTorch Implementation of Dynamic Coattention Network Implementation of the paper Dynamic Coattention Network Improvement ideas Use layer normalization Parts of the architecture explained in brief Encoder Encodes both question and context Coattention Network Combines attention of question with context Highway Maxout Network Determines possible start and end points Dynamic Decoding Determines start and end points What do the py files do config.py contains all the configuration baseline.py contains a baseline architecture based on tfidf and cosine distance vanillaQA.py contains baseline neural network architecture that might possibly work squad.py contains data parser for Squad Dataset setup.py you need to run this after installing requirements to download data for nltk networks package has all of the networks in separate class for testing purpose,Question Answering,Question Answering 2712,Natural Language Processing,Natural Language Processing,Natural Language Processing,"seq2seq keyphrase bert The original code is from which is used to do keyphrase generation using seq2seq with attention model. Recently BERT is very popular for many NLP taks, so I add BERT to the encoder part of the seq2seq model. I add a new model Seq2SeqBERT , which uses BERT for encoder and uses GRU for decoder. Specifically, I change some code in preprocess.py so that it preprocesses data using the tokenizer from BERT, and I add new model in pykp/model.py, relatively I change the beam_search methods in beam_search.py, and there are also some changes in pykp/io.py, train.py, evaluate.py. But right now, the result is not good, I am still researching it. Here is the experiment report Welcome to give me some advice where I did wrong. You can use train.py to train seq2seq model. To use BERT, just set the encoder_type to 'bert', and it will initialize with Seq2SeqBERT . The encoder details are also in Seq2SeqBERT model in pykp/model.py.",Question Answering,Question Answering 2714,Natural Language Processing,Natural Language Processing,Natural Language Processing,"bilm tf Tensorflow implementation of the pretrained biLM used to compute ELMo representations from Deep contextualized word representations . This repository supports both training biLMs and using pre trained models for prediction. We also have a pytorch implementation available in AllenNLP . You may also find it easier to use the version provided in Tensorflow Hub if you just like to make predictions. Citation: @inproceedings{Peters:2018, author {Peters, Matthew E. and Neumann, Mark and Iyyer, Mohit and Gardner, Matt and Clark, Christopher and Lee, Kenton and Zettlemoyer, Luke}, title {Deep contextualized word representations}, booktitle {Proc. of NAACL}, year {2018} } Installing Install python version 3.5 or later, tensorflow version 1.2 and h5py: pip install tensorflow gpu 1.2 h5py python setup.py install Ensure the tests pass in your environment by running: python m unittest discover tests/ Installing with Docker To run the image, you must use nvidia docker, because this repository requires GPUs. sudo nvidia docker run t allennlp/bilm tf:training gpu Using pre trained models We have several different English language pre trained biLMs available for use. Each model is specified with two separate files, a JSON formatted options file with hyperparameters and a hdf5 formatted file with the model weights. Links to the pre trained models are available here . There are three ways to integrate ELMo representations into a downstream task, depending on your use case. 1. Compute representations on the fly from raw text using character input. This is the most general method and will handle any input text. It is also the most computationally expensive. 2. Precompute and cache the context independent token representations, then compute context dependent representations using the biLSTMs for input data. This method is less computationally expensive then 1, but is only applicable with a fixed, prescribed vocabulary. 3. Precompute the representations for your entire dataset and save to a file. We have used all of these methods in the past for various use cases. 1 is necessary for evaluating at test time on unseen data (e.g. public SQuAD leaderboard). 2 is a good compromise for large datasets where the size of the file in 3 is unfeasible (SNLI, SQuAD). 3 is a good choice for smaller datasets or in cases where you'd like to use ELMo in other frameworks. In all cases, the process roughly follows the same steps. First, create a Batcher (or TokenBatcher for 2) to translate tokenized strings to numpy arrays of character (or token) ids. Then, load the pretrained ELMo model (class BidirectionalLanguageModel ). Finally, for steps 1 and 2 use weight_layers to compute the final ELMo representations. For 3, use BidirectionalLanguageModel to write all the intermediate layers to a file. Shape conventions Each tokenized sentence is a list of str , with a batch of sentences a list of tokenized sentences ( List List str ). The Batcher packs these into a shape (n_sentences, max_sentence_length + 2, 50) numpy array of character ids, padding on the right with 0 ids for sentences less then the maximum length. The first and last tokens for each sentence are special begin and end of sentence ids added by the Batcher . The input character id placeholder can be dimensioned (None, None, 50) , with both the batch dimension (axis 0) and time dimension (axis 1) determined for each batch, up the the maximum batch size specified in the BidirectionalLanguageModel constructor. After running inference with the batch, the return biLM embeddings are a numpy array with shape (n_sentences, 3, max_sentence_length, 1024) , after removing the special begin/end tokens. Vocabulary file The Batcher takes a vocabulary file as input for efficency. This is a text file, with one token per line, separated by newlines ( \n ). Each token in the vocabulary is cached as the appropriate 50 character id sequence once. Since the model is completely character based, tokens not in the vocabulary file are handled appropriately at run time, with a slight decrease in run time. It is recommended to always include the special and tokens (case sensitive) in the vocabulary file. ELMo with character input See usage_character.py for a detailed usage example. ELMo with pre computed and cached context independent token representations To speed up model inference with a fixed, specified vocabulary, it is possible to pre compute the context independent token representations, write them to a file, and re use them for inference. Note that we don't support falling back to character inputs for out of vocabulary words, so this should only be used when the biLM is used to compute embeddings for input with a fixed, defined vocabulary. To use this option: 1. First create a vocabulary file with all of the unique tokens in your dataset and add the special and tokens. 2. Run dump_token_embeddings with the full model to write the token embeddings to a hdf5 file. 3. Use TokenBatcher (instead of Batcher ) with your vocabulary file, and pass use_token_inputs False and the name of the output file from step 2 to the BidirectonalLanguageModel constructor. See usage_token.py for a detailed usage example. Dumping biLM embeddings for an entire dataset to a single file. To take this option, create a text file with your tokenized dataset. Each line is one tokenized sentence (whitespace separated). Then use dump_bilm_embeddings . The output file is hdf5 format. Each sentence in the input data is stored as a dataset with key str(sentence_id) where sentence_id is the line number in the dataset file (indexed from 0). The embeddings for each sentence are a shape (3, n_tokens, 1024) array. See usage_cached.py for a detailed example. Training a biLM on a new corpus Broadly speaking, the process to train and use a new biLM is: 1. Prepare input data and a vocabulary file. 2. Train the biLM. 3. Test (compute the perplexity of) the biLM on heldout data. 4. Write out the weights from the trained biLM to a hdf5 file. 5. See the instructions above for using the output from Step 4 in downstream models. 1. Prepare input data and a vocabulary file. To train and evaluate a biLM, you need to provide: a vocabulary file a set of training files a set of heldout files The vocabulary file is a a text file with one token per line. It must also include the special tokens , and (case sensitive) in the file. IMPORTANT : the vocabulary file should be sorted in descending order by token count in your training data. The first three lines should be the special tokens ( , and ), then the most common token in the training data, ending with the least common token. NOTE : the vocabulary file used in training may differ from the one use for prediction. The training data should be randomly split into many training files, each containing one slice of the data. Each file contains pre tokenized and white space separated text, one sentence per line. Don't include the or tokens in your training data. All tokenization/normalization is done before training a model, so both the vocabulary file and training files should include normalized tokens. As the default settings use a fully character based token representation, in general we do not recommend any normalization other then tokenization. Finally, reserve a small amount of the training data as heldout data for evaluating the trained biLM. 2. Train the biLM. The hyperparameters used to train the ELMo model can be found in bin/train_elmo.py . The ELMo model was trained on 3 GPUs. To train a new model with the same hyperparameters, first download the training data from the 1 Billion Word Benchmark . Then download the vocabulary file . Finally, run: export CUDA_VISIBLE_DEVICES 0,1,2 python bin/train_elmo.py \ train_prefix '/path/to/1 billion word language modeling benchmark r13output/training monolingual.tokenized.shuffled/ ' \ vocab_file /path/to/vocab 2016 09 10.txt \ save_dir /output_path/to/checkpoint 3. Evaluate the trained model. Use bin/run_test.py to evaluate a trained model, e.g. export CUDA_VISIBLE_DEVICES 0 python bin/run_test.py \ test_prefix '/path/to/1 billion word language modeling benchmark r13output/heldout monolingual.tokenized.shuffled/news.en.heldout 000 ' \ vocab_file /path/to/vocab 2016 09 10.txt \ save_dir /output_path/to/checkpoint 4. Convert the tensorflow checkpoint to hdf5 for prediction with bilm or allennlp . First, create an options.json file for the newly trained model. To do so, follow the template in an existing file (e.g. the original options.json and modify for your hyperpararameters. Important : always set n_characters to 262 after training (see below). Then Run: python bin/dump_weights.py \ save_dir /output_path/to/checkpoint outfile /output_path/to/weights.hdf5 Frequently asked questions and other warnings Can you provide the tensorflow checkpoint from training? The tensorflow checkpoint is available by downloading these files: vocabulary checkpoint options 1 2 3 How to do fine tune a model on additional unlabeled data? First download the checkpoint files above. Then prepare the dataset as described in the section Training a biLM on a new corpus , with the exception that we will use the existing vocabulary file instead of creating a new one. Finally, use the script bin/restart.py to restart training with the existing checkpoint on the new dataset. For small datasets (e.g. , and . You can find our vocabulary file here . At the model input, all text used the full character based representation, including tokens outside the vocab. For the softmax output we replaced OOV tokens with . The model was trained with a fixed size window of 20 tokens. The batches were constructed by padding sentences with and , then packing tokens from one or more sentences into each row to fill completely fill each batch. Partial sentences and the LSTM states were carried over from batch to batch so that the language model could use information across batches for context, but backpropogation was broken at each batch boundary. Why do I get slightly different embeddings if I run the same text through the pre trained model twice? As a result of the training method (see above), the LSTMs are stateful, and carry their state forward from batch to batch. Consequently, this introduces a small amount of non determinism, expecially for the first two batches. Why does training seem to take forever even with my small dataset? The number of gradient updates during training is determined by: the number of tokens in the training data ( n_train_tokens ) the batch size ( batch_size ) the number of epochs ( n_epochs ) Be sure to set these values for your particular dataset in bin/train_elmo.py . What's the deal with n_characters and padding? During training, we fill each batch to exactly 20 tokens by adding and to each sentence, then packing tokens from one or more sentences into each row to fill completely fill each batch. As a result, we do not allocate space for a special padding token. The UnicodeCharsVocabulary that converts token strings to lists of character ids always uses a fixed number of character embeddings of n_characters 261 , so always set n_characters 261 during training. However, for prediction, we ensure each sentence is fully contained in a single batch, and as a result pad sentences of different lengths with a special padding id. This occurs in the Batcher see here . As a result, set n_characters 262 during prediction in the options.json . How can I use ELMo to compute sentence representations? Simple methods like average and max pooling of the word level ELMo representations across sentences works well, often outperforming supervised methods on benchmark datasets. See Evaluation of sentence embeddings in downstream and linguistic probing tasks , Perone et al, 2018 arxiv link .",Question Answering,Question Answering 2773,Natural Language Processing,Natural Language Processing,Natural Language Processing,"PyTorch Pretrained Bert CircleCI This repository contains an op for op PyTorch reimplementation of Google's TensorFlow repository for the BERT model that was released together with the paper BERT: Pre training of Deep Bidirectional Transformers for Language Understanding by Jacob Devlin, Ming Wei Chang, Kenton Lee and Kristina Toutanova. This implementation is provided with Google's pre trained models , examples, notebooks and a command line interface to load any pre trained TensorFlow checkpoint for BERT is also provided. Content Section Description Installation ( installation) How to install the package Overview ( overview) Overview of the package Usage ( usage) Quickstart examples Doc ( doc) Detailed documentation Examples ( examples) Detailed examples on how to fine tune Bert Notebooks ( notebooks) Introduction on the provided Jupyter Notebooks TPU ( tpu) Notes on TPU support and pretraining scripts Command line interface ( Command line interface) Convert a TensorFlow checkpoint in a PyTorch dump Installation This repo was tested on Python 3.5+ and PyTorch 0.4.1/1.0.0 With pip PyTorch pretrained bert can be installed by pip as follows: bash pip install pytorch pretrained bert From source Clone the repository and run: bash pip install editable . A series of tests is included in the tests folder and can be run using pytest (install pytest if needed: pip install pytest ). You can run the tests with the command: bash python m pytest sv tests/ Overview This package comprises the following classes that can be imported in Python and are detailed in the Doc ( doc) section of this readme: Eight PyTorch models ( torch.nn.Module ) for Bert with pre trained weights (in the modeling.py (./pytorch_pretrained_bert/modeling.py) file): BertModel (./pytorch_pretrained_bert/modeling.py L537) raw BERT Transformer model ( fully pre trained ), BertForMaskedLM (./pytorch_pretrained_bert/modeling.py L691) BERT Transformer with the pre trained masked language modeling head on top ( fully pre trained ), BertForNextSentencePrediction (./pytorch_pretrained_bert/modeling.py L752) BERT Transformer with the pre trained next sentence prediction classifier on top ( fully pre trained ), BertForPreTraining (./pytorch_pretrained_bert/modeling.py L620) BERT Transformer with masked language modeling head and next sentence prediction classifier on top ( fully pre trained ), BertForSequenceClassification (./pytorch_pretrained_bert/modeling.py L814) BERT Transformer with a sequence classification head on top (BERT Transformer is pre trained , the sequence classification head is only initialized and has to be trained ), BertForMultipleChoice (./pytorch_pretrained_bert/modeling.py L880) BERT Transformer with a multiple choice head on top (used for task like Swag) (BERT Transformer is pre trained , the multiple choice classification head is only initialized and has to be trained ), BertForTokenClassification (./pytorch_pretrained_bert/modeling.py L949) BERT Transformer with a token classification head on top (BERT Transformer is pre trained , the token classification head is only initialized and has to be trained ), BertForQuestionAnswering (./pytorch_pretrained_bert/modeling.py L1015) BERT Transformer with a token classification head on top (BERT Transformer is pre trained , the token classification head is only initialized and has to be trained ). Three tokenizers (in the tokenization.py (./pytorch_pretrained_bert/tokenization.py) file): BasicTokenizer basic tokenization (punctuation splitting, lower casing, etc.), WordpieceTokenizer WordPiece tokenization, BertTokenizer perform end to end tokenization, i.e. basic tokenization followed by WordPiece tokenization. One optimizer (in the optimization.py (./pytorch_pretrained_bert/optimization.py) file): BertAdam Bert version of Adam algorithm with weight decay fix, warmup and linear decay of the learning rate. A configuration class (in the modeling.py (./pytorch_pretrained_bert/modeling.py) file): BertConfig Configuration class to store the configuration of a BertModel with utilities to read and write from JSON configuration files. The repository further comprises: Four examples on how to use Bert (in the examples folder (./examples)): extract_features.py (./examples/extract_features.py) Show how to extract hidden states from an instance of BertModel , run_classifier.py (./examples/run_classifier.py) Show how to fine tune an instance of BertForSequenceClassification on GLUE's MRPC task, run_squad.py (./examples/run_squad.py) Show how to fine tune an instance of BertForQuestionAnswering on SQuAD v1.0 task. run_swag.py (./examples/run_swag.py) Show how to fine tune an instance of BertForMultipleChoice on Swag task. These examples are detailed in the Examples ( examples) section of this readme. Three notebooks that were used to check that the TensorFlow and PyTorch models behave identically (in the notebooks folder (./notebooks)): Comparing TF and PT models.ipynb (./notebooks/Comparing TF and PT models.ipynb) Compare the hidden states predicted by BertModel , Comparing TF and PT models SQuAD.ipynb (./notebooks/Comparing TF and PT models SQuAD.ipynb) Compare the spans predicted by BertForQuestionAnswering instances, Comparing TF and PT models MLM NSP.ipynb (./notebooks/Comparing TF and PT models MLM NSP.ipynb) Compare the predictions of the BertForPretraining instances. These notebooks are detailed in the Notebooks ( notebooks) section of this readme. A command line interface to convert any TensorFlow checkpoint in a PyTorch dump: This CLI is detailed in the Command line interface ( Command line interface) section of this readme. Usage Here is a quick start example using BertTokenizer , BertModel and BertForMaskedLM class with Google AI's pre trained Bert base uncased model. See the doc section ( doc) below for all the details on these classes. First let's prepare a tokenized input with BertTokenizer python import torch from pytorch_pretrained_bert import BertTokenizer, BertModel, BertForMaskedLM Load pre trained model tokenizer (vocabulary) tokenizer BertTokenizer.from_pretrained('bert base uncased') Tokenized input text Who was Jim Henson ? Jim Henson was a puppeteer tokenized_text tokenizer.tokenize(text) Mask a token that we will try to predict back with BertForMaskedLM masked_index 6 tokenized_text masked_index ' MASK ' assert tokenized_text 'who', 'was', 'jim', 'henson', '?', 'jim', ' MASK ', 'was', 'a', 'puppet', ' eer' Convert token to vocabulary indices indexed_tokens tokenizer.convert_tokens_to_ids(tokenized_text) Define sentence A and B indices associated to 1st and 2nd sentences (see paper) segments_ids 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1 Convert inputs to PyTorch tensors tokens_tensor torch.tensor( indexed_tokens ) segments_tensors torch.tensor( segments_ids ) Let's see how to use BertModel to get hidden states python Load pre trained model (weights) model BertModel.from_pretrained('bert base uncased') model.eval() Predict hidden states features for each layer encoded_layers, _ model(tokens_tensor, segments_tensors) We have a hidden states for each of the 12 layers in model bert base uncased assert len(encoded_layers) 12 And how to use BertForMaskedLM python Load pre trained model (weights) model BertForMaskedLM.from_pretrained('bert base uncased') model.eval() Predict all tokens predictions model(tokens_tensor, segments_tensors) confirm we were able to predict 'henson' predicted_index torch.argmax(predictions 0, masked_index ).item() predicted_token tokenizer.convert_ids_to_tokens( predicted_index ) 0 assert predicted_token 'henson' Doc Here is a detailed documentation of the classes in the package and how to use them: Sub section Description Loading Google AI's pre trained weigths ( Loading Google AIs pre trained weigths and PyTorch dump) How to load Google AI's pre trained weight or a PyTorch saved instance PyTorch models ( PyTorch models) API of the eight PyTorch model classes: BertModel , BertForMaskedLM , BertForNextSentencePrediction , BertForPreTraining , BertForSequenceClassification , BertForMultipleChoice or BertForQuestionAnswering Tokenizer: BertTokenizer ( Tokenizer BertTokenizer) API of the BertTokenizer class Optimizer: BertAdam ( Optimizer BertAdam) API of the BertAdam class Loading Google AI's pre trained weigths and PyTorch dump To load one of Google AI's pre trained models or a PyTorch saved model (an instance of BertForPreTraining saved with torch.save() ), the PyTorch model classes and the tokenizer can be instantiated as python model BERT_CLASS.from_pretrained(PRE_TRAINED_MODEL_NAME_OR_PATH, cache_dir None) where BERT_CLASS is either the BertTokenizer class (to load the vocabulary) or one of the eight PyTorch model classes (to load the pre trained weights): BertModel , BertForMaskedLM , BertForNextSentencePrediction , BertForPreTraining , BertForSequenceClassification , BertForTokenClassification , BertForMultipleChoice or BertForQuestionAnswering , and PRE_TRAINED_MODEL_NAME_OR_PATH is either: the shortcut name of a Google AI's pre trained model selected in the list: bert base uncased : 12 layer, 768 hidden, 12 heads, 110M parameters bert large uncased : 24 layer, 1024 hidden, 16 heads, 340M parameters bert base cased : 12 layer, 768 hidden, 12 heads , 110M parameters bert large cased : 24 layer, 1024 hidden, 16 heads, 340M parameters bert base multilingual uncased : (Orig, not recommended) 102 languages, 12 layer, 768 hidden, 12 heads, 110M parameters bert base multilingual cased : (New, recommended) 104 languages, 12 layer, 768 hidden, 12 heads, 110M parameters bert base chinese : Chinese Simplified and Traditional, 12 layer, 768 hidden, 12 heads, 110M parameters a path or url to a pretrained model archive containing: bert_config.json a configuration file for the model, and pytorch_model.bin a PyTorch dump of a pre trained instance BertForPreTraining (saved with the usual torch.save() ) If PRE_TRAINED_MODEL_NAME_OR_PATH is a shortcut name, the pre trained weights will be downloaded from AWS S3 (see the links here (pytorch_pretrained_bert/modeling.py)) and stored in a cache folder to avoid future download (the cache folder can be found at /.pytorch_pretrained_bert/ ). cache_dir can be an optional path to a specific directory to download and cache the pre trained model weights. This option is useful in particular when you are using distributed training: to avoid concurrent access to the same weights you can set for example cache_dir './pretrained_model_{}'.format(args.local_rank) (see the section on distributed training for more information). Uncased means that the text has been lowercased before WordPiece tokenization, e.g., John Smith becomes john smith . The Uncased model also strips out any accent markers. Cased means that the true case and accent markers are preserved. Typically, the Uncased model is better unless you know that case information is important for your task (e.g., Named Entity Recognition or Part of Speech tagging). For information about the Multilingual and Chinese model, see the Multilingual README or the original TensorFlow repository. When using an uncased model , make sure to pass do_lower_case to the example training scripts (or pass do_lower_case True to FullTokenizer if you're using your own script and loading the tokenizer your self.). Example: python tokenizer BertTokenizer.from_pretrained('bert base uncased', do_lower_case True) model BertForSequenceClassification.from_pretrained('bert base uncased') PyTorch models 1. BertModel BertModel is the basic BERT Transformer model with a layer of summed token, position and sequence embeddings followed by a series of identical self attention blocks (12 for BERT base, 24 for BERT large). The inputs and output are identical to the TensorFlow model inputs and outputs . We detail them here. This model takes as inputs : modeling.py (./pytorch_pretrained_bert/modeling.py) input_ids : a torch.LongTensor of shape batch_size, sequence_length with the word token indices in the vocabulary (see the tokens preprocessing logic in the scripts extract_features.py (./examples/extract_features.py), run_classifier.py (./examples/run_classifier.py) and run_squad.py (./examples/run_squad.py)), and token_type_ids : an optional torch.LongTensor of shape batch_size, sequence_length with the token types indices selected in 0, 1 . Type 0 corresponds to a sentence A and type 1 corresponds to a sentence B token (see BERT paper for more details). attention_mask : an optional torch.LongTensor of shape batch_size, sequence_length with indices selected in 0, 1 . It's a mask to be used if some input sequence lengths are smaller than the max input sequence length of the current batch. It's the mask that we typically use for attention when a batch has varying length sentences. output_all_encoded_layers : boolean which controls the content of the encoded_layers output as described below. Default: True . This model outputs a tuple composed of: encoded_layers : controled by the value of the output_encoded_layers argument: output_all_encoded_layers True : outputs a list of the encoded hidden states at the end of each attention block (i.e. 12 full sequences for BERT base, 24 for BERT large), each encoded hidden state is a torch.FloatTensor of size batch_size, sequence_length, hidden_size , output_all_encoded_layers False : outputs only the encoded hidden states corresponding to the last attention block, i.e. a single torch.FloatTensor of size batch_size, sequence_length, hidden_size , pooled_output : a torch.FloatTensor of size batch_size, hidden_size which is the output of a classifier pretrained on top of the hidden state associated to the first character of the input ( CLF ) to train on the Next Sentence task (see BERT's paper). An example on how to use this class is given in the extract_features.py (./examples/extract_features.py) script which can be used to extract the hidden states of the model for a given input. 2. BertForPreTraining BertForPreTraining includes the BertModel Transformer followed by the two pre training heads: the masked language modeling head, and the next sentence classification head. Inputs comprises the inputs of the BertModel ( 1. BertModel ) class plus two optional labels: masked_lm_labels : masked language modeling labels: torch.LongTensor of shape batch_size, sequence_length with indices selected in 1, 0, ..., vocab_size . All labels set to 1 are ignored (masked), the loss is only computed for the labels set in 0, ..., vocab_size next_sentence_label : next sentence classification loss: torch.LongTensor of shape batch_size with indices selected in 0, 1 . 0 > next sentence is the continuation, 1 > next sentence is a random sentence. Outputs : if masked_lm_labels and next_sentence_label are not None : Outputs the total_loss which is the sum of the masked language modeling loss and the next sentence classification loss. if masked_lm_labels or next_sentence_label is None : Outputs a tuple comprising the masked language modeling logits, and the next sentence classification logits. 3. BertForMaskedLM BertForMaskedLM includes the BertModel Transformer followed by the (possibly) pre trained masked language modeling head. Inputs comprises the inputs of the BertModel ( 1. BertModel ) class plus optional label: masked_lm_labels : masked language modeling labels: torch.LongTensor of shape batch_size, sequence_length with indices selected in 1, 0, ..., vocab_size . All labels set to 1 are ignored (masked), the loss is only computed for the labels set in 0, ..., vocab_size Outputs : if masked_lm_labels is not None : Outputs the masked language modeling loss. if masked_lm_labels is None : Outputs the masked language modeling logits. 4. BertForNextSentencePrediction BertForNextSentencePrediction includes the BertModel Transformer followed by the next sentence classification head. Inputs comprises the inputs of the BertModel ( 1. BertModel ) class plus an optional label: next_sentence_label : next sentence classification loss: torch.LongTensor of shape batch_size with indices selected in 0, 1 . 0 > next sentence is the continuation, 1 > next sentence is a random sentence. Outputs : if next_sentence_label is not None : Outputs the next sentence classification loss. if next_sentence_label is None : Outputs the next sentence classification logits. 5. BertForSequenceClassification BertForSequenceClassification is a fine tuning model that includes BertModel and a sequence level (sequence or pair of sequences) classifier on top of the BertModel . The sequence level classifier is a linear layer that takes as input the last hidden state of the first character in the input sequence (see Figures 3a and 3b in the BERT paper). An example on how to use this class is given in the run_classifier.py (./examples/run_classifier.py) script which can be used to fine tune a single sequence (or pair of sequence) classifier using BERT, for example for the MRPC task. 6. BertForMultipleChoice BertForMultipleChoice is a fine tuning model that includes BertModel and a linear layer on top of the BertModel . The linear layer outputs a single value for each choice of a multiple choice problem, then all the outputs corresponding to an instance are passed through a softmax to get the model choice. This implementation is largely inspired by the work of OpenAI in Improving Language Understanding by Generative Pre Training and the answer of Jacob Devlin in the following issue . An example on how to use this class is given in the run_swag.py (./examples/run_swag.py) script which can be used to fine tune a multiple choice classifier using BERT, for example for the Swag task. 7. BertForTokenClassification BertForTokenClassification is a fine tuning model that includes BertModel and a token level classifier on top of the BertModel . The token level classifier is a linear layer that takes as input the last hidden state of the sequence. 8. BertForQuestionAnswering BertForQuestionAnswering is a fine tuning model that includes BertModel with a token level classifiers on top of the full sequence of last hidden states. The token level classifier takes as input the full sequence of the last hidden state and compute several (e.g. two) scores for each tokens that can for example respectively be the score that a given token is a start_span and a end_span token (see Figures 3c and 3d in the BERT paper). An example on how to use this class is given in the run_squad.py (./examples/run_squad.py) script which can be used to fine tune a token classifier using BERT, for example for the SQuAD task. Tokenizer: BertTokenizer BertTokenizer perform end to end tokenization, i.e. basic tokenization followed by WordPiece tokenization. This class has two arguments: vocab_file : path to a vocabulary file. do_lower_case : convert text to lower case while tokenizing. Default True . and three methods: tokenize(text) : convert a str in a list of str tokens by (1) performing basic tokenization and (2) WordPiece tokenization. convert_tokens_to_ids(tokens) : convert a list of str tokens in a list of int indices in the vocabulary. convert_ids_to_tokens(tokens) : convert a list of int indices in a list of str tokens in the vocabulary. Please refer to the doc strings and code in tokenization.py (./pytorch_pretrained_bert/tokenization.py) for the details of the BasicTokenizer and WordpieceTokenizer classes. In general it is recommended to use BertTokenizer unless you know what you are doing. Optimizer: BertAdam BertAdam is a torch.optimizer adapted to be closer to the optimizer used in the TensorFlow implementation of Bert. The differences with PyTorch Adam optimizer are the following: BertAdam implements weight decay fix, BertAdam doesn't compensate for bias as in the regular Adam optimizer. The optimizer accepts the following arguments: lr : learning rate warmup : portion of t_total for the warmup, 1 means no warmup. Default : 1 t_total : total number of training steps for the learning rate schedule, 1 means constant learning rate. Default : 1 schedule : schedule to use for the warmup (see above). Default : 'warmup_linear' b1 : Adams b1. Default : 0.9 b2 : Adams b2. Default : 0.999 e : Adams epsilon. Default : 1e 6 weight_decay: Weight decay. Default : 0.01 max_grad_norm : Maximum norm for the gradients ( 1 means no clipping). Default : 1.0 Examples Sub section Description Training large models: introduction, tools and examples ( Training large models introduction, tools and examples) How to use gradient accumulation, multi gpu training, distributed training, optimize on CPU and 16 bits training to train Bert models Fine tuning with BERT: running the examples ( Fine tuning with BERT running the examples) Running the examples in ./examples (./examples/): extract_classif.py , run_classifier.py and run_squad.py Fine tuning BERT large on GPUs ( Fine tuning BERT large on GPUs) How to fine tune BERT large Training large models: introduction, tools and examples BERT base and BERT large are respectively 110M and 340M parameters models and it can be difficult to fine tune them on a single GPU with the recommended batch size for good performance (in most case a batch size of 32). To help with fine tuning these models, we have included several techniques that you can activate in the fine tuning scripts run_classifier.py (./examples/run_classifier.py) and run_squad.py (./examples/run_squad.py): gradient accumulation, multi gpu training, distributed training and 16 bits training . For more details on how to use these techniques you can read the tips on training large batches in PyTorch that I published earlier this month. Here is how to use these techniques in our scripts: Gradient Accumulation : Gradient accumulation can be used by supplying a integer greater than 1 to the gradient_accumulation_steps argument. The batch at each step will be divided by this integer and gradient will be accumulated over gradient_accumulation_steps steps. Multi GPU : Multi GPU is automatically activated when several GPUs are detected and the batches are splitted over the GPUs. Distributed training : Distributed training can be activated by supplying an integer greater or equal to 0 to the local_rank argument (see below). 16 bits training : 16 bits training, also called mixed precision training, can reduce the memory requirement of your model on the GPU by using half precision training, basically allowing to double the batch size. If you have a recent GPU (starting from NVIDIA Volta architecture) you should see no decrease in speed. A good introduction to Mixed precision training can be found here and a full documentation is here . In our scripts, this option can be activated by setting the fp16 flag and you can play with loss scaling using the loss_scale flag (see the previously linked documentation for details on loss scaling). The loss scale can be zero in which case the scale is dynamically adjusted or a positive power of two in which case the scaling is static. To use 16 bits training and distributed training, you need to install NVIDIA's apex extension as detailed here . You will find more information regarding the internals of apex and how to use apex in the doc and the associated repository . The results of the tests performed on pytorch BERT by the NVIDIA team (and my trials at reproducing them) can be consulted in the relevant PR of the present repository . Note: To use Distributed Training , you will need to run one training script on each of your machines. This can be done for example by running the following command on each server (see the above mentioned blog post ( ) for more details): bash python m torch.distributed.launch nproc_per_node 4 nnodes 2 node_rank $THIS_MACHINE_INDEX master_addr 192.168.1.1 master_port 1234 run_classifier.py ( arg1 arg2 arg3 and all other arguments of the run_classifier script) Where $THIS_MACHINE_INDEX is an sequential index assigned to each of your machine (0, 1, 2...) and the machine with rank 0 has an IP address 192.168.1.1 and an open port 1234 . Fine tuning with BERT: running the examples We showcase several fine tuning examples based on (and extended from) the original implementation : a sequence level classifier on the MRPC classification corpus, a token level classifier on the question answering dataset SQuAD, and a sequence level multiple choice classifier on the SWAG classification corpus. MRPC This example code fine tunes BERT on the Microsoft Research Paraphrase Corpus (MRPC) corpus and runs in less than 10 minutes on a single K 80 and in 27 seconds (!) on single tesla V100 16GB with apex installed. Before running this example you should download the GLUE data by running this script and unpack it to some directory $GLUE_DIR . shell export GLUE_DIR /path/to/glue python run_classifier.py \ task_name MRPC \ do_train \ do_eval \ do_lower_case \ data_dir $GLUE_DIR/MRPC/ \ bert_model bert base uncased \ max_seq_length 128 \ train_batch_size 32 \ learning_rate 2e 5 \ num_train_epochs 3.0 \ output_dir /tmp/mrpc_output/ Our test ran on a few seeds with the original implementation hyper parameters gave evaluation results between 84% and 88%. Fast run with apex and 16 bit precision: fine tuning on MRPC in 27 seconds! First install apex as indicated here . Then run shell export GLUE_DIR /path/to/glue python run_classifier.py \ task_name MRPC \ do_train \ do_eval \ do_lower_case \ data_dir $GLUE_DIR/MRPC/ \ bert_model bert base uncased \ max_seq_length 128 \ train_batch_size 32 \ learning_rate 2e 5 \ num_train_epochs 3.0 \ output_dir /tmp/mrpc_output/ SQuAD This example code fine tunes BERT on the SQuAD dataset. It runs in 24 min (with BERT base) or 68 min (with BERT large) on a single tesla V100 16GB. The data for SQuAD can be downloaded with the following links and should be saved in a $SQUAD_DIR directory. train v1.1.json dev v1.1.json evaluate v1.1.py shell export SQUAD_DIR /path/to/SQUAD python run_squad.py \ bert_model bert base uncased \ do_train \ do_predict \ do_lower_case \ train_file $SQUAD_DIR/train v1.1.json \ predict_file $SQUAD_DIR/dev v1.1.json \ train_batch_size 12 \ learning_rate 3e 5 \ num_train_epochs 2.0 \ max_seq_length 384 \ doc_stride 128 \ output_dir /tmp/debug_squad/ Training with the previous hyper parameters gave us the following results: bash { f1 : 88.52381567990474, exact_match : 81.22043519394512} SWAG The data for SWAG can be downloaded by cloning the following repository shell export SWAG_DIR /path/to/SWAG python run_swag.py \ bert_model bert base uncased \ do_train \ do_lower_case \ do_eval \ data_dir $SWAG_DIR/data \ train_batch_size 16 \ learning_rate 2e 5 \ num_train_epochs 3.0 \ max_seq_length 80 \ output_dir /tmp/swag_output/ \ gradient_accumulation_steps 4 Training with the previous hyper parameters on a single GPU gave us the following results: eval_accuracy 0.8062081375587323 eval_loss 0.5966546792367169 global_step 13788 loss 0.06423990014260186 Fine tuning BERT large on GPUs The options we list above allow to fine tune BERT large rather easily on GPU(s) instead of the TPU used by the original implementation. For example, fine tuning BERT large on SQuAD can be done on a server with 4 k 80 (these are pretty old now) in 18 hours. Our results are similar to the TensorFlow implementation results (actually slightly higher): bash { exact_match : 84.56953642384106, f1 : 91.04028647786927} To get these results we used a combination of: multi GPU training (automatically activated on a multi GPU server), 2 steps of gradient accumulation and perform the optimization step on CPU to store Adam's averages in RAM. Here is the full list of hyper parameters for this run: bash python ./run_squad.py \ bert_model bert large uncased \ do_train \ do_predict \ do_lower_case \ train_file $SQUAD_TRAIN \ predict_file $SQUAD_EVAL \ learning_rate 3e 5 \ num_train_epochs 2 \ max_seq_length 384 \ doc_stride 128 \ output_dir $OUTPUT_DIR \ train_batch_size 24 \ gradient_accumulation_steps 2 If you have a recent GPU (starting from NVIDIA Volta series), you should try 16 bit fine tuning (FP16). Here is an example of hyper parameters for a FP16 run we tried: bash python ./run_squad.py \ bert_model bert large uncased \ do_train \ do_predict \ do_lower_case \ train_file $SQUAD_TRAIN \ predict_file $SQUAD_EVAL \ learning_rate 3e 5 \ num_train_epochs 2 \ max_seq_length 384 \ doc_stride 128 \ output_dir $OUTPUT_DIR \ train_batch_size 24 \ fp16 \ loss_scale 128 The results were similar to the above FP32 results (actually slightly higher): bash { exact_match : 84.65468306527909, f1 : 91.238669287002} Notebooks We include three Jupyter Notebooks that can be used to check that the predictions of the PyTorch model are identical to the predictions of the original TensorFlow model. The first NoteBook ( Comparing TF and PT models.ipynb (./notebooks/Comparing TF and PT models.ipynb)) extracts the hidden states of a full sequence on each layers of the TensorFlow and the PyTorch models and computes the standard deviation between them. In the given example, we get a standard deviation of 1.5e 7 to 9e 7 on the various hidden state of the models. The second NoteBook ( Comparing TF and PT models SQuAD.ipynb (./notebooks/Comparing TF and PT models SQuAD.ipynb)) compares the loss computed by the TensorFlow and the PyTorch models for identical initialization of the fine tuning layer of the BertForQuestionAnswering and computes the standard deviation between them. In the given example, we get a standard deviation of 2.5e 7 between the models. The third NoteBook ( Comparing TF and PT models MLM NSP.ipynb (./notebooks/Comparing TF and PT models MLM NSP.ipynb)) compares the predictions computed by the TensorFlow and the PyTorch models for masked token language modeling using the pre trained masked language modeling model. Please follow the instructions given in the notebooks to run and modify them. Command line interface A command line interface is provided to convert a TensorFlow checkpoint in a PyTorch dump of the BertForPreTraining class (see above). You can convert any TensorFlow checkpoint for BERT (in particular the pre trained models released by Google ) in a PyTorch save file by using the ./pytorch_pretrained_bert/convert_tf_checkpoint_to_pytorch.py (convert_tf_checkpoint_to_pytorch.py) script. This CLI takes as input a TensorFlow checkpoint (three files starting with bert_model.ckpt ) and the associated configuration file ( bert_config.json ), and creates a PyTorch model for this configuration, loads the weights from the TensorFlow checkpoint in the PyTorch model and saves the resulting model in a standard PyTorch save file that can be imported using torch.load() (see examples in extract_features.py (./examples/extract_features.py), run_classifier.py (./examples/run_classifier.py) and run_squad.py ((./examples/run_squad.py))). You only need to run this conversion script once to get a PyTorch model. You can then disregard the TensorFlow checkpoint (the three files starting with bert_model.ckpt ) but be sure to keep the configuration file ( bert_config.json ) and the vocabulary file ( vocab.txt ) as these are needed for the PyTorch model too. To run this specific conversion script you will need to have TensorFlow and PyTorch installed ( pip install tensorflow ). The rest of the repository only requires PyTorch. Here is an example of the conversion process for a pre trained BERT Base Uncased model: shell export BERT_BASE_DIR /path/to/bert/uncased_L 12_H 768_A 12 pytorch_pretrained_bert convert_tf_checkpoint_to_pytorch \ $BERT_BASE_DIR/bert_model.ckpt \ $BERT_BASE_DIR/bert_config.json \ $BERT_BASE_DIR/pytorch_model.bin You can download Google's pre trained models for the conversion here . TPU TPU support and pretraining scripts TPU are not supported by the current stable release of PyTorch (0.4.1). However, the next version of PyTorch (v1.0) should support training on TPU and is expected to be released soon (see the recent official announcement ). We will add TPU support when this next release is published. The original TensorFlow code further comprises two scripts for pre training BERT: create_pretraining_data.py and run_pretraining.py . Since, pre training BERT is a particularly expensive operation that basically requires one or several TPUs to be completed in a reasonable amout of time (see details here ) we have decided to wait for the inclusion of TPU support in PyTorch to convert these pre training scripts.",Question Answering,Question Answering 2787,Natural Language Processing,Natural Language Processing,Natural Language Processing,"bilm tf Tensorflow implementation of the pretrained biLM used to compute ELMo representations from Deep contextualized word representations . This repository supports both training biLMs and using pre trained models for prediction. We also have a pytorch implementation available in AllenNLP . You may also find it easier to use the version provided in Tensorflow Hub if you just like to make predictions. Citation: @inproceedings{Peters:2018, author {Peters, Matthew E. and Neumann, Mark and Iyyer, Mohit and Gardner, Matt and Clark, Christopher and Lee, Kenton and Zettlemoyer, Luke}, title {Deep contextualized word representations}, booktitle {Proc. of NAACL}, year {2018} } Installing Install python version 3.5 or later, tensorflow version 1.2 and h5py: pip install tensorflow gpu 1.2 h5py python setup.py install Ensure the tests pass in your environment by running: python m unittest discover tests/ Installing with Docker To run the image, you must use nvidia docker, because this repository requires GPUs. sudo nvidia docker run t allennlp/bilm tf:training gpu Using pre trained models We have several different English language pre trained biLMs available for use. Each model is specified with two separate files, a JSON formatted options file with hyperparameters and a hdf5 formatted file with the model weights. Links to the pre trained models are available here . There are three ways to integrate ELMo representations into a downstream task, depending on your use case. 1. Compute representations on the fly from raw text using character input. This is the most general method and will handle any input text. It is also the most computationally expensive. 2. Precompute and cache the context independent token representations, then compute context dependent representations using the biLSTMs for input data. This method is less computationally expensive then 1, but is only applicable with a fixed, prescribed vocabulary. 3. Precompute the representations for your entire dataset and save to a file. We have used all of these methods in the past for various use cases. 1 is necessary for evaluating at test time on unseen data (e.g. public SQuAD leaderboard). 2 is a good compromise for large datasets where the size of the file in 3 is unfeasible (SNLI, SQuAD). 3 is a good choice for smaller datasets or in cases where you'd like to use ELMo in other frameworks. In all cases, the process roughly follows the same steps. First, create a Batcher (or TokenBatcher for 2) to translate tokenized strings to numpy arrays of character (or token) ids. Then, load the pretrained ELMo model (class BidirectionalLanguageModel ). Finally, for steps 1 and 2 use weight_layers to compute the final ELMo representations. For 3, use BidirectionalLanguageModel to write all the intermediate layers to a file. Shape conventions Each tokenized sentence is a list of str , with a batch of sentences a list of tokenized sentences ( List List str ). The Batcher packs these into a shape (n_sentences, max_sentence_length + 2, 50) numpy array of character ids, padding on the right with 0 ids for sentences less then the maximum length. The first and last tokens for each sentence are special begin and end of sentence ids added by the Batcher . The input character id placeholder can be dimensioned (None, None, 50) , with both the batch dimension (axis 0) and time dimension (axis 1) determined for each batch, up the the maximum batch size specified in the BidirectionalLanguageModel constructor. After running inference with the batch, the return biLM embeddings are a numpy array with shape (n_sentences, 3, max_sentence_length, 1024) , after removing the special begin/end tokens. Vocabulary file The Batcher takes a vocabulary file as input for efficency. This is a text file, with one token per line, separated by newlines ( \n ). Each token in the vocabulary is cached as the appropriate 50 character id sequence once. Since the model is completely character based, tokens not in the vocabulary file are handled appropriately at run time, with a slight decrease in run time. It is recommended to always include the special and tokens (case sensitive) in the vocabulary file. ELMo with character input See usage_character.py for a detailed usage example. ELMo with pre computed and cached context independent token representations To speed up model inference with a fixed, specified vocabulary, it is possible to pre compute the context independent token representations, write them to a file, and re use them for inference. Note that we don't support falling back to character inputs for out of vocabulary words, so this should only be used when the biLM is used to compute embeddings for input with a fixed, defined vocabulary. To use this option: 1. First create a vocabulary file with all of the unique tokens in your dataset and add the special and tokens. 2. Run dump_token_embeddings with the full model to write the token embeddings to a hdf5 file. 3. Use TokenBatcher (instead of Batcher ) with your vocabulary file, and pass use_token_inputs False and the name of the output file from step 2 to the BidirectonalLanguageModel constructor. See usage_token.py for a detailed usage example. Dumping biLM embeddings for an entire dataset to a single file. To take this option, create a text file with your tokenized dataset. Each line is one tokenized sentence (whitespace separated). Then use dump_bilm_embeddings . The output file is hdf5 format. Each sentence in the input data is stored as a dataset with key str(sentence_id) where sentence_id is the line number in the dataset file (indexed from 0). The embeddings for each sentence are a shape (3, n_tokens, 1024) array. See usage_cached.py for a detailed example. Training a biLM on a new corpus Broadly speaking, the process to train and use a new biLM is: 1. Prepare input data and a vocabulary file. 2. Train the biLM. 3. Test (compute the perplexity of) the biLM on heldout data. 4. Write out the weights from the trained biLM to a hdf5 file. 5. See the instructions above for using the output from Step 4 in downstream models. 1. Prepare input data and a vocabulary file. To train and evaluate a biLM, you need to provide: a vocabulary file a set of training files a set of heldout files The vocabulary file is a a text file with one token per line. It must also include the special tokens , and (case sensitive) in the file. IMPORTANT : the vocabulary file should be sorted in descending order by token count in your training data. The first three lines should be the special tokens ( , and ), then the most common token in the training data, ending with the least common token. NOTE : the vocabulary file used in training may differ from the one use for prediction. The training data should be randomly split into many training files, each containing one slice of the data. Each file contains pre tokenized and white space separated text, one sentence per line. Don't include the or tokens in your training data. All tokenization/normalization is done before training a model, so both the vocabulary file and training files should include normalized tokens. As the default settings use a fully character based token representation, in general we do not recommend any normalization other then tokenization. Finally, reserve a small amount of the training data as heldout data for evaluating the trained biLM. 2. Train the biLM. The hyperparameters used to train the ELMo model can be found in bin/train_elmo.py . The ELMo model was trained on 3 GPUs. To train a new model with the same hyperparameters, first download the training data from the 1 Billion Word Benchmark . Then download the vocabulary file . Finally, run: export CUDA_VISIBLE_DEVICES 0,1,2 python bin/train_elmo.py \ train_prefix '/path/to/1 billion word language modeling benchmark r13output/training monolingual.tokenized.shuffled/ ' \ vocab_file /path/to/vocab 2016 09 10.txt \ save_dir /output_path/to/checkpoint 3. Evaluate the trained model. Use bin/run_test.py to evaluate a trained model, e.g. export CUDA_VISIBLE_DEVICES 0 python bin/run_test.py \ test_prefix '/path/to/1 billion word language modeling benchmark r13output/heldout monolingual.tokenized.shuffled/news.en.heldout 000 ' \ vocab_file /path/to/vocab 2016 09 10.txt \ save_dir /output_path/to/checkpoint 4. Convert the tensorflow checkpoint to hdf5 for prediction with bilm or allennlp . First, create an options.json file for the newly trained model. To do so, follow the template in an existing file (e.g. the original options.json and modify for your hyperpararameters. Important : always set n_characters to 262 after training (see below). Then Run: python bin/dump_weights.py \ save_dir /output_path/to/checkpoint outfile /output_path/to/weights.hdf5 Frequently asked questions and other warnings Can you provide the tensorflow checkpoint from training? The tensorflow checkpoint is available by downloading these files: vocabulary checkpoint options 1 2 3 How to do fine tune a model on additional unlabeled data? First download the checkpoint files above. Then prepare the dataset as described in the section Training a biLM on a new corpus , with the exception that we will use the existing vocabulary file instead of creating a new one. Finally, use the script bin/restart.py to restart training with the existing checkpoint on the new dataset. For small datasets (e.g. , and . You can find our vocabulary file here . At the model input, all text used the full character based representation, including tokens outside the vocab. For the softmax output we replaced OOV tokens with . The model was trained with a fixed size window of 20 tokens. The batches were constructed by padding sentences with and , then packing tokens from one or more sentences into each row to fill completely fill each batch. Partial sentences and the LSTM states were carried over from batch to batch so that the language model could use information across batches for context, but backpropogation was broken at each batch boundary. Why do I get slightly different embeddings if I run the same text through the pre trained model twice? As a result of the training method (see above), the LSTMs are stateful, and carry their state forward from batch to batch. Consequently, this introduces a small amount of non determinism, expecially for the first two batches. Why does training seem to take forever even with my small dataset? The number of gradient updates during training is determined by: the number of tokens in the training data ( n_train_tokens ) the batch size ( batch_size ) the number of epochs ( n_epochs ) Be sure to set these values for your particular dataset in bin/train_elmo.py . What's the deal with n_characters and padding? During training, we fill each batch to exactly 20 tokens by adding and to each sentence, then packing tokens from one or more sentences into each row to fill completely fill each batch. As a result, we do not allocate space for a special padding token. The UnicodeCharsVocabulary that converts token strings to lists of character ids always uses a fixed number of character embeddings of n_characters 261 , so always set n_characters 261 during training. However, for prediction, we ensure each sentence is fully contained in a single batch, and as a result pad sentences of different lengths with a special padding id. This occurs in the Batcher see here . As a result, set n_characters 262 during prediction in the options.json . How can I use ELMo to compute sentence representations? Simple methods like average and max pooling of the word level ELMo representations across sentences works well, often outperforming supervised methods on benchmark datasets. See Evaluation of sentence embeddings in downstream and linguistic probing tasks , Perone et al, 2018 arxiv link . I'm seeing a WARNING when serializing models, is it a problem? The below warning can be safely ignored: 2018 08 24 13:04:08,779 : WARNING : Error encountered when serializing lstm_output_embeddings. Type is unsupported, or the types of the items don't match field type in CollectionDef. 'list' object has no attribute 'name'",Question Answering,Question Answering 2792,Natural Language Processing,Natural Language Processing,Natural Language Processing,bert Using BERT in Chinese NER TASK. The BERT is decribed in BERT: Pre training of Deep Bidirectional Transformers for Language Understanding .,Question Answering,Question Answering 2815,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Commonsense for Generative Multi Hop Question Answering Tasks (EMNLP 2018) This repository contains the code and setup instructions for our EMNLP 2018 paper Commonsense for Generative Multi Hop Question Answering Tasks . See full paper here . Environment Setup We trained our models with python 2 and TensorFlow 1.3, a full list of python packages is listed in requirements.txt Downloading Data First, to setup the directory structure, please run setup.sh to create the appropriate directories. We download the raw data for NarrativeQA and WikiHop. For NarrativeQA, we download from github, starting at the root of the directory, run cd raw_data git clone For WikiHop, we download the QAngaroo dataset here , and extract the zip file into the raw_data directory. We use pre computed ELMo representations. Download our pre computed ELMo representation here , and extract into the folder lm_data . We also use a local version of ConceptNet's relations. Download the relations file from here and put it in the folder data . Build Processed Datasets We need to build processed datasets with extracted commonsense information. For NarrativeQA, we run: python src/config.py \ mode build_dataset \ data_dir raw_data/narrativeqa \ load_commonsense \ commonsense_file data/cn_relations_orig.txt \ processed_dataset_train data/narrative_qa_train.jsonl \ processed_dataset_valid data/narrative_qa_valid.jsonl \ processed_dataset_test data/narrative_qa_test.jsonl To build processed datasets with extracted commonsense for WikiHop, we run: python src/config.py \ mode build_wikihop_dataset \ data_dir raw_data/qangaroo_v1.1 \ load_commonsense \ commonsense_file data/cn_relations_orig.txt \ processed_dataset_train data/wikihop_train.jsonl \ processed_dataset_valid data/wikihop_valid.jsonl Training & Evaluation Training To train models for NarrativeQA, run: python src/config.py \ version {commonsense_nqa, baseline_nqa} \ model_name \ processed_dataset_train data/narrative_qa_train.jsonl \ processed_dataset_valid data/narrative_qa_valid.jsonl \ batch_size 24 \ max_target_iterations 15 \ dropout_rate 0.2 To train models for WikiHop, run: python src/config.py \ version {commonsense_wh, baseline_wh} \ model_name \ elmo_options_file lm_data/wh/elmo_2x4096_512_2048cnn_2xhighway_options.json \ elmo_weight_file lm_data/wh/elmo_2x4096_512_2048cnn_2xhighway_weights.hdf5 \ elmo_token_embedding_file lm_data/wh/elmo_token_embeddings.hdf5 \ elmo_vocab_file lm_data/wh/wikihop_vocab.txt \ processed_dataset_train data/wikihop_train.jsonl \ processed_dataset_valid data/wikihop_valid.jsonl \ multiple_choice \ max_target_iterations 4 \ max_iterations 8 \ batch_size 16 \ max_target_iterations 4 \ max_iterations 8 \ max_context_iterations 1300 \ dropout_rate 0.2 Evaluation To evaluate NarrativeQA, we need to first generate official answers on the test set. To do so, run: python src/config.py \ mode generate_answers \ processed_dataset_valid data/narrative_qa_valid.jsonl \ processed_dataset_test data/narrative_qa_test.jsonl This will create the reference files val_ref0.txt , val_ref1.txt , test_ref0.txt and test_ref1.txt . To evaluate a model on NarrativeQA, run: python src/config.py \ mode test \ version {commonsense_nqa, baseline_nqa} \ model_name \ use_ckpt \ use_test \ only use this flag if you want to evaluate on test set processed_dataset_train data/narrative_qa_train.jsonl \ processed_dataset_valid data/narrative_qa_valid.jsonl \ processed_dataset_test data/narrative_qa_test.jsonl \ batch_size 24 \ max_target_iterations 15 \ dropout_rate 0.2 which generates the output (a new file named \_preds.txt). Then run python src/eval_generation.py where ref0 and ref1 are the generated reference files for the automatic metrics. To evaluate a model on WikiHop, run: python src/config.py \ mode test \ version {commonsense_wh, baseline_wh} \ model_name \ use_ckpt \ elmo_options_file lm_data/wh/elmo_2x4096_512_2048cnn_2xhighway_options.json \ elmo_weight_file lm_data/wh/elmo_2x4096_512_2048cnn_2xhighway_weights.hdf5 \ elmo_token_embedding_file lm_data/wh/elmo_token_embeddings.hdf5 \ elmo_vocab_file lm_data/wh/wikihop_vocab.txt \ processed_dataset_train data/wikihop_train.jsonl \ processed_dataset_valid data/wikihop_valid.jsonl \ multiple_choice \ max_target_iterations 4 \ max_iterations 8 \ batch_size 16 \ max_target_iterations 4 \ max_iterations 8 \ max_context_iterations 1300 \ dropout_rate 0.2 This outputs the test accuracy and generates an output file containing the model's predictions. Download and Run Pre Trained Models We release some pretrained models for both the NarrativeQA and WikiHop datasets. The results are listed below: NarrativeQA Model Dev (R L/B 1/B 4/M/C) Test (R L/B 1/B 4/M/C) Baseline 48.10/45.83/20.62/20.28/163.87 46.15/44.55/21.16/19.60/159.51 Commonsense 51.70/49.28/23.18/22.17/179.13 50.15/48.44/24.01/21.76/178.95 These NarrativeQA models resulted from further tuning after the paper's publication and have better performance than those presented in the paper. WikiHop Model Dev Acc (%) Test Acc (%) Baseline 56.2% 57.5% Commonsense 58.5% 57.9% These WikiHop results are after tuning on the official/full WikiHop validation set, these numbers will appear in an upcoming arxiv update available here . Download our pretrained models here: NarrativeQA Commonsense Model NarrativeQA Baseline Model WikiHop Commonsense Model WikiHop Baseline Model Download and extract them to the out repo, and see above for how to evaluate these models. Bibtex @inproceedings{bauerwang2019commonsense, title {Commonsense for Generative Multi Hop Question Answering Tasks}, author {Lisa Bauer , Yicheng Wang and Mohit Bansal}, booktitle {Proceedings of the Empirical Methods in Natural Language Processing}, year {2018} }",Question Answering,Question Answering 2825,Natural Language Processing,Natural Language Processing,Natural Language Processing,This source is to reproduce the mnemonic reader model with chainer. Paper The original paper is here: Reinforced Mnemonic reader for machine comprehension : 'Mnemonic Reader',Question Answering,Question Answering 2829,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Neural Variational Document Model (NVDM) tensorflow implementation This is the tensorflow implementation of NVDM for the paper: Neural Variational Inference for Text Processing 1 . Yishu Miao, Lei Yu, Phil Blunsom. ICML 2016. The original code is on torch. Since there are quite a few people asked me questions about the implementation, I have reimplemented the model by tensorflow. Please contact me if you find any problem with this implementation. It is able to achieve better results than the ones reported in the paper. RCV1 v2 dataset Please download and uncompress the dataset 2 to: data/rcv1 v2 Train the Model python nvdm.py data_dir data/20news/ 1 : 2 : 3 :",Question Answering,Question Answering 2831,Natural Language Processing,Natural Language Processing,Natural Language Processing,"ELMo pytorch to be completed This project is a trial for trainning ELMo with chinese corpus, trying to build a tool for both trainning and using. paper: Deep contextualized word representations( corpus: chinese wiki 300d",Question Answering,Question Answering 2841,Natural Language Processing,Natural Language Processing,Natural Language Processing,mnemonic reader PyTorch implementation of the Reinforced Mnemonic Reader + Answer Verifier model Adapted from HKUST KnowComp/MnemonicReader .,Question Answering,Question Answering 2846,Natural Language Processing,Natural Language Processing,Natural Language Processing,"PPDAI Magic Mirror Data Application Contest Introduction This is the repository for PPDAI contest, which is a natural language processing (NLP) model aims to detect duplicate questions in Chinese. Data Data was provided by PPDAI, which are pairs of questions labeled with 0 and 1 represents similar or not. The questions are represented by two sequences of integers which are the indices of corresponding embedding vectors (word and character). Model We proposed three models including a RNN based model, CNN based model and a RCNN based model. These models have the following characteristics: 1. Bi Directional GRU in RNN based models for semantic learning. 2. 1 D Convolution in CNN and RCNN based models for local feature extraction. 3. Co Attention was used to learn the semantic correlations between two sequences. 4. Self Attention was used to enhance the feature representation. 5. Word embedding and Character Embedding were used simultaneously. Performance: Our ensemble model achieved 0.203930 of loss in the semi final, at the top 15% in ranking. Reference QANet: Combining Local Convolution with Global Self Attention for Reading Comprehension ICLR 2018 Zhouhan Lin et al. “A Structured Self attentive Sentence Embedding”. In:CoRRabs/1703.03130 (2017).arXiv:1703.03130. Pranav Rajpurkar et al. “SQuAD: 100, 000+ Questions for Machine Comprehension of Text”. In:CoRRabs/1606.05250 (2016). arXiv:1606.05250. Wenhui Wang et al. “Gated Self Matching Networks for Reading Comprehension and Question Answering”",Question Answering,Question Answering 2871,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Jointly Learning Remote Relation with ELMo This is the research project for CS682 Neural Networks. Relation extraction is a key NLP task, but it is highly constrained by long distance dependency. In this work we are trying to incorporate the NAACL 2018 work ELMo Embedding with hierarchical attention architecture to learn remote references in bio medic documents. We also tried to jointly train the model on text generation and relation extraction tasks, which is related to both multi task learning and transfer learning.",Question Answering,Question Answering 2052,Computer Vision,Computer Vision,Computer Vision,"DeepLab v2 New release DeepLab v2 has been released recently (see this ), which attains 79.7% on the challenging PASCAL VOC 2012 image segmentation task. DeepLab v2 also incorportates some of the key layers from our DeepLab v1 (this repository). Note that there are still some minor differences between argmax and softmax_loss layers for DeepLabv1 and v2. If you want to reproduce our ICCV'15 results, please refer to the implementation of DeepLabv1. Please also see our project website for details. DeepLab v1 (this repository) Introduction DeepLab is a state of art deep learning system for semantic image segmentation built on top of Caffe . It combines densely computed deep convolutional neural network (CNN) responses with densely connected conditional random fields (CRF). This distribution provides a publicly available implementation for the key model ingredients first reported in an arXiv paper , accepted in revised form as conference publication to the ICLR 2015 conference. It also contains implementations for methods supporting model learning using only weakly labeled examples, described in a second follow up arXiv paper . Please consult and consider citing the following papers: @inproceedings{chen14semantic, title {Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs}, author {Liang Chieh Chen and George Papandreou and Iasonas Kokkinos and Kevin Murphy and Alan L Yuille}, booktitle {ICLR}, url { year {2015} } @article{papandreou15weak, title {Weakly and Semi Supervised Learning of a DCNN for Semantic Image Segmentation}, author {George Papandreou and Liang Chieh Chen and Kevin Murphy and Alan L Yuille}, journal {arxiv:1502.02734}, year {2015} } Note that if you use the densecrf implementation, please consult and cite the following paper: @inproceedings{KrahenbuhlK11, title {Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials}, author {Philipp Kr{\ {a}}henb{\ {u}}hl and Vladlen Koltun}, booktitle {NIPS}, year {2011} } Performance DeepLab currently achieves 73.9% on the challenging PASCAL VOC 2012 image segmentation task see the leaderboard . Pre trained models We have released several trained models and corresponding prototxt files at here . Please check it for more model details. The best model among the released ones yields 73.6% on PASCAL VOC 2012 test set. Experimental set up 1. The scripts we used for our experiments can be downloaded from this link : 1. run_pascal.sh: the script for training/testing on the PASCAL VOC 2012 dataset. __Note__ You also need to download sub.sed script. 2. run_densecrf.sh and run_densecrf_grid_search.sh: the scripts we used for post processing the DCNN computed results by DenseCRF. 2. The image list files used in our experiments can be downloaded from this link : The zip file stores the list files for the PASCAL VOC 2012 dataset. 3. To use the mat_read_layer and mat_write_layer, please download and install matio . FAQ Check FAQ if you have some problems while using the code. How to run DeepLab There are several variants of DeepLab. To begin with, we suggest DeepLab LargeFOV, which has good performance and faster training time. Suppose the codes are located at deeplab/code 1. mkdir deeplab/exper (Create a folder for experiments) 2. mkdir deeplab/exper/voc12 (Create a folder for your specific experiment. Let's take PASCAL VOC 2012 for example.) 3. Create folders for config files and so on. 1. mkdir deeplab/exper/voc12/config (where network config files are saved.) 2. mkdir deeplab/exper/voc12/features (where the computed features will be saved (when train on train)) 3. mkdir deeplab/exper/voc12/features2 (where the computed features will be saved (when train on trainval)) 4. mkdir deeplab/exper/voc12/list (where you save the train, val, and test file lists) 5. mkdir deeplab/exper/voc12/log (where the training/test logs will be saved) 6. mkdir deeplab/exper/voc12/model (where the trained models will be saved) 7. mkdir deeplab/exper/voc12/res (where the evaluation results will be saved) 4. mkdir deeplab/exper/voc12/config/deeplab_largeFOV (test your own network. Create a folder under config. For example, deeplab_largeFOV is the network you want to experiment with. Add your train.prototxt and test.prototxt in that folder (you can check some provided examples for reference).) 5. Set up your init.caffemodel at deeplab/exper/voc12/model/deeplab_largeFOV. You may want to soft link init.caffemodel to the modified VGG 16 net. For example, run ln s vgg16.caffemodel init.caffemodel at voc12/model/deeplab_largeFOV. 6. Modify the provided script, run_pascal.sh, for experiments. You should change the paths according to your setting. For example, you should specify where the caffe is by changing CAFFE_DIR. Note You may need to modify sub.sed, if you want to replace some variables with your desired values in train.prototxt or test.prototxt. 7. The computed features are saved at folders features or features2, and you can run provided MATLAB scripts to evaluate the results (e.g., check the script at code/matlab/my_script/EvalSegResults). Python Seyed Ali Mousavi has implemented a python version of run_pascal.sh (Thanks, Ali!). If you are more familiar with Python, you may want to take a look at this .",Semantic Segmentation,Semantic Segmentation 2053,Computer Vision,Computer Vision,Computer Vision,"DeepLab v2 Introduction DeepLab is a state of art deep learning system for semantic image segmentation built on top of Caffe . It combines (1) atrous convolution to explicitly control the resolution at which feature responses are computed within Deep Convolutional Neural Networks, (2) atrous spatial pyramid pooling to robustly segment objects at multiple scales with filters at multiple sampling rates and effective fields of views, and (3) densely connected conditional random fields (CRF) as post processing. This distribution provides a publicly available implementation for the key model ingredients reported in our latest arXiv paper . This version also supports the experiments (DeepLab v1) in our ICLR'15. You only need to modify the old prototxt files. For example, our proposed atrous convolution is called dilated convolution in CAFFE framework, and you need to change the convolution parameter hole to dilation (the usage is exactly the same). For the experiments in ICCV'15, there are some differences between our argmax and softmax_loss layers and Caffe's. Please refer to DeepLabv1 for details. Please consult and consider citing the following papers: @article{CP2016Deeplab, title {DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs}, author {Liang Chieh Chen and George Papandreou and Iasonas Kokkinos and Kevin Murphy and Alan L Yuille}, journal {arXiv:1606.00915}, year {2016} } @inproceedings{CY2016Attention, title {Attention to Scale: Scale aware Semantic Image Segmentation}, author {Liang Chieh Chen and Yi Yang and Jiang Wang and Wei Xu and Alan L Yuille}, booktitle {CVPR}, year {2016} } @inproceedings{CB2016Semantic, title {Semantic Image Segmentation with Task Specific Edge Detection Using CNNs and a Discriminatively Trained Domain Transform}, author {Liang Chieh Chen and Jonathan T Barron and George Papandreou and Kevin Murphy and Alan L Yuille}, booktitle {CVPR}, year {2016} } @inproceedings{PC2015Weak, title {Weakly and Semi Supervised Learning of a DCNN for Semantic Image Segmentation}, author {George Papandreou and Liang Chieh Chen and Kevin Murphy and Alan L Yuille}, booktitle {ICCV}, year {2015} } @inproceedings{CP2015Semantic, title {Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs}, author {Liang Chieh Chen and George Papandreou and Iasonas Kokkinos and Kevin Murphy and Alan L Yuille}, booktitle {ICLR}, year {2015} } Note that if you use the densecrf implementation, please consult and cite the following paper: @inproceedings{KrahenbuhlK11, title {Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials}, author {Philipp Kr{\ {a}}henb{\ {u}}hl and Vladlen Koltun}, booktitle {NIPS}, year {2011} } Performance DeepLabv2 currently achieves 79.7% on the challenging PASCAL VOC 2012 semantic image segmentation task see the leaderboard . Please refer to our project website for details. Pre trained models We have released several trained models and corresponding prototxt files at here . Please check it for more model details. Experimental set up 1. The scripts we used for our experiments can be downloaded from this link : 1. run_pascal.sh: the script for training/testing on the PASCAL VOC 2012 dataset. __Note__ You also need to download sub.sed script. 2. run_densecrf.sh and run_densecrf_grid_search.sh: the scripts we used for post processing the DCNN computed results by DenseCRF. 2. The image list files used in our experiments can be downloaded from this link : The zip file stores the list files for the PASCAL VOC 2012 dataset. 3. To use the mat_read_layer and mat_write_layer, please download and install matio . FAQ Check FAQ if you have some problems while using the code. How to run DeepLab There are several variants of DeepLab. To begin with, we suggest DeepLab LargeFOV, which has good performance and faster training time. Suppose the codes are located at deeplab/code 1. mkdir deeplab/exper (Create a folder for experiments) 2. mkdir deeplab/exper/voc12 (Create a folder for your specific experiment. Let's take PASCAL VOC 2012 for example.) 3. Create folders for config files and so on. 1. mkdir deeplab/exper/voc12/config (where network config files are saved.) 2. mkdir deeplab/exper/voc12/features (where the computed features will be saved (when train on train)) 3. mkdir deeplab/exper/voc12/features2 (where the computed features will be saved (when train on trainval)) 4. mkdir deeplab/exper/voc12/list (where you save the train, val, and test file lists) 5. mkdir deeplab/exper/voc12/log (where the training/test logs will be saved) 6. mkdir deeplab/exper/voc12/model (where the trained models will be saved) 7. mkdir deeplab/exper/voc12/res (where the evaluation results will be saved) 4. mkdir deeplab/exper/voc12/config/deeplab_largeFOV (test your own network. Create a folder under config. For example, deeplab_largeFOV is the network you want to experiment with. Add your train.prototxt and test.prototxt in that folder (you can check some provided examples for reference).) 5. Set up your init.caffemodel at deeplab/exper/voc12/model/deeplab_largeFOV. You may want to soft link init.caffemodel to the modified VGG 16 net. For example, run ln s vgg16.caffemodel init.caffemodel at voc12/model/deeplab_largeFOV. 6. Modify the provided script, run_pascal.sh, for experiments. You should change the paths according to your setting. For example, you should specify where the caffe is by changing CAFFE_DIR. Note You may need to modify sub.sed, if you want to replace some variables with your desired values in train.prototxt or test.prototxt. 7. The computed features are saved at folders features or features2, and you can run provided MATLAB scripts to evaluate the results (e.g., check the script at code/matlab/my_script/EvalSegResults). Python Seyed Ali Mousavi has implemented a python version of run_pascal.sh (Thanks, Ali!). If you are more familiar with Python, you may want to take a look at this .",Semantic Segmentation,Semantic Segmentation 2067,Computer Vision,Computer Vision,Computer Vision,"LoST? Appearance Invariant Place Recognition for Opposite Viewpoints using Visual Semantics This is the source code for the paper titled LoST? Appearance Invariant Place Recognition for Opposite Viewpoints using Visual Semantics , pre print available here . An example output image showing Keypoint Correspondences: ! An example output image showing Keypoint Correspondences (lost_kc/bin/day night keypoint correspondence place recognition.jpg Keypoint Correspondences using LoST X ) Flowchart of the proposed approach: ! Flowchart of the proposed approach (lost_kc/bin/LoST Flowchart Visual_Place_Recognition.jpg Flowchart for the proposed approach LoST X ) If you find this work useful, please cite it as: Sourav Garg, Niko Sunderhauf, and Michael Milford. LoST? Appearance Invariant Place Recognition for Opposite Viewpoints using Visual Semantics. Proceedings of Robotics: Science and Systems XIV, 2018. bibtex: @article{garg2018lost, title {LoST? Appearance Invariant Place Recognition for Opposite Viewpoints using Visual Semantics}, author {Garg, Sourav and Suenderhauf, Niko and Milford, Michael}, journal {Proceedings of Robotics: Science and Systems XIV}, year {2018} } RefineNet's citation as mentioned on their Github page . Setup and Run Dependencies Ubuntu (Tested on 14.04 ) RefineNet Required primarily for visual semantic information. Convolutional feature maps based dense descriptors are also extracted from the same. A modified fork of RefineNet's code is used in this work to simultaneously store convolutional dense descriptors. Requires Matlab (Tested on 2017a ) Python (Tested on 2.7 ) numpy (Tested on 1.11.1 , 1.14.2 ) scipy (Tested on 0.13.3 , 0.17.1 ) skimage (Minimum Required 0.13.1 ) sklearn (Tested on 0.14.1 , 0.19.1 ) h5py (Tested on 2.7.1 ) Docker (optional, recommended, tested on 17.12.0 ce ) Official page for install instructions Download 1. In your workspace, clone the repositories: git clone cd lostX git clone NOTE: If you download this repository as a zip, the refineNet's fork will not get downloaded automatically, being a git submodule. 2. Download the Resnet 101 model pre trained on Cityscapes dataset from here or here . More details on RefineNet's Github page . Place the downloaded model's .mat file in the refinenet/model_trained/ directory. 3. If you are using docker, download the docker image: docker pull souravgarg/vpr lost kc:v1 Run 1. Generate and store semantic labels and dense convolutional descriptors from RefineNet's conv5 layer In the MATLAB workspace, from the refinenet/main/ directory, run: demo_predict_mscale_cityscapes The above will use the sample dataset from refinenet/datasets/ directory. You can set path to your data in demo_predict_mscale_cityscapes.m through variable datasetName and img_data_dir . You might have to run vl_compilenn before running the demo, please refer to the instructions for running refinenet in their official Readme.md 2. \ For Docker users\ If you have an environment with python and other dependencies installed, skip this step, otherwise run a docker container: docker run it v PATH_TO_YOUR_HOME_DIRECTORY/:/workspace/ souravgarg/vpr lost kc:v1 /bin/bash From within the docker container, navigate to lostX/lost_kc/ repository. v option mounts the PATH_TO_YOUR_HOME_DIRECTORY to /workspace directory within the docker container. 3. Reformat and pre process RefineNet's output from lostX/lost_kc/ directory: python reformat_data.py p $PATH_TO_REFINENET_OUTPUT $PATH_TO_REFINENET_OUTPUT is set to be the parent directory of predict_result_full , for example, ../refinenet/cache_data/test_examples_cityscapes/1 s_result_20180427152622_predict_custom_data/predict_result_1/ 4. Compute LoST descriptor: python LoST.py p $PATH_TO_REFINENET_OUTPUT 5. Repeat step 1, 3, and 4 to generate output for the other dataset by setting the variable datasetName to 2 s . 6. Perform place matching using LoST descriptors based difference matrix and Keypoint Correspondences: python match_lost_kc.py n 10 f 0 p1 $PATH_TO_REFINENET_OUTPUT_1 p2 $PATH_TO_REFINENET_OUTPUT_2 Note: Run python FILENAME h for any of the python source files in Step 3, 4, and 6 for description of arguments passed to those files. License The code is released under MIT License.",Semantic Segmentation,Semantic Segmentation 2070,Computer Vision,Computer Vision,Computer Vision,"SwiftNet Source code to reproduce results from In Defense of Pre trained ImageNet Architectures for Real time Semantic Segmentation of Road driving Images Marin Oršić , Ivan Krešo , Siniša Šegvić , Petra Bevandić ( denotes equal contribution) CVPR, 2019. Steps to reproduce Install requirements Python 3.7+ bash pip install r requirements.txt Download pre trained models bash wget P weights/ wget P weights/ Download Cityscapes From download: leftImg8bit_trainvaltest.zip (11GB) gtFine_trainvaltest.zip (241MB) Either download and extract to datasets/ or create a symbolic link datasets/Cityscapes Evaluate bash python eval.py configs/single_scale.py python eval.py configs/pyramid.py",Semantic Segmentation,Semantic Segmentation 2077,Computer Vision,Computer Vision,Computer Vision,"Semantic Segmentation on the Mapillary Vistas Dataset using the DeepLabv3+ 4 model by Google TensorFlow This is a repository for Stanford CS231N course project (spring 2018) Contact: Sheng Li ( parachutel_ ), available via lisheng@stanford.edu. The Mapillary Vistas Dataset is available for academic use at here (by request). To build the dataset, put images in /datasets/mvd/mvd_raw/JPEGImages/ , put ground truth labels in /datasets/mvd/mvd_raw/SegmentationClass/ , put dataset split filename lists (text files) in /datasets/mvd/mvd_raw/ImageSets/Segmentation/ . /datasets/mvd/mvd_raw/ImageSets/Segmentation/build_image_sets.py can help you build the dataset split list files. You will need to update _MVD_INFORMATION in /datasets/segmentation_dataset.py after building your dataset. To preprocess the dataset and generate tfrecord files for faster reading, please run /datasets/convert_mvd.sh . The initial model checkpoints are available in the TensorFlow DeepLab Model Zoo . Please put the ones you wish to use in /datasets/mvd/init_models/ . To run train, evaluate and visualize prediction using the model, use the following commands by running local_test_mvd.sh (you may comment out the parts you do not wish to run): Train: python ${WORK_DIR} /train.py \ logtostderr \ num_clones 4 \ train_split train \ model_variant xception_65 \ atrous_rates 6 \ atrous_rates 12 \ atrous_rates 18 \ output_stride 16 \ decoder_output_stride 4 \ train_crop_size 513 \ train_crop_size 513 \ train_batch_size 16 \ base_learning_rate 0.0025 \ learning_rate_decay_step 500 \ weight_decay 0.000015 \ training_number_of_steps ${NUM_ITERATIONS} \ log_steps 1 \ save_summaries_secs 60 \ fine_tune_batch_norm true \ tf_initial_checkpoint ${INIT_FOLDER}/deeplabv3_cityscapes_train/model.ckpt \ initialize_last_layer false \ train_logdir ${TRAIN_LOGDIR} \ dataset_dir ${MVD_DATASET} Default value of dataset is modified inside train.py directly. Batch size and train_crop_size depends on your device's available memory. Evaluation model: python ${WORK_DIR} /eval.py \ logtostderr \ eval_split val \ model_variant xception_65 \ atrous_rates 6 \ atrous_rates 12 \ atrous_rates 18 \ output_stride 16 \ decoder_output_stride 4 \ eval_crop_size \ eval_crop_size \ checkpoint_dir ${TRAIN_LOGDIR} \ eval_logdir ${EVAL_LOGDIR} \ dataset_dir ${MVD_DATASET} \ max_number_of_evaluations 1 Visaulize the prediction: python ${WORK_DIR} /vis.py \ logtostderr \ vis_split val \ model_variant xception_65 \ atrous_rates 6 \ atrous_rates 12 \ atrous_rates 18 \ output_stride 16 \ decoder_output_stride 4 \ vis_crop_size \ vis_crop_size \ checkpoint_dir ${TRAIN_LOGDIR} \ vis_logdir ${VIS_LOGDIR} \ dataset_dir ${MVD_DATASET} \ max_number_of_iterations 1 Note : and depends on the maximum resolution of your dataset. The following should be satisfied: output_stride k + 1. The default value, 513, is set for PASCAL images whose largest image dimension is 512. We pick k 32, resulting in eval_crop_size 16 32 + 1 513 > 512. Same for . Original Documentation by Google TensorFlow DeepLab Developers: DeepLab: Deep Labelling for Semantic Image Segmentation DeepLab is a state of art deep learning model for semantic image segmentation, where the goal is to assign semantic labels (e.g., person, dog, cat and so on) to every pixel in the input image. Current implementation includes the following features: 1. DeepLabv1 1 : We use atrous convolution to explicitly control the resolution at which feature responses are computed within Deep Convolutional Neural Networks. 2. DeepLabv2 2 : We use atrous spatial pyramid pooling (ASPP) to robustly segment objects at multiple scales with filters at multiple sampling rates and effective fields of views. 3. DeepLabv3 3 : We augment the ASPP module with image level feature 5, 6 to capture longer range information. We also include batch normalization 7 parameters to facilitate the training. In particular, we applying atrous convolution to extract output features at different output strides during training and evaluation, which efficiently enables training BN at output stride 16 and attains a high performance at output stride 8 during evaluation. 4. DeepLabv3+ 4 : We extend DeepLabv3 to include a simple yet effective decoder module to refine the segmentation results especially along object boundaries. Furthermore, in this encoder decoder structure one can arbitrarily control the resolution of extracted encoder features by atrous convolution to trade off precision and runtime. If you find the code useful for your research, please consider citing our latest works: DeepLabv3+: @article{deeplabv3plus2018, title {Encoder Decoder with Atrous Separable Convolution for Semantic Image Segmentation}, author {Liang Chieh Chen and Yukun Zhu and George Papandreou and Florian Schroff and Hartwig Adam}, journal {arXiv:1802.02611}, year {2018} } MobileNetv2: @inproceedings{mobilenetv22018, title {Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation}, author {Mark Sandler and Andrew Howard and Menglong Zhu and Andrey Zhmoginov and Liang Chieh Chen}, booktitle {CVPR}, year {2018} } In the current implementation, we support adopting the following network backbones: 1. MobileNetv2 8 : A fast network structure designed for mobile devices. 2. Xception 9, 10 : A powerful network structure intended for server side deployment. This directory contains our TensorFlow 11 implementation. We provide codes allowing users to train the model, evaluate results in terms of mIOU (mean intersection over union), and visualize segmentation results. We use PASCAL VOC 2012 12 and Cityscapes 13 semantic segmentation benchmarks as an example in the code. Some segmentation results on Flickr images: Contacts (Maintainers) Liang Chieh Chen, github: aquariusjay YuKun Zhu, github: yknzhu George Papandreou, github: gpapan Tables of Contents Demo: Colab notebook for off the shelf inference. Running: Installation. Running DeepLab on PASCAL VOC 2012 semantic segmentation dataset. Running DeepLab on Cityscapes semantic segmentation dataset. Running DeepLab on ADE20K semantic segmentation dataset. Models: Checkpoints and frozen inference graphs. Misc: Please check FAQ if you have some questions before reporting the issues. Getting Help To get help with issues you may encounter while using the DeepLab Tensorflow implementation, create a new question on StackOverflow with the tags tensorflow and deeplab . Please report bugs (i.e., broken code, not usage questions) to the tensorflow/models GitHub issue tracker , prefixing the issue name with deeplab . References 1. Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs Liang Chieh Chen+, George Papandreou+, Iasonas Kokkinos, Kevin Murphy, Alan L. Yuille (+ equal contribution). link . In ICLR, 2015. 2. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs Liang Chieh Chen+, George Papandreou+, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille (+ equal contribution). link . TPAMI 2017. 3. Rethinking Atrous Convolution for Semantic Image Segmentation Liang Chieh Chen, George Papandreou, Florian Schroff, Hartwig Adam. link . arXiv: 1706.05587, 2017. 4. Encoder Decoder with Atrous Separable Convolution for Semantic Image Segmentation Liang Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, Hartwig Adam. arXiv: 1802.02611. link . arXiv: 1802.02611, 2018. 5. ParseNet: Looking Wider to See Better Wei Liu, Andrew Rabinovich, Alexander C Berg link . arXiv:1506.04579, 2015. 6. Pyramid Scene Parsing Network Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, Jiaya Jia link . In CVPR, 2017. 7. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate shift Sergey Ioffe, Christian Szegedy link . In ICML, 2015. 8. Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang Chieh Chen link . arXiv:1801.04381, 2018. 9. Xception: Deep Learning with Depthwise Separable Convolutions François Chollet link . In CVPR, 2017. 10. Deformable Convolutional Networks COCO Detection and Segmentation Challenge 2017 Entry Haozhi Qi, Zheng Zhang, Bin Xiao, Han Hu, Bowen Cheng, Yichen Wei, Jifeng Dai link . ICCV COCO Challenge Workshop, 2017. 11. Tensorflow: Large Scale Machine Learning on Heterogeneous Distributed Systems M. Abadi, A. Agarwal, et al. link . arXiv:1603.04467, 2016. 12. The Pascal Visual Object Classes Challenge – A Retrospective, Mark Everingham, S. M. Ali Eslami, Luc Van Gool, Christopher K. I. Williams, John Winn, and Andrew Zisserma. link . IJCV, 2014. 13. The Cityscapes Dataset for Semantic Urban Scene Understanding Cordts, Marius, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, Bernt Schiele. link . In CVPR, 2016.",Semantic Segmentation,Semantic Segmentation 2083,Computer Vision,Computer Vision,Computer Vision,"CamVid 95 accuracy Notebooks to: train CamVid Tiramisu dataset to 95% accuracy (sota), convert CamVid dataset to Tiramisu paper version, and other training. I did this on Windows 10 with the fast.ai v1 library: CamVid Datset: The One Hundred Layers Tiramisu Fully Convolutional DenseNets for Semantic Segmentation paper:",Semantic Segmentation,Semantic Segmentation 2145,Computer Vision,Computer Vision,Computer Vision,"Segnet is deep fully convolutional neural network architecture for semantic pixel wise segmentation. This is implementation of (Except for the Upsampling layer where paper uses indices based upsampling which is not implemented in keras yet( I am working on it ), but that shouldnt make a lot of difference). You can directly download the code from This post is a explaination of what is happening in the code.",Semantic Segmentation,Semantic Segmentation 2156,Computer Vision,Computer Vision,Computer Vision,"deep photo styletransfer Code and data for paper Deep Photo Style Transfer Disclaimer This software is published for academic and non commercial use only. Setup This code is based on torch. It has been tested on Ubuntu 14.04 LTS. Dependencies: Torch (with matio ffi and loadcaffe ) Matlab or Octave CUDA backend: CUDA cudnn Download VGG 19: sh models/download_models.sh Compile cuda_utils.cu (Adjust PREFIX and NVCC_PREFIX in makefile for your machine): make clean && make Usage Quick start To generate all results (in examples/ ) using the provided scripts, simply run run('gen_laplacian/gen_laplacian.m') in Matlab or Octave and then python gen_all.py in Python. The final output will be in examples/final_results/ . Basic usage 1. Given input and style images with semantic segmentation masks, put them in examples/ respectively. They will have the following filename form: examples/input/in .png , examples/style/tar .png and examples/segmentation/in .png , examples/segmentation/tar .png ; 2. Compute the matting Laplacian matrix using gen_laplacian/gen_laplacian.m in Matlab. The output matrix will have the following filename form: gen_laplacian/Input_Laplacian_3x3_1e 7_CSR .mat ; Note: Please make sure that the content image resolution is consistent for Matting Laplacian computation in Matlab and style transfer in Torch, otherwise the result won't be correct. 3. Run the following script to generate segmented intermediate result: th neuralstyle_seg.lua content_image style_image content_seg style_seg index serial 4. Run the following script to generate final result: th deepmatting_seg.lua content_image style_image content_seg style_seg index init_image _t_1000.png> serial f_radius 15 f_edge 0.01 You can pass backend cudnn and cudnn_autotune to both Lua scripts (step 3. and 4.) to potentially improve speed and memory usage. libcudnn.so must be in your LD_LIBRARY_PATH . This requires cudnn.torch . Image segmentation Note: In the main paper we generate all comparison results using automatic scene segmentation algorithm modified from DilatedNet . Manual segmentation enables more diverse tasks hence we provide the masks in examples/segmentation/ . The mask colors we used (you could add more colors in ExtractMask function in two .lua files): Color variable RGB Value Hex Value blue 0 0 255 0000ff green 0 255 0 00ff00 black 0 0 0 000000 white 255 255 255 ffffff red 255 0 0 ff0000 yellow 255 255 0 ffff00 grey 128 128 128 808080 lightblue 0 255 255 00ffff purple 255 0 255 ff00ff Here are some automatic and manual tools for creating a segmentation mask for a photo image: Automatic: MIT Scene Parsing SuperParsing Nonparametric Scene Parsing Berkeley Contour Detection and Image Segmentation Resources CRF RNN for Semantic Image Segmentation Selective Search DeepLab TensorFlow Manual: Photoshop Quick Selection Tool GIMP Selection Tool GIMP G'MIC Interactive Foreground Extraction tool Examples Here are some results from our algorithm (from left to right are input, style and our output): Acknowledgement Our torch implementation is based on Justin Johnson's code ; We use Anat Levin's Matlab code to compute the matting Laplacian matrix. Citation If you find this work useful for your research, please cite: @article{luan2017deep, title {Deep Photo Style Transfer}, author {Luan, Fujun and Paris, Sylvain and Shechtman, Eli and Bala, Kavita}, journal {arXiv preprint arXiv:1703.07511}, year {2017} } Contact Feel free to contact me if there is any question (Fujun Luan fl356@cornell.edu).",Semantic Segmentation,Semantic Segmentation 2218,Computer Vision,Computer Vision,Computer Vision,https://arxiv.org/pdf/1612.01105.pdf,Semantic Segmentation,Semantic Segmentation 2242,Computer Vision,Computer Vision,Computer Vision,"Decision Rules in Neural Networks A semantic segmentation network with a softmax output layer can be seen as a statistical model that provides for each pixel of one image a probability distribution on pre defined semantic class labels, given some weights and the input data. The predicted class in one pixel is then usually obtained by the maximum a posteriori probability. In this way, the chance of an incorrect class estimation is minimized which is equivalent to the Bayes rule from decision theory. On the contrary, another mathematically natural approach is applying the Maximum Likelihood (ML) rule which maps features to the class with the largest conditional likelihood. The latter rule aims at finding the class for which given patterns are most typical (according to observed features in the training set), independent of the a priori probability of the particular class. Consequently, more rare class objects can be detected by neural networks that might be biased due to training on unbalanced data. Result In our experiments, we train one FRRN network from scratch using a proprietary and highly unbalanced dataset containing 20,000 annotated frames of video sequences recorded from street scenes. We adopt a localized method by computing the priors pixel wise and compare the performance of applying the ML rule instead of the Bayes rule. The evaluation on our test set shows an increase of average recall with regard to instances of pedestrians and info signs by 25% and 23.4\%, respectively. In addition, we significantly reduce the non detection rate for instances of the same classes by 61% and 38%. Link to the corresponding paper: This repository ... contains python scripts that produce segmentations with the Bayes & ML rule and the analysis tools in order to study the impact of the different decision rules. Preparation We suggest that the user places all the required input data in the folder called _ data/ _. It should contain: directory of training ground truth images directory of test ground truth images directory of test raw input images directory of frozen graph model (.pb) In order to use own class labels modify _ labels.py _. Set global variables by editing _ globals.py _. Packages and their versions we used matplotlib 2.0.2 numpy 1.13.3 Pillow 4.3.0 scikit image 0.14.1 simplejson 3.8.2 sklearn 0.0 tabulate 0.8.2 tensorflow gpu 1.9.0 See also _ requirements.txt _. Run scripts We used Python 3.4.6. Execute: sh ./run.sh Author Robin Chan, University of Wuppertal",Semantic Segmentation,Semantic Segmentation 2250,Computer Vision,Computer Vision,Computer Vision,segNet Segmentation Net based on VGG19 The network contain 5 blocks of conv layers and pooling and the mirror of those layers see:,Semantic Segmentation,Semantic Segmentation 2273,Computer Vision,Computer Vision,Computer Vision,Pytorch Segnet Apply on iris dataset Classification of iris data set by SegNet architecture using Pytorch library. Iris is very popular dataset I try to classify iris dataset using SegNet architecture which are mainly used for image segmentation purpose.It is a simple version of Segnet. For reference :,Semantic Segmentation,Semantic Segmentation 2277,Computer Vision,Computer Vision,Computer Vision,"CRF RNN for Semantic Image Segmentation Keras/Tensorflow version ! sample (sample.png) Live demo: Caffe version: This repository contains Keras/Tensorflow code for the CRF RNN semantic image segmentation method, published in the ICCV 2015 paper Conditional Random Fields as Recurrent Neural Networks . This paper was initially described in an arXiv tech report . The online demo of this project won the Best Demo Prize at ICCV 2015. Original Caffe based code of this project can be found here . Results produced with this Keras/Tensorflow code are almost identical to that with the Caffe based version. If you use this code/model for your research, please cite the following paper: @inproceedings{crfasrnn_ICCV2015, author {Shuai Zheng and Sadeep Jayasumana and Bernardino Romera Paredes and Vibhav Vineet and Zhizhong Su and Dalong Du and Chang Huang and Philip H. S. Torr}, title {Conditional Random Fields as Recurrent Neural Networks}, booktitle {International Conference on Computer Vision (ICCV)}, year {2015} } Installation Guide Step 1: Clone the repository $ git clone The root directory of the clone will be referred to as crfasrnn_keras hereafter. Step 2: Install dependencies Note : If you are using a Python virtualenv, make sure it is activated before running each command in this guide. Use the requirements.txt file (or requirements_gpu.txt , if you have a GPU device) in this repository to install all the dependencies via pip : $ cd crfasrnn_keras $ pip install r requirements.txt If you have a GPU device, use requirements_gpu.txt instead As you can notice from the contents of requirements.txt , we only depend on tensorflow , keras , and h5py . Additionally, Pillow is required for running the demo. After installing the dependencies, run the following commands to make sure they are properly installed: $ python >>> import tensorflow >>> import keras You should not see any errors while importing tensorflow and keras above. Step 3: Build CRF RNN custom op C++ code Run make inside the crfasrnn_keras/src/cpp directory: $ cd crfasrnn_keras/src/cpp $ make Note that the python command in the console should refer to the Python interpreter associated with your Tensorflow installation before running the make command above. You will get a new file named high_dim_filter.so from this build. If it fails, refer to the official Tensorflow guide for building a custom op for help. Note : This make script works on Linux and macOS, but not on Windows OS. If you are on Windows, please check this issue and the comments therein for build instructions. The official Tensorflow guide for building a custom op does not yet include build instructions for Windows. Step 4: Download the pre trained model weights Download the model weights from here or here and place it in the crfasrnn_keras directory with the file name crfrnn_keras_model.h5 . Step 5: Run the demo $ cd crfasrnn_keras $ python run_demo.py If all goes well, you will see the segmentation results in a file named labels.png . Notes 1. Current implementation of the CrfRnnLayer only supports batch_size 1 2. An experimental GPU version of the CrfRnnLayer that has been tested on CUDA 9 and Tensorflow 1.7 only, is available under the gpu_support branch. This code was contributed by thwjoy .",Semantic Segmentation,Semantic Segmentation 2280,Computer Vision,Computer Vision,Computer Vision,"Focusing attention of Fully convolutional neural networks on Region of interest (ROI) input map, using the valve filters method. This project contains code for a fully convolutional neural network (FCN) for semantic segmentation with a region of interest (ROI) map as an additional input (figure 1). The net receives image and ROI as a binary map with pixels corresponding to ROI marked 1, and produce pixel wise annotation of the ROI region of the image. This code was tested on for semantic segmentation task of materials in transparent vessels where the vessel area of the image was set as the ROI. The method is discussed in the paper: Setting an attention region for convolutional neural networks using region selective features, for recognition of materials within glass vessels ! (/Figure1.jpg) Figure 1) Convolutional neural nets (Convnet) with ROI map as input General approach for using ROI input in CNN (valve filter method) Convolutional neural networks have emerged as the leading methods in detection classification and segmentation of images. Many problems in image recognition require the recognition to be performed only on a specific predetermined region of interest (ROI) in the image. One example of such a case is the recognition of the contents of glass vessels such as bottles or jars, where the glassware region in the image is known and given as the ROI input (Figure 1). Directing the attention of a convolutional neural net (CNN) to a given ROI region without loss of background information is a major challenge in this case. This project uses a valve filter approach to focus the attention of a fully convolutional neural net (FCN) on a given ROI in the image. The ROI mask is inserted into the CNN, along with the image in the form of a binary map, with pixels belonging to the ROI set to one and the background set to zero. The processing of the ROI in the net is done using the valve filter approach presented in Figure 2. In general, for each filter that acts on the image, a corresponding valve filter exists that acts on (convolves) the ROI map (Figure 2). The output of the valve filter convolution is multiplied element wise with the output of the image filter convolution, to give a normalized feature map (Figure 2). This map is used as input for the next layers of the net. In this case, the net is a standard fully convolutional net (FCN) for semantic segmentation (pixel wise classification). Valve filters can be seen as a kind of valve that regularizes the activation of image filters in different regions of the image. ! (/Figure2.png) Figure 2) The valve filter approach for introduction of ROI map as input to ConvNets. The image and the ROI input are each passed through a separate convolution layer to give feature map and Relevance map, respectively. Each element in the features map is multiplied by the corresponding element in the feature map to give a normalized features map that passed (after RELU) as input for the next layer of the net. Requirements This network was run and trained with Python 3.6 Anaconda package and Tensorflow 1.1. The training was done using Nvidia GTX 1080, on Linux Ubuntu 16.04. Setup 1) Download the code from the repository. 2) Download a pre trained vgg16 net and put in the /Model_Zoo subfolder in the main code folder. A pre trained vgg16 net can be download from here or from here ftp://mi.eng.cam.ac.uk/pub/mttt2/models/vgg16.npy Tutorial Training network: Run: Train.py Prediction using trained network (pixelwise classification and segmentation of images) Run: Inference.py Evaluating net performance using intersection over union (IOU): Run: Evaluate_Net_IOU.py Notes and issues See the top of each script for an explanation as for how to use it. Detail valve filters implementation. The detail implementation of the valve filters given in Figures 2 and described below: 1) The ROI map is inserted to the net along with the image. The ROI map is represented as a binary image with pixels corresponding to ROI marked 1 and the rest marked 0. 2) A set of image filters is convolved (with bias addition) with the image to give a feature map. 3) A set of valve filters convolved with the ROI map to give a relevance map with the same size and dimension as the feature map (again with bias addition). 4) The feature map is multiplied element wise by the relevance map. Hence, Each element in the relevance map is multiplied by the corresponding element in the feature map to give normalized feature map. 5) The normalized feature map is then passed through a Rectified Linear Unit (ReLU) which zero out any negative map element. The output is used as input for the next layer of the net. The net, in this case, is standard fully convolutional neural net for semantic segmentation. In this way each valve filter act as kind of a valve that regulates the activation the corresponding image filter in different regions of the image. Hence, the valve filter will inhibit some filters in the background zone and others in the ROI zone. The valve filters weights are learned by the net in the same way the image filters are learned. Therefore the net learns both the features and the region for which they are relevant. In the current implementation, the valve filter act only on the first layer of the convolutional neural net and the rest of the net remained unchanged. Details input/output The input for the net (Figure 1) are RGB image and ROI map the ROI map is a 2d binary image with pixels corresponding to ROI marked 1 and background marked 0. The net produce pixel wise annotation as a matrix in size of the image with the value of each pixel is the pixel label (This should be the input in training). Background information The net is based on fully convolutional neural net described in the paper Fully Convolutional Networks for Semantic Segmentation . The code is based on by Sarath Shekkizhar with encoder replaced to VGG16. The net is based on the pre trained VGG16 model by Marvin Teichmann Trained Models Trained model Liquid and solid phases recognition in glass vessel Trained model Exact physical phase of materials in transparent vessel semantic segmentation Supporting datasets The net was tested on a dataset of annotated images of materials in glass vessels . The glass vessel region in the image was taken as the ROI map. This dataset can be downloaded from",Semantic Segmentation,Semantic Segmentation 2281,Computer Vision,Computer Vision,Computer Vision,"Fully convolutional neural network (FCN) for semantic segmentation with tensorflow. This is a simple implementation of a fully convolutional neural network (FCN). The net is based on fully convolutional neural net described in the paper Fully Convolutional Networks for Semantic Segmentation . The code is based on FCN implementation by Sarath Shekkizhar with MIT license but replaces the VGG19 encoder with VGG16 encoder. The net is initialized using the pre trained VGG16 model by Marvin Teichmann. An improved version of this net in pytorch is given here Details input/output The input for the net is RGB image (Figure 1 right). The net produces pixel wise annotation as a matrix in the size of the image with the value of each pixel corresponding to its class (Figure 1 left). ! (/Figure1.png) Figure 1) Semantic segmentation of image of liquid in glass vessel with FCN. Red Glass, Blue Liquid, White Background Requirements This network was run with Python 3.6 Anaconda package and Tensorflow 1.1. The training was done using Nvidia GTX 1080, on Linux Ubuntu 16.04. Setup 1) Download the code from the repository. 2) Download a pre trained vgg16 net and put in the /Model_Zoo subfolder in the main code folder. A pre trained vgg16 net can be download from here or from here ftp://mi.eng.cam.ac.uk/pub/mttt2/models/vgg16.npy Tutorial Instructions for training (in TRAIN.py): In: TRAIN.py 1) Set folder of the training images in Train_Image_Dir 2) Set folder for the ground truth labels in Train_Label_DIR 3) The Label Maps should be saved as png image with the same name as the corresponding image and png ending 4) Download a pretrained vgg16 (ftp://mi.eng.cam.ac.uk/pub/mttt2/models/vgg16.npy) model and put in model_path (should be done automatically if you have internet connection) 5) Set number of classes/labels in NUM_CLASSES 6) If you are interested in using validation set during training, set UseValidationSet True and the validation image folder to Valid_Image_Dir and set the folder with ground truth labels for the validation set in Valid_Label_Dir Instructions for predicting pixelwise annotation using trained net (in Inference.py) In: Inference.py 1) Make sure you have trained model in logs_dir (See Train.py for creating trained model) 2) Set the Image_Dir to the folder where the input images for prediction are located. 3) Set the number of classes in NUM_CLASSES 4) Set folder where you want the output annotated images to be saved to Pred_Dir 5) Run script Evaluating net performance using intersection over union (IOU): In: Evaluate_Net_IOU.py 1) Make sure you have trained model in logs_dir (See Train.py for creating trained model) 2) Set the Image_Dir to the folder where the input images for prediction are located 3) Set folder for ground truth labels in Label_DIR. The Label Maps should be saved as png image with the same name as the corresponding image and png ending 4) Set number of classes number in NUM_CLASSES 5) Run script Supporting data sets The net was tested on a dataset of annotated images of materials in glass vessels. This dataset can be downloaded from here MIT Scene Parsing Benchmark with over 20k pixel wise annotated images can also be used for training and can be download from here Trained model Glass and transparent vessel recognition trained model Liquid Solid chemical phases recognition in transparent glassware trained model",Semantic Segmentation,Semantic Segmentation 2350,Computer Vision,Computer Vision,Computer Vision,"DeepLab: Deep Labelling for Semantic Image Segmentation DeepLab is a state of art deep learning model for semantic image segmentation, where the goal is to assign semantic labels (e.g., person, dog, cat and so on) to every pixel in the input image. Current implementation includes the following features: 1. DeepLabv1 1 : We use atrous convolution to explicitly control the resolution at which feature responses are computed within Deep Convolutional Neural Networks. 2. DeepLabv2 2 : We use atrous spatial pyramid pooling (ASPP) to robustly segment objects at multiple scales with filters at multiple sampling rates and effective fields of views. 3. DeepLabv3 3 : We augment the ASPP module with image level feature 5, 6 to capture longer range information. We also include batch normalization 7 parameters to facilitate the training. In particular, we applying atrous convolution to extract output features at different output strides during training and evaluation, which efficiently enables training BN at output stride 16 and attains a high performance at output stride 8 during evaluation. 4. DeepLabv3+ 4 : We extend DeepLabv3 to include a simple yet effective decoder module to refine the segmentation results especially along object boundaries. Furthermore, in this encoder decoder structure one can arbitrarily control the resolution of extracted encoder features by atrous convolution to trade off precision and runtime. If you find the code useful for your research, please consider citing our latest work: @article{deeplabv3plus2018, title {Encoder Decoder with Atrous Separable Convolution for Semantic Image Segmentation}, author {Liang Chieh Chen and Yukun Zhu and George Papandreou and Florian Schroff and Hartwig Adam}, journal {arXiv:1802.02611}, year {2018} } In the current implementation, we support adopting the following network backbones: 1. MobileNetv2 8 : A fast network structure designed for mobile devices. We will provide MobileNetv2 support in the next update. Please stay tuned. 2. Xception 9, 10 : A powerful network structure intended for server side deployment. This directory contains our TensorFlow 11 implementation. We provide codes allowing users to train the model, evaluate results in terms of mIOU (mean intersection over union), and visualize segmentation results. We use PASCAL VOC 2012 12 and Cityscapes 13 semantic segmentation benchmarks as an example in the code. Some segmentation results on Flickr images: Contacts (Maintainers) Liang Chieh Chen, github: aquariusjay YuKun Zhu, github: yknzhu George Papandreou, github: gpapan Tables of Contents Demo: Jupyter notebook for off the shelf inference. Running: Installation. Running DeepLab on PASCAL VOC 2012 semantic segmentation dataset. Running DeepLab on Cityscapes semantic segmentation dataset. Models: Checkpoints and frozen inference graphs. Misc: Please check FAQ if you have some questions before reporting the issues. Getting Help To get help with issues you may encounter while using the DeepLab Tensorflow implementation, create a new question on StackOverflow with the tags tensorflow and deeplab . Please report bugs (i.e., broken code, not usage questions) to the tensorflow/models GitHub issue tracker , prefixing the issue name with deeplab . References 1. Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs Liang Chieh Chen+, George Papandreou+, Iasonas Kokkinos, Kevin Murphy, Alan L. Yuille (+ equal contribution). link . In ICLR, 2015. 2. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs Liang Chieh Chen+, George Papandreou+, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille (+ equal contribution). link . TPAMI 2017. 3. Rethinking Atrous Convolution for Semantic Image Segmentation Liang Chieh Chen, George Papandreou, Florian Schroff, Hartwig Adam. link . arXiv: 1706.05587, 2017. 4. Encoder Decoder with Atrous Separable Convolution for Semantic Image Segmentation Liang Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, Hartwig Adam. arXiv: 1802.02611. link . arXiv: 1802.02611, 2018. 5. ParseNet: Looking Wider to See Better Wei Liu, Andrew Rabinovich, Alexander C Berg link . arXiv:1506.04579, 2015. 6. Pyramid Scene Parsing Network Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, Jiaya Jia link . In CVPR, 2017. 7. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate shift Sergey Ioffe, Christian Szegedy link . In ICML, 2015. 8. Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang Chieh Chen link . arXiv:1801.04381, 2018. 9. Xception: Deep Learning with Depthwise Separable Convolutions François Chollet link . In CVPR, 2017. 10. Deformable Convolutional Networks COCO Detection and Segmentation Challenge 2017 Entry Haozhi Qi, Zheng Zhang, Bin Xiao, Han Hu, Bowen Cheng, Yichen Wei, Jifeng Dai link . ICCV COCO Challenge Workshop, 2017. 11. Tensorflow: Large Scale Machine Learning on Heterogeneous Distributed Systems M. Abadi, A. Agarwal, et al. link . arXiv:1603.04467, 2016. 12. The Pascal Visual Object Classes Challenge – A Retrospective, Mark Everingham, S. M. Ali Eslami, Luc Van Gool, Christopher K. I. Williams, John Winn, and Andrew Zisserma. link . IJCV, 2014. 13. The Cityscapes Dataset for Semantic Urban Scene Understanding Cordts, Marius, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, Bernt Schiele. link . In CVPR, 2016.",Semantic Segmentation,Semantic Segmentation 2354,Computer Vision,Computer Vision,Computer Vision,"Fully Convolutional Networks for Semantic Segmentation This is the reference implementation of the models and code for the fully convolutional networks (FCNs) in the PAMI FCN and CVPR FCN papers: Fully Convolutional Models for Semantic Segmentation Evan Shelhamer , Jonathan Long , Trevor Darrell PAMI 2016 arXiv:1605.06211 Fully Convolutional Models for Semantic Segmentation Jonathan Long , Evan Shelhamer , Trevor Darrell CVPR 2015 arXiv:1411.4038 Note that this is a work in progress and the final, reference version is coming soon. Please ask Caffe and FCN usage questions on the caffe users mailing list . Refer to these slides for a summary of the approach. These models are compatible with BVLC/caffe:master . Compatibility has held since master@8c66fa5 with the merge of PRs 3613 and 3570. The code and models here are available under the same license as Caffe (BSD 2) and the Caffe bundled models (that is, unrestricted use; see the BVLC model license ). PASCAL VOC models : trained online with high momentum for a 5 point boost in mean intersection over union over the original models. These models are trained using extra data from Hariharan et al. , but excluding SBD val. FCN 32s is fine tuned from the ILSVRC trained VGG 16 model , and the finer strides are then fine tuned in turn. The at once FCN 8s is fine tuned from VGG 16 all at once by scaling the skip connections to better condition optimization. FCN 32s PASCAL (voc fcn32s): single stream, 32 pixel prediction stride net, scoring 63.6 mIU on seg11valid FCN 16s PASCAL (voc fcn16s): two stream, 16 pixel prediction stride net, scoring 65.0 mIU on seg11valid FCN 8s PASCAL (voc fcn8s): three stream, 8 pixel prediction stride net, scoring 65.5 mIU on seg11valid and 67.2 mIU on seg12test FCN 8s PASCAL at once (voc fcn8s atonce): all at once, three stream, 8 pixel prediction stride net, scoring 65.4 mIU on seg11valid FCN AlexNet PASCAL (voc fcn alexnet): AlexNet (CaffeNet) architecture, single stream, 32 pixel prediction stride net, scoring 48.0 mIU on seg11valid. Unlike the FCN 32/16/8s models, this network is trained with gradient accumulation, normalized loss, and standard momentum. (Note: when both FCN 32s/FCN VGG16 and FCN AlexNet are trained in this same way FCN VGG16 is far better; see Table 1 of the paper.) To reproduce the validation scores, use the seg11valid split defined by the paper in footnote 7. Since SBD train and PASCAL VOC 2011 segval intersect, we only evaluate on the non intersecting set for validation purposes. NYUDv2 models : trained online with high momentum on color, depth, and HHA features (from Gupta et al. These models demonstrate FCNs for multi modal input. FCN 32s NYUDv2 Color (nyud fcn32s color): single stream, 32 pixel prediction stride net on color/BGR input FCN 32s NYUDv2 HHA (nyud fcn32s hha): single stream, 32 pixel prediction stride net on HHA input FCN 32s NYUDv2 Early Color Depth (nyud fcn32s color d): single stream, 32 pixel prediction stride net on early fusion of color and (log) depth for 4 channel input FCN 32s NYUDv2 Late Color HHA (nyud fcn32s color hha): single stream, 32 pixel prediction stride net by late fusion of FCN 32s NYUDv2 Color and FCN 32s NYUDv2 HHA SIFT Flow models : trained online with high momentum for joint semantic class and geometric class segmentation. These models demonstrate FCNs for multi task output. FCN 32s SIFT Flow (siftflow fcn32s): single stream stream, 32 pixel prediction stride net FCN 16s SIFT Flow (siftflow fcn16s): two stream, 16 pixel prediction stride net FCN 8s SIFT Flow (siftflow fcn8s): three stream, 8 pixel prediction stride net Note : in this release, the evaluation of the semantic classes is not quite right at the moment due to an issue with missing classes. This will be corrected soon. The evaluation of the geometric classes is fine. PASCAL Context models : trained online with high momentum on an object and scene labeling of PASCAL VOC. FCN 32s PASCAL Context (pascalcontext fcn32s): single stream, 32 pixel prediction stride net FCN 16s PASCAL Context (pascalcontext fcn16s): two stream, 16 pixel prediction stride net FCN 8s PASCAL Context (pascalcontext fcn8s): three stream, 8 pixel prediction stride net Frequently Asked Questions Is learning the interpolation necessary? In our original experiments the interpolation layers were initialized to bilinear kernels and then learned. In follow up experiments, and this reference implementation, the bilinear kernels are fixed. There is no significant difference in accuracy in our experiments, and fixing these parameters gives a slight speed up. Note that in our networks there is only one interpolation kernel per output class, and results may differ for higher dimensional and non linear interpolation, for which learning may help further. Why pad the input? : The 100 pixel input padding guarantees that the network output can be aligned to the input for any input size in the given datasets, for instance PASCAL VOC. The alignment is handled automatically by net specification and the crop layer. It is possible, though less convenient, to calculate the exact offsets necessary and do away with this amount of padding. Why are all the outputs/gradients/parameters zero? : This is almost universally due to not initializing the weights as needed. To reproduce our FCN training, or train your own FCNs, it is crucial to transplant the weights from the corresponding ILSVRC net such as VGG16. The included surgery.transplant() method can help with this. What about FCN GoogLeNet? : a reference FCN GoogLeNet for PASCAL VOC is coming soon.",Semantic Segmentation,Semantic Segmentation 2366,Computer Vision,Computer Vision,Computer Vision,"CRF RNN for Semantic Image Segmentation Keras/Tensorflow version This repository contains Keras/Tensorflow code for the CRF RNN semantic image segmentation method, published in the ICCV 2015 paper Conditional Random Fields as Recurrent Neural Networks . This paper was initially described in an arXiv tech report . The online demo of this project won the Best Demo Prize at ICCV 2015. Original Caffe based code of this project can be found here . Results produced with this Keras/Tensorflow code are almost identical to that with the Caffe based version. Installation Guide Step 1: Install dependencies Note : If you are using a Python virtualenv, make sure it is activated before running each command in this guide. Use the requirements.txt file (or requirements_gpu.txt , if you have a GPU device) in this repository to install all the dependencies via pip : $ cd crfasrnn_keras $ pip install r requirements.txt If you have a GPU device, use requirements_gpu.txt instead As you can notice from the contents of requirements.txt , we only depend on tensorflow , keras , and h5py . Additionally, Pillow is required for running the demo. After installing the dependencies, run the following commands to make sure they are properly installed: $ python >>> import tensorflow >>> import keras You should not see any errors while importing tensorflow and keras above. Step 2: Build CRF RNN custom op C++ code Run make inside the crfasrnn_keras/src/cpp directory: $ cd crfasrnn_keras/src/cpp $ make Note that the python command in the console should refer to the Python interpreter associated with your Tensorflow installation before running the make command above. You will get a new file named high_dim_filter.so from this build. If it fails, refer to the official Tensorflow guide for building a custom op for help. Note : This make script works on Linux and macOS, but not on Windows OS. If you are on Windows, please check this issue and the comments therein for build instructions. The official Tensorflow guide for building a custom op does not yet include build instructions for Windows. Step 3: Download the pre trained model weights Download the model weights from here or here and place it in the crfasrnn_keras directory with the file name crfrnn_keras_model.h5 . Run the Model End to End: Run the demo with images from big_figure and compose with backgrounds from bg python matting_main.py Run on custom images python matting_main.py image_dir Notes 1. Current implementation of the CrfRnnLayer only supports batch_size 1 2. An experimental GPU version of the CrfRnnLayer that has been tested on CUDA 9 and Tensorflow 1.7 only, is available under the gpu_support branch. This code was contributed by thwjoy .",Semantic Segmentation,Semantic Segmentation 2376,Computer Vision,Computer Vision,Computer Vision,"CRF RNN for Semantic Image Segmentation Keras/Tensorflow version ! sample (sample.png) Live demo: Caffe version: This repository contains Keras/Tensorflow code for the CRF RNN semantic image segmentation method, published in the ICCV 2015 paper Conditional Random Fields as Recurrent Neural Networks . This paper was initially described in an arXiv tech report . The online demo of this project won the Best Demo Prize at ICCV 2015. Original Caffe based code of this project can be found here . Results produced with this Keras/Tensorflow code are almost identical to that with the Caffe based version. If you use this code/model for your research, please cite the following paper: @inproceedings{crfasrnn_ICCV2015, author {Shuai Zheng and Sadeep Jayasumana and Bernardino Romera Paredes and Vibhav Vineet and Zhizhong Su and Dalong Du and Chang Huang and Philip H. S. Torr}, title {Conditional Random Fields as Recurrent Neural Networks}, booktitle {International Conference on Computer Vision (ICCV)}, year {2015} } Installation Guide 1.1 Install dependencies Install Tensorflow and Keras , following the respective installation guides. You will need to install Keras with HDF5/h5py if you plan to use the provided trained model. After installing these two packages, run the following commands to make sure they are properly installed: First, activate the correct Python virtualenv if you used one during Tensorflow/Keras installation $ source /home/user/tensorflow_virtualenv/bin/activate $ python >>> import tensorflow >>> import keras You should not see any errors while importing tensorflow and keras above. 1.2 Build CRF RNN custom C++ code Checkout the code in this repository, activate the Tensorflow/Keras virtualenv (if you used one), and run the compile.sh script in the cpp directory. That is, run the following commands: $ git clone $ cd crfasrnn_keras/cpp $ source /home/user/tensorflow_virtualenv/bin/activate $ ./compile.sh If the build succeeds, you will see a new file named high_dim_filter.so . If it fails, please see the comments inside the compile.sh file for help. You could also refer to the official Tensorflow guide for building a custom op . Note : This script will not work on Windows OS. If you are on Windows, please check this issue and the comments therein. The official Tensorflow guide for building a custom op does not yet include build instructions for Windows. 1.3 Download the pre trained model weights Download the model weights from here and place it in the crfasrnn_keras directory with the file name crfrnn_keras_model.h5 . 1.4 Run the demo $ cd crfasrnn_keras $ python run_demo.py Make sure that the correct virtualenv is already activated If everything goes well, you will see the segmentation results in a file named labels.png Limitations of the Current Version 1. Currently, some operations in the CRF RNN layer can only run on the CPU. An all GPU version will be released soon. 2. The crfrnn_keras_model.h5 model was directly converted from the Caffe model . However, training new models entirely from Keras is possible too. 3. Current implementation of CrfRnnLayer only supports batch_size 1",Semantic Segmentation,Semantic Segmentation 2403,Computer Vision,Computer Vision,Computer Vision,Introduction Implementation of The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation from,Semantic Segmentation,Semantic Segmentation 2434,Computer Vision,Computer Vision,Computer Vision,Multi Scale Context Aggregation by Dilated Convolutions Attempt to implement the following paper To Do Add results,Semantic Segmentation,Semantic Segmentation 2476,Computer Vision,Computer Vision,Computer Vision,"Switchable Normalization for Semantic Segmentation This repository contains the code of using Swithable Normalization (SN) in semantic image segmentation, proposed by the paper Differentiable Learning to Normalize via Switchable Normalization . This is the implementations of the experiments presented in the above paper by using open source semantic segmentation framework Scene Parsing on MIT ADE20K . Update 2018/9/26: The code and trained models of semantic segmentation on ADE20K by using SN are released ! More results and models will be released soon. Citation You are encouraged to cite the following paper if you use SN in research or wish to refer to the baseline results. @article{SwitchableNorm, title {Differentiable Learning to Normalize via Switchable Normalization}, author {Ping Luo and Jiamin Ren and Zhanglin Peng}, journal {arXiv:1806.10779}, year {2018} } Getting Started Use git to clone this repository: git clone Environment The code is tested under the following configurations. Hardware: 1 8 GPUs (with at least 12G GPU memories) Software: CUDA 9.0, Python 3.6, PyTorch 0.4.0, tensorboardX Installation & Data Preparation Please check the Environment , Training and Evaluation subsection in the repo Scene Parsing on MIT ADE20K for a quick start. Pre trained Models Download SN based ImageNet pretrained model and put them into the {repo_root}/pretrained_sn . ImageNet pre trained models The backbone models with SN pretrained on ImageNet are available in the format used by above Segmentation Framework and this repo. ResNet50v1+SN(8,2) pretrained_SN(8,2) For more pretrained models with SN, please refer to the repo of switchablenorms/Switchable Normalization . The following script converts the model trained from Switchable Normalization into a valid format used by the semantic segmentation codebase : ./pretrained_sn/convert_sn.py usage: python u convert_sn.py NOTE: The paramater keys in pretrained model checkpoint must match the keys in backbone model EXACTLY . You should load the correct pretrained model according to your segmentation architechure. Training The training strategies of baseline models and sn based models on ADE20K are same as Scene Parsing on MIT ADE20K . The training script with ResNet 50 sn backbone can be found here: ./scripts/train.sh NOTE: The default architecture of this repo is Encoder: resnet50_dilated8 ( resnetXX_dilatedYY: customized resnetXX with dilated convolutions, output feature map is 1/YY of input size, see DeepLab for more details ) and Decoder: c1_bilinear_deepsup ( 1 conv + bilinear upsample + deep supervision, see PSPNet for more details ). Optional arguments (see full input arguments via ./train.py ): arch_encoder architecture of encode network arch_decoder architecture of decode network weights_encoder weights to finetune endoce network weights_decoder weights to finetune decode network list_train the list to load the training data root_dataset the path of the dataset batch_size_per_gpu input batch size start_epoch epoch to start training. (continue from a checkpoint loaded via weights_encoder & weights_decoder) NOTE: In this repo, start_epoch allows the training to resume from the checkpoint loaded from weights_encoder and weights_decoder , which is generated in the training process automatically. If you want to train from scratch, you need to assign start_epoch as 1 and set weights_encoder and weights_decoder to the blank value. Evaluation The evaluation script with ResNet 50 sn backbone can be found here : ./scripts/evaluate.sh Optional arguments (see full input arguments via ./eval.py ): arch_encoder architecture of encode network arch_decoder architecture of decode network suffix which snapshot to load list_val the list to load the validation data root_dataset the path of the dataset imgSize list of input image sizes imgSize enables single scale or multi scale inference. When load_dir is with the int type, the single scale inference will be started up. When load_dir is a int list , the multi scale test will be applied. Main Results Semantic Segmentation Results on ADE20K The experiment results are on the ADE20K validation set. MS test is short for multi scale test. sync BN indicates the mutli GPU synchronization batch normalization. More results and models will be released soon. Architecture Norm MS test Mean IoU Pixel Acc. Overall Score Download : : : : : : : : : : : : : : ResNet50_dilated8 + c1_bilinear_deepsup sync BN no 36.43 77.30 56.87 encoder decoder ResNet50_dilated8 + c1_bilinear_deepsup GN no 35.66 77.24 56.45 encoder decoder ResNet50_dilated8 + c1_bilinear_deepsup SN (8,2) no 38.72 78.90 58.82 encoder decoder ResNet50_dilated8 + c1_bilinear_deepsup sync BN yes 37.69 78.29 57.99 ResNet50_dilated8 + c1_bilinear_deepsup GN yes 36.32 77.77 57.05 ResNet50_dilated8 + c1_bilinear_deepsup SN (8,2) yes 39.21 79.20 59.21 NOTE: For all settings in this repo, we employ ResNet as the backbone network, using the original 7×7 kernel size in the first convolution layer. This is different from the MIT framework , which adopts 3 convolution layers with the kernel size 3×3 at the bottom of the network. See ./models/resnet_v1_sn.py for the details.",Semantic Segmentation,Semantic Segmentation 2506,Computer Vision,Computer Vision,Computer Vision,"JejuNet Real Time Video Segmentation on Mobile Devices Keywords Video Segmentation, Mobile, Tensorflow Lite Tutorials Benchmarks: Tensorflow Lite on GPU A Post on Medium Link Detail results Link Introduction Running vision tasks such as object detection, segmentation in real time on mobile devices. Our goal is to implement video segmentation in real time at least 24 fps on Google Pixel 2. We use effiicient deep learning netwrok specialized in mobile/embedded devices and exploit data redundancy between consecutive frames to reduce unaffordable computational cost. Moreover, the network can be optimized with 8 bits quantization provided by tf lite. ! Real Time Video Segmentation(Credit: Google AI) Example: Reai Time Video Segmentation(Credit: Google AI) Architecture Video Segmentation Compressed DeepLabv3+ 1 Backbone: MobileNetv2 2 Optimization 8 bits Quantization on TensorFlow Lite Experiments Video Segmentation on Google Pixel 2 Datasets PASCAL VOC 2012 Plan @Deep Learning Camp Jeju 2018 July, 2018 x DeepLabv3+ on tf lite x Use data redundancy between frames Optimization x Quantization x Reduce the number of layers, filters and input size Results More results here bit.ly/jejunet output (bit.ly/jejunet output) Demo ! DeepLabv3+ on tf lite Video Segmentation on Google Pixel 2 Trade off Between Speed(FPS) and Accuracy(mIoU) ! Trade off Between Speed(FPS) and Accuracy(mIoU) Low Bits Quantization Network Input Stride Quantization(w/a) PASCAL mIoU Runtime(.tflite) File Size(.tflite) DeepLabv3, MobileNetv2 512x512 16 32/32 79.9% 862ms 8.5MB DeepLabv3, MobileNetv2 512x512 16 8/8 79.2% 451ms 2.2MB DeepLabv3, MobileNetv2 512x512 16 6/6 70.7% DeepLabv3, MobileNetv2 512x512 16 6/4 30.3% ! Low Bits Quantization References 1. Encoder Decoder with Atrous Separable Convolution for Semantic Image Segmentation Liang Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, Hartwig Adam. arXiv: 1802.02611. link . arXiv: 1802.02611, 2018. 2. Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang Chieh Chen link . arXiv:1801.04381, 2018. Authors Taekmin Kim (Mentee) @tantara Jisung Kim(Mentor) @runhani Acknowledgement This work was partially supported by Deep Learning Jeju Camp and sponsors such as Google, SK Telecom. Thank you for the generous support for TPU and Google Pixel 2, and thank Hyungsuk and all the mentees for tensorflow impelmentations and useful discussions. License © Taekmin Kim , 2018. Licensed under the MIT (LICENSE) License.",Semantic Segmentation,Semantic Segmentation 2538,Computer Vision,Computer Vision,Computer Vision,"TensorFlow Models This repository contains a number of different models implemented in TensorFlow : The official models (official) are a collection of example models that use TensorFlow's high level APIs. They are intended to be well maintained, tested, and kept up to date with the latest stable TensorFlow API. They should also be reasonably optimized for fast performance while still being easy to read. We especially recommend newer TensorFlow users to start here. The research models are a large collection of models implemented in TensorFlow by researchers. They are not officially supported or available in release branches; it is up to the individual researchers to maintain the models and/or provide support on issues and pull requests. The samples folder (samples) contains code snippets and smaller models that demonstrate features of TensorFlow, including code presented in various blog posts. The tutorials folder (tutorials) is a collection of models described in the TensorFlow tutorials . Contribution guidelines If you want to contribute to models, be sure to review the contribution guidelines (CONTRIBUTING.md). License Apache License 2.0 (LICENSE)",Semantic Segmentation,Semantic Segmentation 2600,Computer Vision,Computer Vision,Computer Vision,"Tensorflow SegNet Implement slightly different (see below for detail) SegNet in tensorflow, successfully trained segnet basic in CamVid dataset. Due to indice unravel still unavailable in tensorflow, the original upsampling method is temporarily replaced simply by deconv( or conv transpose) layer (without pooling indices). You can follow the issue here: (The current workaround for unpooling layer is a bit slow because it lacks of GPU support.) for model detail, please go to Requirement tensorflow 1.0 Pillow (optional, for write label image) scikit image Update Update to tf 1.0 Finally get some time to refactor a bit, removing some un used function and remove the hard coded file path Now the model should be easy to config. The parameters can be found in main.py. I planned to add more feature such as dilation, multi resolution, sequential learning..etc. Making it more like a basic segmentation toolbox and support more dataset as well. Therefore the model and documentation will be changed accordingly in the future. More utility function will be added and some messed coding style will be fixed. Any feature request is also welcomed. Usage see also example.sh training: python main.py log_dir path_to_your_log image_dir path_to_CamVid_train.txt val_dir path_to_CamVid_val.txt batch_size 5 finetune: python main.py finetune path_to_saved_ckpt log_dir path_to_your_log image_dir path_to_CamVid_train.txt val_dir path_to_CamVid_val.txt batch_size 5 testing: python main.py testing path_to_saved_ckpt log_dir path_to_your_log test_dir path_to_CamVid_train.txt batch_size 5 save_image True You can set default path and parameters in main.py line 618. note: in testing you can specify whether to save predicted images, currently only save one image for manually checking, will be configured to be more flexible. Dataset This Implement default to use CamVid dataset as described in the original SegNet paper, The dataset can be download from author's github in the CamVid folder example format: path_to_image1 path_to_corresponded_label_image1 , path_to_image2 path_to_corresponded_label_image2 , path_to_image3 path_to_corresponded_label_image3 , .......",Semantic Segmentation,Semantic Segmentation 2605,Computer Vision,Computer Vision,Computer Vision,"PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space Created by Charles R. Qi , Li (Eric) Yi , Hao Su , Leonidas J. Guibas from Stanford University. ! prediction example Citation If you find our work useful in your research, please consider citing: @article{qi2017pointnetplusplus, title {PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space}, author {Qi, Charles R and Yi, Li and Su, Hao and Guibas, Leonidas J}, journal {arXiv preprint arXiv:1706.02413}, year {2017} } Introduction This work is based on our NIPS'17 paper. You can find arXiv version of the paper here or check project webpage for a quick overview. PointNet++ is a follow up project that builds on and extends PointNet . It is version 2.0 of the PointNet architecture. PointNet (the v1 model) either transforms features of individual points independently or process global features of the entire point set . However, in many cases there are well defined distance metrics such as Euclidean distance for 3D point clouds collected by 3D sensors or geodesic distance for manifolds like isometric shape surfaces. In PointNet++ we want to respect spatial localities of those point sets. PointNet++ learns hierarchical features with increasing scales of contexts, just like that in convolutional neural networks. Besides, we also observe one challenge that is not present in convnets (with images) non uniform densities in natural point clouds. To deal with those non uniform densities, we further propose special layers that are able to intelligently aggregate information from different scales. In this repository we release code and data for our PointNet++ classification and segmentation networks as well as a few utility scripts for training, testing and data processing and visualization. Installation Install TensorFlow . The code is tested under TF1.2 GPU version and Python 2.7 (version 3 should also work) on Ubuntu 14.04. There are also some dependencies for a few Python libraries for data processing and visualizations like cv2 , h5py etc. It's highly recommended that you have access to GPUs. Compile Customized TF Operators The TF operators are included under tf_ops , you need to compile them (check tf_xxx_compile.sh under each ops subfolder) first. Update nvcc and python path if necessary. The code is tested under TF1.2.0. If you are using earlier version it's possible that you need to remove the D_GLIBCXX_USE_CXX11_ABI 0 flag in g++ command in order to compile correctly. To compile the operators in TF version > 1.4, you need to modify the compile scripts slightly. First, find Tensorflow include and library paths. TF_INC $(python c 'import tensorflow as tf; print(tf.sysconfig.get_include())') TF_LIB $(python c 'import tensorflow as tf; print(tf.sysconfig.get_lib())') Then, add flags of I$TF_INC/external/nsync/public L$TF_LIB ltensorflow_framework to the g++ commands. Usage Shape Classification To train a PointNet++ model to classify ModelNet40 shapes (using point clouds with XYZ coordinates): python train.py To see all optional arguments for training: python train.py h If you have multiple GPUs on your machine, you can also run the multi GPU version training (our implementation is similar to the tensorflow cifar10 tutorial ): CUDA_VISIBLE_DEVICES 0,1 python train_multi_gpu.py num_gpus 2 After training, to evaluate the classification accuracies (with optional multi angle voting): python evaluate.py num_votes 12 Side Note: For the XYZ+normal experiment reported in our paper: (1) 5000 points are used and (2) a further random data dropout augmentation is used during training (see commented line after augment_batch_data in train.py and (3) the model architecture is updated such that the nsample 128 in the first two set abstraction levels, which is suited for the larger point density in 5000 point samplings. To use normal features for classification: You can get our sampled point clouds of ModelNet40 (XYZ and normal from mesh, 10k points per shape) here (1.6GB) . Move the uncompressed data folder to data/modelnet40_normal_resampled Object Part Segmentation To train a model to segment object parts for ShapeNet models: cd part_seg python train.py Preprocessed ShapeNetPart dataset (XYZ, normal and part labels) can be found here (674MB) . Move the uncompressed data folder to data/shapenetcore_partanno_segmentation_benchmark_v0_normal Semantic Scene Parsing See scannet/README and scannet/train.py for details. Visualization Tools We have provided a handy point cloud visualization tool under utils . Run sh compile_render_balls_so.sh to compile it and then you can try the demo with python show3d_balls.py The original code is from here . Prepare Your Own Data You can refer to here on how to prepare your own HDF5 files for either classification or segmentation. Or you can refer to modelnet_dataset.py on how to read raw data files and prepare mini batches from them. A more advanced way is to use TensorFlow's dataset APIs, for which you can find more documentations here . License Our code is released under MIT License (see LICENSE file for details). Updates 02/23/2018: Added support for multi gpu training for the classification task. 02/23/2018: Adopted a new way for data loading. No longer require manual data downloading to train a classification network. 02/06/2018: Added sample training code for ScanNet semantic segmentation. Related Projects PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation by Qi et al. (CVPR 2017 Oral Presentation). Code and data released in GitHub . Frustum PointNets for 3D Object Detection from RGB D Data by Qi et al. (CVPR 2018) A novel framework for 3D object detection with RGB D data. Based on 2D boxes from a 2D object detector on RGB images, we extrude the depth maps in 2D boxes to point clouds in 3D space and then realize instance segmentation and 3D bounding box estimation using PointNet/PointNet++. The method proposed has achieved first place on KITTI 3D object detection benchmark on all categories (last checked on 11/30/2017). Code and data release TBD. Install Env conda create name pointnet_tf python 2.7.5 pip install tensorflow gpu 1.4 pip install h5py opencv python",Semantic Segmentation,Semantic Segmentation 2607,Computer Vision,Computer Vision,Computer Vision,"CRF RNN for Semantic Image Segmentation ! sample (sample.png) License (3 Clause BSD) Live demo: Updates: Keras/Tensorflow version is now available. We now support the latest Caffe future version. This package contains code for the CRF RNN semantic image segmentation method, published in the ICCV 2015 paper Conditional Random Fields as Recurrent Neural Networks . This paper was initially described in an arXiv tech report . The online demonstration based on this code won the Best Demo Prize at ICCV 2015. Our software is built on top of the Caffe deep learning library. The current version was developed by: Sadeep Jayasumana , Shuai Zheng , Bernardino Romera Paredes , Anurag Arnab , and Zhizhong Su. Supervisor: Philip Torr Our work allows computers to recognize objects in images, what is distinctive about our work is that we also recover the 2D outline of objects. Currently we have trained this model to recognize 20 classes. This software allows you to test our algorithm on your own images – have a try and see if you can fool it, if you get some good examples you can send them to us. Why are we doing this? This work is part of a project to build augmented reality glasses for the partially sighted. Please read about it here: smart specs . For demo and more information about CRF RNN please visit the project website: . If you use this code/model for your research, please cite the following papers: @inproceedings{crfasrnn_ICCV2015, author {Shuai Zheng and Sadeep Jayasumana and Bernardino Romera Paredes and Vibhav Vineet and Zhizhong Su and Dalong Du and Chang Huang and Philip H. S. Torr}, title {Conditional Random Fields as Recurrent Neural Networks}, booktitle {International Conference on Computer Vision (ICCV)}, year {2015} } @inproceedings{higherordercrf_ECCV2016, author {Anurag Arnab and Sadeep Jayasumana and Shuai Zheng and Philip H. S. Torr}, title {Higher Order Conditional Random Fields in Deep Neural Networks}, booktitle {European Conference on Computer Vision (ECCV)}, year {2016} } How to use the CRF RNN layer CRF RNN has been developed as a custom Caffe layer named MultiStageMeanfieldLayer. Usage of this layer in the model definition prototxt file looks the following. Check the matlab scripts or the python scripts folder for more detailed examples. This is part of FCN, coarse is a blob coming from FCN layer { type: 'Crop' name: 'crop' bottom: 'bigscore' bottom: 'data' top: 'coarse' } This layer is used to split the output of FCN into two. This is required by CRF RNN. layer { type: 'Split' name: 'splitting' bottom: 'coarse' top: 'unary' top: 'Q0' } layer { name: inference1 Keep the name inference1 to load the trained parameters from our caffemodel. type: MultiStageMeanfield Type of this layer bottom: unary Unary input from FCN bottom: Q0 A copy of the unary input from FCN bottom: data Input image top: pred Output of CRF RNN param { lr_mult: 10000 learning rate for W_G } param { lr_mult: 10000 learning rate for W_B } param { lr_mult: 1000 learning rate for compatiblity transform matrix } multi_stage_meanfield_param { num_iterations: 10 Number of iterations for CRF RNN compatibility_mode: POTTS Initialize the compatilibity transform matrix with a matrix whose diagonal is 1. threshold: 2 theta_alpha: 160 theta_beta: 3 theta_gamma: 3 spatial_filter_weight: 3 bilateral_filter_weight: 5 } } Installation Guide First, clone the project by running: git clone recursive You need to compile the modified Caffe library in this repository. Instructions for Ubuntu 14.04 are included below. You can also consult the generic Caffe installation guide for further help. 1.1 Install dependencies General dependencies sudo apt get install libprotobuf dev libleveldb dev libsnappy dev libopencv dev libhdf5 serial dev protobuf compiler sudo apt get install no install recommends libboost all dev CUDA (optional needed only if you are planning to use a GPU for faster processing) Install the correct CUDA driver and its SDK. Download CUDA SDK from Nvidia website. You might need to blacklist some modules so that they do not interfere with the driver installation. You also need to uninstall your default Nvidia Driver first. sudo apt get install freeglut3 dev build essential libx11 dev libxmu dev libxi dev libgl1 mesa glx libglu1 mesa libglu1 mesa dev open /etc/modprobe.d/blacklist.conf and add: blacklist amd76x_edac blacklist vga16fb blacklist nouveau blacklist rivafb blacklist nvidiafb blacklist rivatv sudo apt get remove purge nvidia When you restart your PC, before logging in, try Ctrl + Alt + F1 to switch to a text based login. Try: sudo service lightdm stop chmod +x cuda .run sudo ./cuda .run BLAS Install a BLAS library such as ATLAS, OpenBLAS or MKL. To install BLAS: sudo apt get install libatlas base dev Python Install Anaconda Python distribution or install the default Python distribution with numpy, scipy, etc. MATLAB (optional needed only if you are planning to use the MATLAB interface) Install MATLAB using a standard distribution. 1.2 Build the custom Caffe version Set the path correctly in the Makefile.config . You can rename the Makefile.config.example to Makefile.config , as most common parts are filled already. You may need to change it a bit according to your environment. After this, in Ubuntu 14.04, try: make If there are no error messages, you can then compile and install the Python and Matlab wrappers: To install the MATLAB wrapper (optional): make matcaffe To install the Python wrapper (optional): make pycaffe That's it! Enjoy our software! 1.3 Run the demo MATLAB and Python scripts for running the demo are available in the matlab scripts and python scripts directories, respectively. Both of these scripts do the same thing you can choose either. Python users: Change the directory to python scripts . First download the model that includes the trained weights. In Linux, this can be done by: sh download_trained_model.sh Alternatively, you can also get the model by directly clicking the link in python scripts/README.md . To run the demo, execute: python crfasrnn_demo.py You will get an output.png image. To use your own images, replace input.jpg in the crfasrnn_demo.py file. MATLAB users: Change the directory to matlab scripts . First download the model that includes the trained weights. In Linux, this can be done by: sh download_trained_model.sh Alternatively, you can also get the model by directly clicking the link in matlab scripts/README.md . Load your MATLAB application and run crfrnn_demo.m. To use your own images, just replace input.jpg in the crfrnn_demo.m file. You can also find a part of our model in MatConvNet . Explanation about the CRF RNN layer: If you would like to try out the CRF RNN model we trained, you should keep the layer name as it is ( inference1 ), so that the code will correctly load the parameters from the caffemodel. Otherwise, it will reinitialize parameters. You should find out that the end to end trained CRF RNN model does better than the alternatives. If you set the CRF RNN layer name to inference2 , you should observe lower performance since the parameters for both CNN and CRF are not jointly optimized. Training CRF RNN on a new dataset: If you would like to train CRF RNN on other datasets, please follow the piecewise training described in our paper. In short, you should first train a strong pixel wise CNN model. After this, you could plug our CRF RNN layer into it by adding the MultiStageMeanfieldLayer to the prototxt file. You should then be able to train the CNN and CRF RNN parts jointly end to end. Notice that the current deploy.prototxt file we have provided is tailored for PASCAL VOC Challenge. This dataset contains 21 class labels including background. You should change the num_output in the corresponding layer if you would like to finetune our model for other datasets. Also, the deconvolution layer in current code does not allow initializing the parameters through prototxt. If you change the num_output there, you should manually re initialize the parameters in the caffemodel file. See examples/segmentationcrfasrnn for more information. Why predictions are all black? This could happen if you change layer names in the model definition prototxt, causing the weights not to load correctly. This could also happen if you change the number of outputs in deconvolution layer in the prototxt but not initialize the deconvolution layer properly. MultiStageMeanfield causes a segfault? This error usually occurs when you do not place the spatial.par and bilateral.par files in the script path. Python training script from third parties We would like to thank martinkersner and MasazI for providing Python training scripts for CRF RNN. 1. martinkersner's scripts 2. MasazI's scripts Merge with the upstream caffe It is possible to integrate the CRF RNN code into upstream Caffe. However, due to the change of the crop layer, the caffemodel we provided might require extra training to provide the same accuracy. mtourne Kindly provided a version that merged the code with upstream caffe. 1. mtourne upstream version with CRFRNN GPU version of CRF RNN hyenal kindly provided a purely GPU version of CRF RNN. This would lead to considerably faster training and testing. 1. hyenal's GPU crf rnn CRF as RNN as a layer in Lasagne Lasagne CRFasRNN layer Latest Caffe with CPU/GPU CRF RNN crfasrnn caffe Keras/Tensorflow version of CRF RNN crfasrnn_keras Let us know if we have missed any other works from third parties. For more information about CRF RNN please visit the project website Contact:",Semantic Segmentation,Semantic Segmentation 2678,Computer Vision,Computer Vision,Computer Vision,"deep crowd counting This repository is based on davideverona/deep crowd counting_crowdnet Paper : CrowdNet: A Deep Convolutional Network for Dense Crowd Counting . Dependence : DeepLab v2 (an independent Caffe,mainly working on semantic image segmentation) paper The performances are as follows: ! image ! image ! image ! image ! image",Semantic Segmentation,Semantic Segmentation 2696,Computer Vision,Computer Vision,Computer Vision,"PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space Created by Charles R. Qi , Li (Eric) Yi , Hao Su , Leonidas J. Guibas from Stanford University. ! prediction example Citation If you find our work useful in your research, please consider citing: @article{qi2017pointnetplusplus, title {PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space}, author {Qi, Charles R and Yi, Li and Su, Hao and Guibas, Leonidas J}, journal {arXiv preprint arXiv:1706.02413}, year {2017} } Introduction This work is based on our NIPS'17 paper. You can find arXiv version of the paper here or check project webpage for a quick overview. PointNet++ is a follow up project that builds on and extends PointNet . It is version 2.0 of the PointNet architecture. PointNet (the v1 model) either transforms features of individual points independently or process global features of the entire point set . However, in many cases there are well defined distance metrics such as Euclidean distance for 3D point clouds collected by 3D sensors or geodesic distance for manifolds like isometric shape surfaces. In PointNet++ we want to respect spatial localities of those point sets. PointNet++ learns hierarchical features with increasing scales of contexts, just like that in convolutional neural networks. Besides, we also observe one challenge that is not present in convnets (with images) non uniform densities in natural point clouds. To deal with those non uniform densities, we further propose special layers that are able to intelligently aggregate information from different scales. In this repository we release code and data for our PointNet++ classification and segmentation networks as well as a few utility scripts for training, testing and data processing and visualization. Installation Install TensorFlow . The code is tested under TF1.2 GPU version and Python 2.7 (version 3 should also work) on Ubuntu 14.04. There are also some dependencies for a few Python libraries for data processing and visualizations like cv2 , h5py etc. It's highly recommended that you have access to GPUs. Compile Customized TF Operators The TF operators are included under tf_ops , you need to compile them (check tf_xxx_compile.sh under each ops subfolder) first. Update nvcc and python path if necessary. The code is tested under TF1.2.0. If you are using earlier version it's possible that you need to remove the D_GLIBCXX_USE_CXX11_ABI 0 flag in g++ command in order to compile correctly. To compile the operators in TF version > 1.4, you need to modify the compile scripts slightly. First, find Tensorflow include and library paths. TF_INC $(python c 'import tensorflow as tf; print(tf.sysconfig.get_include())') TF_LIB $(python c 'import tensorflow as tf; print(tf.sysconfig.get_lib())') Then, add flags of I$TF_INC/external/nsync/public L$TF_LIB ltensorflow_framework to the g++ commands. Usage Shape Classification To train a PointNet++ model to classify ModelNet40 shapes (using point clouds with XYZ coordinates): python train.py To see all optional arguments for training: python train.py h If you have multiple GPUs on your machine, you can also run the multi GPU version training (our implementation is similar to the tensorflow cifar10 tutorial ): CUDA_VISIBLE_DEVICES 0,1 python train_multi_gpu.py num_gpus 2 After training, to evaluate the classification accuracies (with optional multi angle voting): python evaluate.py num_votes 12 Side Note: For the XYZ+normal experiment reported in our paper: (1) 5000 points are used and (2) a further random data dropout augmentation is used during training (see commented line after augment_batch_data in train.py and (3) the model architecture is updated such that the nsample 128 in the first two set abstraction levels, which is suited for the larger point density in 5000 point samplings. To use normal features for classification: You can get our sampled point clouds of ModelNet40 (XYZ and normal from mesh, 10k points per shape) here (1.6GB) . Move the uncompressed data folder to data/modelnet40_normal_resampled Object Part Segmentation To train a model to segment object parts for ShapeNet models: cd part_seg python train.py Preprocessed ShapeNetPart dataset (XYZ, normal and part labels) can be found here (674MB) . Move the uncompressed data folder to data/shapenetcore_partanno_segmentation_benchmark_v0_normal Semantic Scene Parsing See scannet/README and scannet/train.py for details. Visualization Tools We have provided a handy point cloud visualization tool under utils . Run sh compile_render_balls_so.sh to compile it and then you can try the demo with python show3d_balls.py The original code is from here . Prepare Your Own Data You can refer to here on how to prepare your own HDF5 files for either classification or segmentation. Or you can refer to modelnet_dataset.py on how to read raw data files and prepare mini batches from them. A more advanced way is to use TensorFlow's dataset APIs, for which you can find more documentations here . License Our code is released under MIT License (see LICENSE file for details). Updates 02/23/2018: Added support for multi gpu training for the classification task. 02/23/2018: Adopted a new way for data loading. No longer require manual data downloading to train a classification network. 02/06/2018: Added sample training code for ScanNet semantic segmentation. Related Projects PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation by Qi et al. (CVPR 2017 Oral Presentation). Code and data released in GitHub . Frustum PointNets for 3D Object Detection from RGB D Data by Qi et al. (CVPR 2018) A novel framework for 3D object detection with RGB D data. Based on 2D boxes from a 2D object detector on RGB images, we extrude the depth maps in 2D boxes to point clouds in 3D space and then realize instance segmentation and 3D bounding box estimation using PointNet/PointNet++. The method proposed has achieved first place on KITTI 3D object detection benchmark on all categories (last checked on 11/30/2017). Code and data release TBD.",Semantic Segmentation,Semantic Segmentation 2699,Computer Vision,Computer Vision,Computer Vision,"Image Segmentation and Object Detection in Pytorch Pytorch Segmentation Detection is a library for image segmentation and object detection with reported results achieved on common image segmentation/object detection datasets, pretrained models and scripts to reproduce them. Segmentation PASCAL VOC 2012 Implemented models were tested on Restricted PASCAL VOC 2012 Validation dataset (RV VOC12) or Full PASCAL VOC 2012 Validation dataset (VOC 2012) and trained on the PASCAL VOC 2012 Training data and additional Berkeley segmentation data for PASCAL VOC 12. You can find all the scripts that were used for training and evaluation here (pytorch_segmentation_detection/recipes/pascal_voc/segmentation). This code has been used to train networks with this performance: Model Test data Mean IOU Mean pix. accuracy Pixel accuracy Inference time (512x512 px. image) Model Download Link Related paper Resnet 18 8s RV VOC12 59.0 in prog. in prog. 28 ms. Dropbox DeepLab Resnet 34 8s RV VOC12 68.0 in prog. in prog. 50 ms. Dropbox DeepLab Resnet 50 16s VOC12 66.5 in prog. in prog. in prog. in prog. DeepLab Resnet 50 8s VOC12 67.0 in prog. in prog. in prog. in prog. DeepLab Resnet 50 8s deep sup VOC12 67.1 in prog. in prog. in prog. in prog. DeepLab Resnet 101 16s VOC12 68.6 in prog. in prog. in prog. in prog. DeepLab PSP Resnet 18 8s VOC12 68.3 n/a n/a n/a in prog. PSPnet PSP Resnet 50 8s VOC12 73.6 n/a n/a n/a in prog. PSPnet Some qualitative results: ! Alt text (pytorch_segmentation_detection/recipes/pascal_voc/segmentation/segmentation_demo_preview.gif?raw true Title ) Endovis 2017 Implemented models were trained on Endovis 2017 segmentation dataset and the sequence number 3 was used for validation and was not included in training dataset. The code to acquire the training and validating the model is also provided in the library. Additional Qualitative results can be found on this youtube playlist . Binary Segmentation Model Test data Mean IOU Mean pix. accuracy Pixel accuracy Inference time (512x512 px. image) Model Download Link Resnet 9 8s Seq 3 96.1 in prog. in prog. 13.3 ms. Dropbox Resnet 18 8s Seq 3 96.0 in prog. in prog. 28 ms. Dropbox Resnet 34 8s Seq 3 in prog. in prog. in prog. 50 ms. in prog. Resnet 9 8s network was tested on the 0.5 reduced resoulution (512 x 640). Qualitative results (on validation sequence): ! Alt text (pytorch_segmentation_detection/recipes/endovis_2017/segmentation/validation_binary.gif?raw true Title ) Multi class Segmentation Model Test data Mean IOU Mean pix. accuracy Pixel accuracy Inference time (512x512 px. image) Model Download Link Resnet 18 8s Seq 3 81.0 in prog. in prog. 28 ms. Dropbox Resnet 34 8s Seq 3 in prog. in prog. in prog. 50 ms. in prog Qualitative results (on validation sequence): ! Alt text (pytorch_segmentation_detection/recipes/endovis_2017/segmentation/validation_multiclass.gif?raw true Title ) Cityscapes The dataset contains video sequences recorded in street scenes from 50 different cities, with high quality pixel level annotations of 5 000 frames. The annotations contain 19 classes which represent cars, road, traffic signs and so on. Model Test data Mean IOU Mean pix. accuracy Pixel accuracy Inference time (512x512 px. image) Model Download Link Resnet 18 32s Validation set 61.0 in prog. in prog. in prog. in prog. Resnet 18 8s Validation set 60.0 in prog. in prog. 28 ms. Dropbox Resnet 34 8s Validation set 69.1 in prog. in prog. 50 ms. Dropbox Resnet 50 16s PSP Validation set 71.2 in prog. in prog. in prog. in prog. Qualitative results (on validation sequence): Whole sequence can be viewed here . ! Alt text (pytorch_segmentation_detection/recipes/cityscapes/cityscapes_demo.gif?raw true Title ) Installation This code requires: 1. Pytorch . 2. Some libraries which can be acquired by installing Anaconda package . Or you can install scikit image , matplotlib , numpy using pip . 3. Clone the library: git clone recursive And use this code snippet before you start to use the library: python import sys update with your path All the jupyter notebooks in the repository already have this sys.path.append( /your/path/pytorch segmentation detection/ ) sys.path.insert(0, '/your/path/pytorch segmentation detection/vision/') Here we use our pytorch/vision fork, which might be merged and futher merged in a future. We have added it as a submodule to our repository. 4. Download segmentation or detection models that you want to use manually (links can be found below). About If you used the code for your research, please, cite the paper: @article{pakhomov2017deep, title {Deep Residual Learning for Instrument Segmentation in Robotic Surgery}, author {Pakhomov, Daniil and Premachandran, Vittal and Allan, Max and Azizian, Mahdi and Navab, Nassir}, journal {arXiv preprint arXiv:1703.08580}, year {2017} } During implementation, some preliminary experiments and notes were reported: Converting Image Classification network into FCN Performing upsampling using transposed convolution Conditional Random Fields for Refining of Segmentation and Coarseness of FCN 32s model segmentations TF records usage",Semantic Segmentation,Semantic Segmentation 2701,Computer Vision,Computer Vision,Computer Vision,"Deep Learning Explorer Deep learning explorer is a set of tools to quickly see how different deep learning models work with your data. Every model is ready to test in an NVIDIA docker environment with Jupyter notebooks. An NVIDIA GeForce 1080Ti on Ubuntu 16.04 is used for testing, but other cards and distributions may also work. Custom data is supported in the COCO format. You can use pycococreator to create your own COCO style data sets. Learn how to get started here: Currently implemented models Mask R CNN (object detection and segmentation) arXiv , source FCN (class segmentation) arXiv , source",Semantic Segmentation,Semantic Segmentation 2719,Computer Vision,Computer Vision,Computer Vision,"简介 本代码为系列课程, 第九周部分的课后作业内容。 TinymMind上GPU运行费用较贵,每 CPU 每小时 $0.09,每 GPU 每小时 $0.99,所有作业内容推荐先在本地运行出一定的结果,保证运行正确之后,再上传到TinyMind上运行。初始运行推荐使用CPU运行资源,待所有代码确保没有问题之后,再启动GPU运行。 TinyMind上Tensorflow已经有1.4的版本,能比1.3的版本快一点,推荐使用。 作业内容 本作业以week9视频中讲述的FCN为基础,构建一个FCN训练模型,要求学员实现代码中缺失的部分并使用自己的实现跑出比较好的结果。 数据集 本作业使用Pascal2 VOC2012的数据中,语义分割部分的数据作为作业的数据集。 VOC网址: 本次作业不提供数据集下载,请学员自行到上述网址找到并下载数据,同时请仔细阅读VOC网站对于数据集的描述。 VOC数据集目录结构如下: ├── local │ ├── VOC2006 │ └── VOC2007 ├── results │ ├── VOC2006 │ │ └── Main │ └── VOC2007 │ ├── Layout │ ├── Main │ └── Segmentation ├── VOC2007 │ ├── Annotations │ ├── ImageSets │ │ ├── Layout │ │ ├── Main │ │ └── Segmentation │ ├── JPEGImages │ ├── SegmentationClass │ └── SegmentationObject ├── VOC2012 │ ├── Annotations │ ├── ImageSets │ │ ├── Action │ │ ├── Layout │ │ ├── Main │ │ └── Segmentation │ ├── JPEGImages │ ├── SegmentationClass │ └── SegmentationObject └── VOCcode 其中本次作业使用VOC2012目录下的内容。作业数据集划分位于 VOC2012/ImageSets/Segmentation 中,分为train.txt 1464张图片和val.txt1449张图片。 语义分割标签位于 VOC2012/SegmentationClass ,注意不是数据集中所有的图片都有语义分类的标签。 语义分割标签用颜色来标志不同的物体,该数据集中共有20种不同的物体分类,以1~20的数字编号,加上编号为0的背景分类,该数据集中共有21种分类。编号与颜色的对应关系如下: py class classes 'background', 'aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', 'cat', 'chair', 'cow', 'diningtable', 'dog', 'horse', 'motorbike', 'person', 'potted plant', 'sheep', 'sofa', 'train', 'tv/monitor' RGB color for each class colormap 0, 0, 0 , 128, 0, 0 , 0, 128, 0 , 128, 128, 0 , 0, 0, 128 , 128, 0, 128 , 0, 128, 128 , 128, 128, 128 , 64, 0, 0 , 192, 0, 0 , 64, 128, 0 , 192, 128, 0 , 64, 0, 128 , 192, 0, 128 , 64, 128, 128 , 192, 128, 128 , 0, 64, 0 , 128, 64, 0 , 0, 192, 0 , 128, 192, 0 , 0, 64, 128 对应关系可由 VOCcode/VOClabelcolormap.m 计算得出,作业代码中也有计算对应关系的代码,这里不再详述,请学员自行理解代码。 >需要注意,分类中其实还有一个编号为255的分类,其颜色对应 224, 224, 192 ,这个分类用作边界着色,这里不处理这个分类。 训练数据准备 训练数据需要预先打包成tfrecord格式,本步骤在本地完成。 打包使用作业代码中的 convert_fcn_dataset.py 脚本进行。脚本内容已经删掉一部分,需要由学员自行补全缺失部分的代码。 python3 convert_fcn_dataset.py data_dir /path/to/VOCdevkit/VOC2012/ output_dir ./ 本步骤最终生成的两个文件 fcn_train.record , fcn_val.record 分别在400MB左右,共800MB左右,如果最后的文件大小过大或过小,生成数据的过程可能有问题,请注意检查。 >提示:可以参考week8中数据集生成部分的代码来补全这里的代码。 数据集上传 请参考week7,week8中的内容,这里不再详述。 预训练模型 预训练模型使用tensorflow,modelzoo中的VGG16模型,请学员自行到modelzoo中查找并将该预训练模型放到tinymind上。 网络有问题的学员,可以使用已经预先上传到tinymind的模型,数据集为 ai100/vgg16 . 模型 模型代码以课程视频week9 FCN部分的代码进行了修改,主要是代码整理,添加了数据输入和结果输出的部分。 代码参考: 在tinymind上新建一个模型,模型设置参考如下模型: 复制模型后可以看到模型的全部参数。 需要注意的是,代码中使用了额外的库,所以在建立模型的时候,需要在依赖项中,填入以下项目: pydensecrf opencv python >cv2即是opencv python,本地运行的话,使用pip安装即可。这个不是一个官方版本,缺一些比较少用的功能,本作业用这个版本就足够了。官方版本需要编译,而且过程比较复杂,没有特殊必要,不要编译安装。 模型参数的解释: checkpoint_path VGG16的预训练模型的目录,这个请根据自己建立的数据集的目录进行设置。 output_dir 输出目录,这里使用tinymind上的/output目录即可。 dataset_train train数据集的目录,这个请根据自己建立的数据集的目录进行设置。 dataset_val val数据集的目录,这个请根据自己建立的数据集的目录进行设置。 batch_size BATCH_SIZE,这里使用的是16,建立8X的FCN的时候,可能会OutOfMem,将batch_size调低即可解决。 max_steps MAX_STEPS, 这里运行1500步,如果batch_size调整了的话,可以考虑调整一下这里。 learning_rate 学习率,这里固定为1e 4, 不推荐做调整。 运行过程中,模型每100个step会在/output/train下生成一个checkpoint,每200步会在/output/eval下生成四张验证图片。 >FC论文参考 作业内容 学员需要将convert_fcn_dataset.py中的代码补全并生成对应的数据集文件上传到tinymind。 学员需要在作业提供的代码基础上添加8X的FCN实现并进行训练。 > tinymind上有已经上传好的 数据集 ,仅供测试和参考,作业中请自己处理数据集并上传,使用这个数据集的作业数据集部分不给分。 结果评估 数据集准备完成 20分: 数据集中应包含train和val两个tfrecord文件,大小在400MB左右 模型训练完成 20分: 在tinymind运行log的输出中,可以看到如下内容: sh 2018 01 04 11:11:20,088 DEBUG train.py:298 step 1200 Current Loss: 101.153938293 2018 01 04 11:11:20,088 DEBUG train.py:300 23.54 imgs/s 2018 01 04 11:11:21,011 DEBUG train.py:307 Model saved in file: ./out/train/model.ckpt 1200 2018 01 04 11:11:21,018 DEBUG train.py:314 validation generated at step 1200 2018 01 04 11:11:28,461 DEBUG train.py:298 step 1210 Current Loss: 116.911231995 2018 01 04 11:11:28,461 DEBUG train.py:300 19.11 imgs/s 2018 01 04 11:11:35,356 DEBUG train.py:298 step 1220 Current Loss: 90.7060165405 训练结果完成 20分: 训练完成之后,可以在 /output/eval 下面生成验证的图片,其中 val_xx_prediction.jpg 的图片为模型输出的预测结果,内容应可以对应相应的annotation和img。根据验证图片的内容,结果可能会有区别,但是肯定可以看到输出的结果是明显有意义的。 模型代码补全 20分: train.py中可以看到8x代码的实现。形式可能会有区别,但是有比较明显的三个上采样过程,两个2X,一个8X,及其结果的融合。 最后的效果如下: 原图 ! 原图 (val_1000_img.jpg) 标签 ! 标签 (val_1000_annotation.jpg) 预测 ! 预测 (val_1000_prediction.jpg) CRF之后的预测 ! 预测 (val_1000_prediction_crfed.jpg) 心得体会 20分: 提供一份文档,描述自己的8Xfcn实现,需要有对关键代码的解释。描述自己对fcn的理解。 参考内容 本地运行训练使用的命令行: sh python train.py checkpoint_path ./vgg_16.ckpt output_dir ./output dataset_train ./fcn_train.record dataset_val ./fcn_val.record batch_size 16 max_steps 2000",Semantic Segmentation,Semantic Segmentation 2721,Computer Vision,Computer Vision,Computer Vision,"Tencent ML Images This repository introduces the open source project dubbed Tencent ML Images , which publishes ML Images : the largest open source multi label image database, including 17,609,752 training and 88,739 validation image URLs, which are annotated with up to 11,166 categories Resnet 101 model : it is pre trained on ML Images, and achieves the top 1 accuracy 80.73% on ImageNet via transfer learning Updates NOTE : A part of URLs of ML Images is collected from ImageNet . However, many URLs from ImageNet have expired. Thus, we also provide the correpsonding image indexes of ImageNet for these URLs in ML Images. Then, you can obtain the original image from ImageNet, if the URL is invalid. Please see How to handle the invalid URLs during downloading? ( invalid URLs) for details. We provide a new file download_urls_multithreading.sh (data/download_urls_multithreading.sh), which could download images using multi threading module. Most URLs that are not from ImageNet are valid. Please refer to Download Images using URLs for details. Contents Dependencies ( dependencies) Data ( data) Download ( download) Image Source ( source) Semantic Hierarchy ( hierarchy) Annotations ( annotation) Statistics ( statistics) Train ( train) Download Images using URLs ( download image) How to handle the invalid URLs during downloading? ( invalid URLs) Prepare the TFRecord File ( prepare tfrecord) Pretrain on ML Images ( pretrain) Finetune on ImageNet ( finetune) Checkpoints ( checkpoint) Feature Extraction ( feature extraction) Results ( result) Copyright ( copyright) Citation ( citation) Dependencies ( dependencies) Linux Python 2.7 Tensorflow > 1.6.0 Data ( data) back to top ( ) Download ( download) back to top ( ) train_urls.txt ( google云盘 , 百度网盘 ) val_urls.txt ( google云盘 , 百度网盘 ) The image URLs and the corresponding annotations can be downloaded above. The format of train_urls.txt is as follows ... 3:1 5193:0.9 5851:0.9 9413:1 9416:1 1053:0.8 1193:0.8 1379:0.8 ... As shown above, one image corresponds to one row. The first term is the image URL. The followed terms separated by space are the annotations. For example, 5193:0.9 indicates class 5193 and its confidence 0.9. Note that the class index starts from 0, and you can find the class name from the file data/dictionary_and_semantic_hierarchy.txt (data/dictionary_and_semantic_hierarchy.txt). Image Source ( source) back to top ( ) The image URLs of ML Images are collected from ImageNet and Open Images . Specifically, Part 1: From the whole database of ImageNet, we adopt 10,706,941 training and 50,000 validation image URLs, covering 10,032 categories. Part 2: From Open Images, we adopt 6,902,811 training and 38,739 validation image URLs, covering 1,134 unique categories (note that some other categories are merged with their synonymous categories from ImageNet). Finally, ML Images includes 17,609,752 training and 88,739 validation image URLs, covering 11,166 categories. Semantic Hierarchy ( hierarchy) back to top ( ) We build the semantic hiearchy of 11,166 categories, according to WordNet . The direct parent categories of each class can be found from the file data/dictionary_and_semantic_hierarchy.txt (data/dictionary_and_semantic_hierarchy.txt). The whole semantic hierarchy includes 4 independent trees, of which the root nodes are thing , matter , object, physical object and atmospheric phenomenon , respectively. The length of the longest semantic path from root to leaf nodes is 16, and the average length is 7.47. Annotations ( annotation) back to top ( ) Since the image URLs of ML Images are collected from ImageNet and Open Images, the annotations of ML Images are constructed based on the original annotations from ImageNet and Open Images. Note that the original annotations from Open Images are licensed by Google Inc. under CC BY 4.0 . Specifically, we conduct the following steps to construct the new annotations of ML Images. For the 6,902,811 training URLs from Open Images, we remove the annotated tags that are out of the remained 1,134 categories. According to the constructed semantic hierarchy (data/dictionary_and_semantic_hierarchy.txt) of 11,166 categories, we augment the annotations of all URLs of ML Images following the cateria that if one URL is annotated with category i, then all ancestor categories will also be annotated to this URL. We train a ResNet 101 model based on the 6,902,811 training URLs from Open Images, with 1,134 outputs. Using this ResNet 101 model, we predict the tags from 1,134 categories for the 10,756,941 single annotated image URLs from ImageNet. Consequently, we obtain a normalized co occurrence matrix between 10,032 categories from ImageNet and 1,134 categories from Open Images. We can determine the strongly co occurrenced pairs of categories. For example, category i and j are strongly co occurrenced; then, if one image is annotated with category i, then category j should also be annotated. The annotations of all URLs in ML Images are stored in train_urls.txt and val_urls.txt . Statistics ( statistics) back to top ( ) The main statistics of ML Images are summarized in ML Images. Train images Validation images Classes Trainable Classes Avg tags per image Avg images per class : : : : : : : : : : : : 17,609,752 88,739 11,166 10,505 8 1447.2 Note: Trainable class indicates the class that has over 100 train images. The number of images per class and the histogram of the number of annotations in training set are shown in the following figures. Train ( train) back to top ( ) Download Images using URLs ( download image) back to top ( ) The full train_urls.txt is very large. Here we provide a tiny file train_urls_tiny.txt (data/train_urls_tiny.txt) to demonstrate the downloading procedure. cd data ./download_urls_multithreading.sh A sub folder data/images will be generated to save the downloaded jpeg images, as well as a file train_im_list_tiny.txt to save the image list and the corresponding annotations. How to handle the invalid URLs during downloading? ( invalid URLs) As many URLs from ImageNet have expired, we also provide the correpsonding image indexes of ImageNet for these URLs in ML Images. We provide two new files that include the corresponding image index of ImageNet for each URL that is from ImageNet, including train_urls_and_index_from_imagenet.txt and val_urls_and_index_from_imagenet.txt . train_urls_and_index_from_imagenet.txt ( google云盘 , 百度网盘 ) val_urls_and_index_from_imagenet.txt ( google云盘 , 百度网盘 ) The format is as follows ... n03874293_7679 2964:1 2944:1 2913:1 2896:1 2577:1 1833:1 1054:1 1041:1 865:1 2:1 n03580845_3376 3618:1 3604:1 1835:1 1054:1 1041:1 865:1 2:1 ... In each row, the first term is the image index in ImageNet, and the followings are the corresponding URL and annotations. Using these two files, you can directly obtain the original image from ImageNet, if the URL is invalid. In train_urls.txt , the first 10,706,941 rows are URLs from ImageNet, while the other URLs are from Open Images. In val_urls.txt , the first 50,000 rows are URLs from ImageNet, while the other URLs are from Open Images. One can split them to obtain the URL list from Open Images, where most URLs are valid. Prepare the TFRecord File ( prepare tfrecord) back to top ( ) Here we generate the tfrecords using the multithreading module. One should firstly split the file train_im_list_tiny.txt into multiple smaller files, and save them into the sub folder data/image_lists/ . cd data ./tfrecord.sh Pretrain on ML Images ( pretrain) back to top ( ) ./example/train.sh Note that here we only provide the training code in the single node single GPU framework, while our actual training on ML Images is based on an internal distributed training framework (not released yet). One could modify the training code to the distributed framework following distributed tensorFlow . Finetune on ImageNet ( finetune) back to top ( ) One should firstly download the ImageNet database, then prepare the tfrecord file using tfrecord.sh (example/tfrecord.sh). Then, you can finetune the ResNet 101 model on ImageNet as follows, with the checkpoint pre trained on ML Images. ./example/finetune.sh Checkpoints ( checkpoint) back to top ( ) ckpt resnet101 mlimages ( google云盘 , 百度网盘 ): pretrained on ML Images ckpt resnet101 mlimages imagenet ( google云盘 , 百度网盘 ): pretrained on ML Images and finetuned on ImageNet (ILSVRC2012) Please download above two checkpoints and move them into the folder checkpoints/ , if you want to extract features using them. Feature Extraction ( feature extraction) back to top ( ) ./example/extract_feature.sh Results ( result) back to top ( ) The retults of different ResNet 101 checkpoints on the validation set of ImageNet (ILSVRC2012) are summarized in the following table. Checkpoints Train and finetune setting Top 1 acc on Val 224 Top 5 acc on Val 224 Top 1 acc on Val 299 Top 5 accuracy on Val 299 : : : : : : : : : : MSRA ResNet 101 train on ImageNet 76.4 92.9 Google ResNet 101 ckpt1 train on ImageNet, 299 x 299 77.5 93.9 Our ResNet 101 ckpt1 train on ImageNet 77.8 93.9 79.0 94.5 Google ResNet 101 ckpt2 Pretrain on JFT 300M, finetune on ImageNet, 299 x 299 79.2 94.7 Our ResNet 101 ckpt2 Pretrain on ML Images, finetune on ImageNet 78.8 94.5 79.5 94.9 Our ResNet 101 ckpt3 Pretrain on ML Images, finetune on ImageNet 224 to 299 78.3 94.2 80.73 95.5 Our ResNet 101 ckpt4 Pretrain on ML Images, finetune on ImageNet 299 x 299 75.8 92.7 79.6 94.6 Note: if not specified, the image size in training/finetuning is 224 x 224. finetune on ImageNet from 224 to 299 means that the image size in early epochs of finetuning is 224 x 224, then 299 x 299 in late epochs. Top 1 acc on Val 224 indicates the top 1 accuracy on 224 x 224 validation images. Copyright ( copyright) back to top ( ) The annotations of images are licensed by Tencent under CC BY 4.0 license. The contents of this repository, including the codes, documents and checkpoints, are released under an BSD 3 Clause license. Please refer to LICENSE (LICENSE.txt) for more details. If there is any concern about the copyright of any image used in this project, please email us (mailto:wubaoyuan1987@gmail.com). Citation ( citation) back to top ( ) The arxiv paper describling the details of this project will be available soon!",Semantic Segmentation,Semantic Segmentation 2736,Computer Vision,Computer Vision,Computer Vision,"FastFCN: Rethinking Dilated Convolution in the Backbone for Semantic Segmentation Project Paper arXiv Home PWC Official implementation of FastFCN: Rethinking Dilated Convolution in the Backbone for Semantic Segmentation . A Faster , Stronger and Lighter framework for semantic segmentation, achieving the state of the art performance and more than 3x acceleration. @inproceedings{wu2019fastfcn, title {FastFCN: Rethinking Dilated Convolution in the Backbone for Semantic Segmentation}, author {Wu, Huikai and Zhang, Junge and Huang, Kaiqi and Liang, Kongming and Yu Yizhou}, booktitle {arXiv preprint arXiv:1903.11816}, year {2019} } Contact: Hui Kai Wu (huikaiwu@icloud.com) Overview Framework ! (images/Framework.png) Joint Pyramid Upsampling (JPU) ! (images/JPU.png) Install 1. PyTorch 1.0 (Note: The code is test in the environment with python 3.5, cuda 9.0 ) 2. Install FastFCN git clone cd FastFCN PATH .:$PATH python setup.py install 3. Install Requirements nose tqdm scipy cython requests Train and Test PContext python scripts/prepare_pcontext.py cd experiments/segmentation Method Backbone mIoU FPS Model Scripts : : : : : : : : : : EncNet ResNet 50 49.91 18.77 EncNet+JPU (ours) ResNet 50 51.05 37.56 GoogleDrive bash (experiments/segmentation/scripts/encnet_res50_pcontext.sh) PSP ResNet 50 50.58 18.08 PSP+JPU (ours) ResNet 50 50.89 28.48 GoogleDrive bash (experiments/segmentation/scripts/psp_res50_pcontext.sh) DeepLabV3 ResNet 50 49.19 15.99 DeepLabV3+JPU (ours) ResNet 50 50.07 20.67 GoogleDrive bash (experiments/segmentation/scripts/deeplab_res50_pcontext.sh) EncNet ResNet 101 52.60 (MS) 10.51 EncNet+JPU (ours) ResNet 101 54.03 (MS) 32.02 GoogleDrive bash (experiments/segmentation/scripts/encnet_res101_pcontext.sh) ADE20K python scripts/prepare_ade20k.py cd experiments/segmentation Training Set Method Backbone mIoU (MS) Model Scripts : : : : : : : : EncNet ResNet 50 41.11 EncNet+JPU (ours) ResNet 50 42.75 GoogleDrive bash (experiments/segmentation/scripts/encnet_res50_ade20k_train.sh) EncNet ResNet 101 44.65 EncNet+JPU (ours) ResNet 101 44.34 GoogleDrive bash (experiments/segmentation/scripts/encnet_res101_ade20k_train.sh) Training Set + Val Set Method Backbone FinalScore (MS) Model Scripts : : : : : : : : EncNet+JPU (ours) ResNet 50 GoogleDrive bash (experiments/segmentation/scripts/encnet_res50_ade20k_trainval.sh) EncNet ResNet 101 55.67 EncNet+JPU (ours) ResNet 101 55.84 GoogleDrive bash (experiments/segmentation/scripts/encnet_res101_ade20k_trainval.sh) Note: EncNet (ResNet 101) is trained with crop_size 576 , while EncNet+JPU (ResNet 101) is trained with crop_size 480 for fitting 4 images into a 12G GPU. Visual Results Dataset Input GT EncNet Ours : : : : : : : : : PContext ! (images/img_2009_001858.jpg) ! (images/gt_2009_001858.png) ! (images/encnet_2009_001858.png) ! (images/ours_2009_001858.png) ADE20K ! (images/img_ADE_val_00001086.jpg) ! (images/gt_ADE_val_00001086.png) ! (images/encnet_ADE_val_00001086.png) ! (images/ours_ADE_val_00001086.png) More Visual Results Acknowledgement Code borrows heavily from PyTorch Encoding .",Semantic Segmentation,Semantic Segmentation 2751,Computer Vision,Computer Vision,Computer Vision,"Extended caffe Introduction This repository contains an extended caffe wich is modified from caffe version of yjxiong and introduces many new features. Features on the fly data augmentation, which is used in ImageSegData layer, including mirror, crop, scale, smooth filer, rotation, translation, please refers to caffe\src\caffe\data_transformer\TransformImgAndSeg2 an example is as follows: layer { name: data type: ImageSegData top: data top: label top: data_dim include { phase: TRAIN } transform_param { mirror: true crop_size: 352 mean_value: 104.008 mean_value: 116.669 mean_value: 122.675 scale_factors: 0.5 scale_factors: 0.75 scale_factors: 1 scale_factors: 1.25 scale_factors: 1.5 scale_factors: 1.75 scale_factors: 2.0 smooth_filtering: true max_smooth: 6 apply_probability: 0.5 max_rotation_angle: 60 max_translation: 30 } image_data_param { root_folder: /data1/caiyong.wang/data/Point/CASIA/ source: /data1/caiyong.wang/data/Point/CASIA/list/train_edge.txt batch_size: 1 shuffle: true label_type: PIXEL } } include interp_layer used in deeplab see include balance_cross_entropy_loss_layer used in hed see Holistically Nested Edge Detection include normalize_layer: L2 normalization for parsenet see ParseNet: Looking Wider to See Better support pooling with bin size, output_size for pspnet , segnet support upsample_layer used in segnet layer { name: pool4 type: Pooling bottom: conv4_3 top: pool4 top: pool4_idx top: pool4_size pooling_param { pool: MAX kernel_size: 2 stride: 2 output_size: true } } layer { name: upsample4 type: Upsample bottom: conv5_1_D bottom: pool4_idx bottom: pool4_size top: pool4_D } include dice_loss_layer include focal_sigmoid_loss_layer, the usage is simlar with SigmoidCrossEntropyLoss layer { name: loss_mask type: FocalSigmoidLoss bottom: mask_pred bottom: mask_label top: loss_mask loss_weight: 10 loss_param { ignore_label: 255 normalize: true } focal_sigmoid_loss_param { alpha: 0.95 gamma: 2 } } include focal_softmax_loss_layer, modified from the usage is similar with SoftmaxWithLoss , more details, please see Focal Loss for Dense Object Detection include prelu_layer include smooth_L1_loss_layer include selu_layer support deconvolution upsampling type: nearest layer { name: out_2_up4 type: Deconvolution bottom: out_2 top: out_2_up4 param { lr_mult: 0 decay_mult: 0 } convolution_param { num_output: 2 bias_term: false pad: 0 kernel_size: 4 group: 2 stride: 4 weight_filler { type: nearest } } } include my_spp_layer for spatial pyramid pooling see SPPNet:Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition layer { name: spatial_pyramid_pooling type: MySPP bottom: conv5 top: pool5 my_spp_param { pool: MAX bin_size: 2 bin_size: 3 bin_size: 6 } } Installation For installation, please follow the instructions of Caffe . For chinese users, please refers to and To enable cuDNN for GPU acceleration, cuDNN v6 is needed. The code has been tested successfully on CentOS 6.9 with CUDA 8.0. Questions Please contact wangcaiyong2017@ia.ac.cn Following is the original README of Caffe. Caffe Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by the Berkeley Vision and Learning Center ( BVLC ) and community contributors. Check out the project site for all the details like DIY Deep Learning for Vision with Caffe Tutorial Documentation BVLC reference models and the community model zoo Installation instructions and step by step examples. Join the chat at Please join the caffe users group or gitter chat to ask questions and talk about methods and models. Framework development discussions and thorough bug reports are collected on Issues . Happy brewing! License and Citation Caffe is released under the BSD 2 Clause license . The BVLC reference models are released for unrestricted use. Please cite Caffe in your publications if it helps your research: @article{jia2014caffe, Author {Jia, Yangqing and Shelhamer, Evan and Donahue, Jeff and Karayev, Sergey and Long, Jonathan and Girshick, Ross and Guadarrama, Sergio and Darrell, Trevor}, Journal {arXiv preprint arXiv:1408.5093}, Title {Caffe: Convolutional Architecture for Fast Feature Embedding}, Year {2014} }",Semantic Segmentation,Semantic Segmentation 2760,Computer Vision,Computer Vision,Computer Vision,"Advanced Lane Finding Udacity Self Driving Car NanoDegree 扩展阅读: 1.如何在opencv中利用trackbar调节参数: 2.语义分割: 3.利用deep learning做这个项目的一个解决方案: 项目简介: 使用计算机视觉如摄像头校准、去除畸变和透视变换等技术标记、追踪、拟合车道线,检测车道线曲率及车辆相对于每帧图像中心位置的距离。 In this project, your goal is to write a software pipeline to identify the lane boundaries in a video, but the main output or product we want you to create is a detailed writeup of the project. Check out the writeup template for this project and use it as a starting point for creating your own writeup. Creating a great writeup: A great writeup should include the rubric points as well as your description of how you addressed each point. You should include a detailed description of the code used in each step (with line number references and code snippets where necessary), and links to other supporting documents or external references. You should include images in your writeup to demonstrate how your code works with examples. All that said, please be concise! We're not looking for you to write a book here, just a brief description of how you passed each rubric point, and references to the relevant code :). You're not required to use markdown for your writeup. If you use another method please just submit a pdf of your writeup. The Project The goals / steps of this project are the following: Compute the camera calibration matrix and distortion coefficients given a set of chessboard images. Apply a distortion correction to raw images. Use color transforms, gradients, etc., to create a thresholded binary image. Apply a perspective transform to rectify binary image ( birds eye view ). Detect lane pixels and fit to find the lane boundary. Determine the curvature of the lane and vehicle position with respect to center. Warp the detected lane boundaries back onto the original image. Output visual display of the lane boundaries and numerical estimation of lane curvature and vehicle position. The images for camera calibration are stored in the folder called camera_cal . The images in test_images are for testing your pipeline on single frames. If you want to extract more test images from the videos, you can simply use an image writing method like cv2.imwrite() , i.e., you can read the video in frame by frame as usual, and for frames you want to save for later you can write to an image file. To help the reviewer examine your work, please save examples of the output from each stage of your pipeline in the folder called output_images , and include a description in your writeup for the project of what each image shows. The video called project_video.mp4 is the video your pipeline should work well on. The challenge_video.mp4 video is an extra (and optional) challenge for you if you want to test your pipeline under somewhat trickier conditions. The harder_challenge.mp4 video is another optional challenge and is brutal! If you're feeling ambitious (again, totally optional though), don't stop there! We encourage you to go out and take video of your own, calibrate your camera and show us how you would implement this project from scratch! How to write a README A well written README file can enhance your project and portfolio. Develop your abilities to create professional README files by completing this free course .",Semantic Segmentation,Semantic Segmentation 2775,Computer Vision,Computer Vision,Computer Vision,"PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space Created by Charles R. Qi , Li (Eric) Yi , Hao Su , Leonidas J. Guibas from Stanford University. ! prediction example Citation If you find our work useful in your research, please consider citing: @article{qi2017pointnetplusplus, title {PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space}, author {Qi, Charles R and Yi, Li and Su, Hao and Guibas, Leonidas J}, journal {arXiv preprint arXiv:1706.02413}, year {2017} } Introduction This work is based on our NIPS'17 paper. You can find arXiv version of the paper here or check project webpage for a quick overview. PointNet++ is a follow up project that builds on and extends PointNet . It is version 2.0 of the PointNet architecture. PointNet (the v1 model) either transforms features of individual points independently or process global features of the entire point set . However, in many cases there are well defined distance metrics such as Euclidean distance for 3D point clouds collected by 3D sensors or geodesic distance for manifolds like isometric shape surfaces. In PointNet++ we want to respect spatial localities of those point sets. PointNet++ learns hierarchical features with increasing scales of contexts, just like that in convolutional neural networks. Besides, we also observe one challenge that is not present in convnets (with images) non uniform densities in natural point clouds. To deal with those non uniform densities, we further propose special layers that are able to intelligently aggregate information from different scales. In this repository we release code and data for our PointNet++ classification and segmentation networks as well as a few utility scripts for training, testing and data processing and visualization. Installation Install TensorFlow . The code is tested under TF1.2 GPU version and Python 2.7 (version 3 should also work) on Ubuntu 14.04. There are also some dependencies for a few Python libraries for data processing and visualizations like cv2 , h5py etc. It's highly recommended that you have access to GPUs. Compile Customized TF Operators The TF operators are included under tf_ops , you need to compile them (check tf_xxx_compile.sh under each ops subfolder) first. Update nvcc and python path if necessary. The code is tested under TF1.2.0. If you are using earlier version it's possible that you need to remove the D_GLIBCXX_USE_CXX11_ABI 0 flag in g++ command in order to compile correctly. To compile the operators in TF version > 1.4, you need to modify the compile scripts slightly. First, find Tensorflow include and library paths. TF_INC $(python c 'import tensorflow as tf; print(tf.sysconfig.get_include())') TF_LIB $(python c 'import tensorflow as tf; print(tf.sysconfig.get_lib())') Then, add flags of I$TF_INC/external/nsync/public L$TF_LIB ltensorflow_framework to the g++ commands. Usage Shape Classification To train a PointNet++ model to classify ModelNet40 shapes (using point clouds with XYZ coordinates): python train.py To see all optional arguments for training: python train.py h If you have multiple GPUs on your machine, you can also run the multi GPU version training (our implementation is similar to the tensorflow cifar10 tutorial ): CUDA_VISIBLE_DEVICES 0,1 python train_multi_gpu.py num_gpus 2 After training, to evaluate the classification accuracies (with optional multi angle voting): python evaluate.py num_votes 12 Side Note: For the XYZ+normal experiment reported in our paper: (1) 5000 points are used and (2) a further random data dropout augmentation is used during training (see commented line after augment_batch_data in train.py and (3) the model architecture is updated such that the nsample 128 in the first two set abstraction levels, which is suited for the larger point density in 5000 point samplings. To use normal features for classification: You can get our sampled point clouds of ModelNet40 (XYZ and normal from mesh, 10k points per shape) here (1.6GB) . Move the uncompressed data folder to data/modelnet40_normal_resampled Object Part Segmentation To train a model to segment object parts for ShapeNet models: cd part_seg python train.py Preprocessed ShapeNetPart dataset (XYZ, normal and part labels) can be found here (674MB) . Move the uncompressed data folder to data/shapenetcore_partanno_segmentation_benchmark_v0_normal Semantic Scene Parsing See scannet/README and scannet/train.py for details. Visualization Tools We have provided a handy point cloud visualization tool under utils . Run sh compile_render_balls_so.sh to compile it and then you can try the demo with python show3d_balls.py The original code is from here . Prepare Your Own Data You can refer to here on how to prepare your own HDF5 files for either classification or segmentation. Or you can refer to modelnet_dataset.py on how to read raw data files and prepare mini batches from them. A more advanced way is to use TensorFlow's dataset APIs, for which you can find more documentations here . License Our code is released under MIT License (see LICENSE file for details). Updates 02/23/2018: Added support for multi gpu training for the classification task. 02/23/2018: Adopted a new way for data loading. No longer require manual data downloading to train a classification network. 02/06/2018: Added sample training code for ScanNet semantic segmentation. Related Projects PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation by Qi et al. (CVPR 2017 Oral Presentation). Code and data released in GitHub . Frustum PointNets for 3D Object Detection from RGB D Data by Qi et al. (CVPR 2018) A novel framework for 3D object detection with RGB D data. Based on 2D boxes from a 2D object detector on RGB images, we extrude the depth maps in 2D boxes to point clouds in 3D space and then realize instance segmentation and 3D bounding box estimation using PointNet/PointNet++. The method proposed has achieved first place on KITTI 3D object detection benchmark on all categories (last checked on 11/30/2017). Code and data release TBD.",Semantic Segmentation,Semantic Segmentation 2795,Computer Vision,Computer Vision,Computer Vision,"Autofocus Layer for Semantic Segmentation Introduction This is a PyTorch implementation of the autofocus convolutional layer proposed for semantic segmentation with the objective of enhancing the capabilities of neural networks for multi scale processing. Autofocus layers adaptively change the size of the effective receptive field based on the processed context to generate more powerful features. The proposed autofocus layer can be easily integrated into existing networks to improve a model's representational power. Here we apply the autofocus convolutional layer to deep neural networks for 3D semantic segmentation. We run experiments on the Brain Tumor Image Segmentation dataset (BRATS2015) as an example to show how the models work. In addition, we also implement a series of deep learning based models used for 3D Semantic Segmentation. The details of all the models implemented here can be found in our paper: Autofocus Layer for Semantic Segmentation . Figure 1. An autofocus convolutional layer with four candidate dilation rates. (a) The attention model. (b) A weighted summation of activations from parallel dilated convolutions. (c) An example of attention maps for a small (r^1) and large (r^2) dilation rate. The first row is the input and the segmentation result of AFN6. Citation If you find the code or the models implemented here are useful, please cite our paper: Autofocus Layer for Semantic Segmentation . Y. Qin , K. Kamnitsas, S. Ancha, J. Nanavati, G. Cottrell, A. Criminisi, A. Nori, MICCAI 2018. Data You can download the full dataset with training and testing images from To run all the models here, you need to do a series of data pre processing to the input images. Provide a mask including the Region of Interest (RoI) as one of the input image. For example, in the BRATS dataset, the region outside the brain should be masked out with the provided mask. The intensity of the data within the RoI must be normalized to be zero mean, unit variance. For the BRATS dataset, each image must be normalized independently other than doing the normalization with the mean and variance of the whole training dataset. Make sure the ground truth labels for training and testing represent the background with zero. For example, we have four different classes in the BRATS dataset, then the number of classes in this dataset will be 5, including the background ( num_classes 5 ) and number zero will be used to represent the background. When you use the training code for your own data, please change the data path correspondingly. We provide the example codes for data preprocessing, including converting the data format, generating the masks and normalizing the input image. The corresponding text file is also provided to show the directory where the image are saved. You can create your own text file to save the image data path and change the corresponding code in the python scripts. The data normalization code is mainly derived from DeepMedic . A small subset of the BRATS dataset (after all the above data pre processing) is provided here to run the preset examples. Supported Models Please refer Autofocus Layer for Semantic Segmentation for the details of all the supported models. Basic Model: half pathway of DeepMedic with the last 6 layers with dilation rates equal 2. ASPP c: adding an ASPP module on top of Basic model (parallel features are merged via concatenation). ASPP s: adding an ASPP module on top of Basic model (parallel features are merged via summation). AFN1 6: with the last 1 6 dilated convolutional layers replaced by our proposed aufofocus convolutional layer. Performance The performance reported here is an average over three runs on the 54 images from BRATS 2015 dataset. All the trained models can be downloaded here . Table 1: Dice scores shown in format mean (standard deviation). Environment The code is developed under the follwing configurations. 1 3 GPUs with at least 12G GPU memories. You can choose the number of GPUs used via num_gpus NUM_GPUS . PyTorch 0.3.0 or higher is required to run the codes. Nibabel is used here for loading the NIFTI images. SimpleITK is used for saving the output into images. Installation bash git clone conda install pytorch torchvision c pytorch pip install nibabel pip install SimpleITK Quick Start First, you need to download the provided subset of BRATS dataset and all the trained models. Please run bash chmod +x download_sub_dataset.sh ./download_sub_daset.sh chmod +x download_models.sh ./download_models.sh Then you can run the following script to choose a model and do the testing. Here we use AFN1 as an example. bash python test.py num_gpus 1 id AFN1 test_epoch 390 You can change the number of used GPUs via num_gpus NUM_GPUS and choose the tested model that you want via id MODEL_ID . Make sure the test epoch is included in the downloaded directory saved_models . You can check all the input arguments via python test.py h . Training In the provided subset of dataset, we also provided 20 example images for training. You can start training via: bash python train.py num_gpus NUM_GPUS id MODEL_ID For the models like Basic , you may only need one gpu to run the experiments. For the models like AFN6 , you may need to increase the number of GPUs to be 2 or 3. This depends on the GPU memory that you are using. Please check all the input arguments via python train.py h . Evaluation You can evaluate a series of models saved after different epochs for one network via. bash python val.py num_gpus NUM_GPUS id MODEL_ID Please make sure that you have already provided a validation list in order to load the validation images. You can specify the steps of epochs that you want to evaluate. Please check all the input arguments via python val.py h . Testing Case 1 If you have labels for test data and want to see the accuracy (e.g., Dice score for BRATS dataset), you can use the following two testing codes: test.py The input of the network are small image segments as in the training stage. test_full.py The input of the network is a full image rather than a smaller image segment. There are small differences of these two different testing methods due to the padding in the convolutions. For the performance that we report above, we use the test.py to get all the results. To test, you can simply run bash python test.py/test_full.py num_gpus NUM_GPUS id MODEL_ID test_epoch NUM_OF_TEST_EPOCH You can increase the number of GPUs to speed up the evaluation process. You can also use visualize action to save the prediction as an output image. Case 2 If you do not have ground truth for test data, you should use test_online.py to do the testing and save the output. For the BRATS dataset, you can simply run the following script to generate the predicted images and submit them to the online evaluation server. bash python test_online.py num_gpus NUM_GPUS id MODEL_ID test_epoch NUM_OF_TEST_EPOCH visualize Note! If you want to run test_online.py on the provided sample testing images, you need to change the directory of data when loading images. Contact If you have any problems when using our codes or models, please feel free to contact me via e mail: yaq007@eng.ucsd.edu.",Semantic Segmentation,Semantic Segmentation 2799,Computer Vision,Computer Vision,Computer Vision,"TorchSeg This project aims at providing a fast, modular reference implementation for semantic segmentation models using PyTorch. ! demo image (demo/cityscapes_demo_img.png) Highlights Modular Design: easily construct a customized semantic segmentation models by combining different components. Distributed Training: >60% faster than the multi thread parallel method( nn.DataParallel ), we use the multi processing parallel method. Multi GPU training and inference: support different manners of inference. Provides pre trained models and implement different semantic segmentation models. Prerequisites PyTorch 1.0 pip3 install torch torchvision Easydict pip3 install easydict Apex Ninja sudo apt get install ninja build tqdm pip3 install tqdm Model Zoo Supported Model FCN DFN BiSeNet PSPNet Performance and Benchmarks SS:Single Scale MSF:Multi scale + Flip PASCAL VOC 2012 Methods Backbone TrainSet EvalSet Mean IoU(SS) Mean IoU(MSF) Model : : : : : : : : : : : : : : FCN 32s R101_v1c train_aug val 71.26 BaiduYun / GoogleDrive DFN(paper) R101_v1c train_aug val 79.67 80.6 1 BaiduYun / GoogleDrive DFN(ours) R101_v1c train_aug val 79.63 81.15 BaiduYun / GoogleDrive 80.6 1 : this result reported in paper is further finetuned on train dataset. Cityscapes Non real time Methods Methods Backbone OHEM TrainSet EvalSet Mean IoU(ss) Mean IoU(msf) Model : : : : : : : : : : : : : : : : DFN(paper) R101_v1c ✗ train_fine val 78.5 79.3 BaiduYun / GoogleDrive DFN(ours) R101_v1c ✓ train_fine val 79.49 80.32 BaiduYun / GoogleDrive BiSeNet(paper) R101_v1c ✓ train_fine val 80.3 BaiduYun / GoogleDrive BiSeNet(ours) R101_v1c ✓ train_fine val 79.56 80.29 BaiduYun / GoogleDrive BiSeNet(paper) R18 ✓ train_fine val 76.21 78.57 BaiduYun / GoogleDrive BiSeNet(ours) R18 ✓ train_fine val 76.33 78.46 BaiduYun / GoogleDrive BiSeNet(paper) X39 ✓ train_fine val 70.1 72 BaiduYun / GoogleDrive BiSeNet(ours) 1 X39 ✓ train_fine val 69.1 72.2 BaiduYun / GoogleDrive BiSeNet(ours) 1 : because we didn't pre train the Xception39 model on ImageNet in PyTorch, we train this experiment from scratch. We will release the pre trained Xception39 model in PyTorch and the corresponding experiment. Real time Methods Methods Backbone OHEM TrainSet EvalSet Mean IoU Model : : : : : : : : : : : : : : BiSeNet(paper) R18 ✓ train_fine val 74.8 BaiduYun / GoogleDrive BiSeNet(ours) R18 ✓ train_fine val 74.6 BaiduYun / GoogleDrive BiSeNet(paper) X39 ✓ train_fine val 69 BaiduYun / GoogleDrive BiSeNet(ours) 1 X39 ✓ train_fine val 68.5 BaiduYun / GoogleDrive ADE Methods Backbone TrainSet EvalSet Mean IoU Accuracy Model : : : : : : : : : : : : : : PSPNet(paper) R50_v1c train val 41.68(ss) 80.04(ss) BaiduYun / GoogleDrive PSPNet(ours) R50_v1c train val 41.61(ss) 80.19(ss) BaiduYun / GoogleDrive To Do release all trained models offer comprehensive documents support more semantic segmentation models Deeplab v3 / Deeplab v3+ DenseASPP PSANet EncNet OCNet Training 1. create the config file of dataset: train.txt , val.txt , test.txt file structure:(split with tab ) txt path of the image path of the groundtruth 2. modify the config.py according to your requirements 3. train a network: Distributed Training We use the official torch.distributed.launch in order to launch multi gpu training. This utility function from PyTorch spawns as many Python processes as the number of GPUs we want to use, and each Python process will only use a single GPU. For each experiment, you can just run this script: bash export NGPUS 8 python m torch.distributed.launch nproc_per_node $NGPUS train.py Non distributed Training The above performance are all conducted based on the non distributed training. For each experiment, you can just run this script: bash python train.py d 0 7 the argument of d means the GPU you want to use. Inference In the evaluator, we have implemented the multi gpu inference base on the multi process. In the inference phase, the function will spawns as many Python processes as the number of GPUs we want to use, and each Python process will handle a subset of the whole evaluation dataset on a single GPU. 1. evaluate a trained network on the validation set: bash python3 eval.py 2. input arguments: bash usage: e epoch_idx d device_idx verbose show_image save_path Pred_Save_Path Disclaimer This project is under active development. So things that are currently working might break in a future release. However, feel free to open issue if you get stuck anywhere. Citation The following are BibTeX references. The BibTeX entry requires the url LaTeX package. Please consider citing this project in your publications if it helps your research. @misc{torchseg2019, author {Yu, Changqian}, title {TorchSeg}, howpublished {\url{ year {2019} } Please consider citing the DFN in your publications if it helps your research. @article{yu2018dfn, title {Learning a Discriminative Feature Network for Semantic Segmentation}, author {Yu, Changqian and Wang, Jingbo and Peng, Chao and Gao, Changxin and Yu, Gang and Sang, Nong}, journal {arXiv preprint arXiv:1804.09337}, year {2018} } Please consider citing the BiSeNet in your publications if it helps your research. @inproceedings{yu2018bisenet, title {Bisenet: Bilateral segmentation network for real time semantic segmentation}, author {Yu, Changqian and Wang, Jingbo and Peng, Chao and Gao, Changxin and Yu, Gang and Sang, Nong}, booktitle {European Conference on Computer Vision}, pages {334 349}, year {2018}, organization {Springer} } Why this name, Furnace? Furnace means the Alchemical Furnace . We all are the Alchemist , so I hope everyone can have a good alchemical furnace to practice the Alchemy . Hope you can be a excellent alchemist.",Semantic Segmentation,Semantic Segmentation 2802,Computer Vision,Computer Vision,Computer Vision,"Fully Convolutional Networks for Semantic Segmentation This is the reference implementation of the models and code for the fully convolutional networks (FCNs) in the PAMI FCN and CVPR FCN papers: Fully Convolutional Models for Semantic Segmentation Evan Shelhamer , Jonathan Long , Trevor Darrell PAMI 2016 arXiv:1605.06211 Fully Convolutional Models for Semantic Segmentation Jonathan Long , Evan Shelhamer , Trevor Darrell CVPR 2015 arXiv:1411.4038 Note that this is a work in progress and the final, reference version is coming soon. Please ask Caffe and FCN usage questions on the caffe users mailing list . Refer to these slides for a summary of the approach. These models are compatible with BVLC/caffe:master . Compatibility has held since master@8c66fa5 with the merge of PRs 3613 and 3570. The code and models here are available under the same license as Caffe (BSD 2) and the Caffe bundled models (that is, unrestricted use; see the BVLC model license ). PASCAL VOC models : trained online with high momentum for a 5 point boost in mean intersection over union over the original models. These models are trained using extra data from Hariharan et al. , but excluding SBD val. FCN 32s is fine tuned from the ILSVRC trained VGG 16 model , and the finer strides are then fine tuned in turn. The at once FCN 8s is fine tuned from VGG 16 all at once by scaling the skip connections to better condition optimization. FCN 32s PASCAL (voc fcn32s): single stream, 32 pixel prediction stride net, scoring 63.6 mIU on seg11valid FCN 16s PASCAL (voc fcn16s): two stream, 16 pixel prediction stride net, scoring 65.0 mIU on seg11valid FCN 8s PASCAL (voc fcn8s): three stream, 8 pixel prediction stride net, scoring 65.5 mIU on seg11valid and 67.2 mIU on seg12test FCN 8s PASCAL at once (voc fcn8s atonce): all at once, three stream, 8 pixel prediction stride net, scoring 65.4 mIU on seg11valid FCN AlexNet PASCAL (voc fcn alexnet): AlexNet (CaffeNet) architecture, single stream, 32 pixel prediction stride net, scoring 48.0 mIU on seg11valid. Unlike the FCN 32/16/8s models, this network is trained with gradient accumulation, normalized loss, and standard momentum. (Note: when both FCN 32s/FCN VGG16 and FCN AlexNet are trained in this same way FCN VGG16 is far better; see Table 1 of the paper.) To reproduce the validation scores, use the seg11valid split defined by the paper in footnote 7. Since SBD train and PASCAL VOC 2011 segval intersect, we only evaluate on the non intersecting set for validation purposes. NYUDv2 models : trained online with high momentum on color, depth, and HHA features (from Gupta et al. These models demonstrate FCNs for multi modal input. FCN 32s NYUDv2 Color (nyud fcn32s color): single stream, 32 pixel prediction stride net on color/BGR input FCN 32s NYUDv2 HHA (nyud fcn32s hha): single stream, 32 pixel prediction stride net on HHA input FCN 32s NYUDv2 Early Color Depth (nyud fcn32s color d): single stream, 32 pixel prediction stride net on early fusion of color and (log) depth for 4 channel input FCN 32s NYUDv2 Late Color HHA (nyud fcn32s color hha): single stream, 32 pixel prediction stride net by late fusion of FCN 32s NYUDv2 Color and FCN 32s NYUDv2 HHA SIFT Flow models : trained online with high momentum for joint semantic class and geometric class segmentation. These models demonstrate FCNs for multi task output. FCN 32s SIFT Flow (siftflow fcn32s): single stream stream, 32 pixel prediction stride net FCN 16s SIFT Flow (siftflow fcn16s): two stream, 16 pixel prediction stride net FCN 8s SIFT Flow (siftflow fcn8s): three stream, 8 pixel prediction stride net Note : in this release, the evaluation of the semantic classes is not quite right at the moment due to an issue with missing classes. This will be corrected soon. The evaluation of the geometric classes is fine. PASCAL Context models : trained online with high momentum on an object and scene labeling of PASCAL VOC. FCN 32s PASCAL Context (pascalcontext fcn32s): single stream, 32 pixel prediction stride net FCN 16s PASCAL Context (pascalcontext fcn16s): two stream, 16 pixel prediction stride net FCN 8s PASCAL Context (pascalcontext fcn8s): three stream, 8 pixel prediction stride net Frequently Asked Questions Is learning the interpolation necessary? In our original experiments the interpolation layers were initialized to bilinear kernels and then learned. In follow up experiments, and this reference implementation, the bilinear kernels are fixed. There is no significant difference in accuracy in our experiments, and fixing these parameters gives a slight speed up. Note that in our networks there is only one interpolation kernel per output class, and results may differ for higher dimensional and non linear interpolation, for which learning may help further. Why pad the input? : The 100 pixel input padding guarantees that the network output can be aligned to the input for any input size in the given datasets, for instance PASCAL VOC. The alignment is handled automatically by net specification and the crop layer. It is possible, though less convenient, to calculate the exact offsets necessary and do away with this amount of padding. Why are all the outputs/gradients/parameters zero? : This is almost universally due to not initializing the weights as needed. To reproduce our FCN training, or train your own FCNs, it is crucial to transplant the weights from the corresponding ILSVRC net such as VGG16. The included surgery.transplant() method can help with this. What about FCN GoogLeNet? : a reference FCN GoogLeNet for PASCAL VOC is coming soon.",Semantic Segmentation,Semantic Segmentation 2821,Computer Vision,Computer Vision,Computer Vision,"SegNet keras implementation An implementation of SegNet in keras The repository doesn't contain dataset, please prepare and set up it in config.py. A nicely normalized and cleaned dataset can be downloaded from To train a new model: console python train.py save name of model to be saved To load a model and resume training: console python train.py save name of model to be saved resume load name of model to be loaded To test a trained model console python test.py load name of model to read",Semantic Segmentation,Semantic Segmentation 2822,Computer Vision,Computer Vision,Computer Vision,"SegNet keras implementation An implementation of SegNet in keras The repository doesn't contain dataset, please prepare and set up it in config.py. A nicely normalized and cleaned dataset can be downloaded from To train a new model: console python train.py save name of model to be saved To load a model and resume training: console python train.py save name of model to be saved resume load name of model to be loaded To test a trained model console python test.py load name of model to read",Semantic Segmentation,Semantic Segmentation 2828,Computer Vision,Computer Vision,Computer Vision,"Dual Attention Network for Scene Segmentation Introduction We propose a Dual Attention Network (DANet) to adaptively integrate local features with their global dependencies based on the self attention mechanism. And we achieve new state of the art segmentation performance on three challenging scene segmentation datasets, i.e., Cityscapes, PASCAL Context and COCO Stuff 10k dataset. ! image (img/overview.png) Cityscapes testing set result We train our DANet 101 with only fine annotated data and submit our test results to the official evaluation server. ! image (img/tab3.png) Usage 1. Install pytorch The code is tested on python3.6 and official Pytorch@commitfd25a2a , please install PyTorch from source. The code is modified from PyTorch Encoding . 2. Clone the repository: shell git clone cd DANet python setup.py install 3. Dataset Download the Cityscapes dataset and convert the dataset to 19 categories . Please put dataset in folder ./datasets 4 . Evaluation Download trained model DANet101 and put it in folder ./danet/cityscapes/model Evaluation code is in folder ./danet/cityscapes cd danet For single scale testing, please run: shell CUDA_VISIBLE_DEVICES 0 python test.py dataset cityscapes model danet resume dir cityscapes/model base size 2048 crop size 768 workers 1 backbone resnet101 multi grid multi dilation 4 8 16 eval CUDA_VISIBLE_DEVICES 0,1,2,3 python test.py dataset cityscapes model danet resume dir cityscapes/model base size 2048 crop size 768 workers 1 backbone resnet101 multi grid multi dilation 4 8 16 eval For multi scale testing, please run: shell CUDA_VISIBLE_DEVICES 0,1,2,3 python test.py dataset cityscapes model danet resume dir cityscapes/model base size 2048 crop size 1024 workers 1 backbone resnet101 multi grid multi dilation 4 8 16 eval multi scales If you want to visualize the result of DAN 101, you can run: shell CUDA_VISIBLE_DEVICES 0,1,2,3 python test.py dataset cityscapes model danet resume dir cityscapes/model base size 2048 crop size 768 workers 1 backbone resnet101 multi grid multi dilation 4 8 16 5. Evaluation Result: The expected scores will show as follows: (single scale testing denotes as 'ss' and multiple scale testing denotes as 'ms') DANet101 on cityscapes val set (mIoU/pAcc): 79.93/95.97 (ss) and 81.49/96.41 (ms) 6. Training: Training code is in folder ./danet/cityscapes cd danet You can reproduce our result by run: shell CUDA_VISIBLE_DEVICES 0 python train.py dataset cityscapes model danet backbone resnet101 checkname danet101 base size 1024 crop size 768 epochs 240 batch size 2 lr 0.003 workers 2 multi grid multi dilation 4 8 16 CUDA_VISIBLE_DEVICES 0,1,2,3 python train.py dataset cityscapes model danet backbone resnet101 checkname danet101 base size 1024 crop size 768 epochs 240 batch size 8 lr 0.003 workers 2 multi grid multi dilation 4 8 16 Note that: We adopt multiple losses in end of the network for better training. Citation If DANet is useful for your research, please consider citing: @article{fu2018dual, title {Dual Attention Network for Scene Segmentation}, author {Fu, Jun and Liu, Jing and Tian, Haijie, and Fang, Zhiwei, and Lu, Hanqing}, journal {arXiv preprint arXiv:1809.02983}, year {2018} } Acknowledgement Thanks PyTorch Encoding , especially the Synchronized BN!",Semantic Segmentation,Semantic Segmentation 2852,Computer Vision,Computer Vision,Computer Vision,"DeepLab v2 Introduction DeepLab is a state of art deep learning system for semantic image segmentation built on top of Caffe . It combines (1) atrous convolution to explicitly control the resolution at which feature responses are computed within Deep Convolutional Neural Networks, (2) atrous spatial pyramid pooling to robustly segment objects at multiple scales with filters at multiple sampling rates and effective fields of views, and (3) densely connected conditional random fields (CRF) as post processing. This distribution provides a publicly available implementation for the key model ingredients reported in our latest arXiv paper . This version also supports the experiments (DeepLab v1) in our ICLR'15. You only need to modify the old prototxt files. For example, our proposed atrous convolution is called dilated convolution in CAFFE framework, and you need to change the convolution parameter hole to dilation (the usage is exactly the same). For the experiments in ICCV'15, there are some differences between our argmax and softmax_loss layers and Caffe's. Please refer to DeepLabv1 for details. Please consult and consider citing the following papers: @article{CP2016Deeplab, title {DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs}, author {Liang Chieh Chen and George Papandreou and Iasonas Kokkinos and Kevin Murphy and Alan L Yuille}, journal {arXiv:1606.00915}, year {2016} } @inproceedings{CY2016Attention, title {Attention to Scale: Scale aware Semantic Image Segmentation}, author {Liang Chieh Chen and Yi Yang and Jiang Wang and Wei Xu and Alan L Yuille}, booktitle {CVPR}, year {2016} } @inproceedings{CB2016Semantic, title {Semantic Image Segmentation with Task Specific Edge Detection Using CNNs and a Discriminatively Trained Domain Transform}, author {Liang Chieh Chen and Jonathan T Barron and George Papandreou and Kevin Murphy and Alan L Yuille}, booktitle {CVPR}, year {2016} } @inproceedings{PC2015Weak, title {Weakly and Semi Supervised Learning of a DCNN for Semantic Image Segmentation}, author {George Papandreou and Liang Chieh Chen and Kevin Murphy and Alan L Yuille}, booktitle {ICCV}, year {2015} } @inproceedings{CP2015Semantic, title {Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs}, author {Liang Chieh Chen and George Papandreou and Iasonas Kokkinos and Kevin Murphy and Alan L Yuille}, booktitle {ICLR}, year {2015} } Note that if you use the densecrf implementation, please consult and cite the following paper: @inproceedings{KrahenbuhlK11, title {Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials}, author {Philipp Kr{\ {a}}henb{\ {u}}hl and Vladlen Koltun}, booktitle {NIPS}, year {2011} } Performance DeepLabv2 currently achieves 79.7% on the challenging PASCAL VOC 2012 semantic image segmentation task see the leaderboard . Please refer to our project website for details. Pre trained models We have released several trained models and corresponding prototxt files at here . Please check it for more model details. Experimental set up 1. The scripts we used for our experiments can be downloaded from this link : 1. run_pascal.sh: the script for training/testing on the PASCAL VOC 2012 dataset. __Note__ You also need to download sub.sed script. 2. run_densecrf.sh and run_densecrf_grid_search.sh: the scripts we used for post processing the DCNN computed results by DenseCRF. 2. The image list files used in our experiments can be downloaded from this link : The zip file stores the list files for the PASCAL VOC 2012 dataset. 3. To use the mat_read_layer and mat_write_layer, please download and install matio . FAQ Check FAQ if you have some problems while using the code. How to run DeepLab There are several variants of DeepLab. To begin with, we suggest DeepLab LargeFOV, which has good performance and faster training time. Suppose the codes are located at deeplab/code 1. mkdir deeplab/exper (Create a folder for experiments) 2. mkdir deeplab/exper/voc12 (Create a folder for your specific experiment. Let's take PASCAL VOC 2012 for example.) 3. Create folders for config files and so on. 1. mkdir deeplab/exper/voc12/config (where network config files are saved.) 2. mkdir deeplab/exper/voc12/features (where the computed features will be saved (when train on train)) 3. mkdir deeplab/exper/voc12/features2 (where the computed features will be saved (when train on trainval)) 4. mkdir deeplab/exper/voc12/list (where you save the train, val, and test file lists) 5. mkdir deeplab/exper/voc12/log (where the training/test logs will be saved) 6. mkdir deeplab/exper/voc12/model (where the trained models will be saved) 7. mkdir deeplab/exper/voc12/res (where the evaluation results will be saved) 4. mkdir deeplab/exper/voc12/config/deeplab_largeFOV (test your own network. Create a folder under config. For example, deeplab_largeFOV is the network you want to experiment with. Add your train.prototxt and test.prototxt in that folder (you can check some provided examples for reference).) 5. Set up your init.caffemodel at deeplab/exper/voc12/model/deeplab_largeFOV. You may want to soft link init.caffemodel to the modified VGG 16 net. For example, run ln s vgg16.caffemodel init.caffemodel at voc12/model/deeplab_largeFOV. 6. Modify the provided script, run_pascal.sh, for experiments. You should change the paths according to your setting. For example, you should specify where the caffe is by changing CAFFE_DIR. Note You may need to modify sub.sed, if you want to replace some variables with your desired values in train.prototxt or test.prototxt. 7. The computed features are saved at folders features or features2, and you can run provided MATLAB scripts to evaluate the results (e.g., check the script at code/matlab/my_script/EvalSegResults). Python Seyed Ali Mousavi has implemented a python version of run_pascal.sh (Thanks, Ali!). If you are more familiar with Python, you may want to take a look at this .",Semantic Segmentation,Semantic Segmentation 2866,Computer Vision,Computer Vision,Computer Vision,"PyTorch Encoding created by Hang Zhang Documentation Please visit the Docs for detail instructions of installation and usage. Please visit the link to examples of semantic segmentation. Citations Context Encoding for Semantic Segmentation arXiv Hang Zhang , Kristin Dana , Jianping Shi , Zhongyue Zhang , Xiaogang Wang , Ambrish Tyagi , Amit Agrawal @InProceedings{Zhang_2018_CVPR, author {Zhang, Hang and Dana, Kristin and Shi, Jianping and Zhang, Zhongyue and Wang, Xiaogang and Tyagi, Ambrish and Agrawal, Amit}, title {Context Encoding for Semantic Segmentation}, booktitle {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, month {June}, year {2018} } Deep TEN: Texture Encoding Network arXiv Hang Zhang , Jia Xue , Kristin Dana @InProceedings{Zhang_2017_CVPR, author {Zhang, Hang and Xue, Jia and Dana, Kristin}, title {Deep TEN: Texture Encoding Network}, booktitle {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, month {July}, year {2017} }",Semantic Segmentation,Semantic Segmentation 2874,Computer Vision,Computer Vision,Computer Vision,"参考自: environment CUDA:8.0 cudnn:7.1.3 python:3.5 pytorch encoding:0.5.2 torch encoding可进行下载 pytorch:1.0.0 pytorch/pytorch (master 分支) gcc:5.4.0 ninja:1.8.2 newer ninja build (1.8.2) on Ubuntu 14.04 Trusty :: Dual Attention Network for Scene Segmentation Introduction We propose a Dual Attention Network (DANet) to adaptively integrate local features with their global dependencies based on the self attention mechanism. And we achieve new state of the art segmentation performance on three challenging scene segmentation datasets, i.e., Cityscapes, PASCAL Context and COCO Stuff 10k dataset. ! image (img/overview.png) Cityscapes testing set result We train our DANet 101 with only fine annotated data and submit our test results to the official evaluation server. ! image (img/tab3.png) Usage 1. Install pytorch The code is tested on python3.6 and official Pytorch@commitfd25a2a , please install PyTorch from source. The code is modified from PyTorch Encoding . 2. Clone the repository: shell git clone cd DANet python setup.py install 3. Dataset Download the Cityscapes dataset and convert the dataset to 19 categories . Please put dataset in folder ./datasets 4 . Evaluation Download trained model DANet101 and put it in folder ./danet/cityscapes/model Evaluation code is in folder ./danet/cityscapes cd danet For single scale testing, please run: shell CUDA_VISIBLE_DEVICES 0,1,2,3 python test.py dataset cityscapes model danet resume dir cityscapes/model base size 2048 crop size 768 workers 1 backbone resnet101 multi grid multi dilation 4 8 16 eval For multi scale testing, please run: shell CUDA_VISIBLE_DEVICES 0,1,2,3 python test.py dataset cityscapes model danet resume dir cityscapes/model base size 2048 crop size 1024 workers 1 backbone resnet101 multi grid multi dilation 4 8 16 eval multi scales If you want to visualize the result of DAN 101, you can run: shell CUDA_VISIBLE_DEVICES 0,1,2,3 python test.py dataset cityscapes model danet resume dir cityscapes/model base size 2048 crop size 768 workers 1 backbone resnet101 multi grid multi dilation 4 8 16 5. Evaluation Result: The expected scores will show as follows: (single scale testing denotes as 'ss' and multiple scale testing denotes as 'ms') DANet101 on cityscapes val set (mIoU/pAcc): 79.93/95.97 (ss) and 81.49/96.41 (ms) 6. Training: Training code is in folder ./danet/cityscapes cd danet You can reproduce our result by run: shell CUDA_VISIBLE_DEVICES 0,1,2,3 python train.py dataset cityscapes model danet backbone resnet101 checkname danet101 base size 1024 crop size 768 epochs 240 batch size 8 lr 0.003 workers 2 multi grid multi dilation 4 8 16 Note that: We adopt multiple losses in end of the network for better training. Citation If DANet is useful for your research, please consider citing: @article{fu2018dual, title {Dual Attention Network for Scene Segmentation}, author {Fu, Jun and Liu, Jing and Tian, Haijie, and Fang, Zhiwei, and Lu, Hanqing}, journal {arXiv preprint arXiv:1809.02983}, year {2018} } Acknowledgement Thanks PyTorch Encoding , especially the Synchronized BN!",Semantic Segmentation,Semantic Segmentation 2878,Computer Vision,Computer Vision,Computer Vision,"Semantic Segmentation on MIT ADE20K dataset in PyTorch This is a PyTorch implementation of semantic segmentation models on MIT ADE20K scene parsing dataset. ADE20K is the largest open source dataset for semantic segmentation and scene parsing, released by MIT Computer Vision team. Follow the link below to find the repository for our dataset and implementations on Caffe and Torch7: All pretrained models can be found at: From left to right: Test Image, Ground Truth, Predicted Result Highlights Syncronized Batch Normalization on PyTorch This module computes the mean and standard deviation across all devices during training. We empirically find that a reasonable large batch size is important for segmentation. We thank Jiayuan Mao for his kind contributions, please refer to Synchronized BatchNorm PyTorch for details. The implementation is easy to use as: It is pure python, no C++ extra extension libs. It is completely compatible with PyTorch's implementation. Specifically, it uses unbiased variance to update the moving average, and use sqrt(max(var, eps)) instead of sqrt(var + eps). It is efficient, only 20% to 30% slower than UnsyncBN. Dynamic scales of input for training with multiple GPUs For the task of semantic segmentation, it is good to keep aspect ratio of images during training. So we re implement the DataParallel module, and make it support distributing data to multiple GPUs in python dict, so that each gpu can process images of different sizes. At the same time, the dataloader also operates differently. Now the batch size of a dataloader always equals to the number of GPUs , each element will be sent to a GPU. It is also compatible with multi processing. Note that the file index for the multi processing dataloader is stored on the master process, which is in contradict to our goal that each worker maintains its own file list. So we use a trick that although the master process still gives dataloader an index for __getitem__ function, we just ignore such request and send a random batch dict. Also, the multiple workers forked by the dataloader all have the same seed , you will find that multiple workers will yield exactly the same data, if we use the above mentioned trick directly. Therefore, we add one line of code which sets the defaut seed for numpy.random before activating multiple worker in dataloader. An Efficient and Effective Framework: UPerNet UPerNet is a model based on Feature Pyramid Network (FPN) and Pyramid Pooling Module (PPM). It doesn't need dilated convolution, an operator that is time and memory consuming. Without bells and whistles , it is comparable or even better compared with PSPNet, while requiring much shorter training time and less GPU memory (e.g., you cannot train a PSPNet 101 on TITAN Xp GPUs with only 12GB memory, while you can train a UPerNet 101 on such GPUs). Thanks to the efficient network design, we will soon open source stronger models of UPerNet based on ResNeXt that is able to run on normal GPUs. Please refer to UperNet for details. Supported models We split our models into encoder and decoder, where encoders are usually modified directly from classification networks, and decoders consist of final convolutions and upsampling. Encoder: MobileNetV2dilated ResNet18dilated ResNet50dilated ResNet101dilated Coming soon : ResNeXt101dilated Decoder: C1 (1 convolution module) C1_deepsup (C1 + deep supervision trick) PPM (Pyramid Pooling Module, see PSPNet paper for details.) PPM_deepsup (PPM + deep supervision trick) UPerNet (Pyramid Pooling + FPN head, see UperNet for details.) Performance: IMPORTANT: We use our self trained base model on ImageNet. The model takes the input in BGR form (consistent with opencv) instead of RGB form as used by default implementation of PyTorch. The base model will be automatically downloaded when needed. Architecture MultiScale Testing Mean IoU Pixel Accuracy(%) Overall Score Inference Speed(fps) Training Time(hours) MobileNetV2dilated + C1_deepsup No 32.39 75.75 54.07 17.2 0.8 20 16 Yes 33.75 76.75 55.25 10.3 MobileNetV2dilated + PPM_deepsup No 35.76 77.77 56.27 14.9 0.9 20 18.0 Yes 36.28 78.26 57.27 6.7 ResNet18dilated + C1_deepsup No 33.82 76.05 54.94 13.9 0.42 20 8.4 Yes 35.34 77.41 56.38 5.8 ResNet18dilated + PPM_deepsup No 38.00 78.64 58.32 11.7 1.1 20 22.0 Yes 38.81 79.29 59.05 4.2 ResNet50dilated + PPM_deepsup No 41.26 79.73 60.50 8.3 1.67 20 33.4 Yes 42.04 80.23 61.14 2.6 ResNet101dilated + PPM_deepsup No 42.19 80.59 61.39 6.8 3.82 25 95.5 Yes 42.53 80.91 61.72 2.0 UperNet50 No 40.44 79.80 60.12 8.4 1.75 20 35.0 Yes 41.55 80.23 60.89 2.9 UperNet101 No 42.00 80.79 61.40 7.8 2.5 25 62.5 Yes 42.66 81.01 61.84 2.3 UPerNet ResNext101 (coming soon!) The training is benchmarked on a server with 8 NVIDIA Pascal Titan Xp GPUs (12GB GPU memory), except for ResNet101dilated, which is benchmarked on a server with 8 NVIDIA Tesla P40 GPUS (22GB GPU memory), because of the insufficient memory issue when using dilated conv on a very deep network. The inference speed is benchmarked a single NVIDIA Pascal Titan Xp GPU, without visualization. Environment The code is developed under the following configurations. Hardware: 1 8 GPUs (with at least 12G GPU memories) (change gpus GPUS accordingly) Software: Ubuntu 16.04.3 LTS, CUDA> 8.0, Python> 3.5, PyTorch> 0.4.0 Quick start: Test on an image using our trained model 1. Here is a simple demo to do inference on a single image: bash chmod +x demo_test.sh ./demo_test.sh This script downloads a trained model (ResNet50dilated + PPM_deepsup) and a test image, runs the test script, and saves predicted segmentation (.png) to the working directory. 2. To test on multiple images, you can simply do something as the following ( $PATH_IMG1, $PATH_IMG2, $PATH_IMG3 are your image paths): python3 u test.py \ model_path $MODEL_PATH \ test_imgs $PATH_IMG1 $PATH_IMG2 $PATH_IMG3 \ arch_encoder resnet50dilated \ arch_decoder ppm_deepsup 3. See full input arguments via python3 test.py h . Training 1. Download the ADE20K scene parsing dataset: bash chmod +x download_ADE20K.sh ./download_ADE20K.sh 2. Train a model (default: ResNet50dilated + PPM_deepsup). During training, checkpoints will be saved in folder ckpt . bash python3 train.py gpus GPUS To choose which gpus to use, you can either do gpus 0 7 , or gpus 0,2,4,6 . For example: Train MobileNetV2dilated + C1_deepsup bash python3 train.py gpus GPUS \ arch_encoder mobilenetv2dilated arch_decoder c1_deepsup \ fc_dim 320 Train ResNet18dilated + PPM_deepsup bash python3 train.py gpus GPUS \ arch_encoder resnet18dilated arch_decoder ppm_deepsup \ fc_dim 512 Train UPerNet101 bash python3 train.py gpus GPUS \ arch_encoder resnet101 arch_decoder upernet \ segm_downsampling_rate 4 padding_constant 32 3. See full input arguments via python3 train.py h . Evaluation 1. Evaluate a trained model on the validation set. id is the folder name under ckpt directory. suffix defines which checkpoint to use, for example _epoch_20.pth . Add visualize option to output visualizations as shown in teaser. bash python3 eval_multipro.py gpus GPUS id MODEL_ID suffix SUFFIX For example: Evaluate MobileNetV2dilated + C1_deepsup bash python3 eval_multipro.py gpus GPUS \ id MODEL_ID suffix SUFFIX arch_encoder mobilenetv2dilated arch_decoder c1_deepsup \ fc_dim 320 Evaluate ResNet18dilated + PPM_deepsup bash python3 eval_multipro.py gpus GPUS \ id MODEL_ID suffix SUFFIX arch_encoder resnet18dilated arch_decoder ppm_deepsup \ fc_dim 512 Evaluate UPerNet101 bash python3 eval_multipro.py gpus GPUS \ id MODEL_ID suffix SUFFIX arch_encoder resnet101 arch_decoder upernet \ padding_constant 32 2. See full input arguments via python3 eval_multipro.py h . Reference If you find the code or pre trained models useful, please cite the following papers: Semantic Understanding of Scenes through ADE20K Dataset. B. Zhou, H. Zhao, X. Puig, T. Xiao, S. Fidler, A. Barriuso and A. Torralba. International Journal on Computer Vision (IJCV), 2018. @article{zhou2018semantic, title {Semantic understanding of scenes through the ade20k dataset}, author {Zhou, Bolei and Zhao, Hang and Puig, Xavier and Xiao, Tete and Fidler, Sanja and Barriuso, Adela and Torralba, Antonio}, journal {International Journal on Computer Vision}, year {2018} } Scene Parsing through ADE20K Dataset. B. Zhou, H. Zhao, X. Puig, S. Fidler, A. Barriuso and A. Torralba. Computer Vision and Pattern Recognition (CVPR), 2017. @inproceedings{zhou2017scene, title {Scene Parsing through ADE20K Dataset}, author {Zhou, Bolei and Zhao, Hang and Puig, Xavier and Fidler, Sanja and Barriuso, Adela and Torralba, Antonio}, booktitle {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition}, year {2017} }",Semantic Segmentation,Semantic Segmentation 2889,Computer Vision,Computer Vision,Computer Vision,"PyTorch Encoding created by Hang Zhang Documentation Please visit the Docs for detail instructions of installation and usage. Please visit the link to examples of semantic segmentation. Citations Context Encoding for Semantic Segmentation arXiv Hang Zhang , Kristin Dana , Jianping Shi , Zhongyue Zhang , Xiaogang Wang , Ambrish Tyagi , Amit Agrawal @InProceedings{Zhang_2018_CVPR, author {Zhang, Hang and Dana, Kristin and Shi, Jianping and Zhang, Zhongyue and Wang, Xiaogang and Tyagi, Ambrish and Agrawal, Amit}, title {Context Encoding for Semantic Segmentation}, booktitle {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, month {June}, year {2018} } Deep TEN: Texture Encoding Network arXiv Hang Zhang , Jia Xue , Kristin Dana @InProceedings{Zhang_2017_CVPR, author {Zhang, Hang and Xue, Jia and Dana, Kristin}, title {Deep TEN: Texture Encoding Network}, booktitle {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, month {July}, year {2017} }",Semantic Segmentation,Semantic Segmentation 1899,Computer Vision,Computer Vision,Computer Vision,CycleGAN Implements Cycle Generative Adversarial Network as described in,Image-to-Image Translation,Image-to-Image Translation 1910,Computer Vision,Computer Vision,Computer Vision,"Cycle GAN implemented by fastai This repository implements the Cycle GAN using some fastai built in functions. Paper Linking: Required Software To run it, you have to install following sofware with specified version(I would hignly recommend that create a virtual environment first): pytorch 0.3.1 fastai 0.7 torchtext 0.2.3 Create the fastai conda environment. It will install all the dependencies that are needed. Do it only once: git clone cd fastai conda env create f environment.yml You then activate that environment with: conda activate fastai If you don’t have GPU, build the fastai cpu environment instead: git clone cd fastai conda env create f environment cpu.yml You then activate that environment with: conda activate fastai cpu Dataset Download the dataset: and unzip it in a local folder. Modify the relevant path before you run the code Final Results !",Image-to-Image Translation,Image-to-Image Translation 1914,Computer Vision,Computer Vision,Computer Vision,"Coupled Generative Adversarial Network code General This is the open source repository for the Coupled Generative Adversarial Network (CoupledGAN or CoGAN) work. For more details please refer to our NIPS 2016 paper or our arXiv paper . Please cite the NIPS paper in your publications if you find the source code useful to your research. I have improved the algorithm by combining with encoders. For more details please check our NIPS 2017 paper on Unsupervised Image to Image Translation Networks USAGE In this repository, we provide both Caffe implementation and PyTorch implementation. For using the code with the Caffe library, please consult USAGE_CAFFE (USAGE_CAFFE.md). For using the code with the PyTorch library, please consult USAGE_PYTORCH (USAGE_PYTORCH.md). CoGAN Network Architecture ! CoGAN learn to generate corresponding smile and non smile faces ! CoGAN learn to generate corresponding faces with blond hair and without non blond hair ! CoGAN learn to generate corresponding faces with eye glasses and without eye glasses ! CoGAN learn to generate corresponding RGB and depth images ! Copyright 2017, Ming Yu Liu All Rights Reserved Permission to use, copy, modify, and distribute this software and its documentation for any non commercial purpose is hereby granted without fee, provided that the above copyright notice appear in all copies and that both that copyright notice and this permission notice appear in supporting documentation, and that the name of the author not be used in advertising or publicity pertaining to distribution of the software without specific, written prior permission. THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.",Image-to-Image Translation,Image-to-Image Translation 1915,Computer Vision,Computer Vision,Computer Vision,"License CC BY NC SA 4.0 ! Python 2.7 UNIT: UNsupervised Image to image Translation Networks License Copyright (C) 2018 NVIDIA Corporation. All rights reserved. Licensed under the CC BY NC SA 4.0 license . Code usage Please check out our tutorial (TUTORIAL.md). For multimodal (or many to many) image translation, please check out our new work on MUNIT . What's new. 05 02 2018: We now adapt MUNIT code structure. For reproducing experiment results in the NIPS paper, please check out version_02 branch . 12 21 2017: Release pre trained synthia to cityscape image translation model. See USAGE.md (TUTORIAL.md) for usage examples. 12 14 2017: Added multi scale discriminators described in the pix2pixHD paper. To use it simply make the name of the discriminator COCOMsDis. Paper Ming Yu Liu, Thomas Breuel, Jan Kautz, Unsupervised Image to Image Translation Networks NIPS 2017 Spotlight, arXiv:1703.00848 2017 Two Minute Paper Summary (./docs/two minute paper.png) (We thank the Two Minute Papers channel for summarizing our work.) The Shared Latent Space Assumption (./docs/shared latent space.png) Result Videos More image results are available in the Google Photo Album . Left: input. Right: neural network generated. Resolution: 640x480 ! (./docs/snowy2summery.gif) Left: input. Right: neural network generated. Resolution: 640x480 ! (./docs/day2night.gif) ! (./docs/dog_breed.gif) ! (./docs/cat_species.gif) Snowy2Summery 01 Snowy2Summery 02 Day2Night 01 Day2Night 02 Translation Between 5 dog breeds Translation Between 6 cat species Street Scene Image Translation From the first row to the fourth row, we show example results on day to night, sunny to rainy, summery to snowy, and real to synthetic image translation (two directions). For each image pair, left is the input image ; right is the machine generated image. ! (./docs/street_scene.png) Dog Breed Image Translation ! (./docs/dog_trans.png) Cat Species Image Translation ! (./docs/cat_trans.png) Attribute based Face Image Translation ! (./docs/faces.png)",Image-to-Image Translation,Image-to-Image Translation 1920,Computer Vision,Computer Vision,Computer Vision,Biscotti_misc Biscotti,Image-to-Image Translation,Image-to-Image Translation 1940,Computer Vision,Computer Vision,Computer Vision,Cycle GAN Paper: arxiv.org/pdf/1703.10593.pdf A chainer implementation for Unpaired Image to Image Translation using Cycle Consistent Adversarial Networks . Prerequisites Linux Python 3 Chainer > 5.0.0.b1 (v5) CPU(not tested yet) or NVIDIA GPU + CUDA CuDNN,Image-to-Image Translation,Image-to-Image Translation 1952,Computer Vision,Computer Vision,Computer Vision,"CycleGAN in Keras Original Paper: Two implementations are provided: cyclegan.py contains the architecture described in the original paper, while cyclegan_noReflection.py contains the architecture without reflection padding",Image-to-Image Translation,Image-to-Image Translation 2003,Computer Vision,Computer Vision,Computer Vision,C GAN Demo for image to image translation This is a demo for pix2pix Aerial to Map images dataset implemented here . The implementation of Condition GAN is based on a paper by Isola et al. Link : Demo ! Example1 ! Example 2 Requirements sudo apt get install python3 pip build essential libgtk2.0 dev sudo pip3 install virtualenv virtualenv django p python3 source django/bin/activate pip install tensorflow Django django admin numpy matplotlib opencv python Setup Create checkpoint directory cd cgandemo mkdir static/checkpoints Download checkpoints Link: _ Migrate python manage.py makemigrations python manage.py migrate Run python manage.py runserver Note: Run all commands within the created virtual environment,Image-to-Image Translation,Image-to-Image Translation 2019,Computer Vision,Computer Vision,Computer Vision,"SmartSketch Supercharge your creativity with state of the art image synthesis ! promo.png (promo.png) Credits Special thanks to @AndroidKitKat for helping us host this! Set Up You'll need to install the pretrained generator model for the COCO dataset into checkpoints/coco_pretrained/ . Instructions for this can be found on the nvlabs/spade repo. Make sure you need to install all the Python requirements using pip3 install r requirements.txt . Once you do this, you should be able to run the server using python3 server.py . It will run it on 0.0.0.0 on port 80. Unfortunately these are hardcoded into the server and right now you cannot pass CLI arguments to the server to specify the port and host, as the PyTorch stuff also reads from the command line (will fix this soon). TODOS Change how we run the model, make it easier to use (don't use their options object) Make a seperate frontend server and a backend server (for scaling) Try to containerize at least the bacckend components",Image-to-Image Translation,Image-to-Image Translation 2020,Computer Vision,Computer Vision,Computer Vision,"SmartSketch Supercharge your creativity with state of the art image synthesis ! promo.png (promo.png) Credits Special thanks to @AndroidKitKat for helping us host this! Set Up You'll need to install the pretrained generator model for the COCO dataset into checkpoints/coco_pretrained/ . Instructions for this can be found on the nvlabs/spade repo. Make sure you need to install all the Python requirements using pip3 install r requirements.txt . Once you do this, you should be able to run the server using python3 server.py . It will run it on 0.0.0.0 on port 80. Unfortunately these are hardcoded into the server and right now you cannot pass CLI arguments to the server to specify the port and host, as the PyTorch stuff also reads from the command line (will fix this soon). TODOS Change how we run the model, make it easier to use (don't use their options object) Make a seperate frontend server and a backend server (for scaling) Try to containerize at least the bacckend components",Image-to-Image Translation,Image-to-Image Translation 2051,Computer Vision,Computer Vision,Computer Vision,"Combination of CycleGan and Mcd in Pytorch This is my PyTorch implementation for semi supervised un paired co training. Although it is not yet been completed, it is nolonger under development. This package includes CycleGAN , MCD_DA The code was written by Chia Ming Chang . Note : The current software works well with PyTorch 0.4. Prerequisites Linux NVIDIA GPU + CUDA CuDNN (CPU mode and CUDA without CuDNN may work with minimal modification, but untested) Getting Started Installation Install torch and dependencies from Install tensorboardX from Clone this repo: bash git clone cd Cycle_Mcd_Gan Dataset Cityscapes Download gtFine_trainvaltest.zip and leftImg8bit_trainvaltest.zip from Unzip both files Rename the directory as followed Cityscapes └───image │ └───train │ │ └───aachen │ │ │ aachen_000000_000019_leftImg8bit.png │ │ │ ... │ │ ... │ │ │ └───val │ │ └───frankfurt │ │ │ frankfurt_000000_000294_leftImg8bit.png │ │ │ ... │ │ ... │ │ │ └───test │ └───berlin │ │ berlin_000000_000019_leftImg8bit.png │ │ ... │ ... │ └───label └───train │ └───aachen │ │ aachen_000000_000019_gtFine_labelIds.png │ │ ... │ ... │ └───val │ └───frankfurt │ │ frankfurt_000000_000294_gtFine_labelIds.png │ │ ... │ ... │ └───test └───berlin │ berlin_000000_000019_gtFine_labelIds.png │ ... ... Generate txt file python3 datamanager/generate_txt.py directory of Cityscapes Dataset GTA Download all the images and labels and split.mat from Unzip all files Rename the directory as followed GTA └───image │ 00001.png │ ... │ └───label 00001.png ... Split data python3 datamanager/split_gta.py directory of GTA Dataset path of split.mat Note : the datastructure will become like this Cityscapes └───image │ └───train │ │ 00001.png │ │ ... │ │ │ └───val │ │ 00022.png │ │ ... │ │ │ └───test │ 00011.png │ ... │ └───label └───train │ 00001.png │ ... │ └───val │ 00022.png │ ... │ └───test 00022.png ... Generate txt file python3 datamanager/generate_txt.py directory of GTA Dataset Train Train a model: bash python3 cycle_mcd_trainer.py Display UI Optionally, for displaying images during training and test, use the tensorboardX bash cd checkpoints/cycle_mcd_da tensorboard logdir log Citation @inproceedings{CycleGAN2017, title {Unpaired Image to Image Translation using Cycle Consistent Adversarial Networkss}, author {Zhu, Jun Yan and Park, Taesung and Isola, Phillip and Efros, Alexei A}, booktitle {Computer Vision (ICCV), 2017 IEEE International Conference on}, year {2017} } @article{saito2017maximum, title {Maximum Classifier Discrepancy for Unsupervised Domain Adaptation}, author {Saito, Kuniaki and Watanabe, Kohei and Ushiku, Yoshitaka and Harada, Tatsuya}, journal {arXiv preprint arXiv:1712.02560}, year {2017} } Acknowledgments code is done in iis sinica. Related Projects: CycleGAN: Project Paper MCD_DA: Project Paper",Image-to-Image Translation,Image-to-Image Translation 2064,Computer Vision,Computer Vision,Computer Vision,"ml4a invisible cities A project made during Machine Learning for Artists workshop with Gene Kogan @ Opendotlab See the full website + gallery: Concept “With cities, it is as with dreams: everything imaginable can be dreamed, but even the most unexpected dream is a rebus that conceals a desire or, its reverse, a fear. Cities, like dreams, are made of desires and fears, even if the thread of their discourse is secret, their rules are absurd, their perspectives deceitful, and everything conceals something else.” _― Italo Calvino , Invisible Cities_ The idea is to create an imaginary city from a hand drawn sketch. Trained with aerial images of real cities, a neural network can transform this into a realistic bird eye view city. Then, switching between different cities references it would be possible to generate different views of the same imaginary city. How it works We were fascinated by the possibility of generating new and non existent but realistic images using conditional adversarial neural networks that remembers a certain set of features from the things it has seen in the past: the same process that we humans undergo when we dream. Dataset Taking inspiration from the given examples, we applied a pre defined color scheme to geographic data ( OpenStreetMap ) using Mapbox Studio : roads, green spaces, buildings, water were styled with different colours (black, green, red, blue), so that the neural network (NN) could compare these to aerial images and learn the different features. ! (./images/Venice LA01.jpg) Training, evaluating, running We then used vvvv as a tool to collect both satellite imagery and associated labeled map tiles. We trained a conditional generative adversarial network to recontruct the satellite imagery from its map tiles. ! (./images/01.jpg) It then produces a set of images according to the unique characteristics of each city: the same blue shade will translate to a venetian canal or a simple river, red will became a 17th century villa or a 50s modernist house in the hills of L.A. ! (./images/02.jpg) To encompass the variability of all geographic features, we left the background as plain white. This translated to unexpected results as the NN could interpret the same white patch of land as an airport, a maize field or a dumpster. Gallery City style transfer With this technique, we fed map tiles of one city to the generative model of another city, producing sattelite imagery of the former in the style of the latter. ! (./images/03.jpg) ! (./images/04.jpg) ! (./images/05.jpg) Imaginary maps Here we feed completely handdrawn tiles to the models, producing hallucinations of cities. ! (./images/07.jpg) ! (./images/08.jpg) Team Gene Kogan Gabriele Gambotto Ambhika Samsen Andrej Boleslavsky Michele Ferretti Damiano Gui Fabian Frei Credits All credit for the algorithm development to “Image to Image Translation Using Conditional Adversarial Networks” by Phillip Isola , Jun Yan Zhu , Tinghui Zhou , Alexei A. Efros published in arxiv, 2016 .",Image-to-Image Translation,Image-to-Image Translation 2107,Computer Vision,Computer Vision,Computer Vision,"Forecast the weather using tensorflow and the pix2pix model. Conditional Adversarial Networks offer a general machine learning tool to build a model deriving one (image) dataset from another. Originally demonstrated as a tool called pix2pix . Here I'm using a tensorflow port of this technique pix2pix tensorflow . The idea is to use pix2pix to transform an image describing the weather into an image describing the weather 6 hours later (a forecast). First we need a tool to encode a surface weather field (2m air temperature anomaly, mean sea level pressure, and precipitation rate) as an image. Script (./weather2image//make.3var.plot.R) Then we need a set of pairs of such images a source image, and a target image from 6 hours later. Each pair should be separated by at least 5 days, so they are independent states. Script (./weather2image//make.training.batch.R) Then we need to take a training set (400) of those pairs of images and pack them into the 512x256 side by side format used by pix2pix (source in the left half, and target in the right half). Script (./weather2image/make_p2p_training_images.R) Alternatively, you can get the set of training and test images I used from Dropbox . Then train a model on this set for 200 epochs with a fast GPU this should take about 1 hour, but, CPU only, it takes a bit over 24 hours on my 4 core iMac. (It took about 2 hours on one gpu node of Isambard ). sh python weather2weather.py \ mode train \ output_dir $SCRATCH/weather2weather/model_train \ max_epochs 200 \ input_dir $SCRATCH/weather2weather/p2p_format_images_for_training \ which_direction AtoB Now make some more pairs of images (100) to test the model on same format as the training set, but must be different weather states (times). Script (./weather2image/make_p2p_validation_images.R) Use the trained model to make predictions from the validation set sources and compare those predictions to the validation set targets. sh python weather2weather.py \ mode test \ output_dir $SCRATCH/weather2weather/model_test \ input_dir $SCRATCH/weather2weather/p2p_format_images_for_validation \ checkpoint $SCRATCH/weather2weather/model_train The test run will output an HTML file at $SCRATCH/weather2weather/model_test/index.html that shows input/output/target image sets. This is good for a first glance, but those images are in a packed analysis form. So we need a tool to convert the packed image pairs to a clearer image format: Script (./weather2image/replot.p2p.image.R). This shows target weather (top left), model output weather (top right), target pressure increment (bottom left), and model output pressure increment (bottom right). To postprocess all the test cases run: sh ./weather2image/replot_all_validation.R \ input.dir $SCRATCH/weather2weather/model_test/images \ output.dir $SCRATCH/weather2weather/model_test/postprocessed This will produce an HTML file at $SCRATCH/weather2weather/model_test/index.html showing results of all the test cases. This clearly does have skill at 6 hour weather forecasts it gets the semi diurnal oscillation, and some of the extratropical structure. The final step is to use the model on it's own output by making repeated 6 hour forecasts we can make a forecast as far into the future as we like. This is less successful . Acknowledgments Derived from pix2pix tensorflow . Citation Please cite the paper this code is based on: Image to Image Translation Using Conditional Adversarial Networks : @article{pix2pix2016, title {Image to Image Translation with Conditional Adversarial Networks}, author {Isola, Phillip and Zhu, Jun Yan and Zhou, Tinghui and Efros, Alexei A}, journal {arxiv}, year {2016} }",Image-to-Image Translation,Image-to-Image Translation 2110,Computer Vision,Computer Vision,Computer Vision,"pix2pix Project Arxiv PyTorch Torch implementation for learning a mapping from input images to output images, for example: Image to Image Translation with Conditional Adversarial Networks Phillip Isola , Jun Yan Zhu , Tinghui Zhou , Alexei A. Efros CVPR, 2017. On some tasks, decent results can be obtained fairly quickly and on small datasets. For example, to learn to generate facades (example shown above), we trained on just 400 images for about 2 hours (on a single Pascal Titan X GPU). However, for harder problems it may be important to train on far larger datasets, and for many hours or even days. Note : Please check out our PyTorch implementation for pix2pix and CycleGAN. The PyTorch version is under active development and can produce results comparable to or better than this Torch version. Setup Prerequisites Linux or OSX NVIDIA GPU + CUDA CuDNN (CPU mode and CUDA without CuDNN may work with minimal modification, but untested) Getting Started Install torch and dependencies from Install torch packages nngraph and display bash luarocks install nngraph luarocks install Clone this repo: bash git clone git@github.com:phillipi/pix2pix.git cd pix2pix Download the dataset (e.g., CMP Facades ): bash bash ./datasets/download_dataset.sh facades Train the model bash DATA_ROOT ./datasets/facades name facades_generation which_direction BtoA th train.lua (CPU only) The same training command without using a GPU or CUDNN. Setting the environment variables gpu 0 cudnn 0 forces CPU only bash DATA_ROOT ./datasets/facades name facades_generation which_direction BtoA gpu 0 cudnn 0 batchSize 10 save_epoch_freq 5 th train.lua (Optionally) start the display server to view results as the model trains. ( See Display UI ( display ui) for more details): bash th ldisplay.start 8000 0.0.0.0 Finally, test the model: bash DATA_ROOT ./datasets/facades name facades_generation which_direction BtoA phase val th test.lua The test results will be saved to an html file here: ./results/facades_generation/latest_net_G_val/index.html . Train bash DATA_ROOT /path/to/data/ name expt_name which_direction AtoB th train.lua Switch AtoB to BtoA to train translation in opposite direction. Models are saved to ./checkpoints/expt_name (can be changed by passing checkpoint_dir your_dir in train.lua). See opt in train.lua for additional training options. Test bash DATA_ROOT /path/to/data/ name expt_name which_direction AtoB phase val th test.lua This will run the model named expt_name in direction AtoB on all images in /path/to/data/val . Result images, and a webpage to view them, are saved to ./results/expt_name (can be changed by passing results_dir your_dir in test.lua). See opt in test.lua for additional testing options. Datasets Download the datasets using the following script. Some of the datasets are collected by other researchers. Please cite their papers if you use the data. bash bash ./datasets/download_dataset.sh dataset_name facades : 400 images from CMP Facades dataset . Citation (datasets/bibtex/facades.tex) cityscapes : 2975 images from the Cityscapes training set . Citation (datasets/bibtex/cityscapes.tex) maps : 1096 training images scraped from Google Maps edges2shoes : 50k training images from UT Zappos50K dataset . Edges are computed by HED edge detector + post processing. Citation (datasets/bibtex/shoes.tex) edges2handbags : 137K Amazon Handbag images from iGAN project . Edges are computed by HED edge detector + post processing. Citation (datasets/bibtex/handbags.tex) night2day : around 20K natural scene images from Transient Attributes dataset Citation (datasets/bibtex/transattr.tex) . To train a day2night pix2pix model, you need to add which_direction BtoA . Models Download the pre trained models with the following script. You need to rename the model (e.g., facades_label2image to /checkpoints/facades/latest_net_G.t7 ) after the download has finished. bash bash ./models/download_model.sh model_name facades_label2image (label > facade): trained on the CMP Facades dataset. cityscapes_label2image (label > street scene): trained on the Cityscapes dataset. cityscapes_image2label (street scene > label): trained on the Cityscapes dataset. edges2shoes (edge > photo): trained on UT Zappos50K dataset. edges2handbags (edge > photo): trained on Amazon handbags images. day2night (daytime scene > nighttime scene): trained on around 100 webcams . Setup Training and Test data Generating Pairs We provide a python script to generate training data in the form of pairs of images {A,B}, where A and B are two different depictions of the same underlying scene. For example, these might be pairs {label map, photo} or {bw image, color image}. Then we can learn to translate A to B or B to A: Create folder /path/to/data with subfolders A and B . A and B should each have their own subfolders train , val , test , etc. In /path/to/data/A/train , put training images in style A. In /path/to/data/B/train , put the corresponding images in style B. Repeat same for other data splits ( val , test , etc). Corresponding images in a pair {A,B} must be the same size and have the same filename, e.g., /path/to/data/A/train/1.jpg is considered to correspond to /path/to/data/B/train/1.jpg . Once the data is formatted this way, call: bash python scripts/combine_A_and_B.py fold_A /path/to/data/A fold_B /path/to/data/B fold_AB /path/to/data This will combine each pair of images (A,B) into a single image file, ready for training. Notes on Colorization No need to run combine_A_and_B.py for colorization. Instead, you need to prepare some natural images and set preprocess colorization in the script. The program will automatically convert each RGB image into Lab color space, and create L > ab image pair during the training. Also set input_nc 1 and output_nc 2 . Extracting Edges We provide python and Matlab scripts to extract coarse edges from photos. Run scripts/edges/batch_hed.py to compute HED edges. Run scripts/edges/PostprocessHED.m to simplify edges with additional post processing steps. Check the code documentation for more details. Evaluating Labels2Photos on Cityscapes We provide scripts for running the evaluation of the Labels2Photos task on the Cityscapes validation set. We assume that you have installed caffe (and pycaffe ) in your system. If not, see the official website for installation instructions. Once caffe is successfully installed, download the pre trained FCN 8s semantic segmentation model (512MB) by running bash bash ./scripts/eval_cityscapes/download_fcn8s.sh Then make sure ./scripts/eval_cityscapes/ is in your system's python path. If not, run the following command to add it bash export PYTHONPATH ${PYTHONPATH}:./scripts/eval_cityscapes/ Now you can run the following command to evaluate your predictions: bash python ./scripts/eval_cityscapes/evaluate.py cityscapes_dir /path/to/original/cityscapes/dataset/ result_dir /path/to/your/predictions/ output_dir /path/to/output/directory/ Images stored under result_dir should contain your model predictions on the Cityscapes validation split, and have the original Cityscapes naming convention (e.g., frankfurt_000001_038418_leftImg8bit.png ). The script will output a text file under output_dir containing the metric. Further notes : The pre trained model is not supposed to work on Cityscapes in the original resolution (1024x2048) as it was trained on 256x256 images that are upsampled to 1024x2048. The purpose of the resizing was to 1) keep the label maps in the original high resolution untouched and 2) avoid the need of changing the standard FCN training code for Cityscapes. To get the ground truth numbers in the paper, you need to resize the original Cityscapes images to 256x256 before running the evaluation code. Display UI Optionally, for displaying images during training and test, use the display package . Install it with: luarocks install Then start the server with: th ldisplay.start Open this URL in your browser: By default, the server listens on localhost. Pass 0.0.0.0 to allow external connections on any interface: bash th ldisplay.start 8000 0.0.0.0 Then open in your browser to load the remote desktop. L1 error is plotted to the display by default. Set the environment variable display_plot to a comma separated list of values errL1 , errG and errD to visualize the L1, generator, and discriminator error respectively. For example, to plot only the generator and discriminator errors to the display instead of the default L1 error, set display_plot errG,errD . Citation If you use this code for your research, please cite our paper Image to Image Translation Using Conditional Adversarial Networks : @article{pix2pix2017, title {Image to Image Translation with Conditional Adversarial Networks}, author {Isola, Phillip and Zhu, Jun Yan and Zhou, Tinghui and Efros, Alexei A}, journal {CVPR}, year {2017} } Cat Paper Collection If you love cats, and love reading cool graphics, vision, and learning papers, please check out the Cat Paper Collection: Github Webpage Acknowledgments Code borrows heavily from DCGAN . The data loader is modified from DCGAN and Context Encoder .",Image-to-Image Translation,Image-to-Image Translation 2188,Computer Vision,Computer Vision,Computer Vision,"Notes So many papers, note the notes. 2018/7/318/1 Checkerboard artifacts 1707.02937 Checkerboard artifact free sub pixel 1806.02658 Super Resolution using Convolutional Neural Networks without Any Checkerboard Artifacts Article from distill.pub 1. distill.pub Make sure to try the awesome demo of the page!! Surely helps understanding. 1. deconvolutions are prone to artifacts > In addition to the high frequency checkerboard like artifacts we observed above, early deconvolutions can create lower frequency artifacts 2. Resize (NN or bilinear) convolution may be a cure to artifacts, but this might not be the final solution > Another approach is to separate out upsampling to a higher resolution from convolution to compute features. For example, you might resize the image (using nearest neighbor interpolation or bilinear interpolation) and then do a convolutional layer. This seems like a natural approach, and roughly similar methods have worked well in image super resolution (eg. 9 ). 3. Artifacts also present in temrs of gradient → GANs also suffer > We’ve found that this does happen in some cases. When the generator is neither biased for or against checkerboard patterns, strided convolutions in the discriminator can cause them. 4. Quick fix is switching from deconv to NN resize then conv, experiements show that parameters are compatible 2. 1707.02937 Checkerboard artifact free sub pixel convolution: A note on sub pixel convolution, resize convolution and convolution resize 1. Source of artifacts : deconvolution overlap, random initialization, loss functions > The most prominent problem associated with the deconvolution layer is the presence of checkerboard artifacts in output images and dense labels as shown in Figure 1. To combat this problem, smoothness constraints, post processing and different architecture designs have been proposed 6,13,24 . Odena et al. 2 highlight three sources of checkerboard artifacts: deconvolution overlap, random initialization and loss functions. 2. Sub pixel convolution also suffers if random initialized 3. If NN resize then conv is switched to conv then NN resize, guarantees checkerboard free reconstructions, but upsampling is not trainable. → It might be beneficial to initialize sub pixel convolution to behave like NN resize 4. implementations found (2018/07/31) 3. 1806.02658 Super Resolution using Convolutional Neural Networks without Any Checkerboard Artifacts 1. Intro & Background Checkerboard artifacts have been studied in linear domain > On the other hand, checkerboard artifacts have been studied to design linear multirate systems including filter banks and wavelets 19–22 . In addition, it is well known that checkerboard artifacts are caused by the time variant property of interpolators in multirate systems, and the condition for avoiding these artifacts have been given 19–21 . However, the condition to avoid checkerboard artifacts for linear systems can not be applied to CNNs due to the nonlinearity of CNNs. 2. The preparation section in the paper reviews a lot of SR techniques and overviews of checkerboard artifacts for linear systems, which seems to be way too difficult for people who is poor at signal processing...like me :( 3. Not very sure about my understanding, but it seems like the checkerboard artifacts arouse from the different steady state values of channels of feature map, causing the unit step response overall has a periodic characteristic. See section 3.2 in the paper. 4. Due to the non linearity of CNN, > This is a non linear system due to the bias b. It is hard to avoid checkerboard artifacts, for the condition to avoid that is each filter has identical steady state values. (Formulas in Section 3.3) 5. Solution in Section 3.3 Not really sure ... but the form of the filter H0 seems to be a box filter to me ... > In this paper, we propose to add the kernel of the zero order hold with factor U, i.e. H0 in eq.(4), after upsampling layers as shown in Fig. 6. In this structure, the output signal from H0 can be a constant value, even when an arbitrary periodic signal is inputted to H0. As a result, Fig. 6 can satisfy eq.(7). 6. Implementations qq 4. Thoughts 1. May be useful in replacing the upsampling layers in GAN, trying (7/31) 5. Exps & Notes 1. 8/2 commit 88cae57 1. Even with ICNR (proposed in 1707.02937 Checkerboard artifact free sub pixel convolution: A note on sub pixel convolution, resize convolution and convolution resize ) still getting checkerboard artifacts, might arise from the convs in discriminator? 1. The conv layers in the discrimintaor have k_size % stride ! 0 2. The discriminator's receptive field is too small, which is discussed in Sec 4.4 of 1611.07004 Image to Image Translation with Conditional Adversarial Networks ) 2. 6.",Image-to-Image Translation,Image-to-Image Translation 2194,Computer Vision,Computer Vision,Computer Vision,"Real world to anime style transfer This is repository of TensorFlow implementation of CycleGAN: CycleGAN is GAN like neural network for style transfer, which does not require paired training data. This implementation is heavily based on Otakar Jašek's diploma thesis Basically tt uses 2 datasets: real and anime. Real data are from common machine learning datasets for computer vision, namely Ade20k, but combined with various cosplay photographs. Anime data are from anime videos, sampled at 1 FPS. Code is in code/mod cycle gan . code/mod cycle gan/data_preparation contains data preparation scripts. The input to neural network is native TensorFlow format, protobuf. Videos are sampled to obtain images and then images are converted into .tfrecord files containing protobuf format of training data. Images in tfrecord can be corrupted, you can check them with script. python data_preparation/check_tfrecords.py file Neural network is then trained in code/mod cycle gan/train.py by feeding it two tfrecord files, one with real data, one with anime data. Trained network then can be used for inference, transforming real images to anime data by code/mod cycle gan/transform.py script. Example for start training: python3 train.py batchsize 2 Ytfr ../../datasets/anime/houseki no kuni.tfrecord if you need to run training on server, in background, you can use run network bg.sh script. For example, you can run the same network in background by: ./run network bg.sh batchsize 2 Ytfr '../../datasets/anime/houseki no kuni.tfrecord' Trained network is stored in .pb files, which contain its very compact, protobuf representation. It is much smaller than checkpoints, so it can be even versioned in git. Trained networks are stored in export/ / When transforming video, we must split it to images, transform them, and then create video from them. Using this approach, audio is lost, obviously. Commands for that, example: python data_preparation/videos_to_images.py videos_dir ../../dataset sources/real/videos/animefest 2017 cosplay images_dir ../../dataset sources/real/images/animefest 2017 cosplay python transform.py inpath ../../dataset sources/real/images/animefest 2017 cosplay/ .png outdir ../../data/images/animefest 2017 cosplay includein 0 rundir 20180625 1659 0 python data_preparation/images_to_videos.py images_dir ../../data/images/animefest 2017 cosplay/20180625 1659 0/80000 video_path ../../data/videos/animefest cosplay.avi Images extracted from videos take lots of space, and are not needed when threcords are generated so you can delete them. Results (so far) Trained on 2 datasets: Ade20k and anime series + movie No Game No Life , I obtained following results on Ade20k dataset (training data): ! Image of results Tried on testing data (not used for training), I obtained interesting results, althour with some slight artifacts. Following images are photos of Czech cosplayer Lena, be sure to check her content ( Facebook , Instagram ) ! Image of results",Image-to-Image Translation,Image-to-Image Translation 2203,Computer Vision,Computer Vision,Computer Vision,"pix2pix tensorflow Based on pix2pix by Isola et al. Article about this implemention Interactive Demo Tensorflow implementation of pix2pix. Learns a mapping from input images to output images, like these examples from the original paper: This port is based directly on the torch implementation, and not on an existing Tensorflow implementation. It is meant to be a faithful implementation of the original work and so does not add anything. The processing speed on a GPU with cuDNN was equivalent to the Torch implementation in testing. Setup Prerequisites Tensorflow 1.4.1 Recommended Linux with Tensorflow GPU edition + cuDNN Getting Started sh clone this repo git clone cd pix2pix tensorflow download the CMP Facades dataset (generated from python tools/download dataset.py facades train the model (this may take 1 8 hours depending on GPU, on CPU you will be waiting for a bit) python pix2pix.py \ mode train \ output_dir facades_train \ max_epochs 200 \ input_dir facades/train \ which_direction BtoA test the model python pix2pix.py \ mode test \ output_dir facades_test \ input_dir facades/val \ checkpoint facades_train The test run will output an HTML file at facades_test/index.html that shows input/output/target image sets. If you have Docker installed, you can use the provided Docker image to run pix2pix without installing the correct version of Tensorflow: sh train the model python tools/dockrun.py python pix2pix.py \ mode train \ output_dir facades_train \ max_epochs 200 \ input_dir facades/train \ which_direction BtoA test the model python tools/dockrun.py python pix2pix.py \ mode test \ output_dir facades_test \ input_dir facades/val \ checkpoint facades_train Datasets and Trained Models The data format used by this program is the same as the original pix2pix format, which consists of images of input and desired output side by side like: For example: Some datasets have been made available by the authors of the pix2pix paper. To download those datasets, use the included script tools/download dataset.py . There are also links to pre trained models alongside each dataset, note that these pre trained models require the current version of pix2pix.py: dataset example python tools/download dataset.py facades 400 images from CMP Facades dataset . (31MB) Pre trained: BtoA python tools/download dataset.py cityscapes 2975 images from the Cityscapes training set . (113M) Pre trained: AtoB BtoA python tools/download dataset.py maps 1096 training images scraped from Google Maps (246M) Pre trained: AtoB BtoA python tools/download dataset.py edges2shoes 50k training images from UT Zappos50K dataset . Edges are computed by HED edge detector + post processing. (2.2GB) Pre trained: AtoB python tools/download dataset.py edges2handbags 137K Amazon Handbag images from iGAN project . Edges are computed by HED edge detector + post processing. (8.6GB) Pre trained: AtoB The facades dataset is the smallest and easiest to get started with. Creating your own dataset Example: creating images with blank centers for inpainting sh Resize source images python tools/process.py \ input_dir photos/original \ operation resize \ output_dir photos/resized Create images with blank centers python tools/process.py \ input_dir photos/resized \ operation blank \ output_dir photos/blank Combine resized images with blanked images python tools/process.py \ input_dir photos/resized \ b_dir photos/blank \ operation combine \ output_dir photos/combined Split into train/val set python tools/split.py \ dir photos/combined The folder photos/combined will now have train and val subfolders that you can use for training and testing. Creating image pairs from existing images If you have two directories a and b , with corresponding images (same name, same dimensions, different data) you can combine them with process.py : sh python tools/process.py \ input_dir a \ b_dir b \ operation combine \ output_dir c This puts the images in a side by side combined image that pix2pix.py expects. Colorization For colorization, your images should ideally all be the same aspect ratio. You can resize and crop them with the resize command: sh python tools/process.py \ input_dir photos/original \ operation resize \ output_dir photos/resized No other processing is required, the colorization mode (see Training section below) uses single images instead of image pairs. Training Image Pairs For normal training with image pairs, you need to specify which directory contains the training images, and which direction to train on. The direction options are AtoB or BtoA sh python pix2pix.py \ mode train \ output_dir facades_train \ max_epochs 200 \ input_dir facades/train \ which_direction BtoA Colorization pix2pix.py includes special code to handle colorization with single images instead of pairs, using that looks like this: sh python pix2pix.py \ mode train \ output_dir photos_train \ max_epochs 200 \ input_dir photos/train \ lab_colorization In this mode, image A is the black and white image (lightness only), and image B contains the color channels of that image (no lightness information). Tips You can look at the loss and computation graph using tensorboard: sh tensorboard logdir facades_train If you wish to write in progress pictures as the network is training, use display_freq 50 . This will update facades_train/index.html every 50 steps with the current training inputs and outputs. Testing Testing is done with mode test . You should specify the checkpoint to use with checkpoint , this should point to the output_dir that you created previously with mode train : sh python pix2pix.py \ mode test \ output_dir facades_test \ input_dir facades/val \ checkpoint facades_train The testing mode will load some of the configuration options from the checkpoint provided so you do not need to specify which_direction for instance. The test run will output an HTML file at facades_test/index.html that shows input/output/target image sets: Code Validation Validation of the code was performed on a Linux machine with a 1.3 TFLOPS Nvidia GTX 750 Ti GPU and an Azure NC6 instance with a K80 GPU. sh git clone cd pix2pix tensorflow python tools/download dataset.py facades sudo nvidia docker run \ volume $PWD:/prj \ workdir /prj \ env PYTHONUNBUFFERED x \ affinelayer/pix2pix tensorflow \ python pix2pix.py \ mode train \ output_dir facades_train \ max_epochs 200 \ input_dir facades/train \ which_direction BtoA sudo nvidia docker run \ volume $PWD:/prj \ workdir /prj \ env PYTHONUNBUFFERED x \ affinelayer/pix2pix tensorflow \ python pix2pix.py \ mode test \ output_dir facades_test \ input_dir facades/val \ checkpoint facades_train Comparison on facades dataset: Input Tensorflow Torch Target Unimplemented Features The following models have not been implemented: defineG_encoder_decoder defineG_unet_128 defineD_pixelGAN Citation If you use this code for your research, please cite the paper this code is based on: Image to Image Translation Using Conditional Adversarial Networks : @article{pix2pix2016, title {Image to Image Translation with Conditional Adversarial Networks}, author {Isola, Phillip and Zhu, Jun Yan and Zhou, Tinghui and Efros, Alexei A}, journal {arxiv}, year {2016} } Acknowledgments This is a port of pix2pix from Torch to Tensorflow. It also contains colorspace conversion code ported from Torch. Thanks to the Tensorflow team for making such a quality library! And special thanks to Phillip Isola for answering my questions about the pix2pix code.",Image-to-Image Translation,Image-to-Image Translation 2205,Computer Vision,Computer Vision,Computer Vision,"Bayesian Cycle consistent Adversarial Networks in PyTorch This is the PyTorch implementation for Bayesian CycleGAN . Introduction Recent techniques built on Generative Adversarial Networks (GANs) like CycleGAN are able to learn mappings between domains from unpaired datasets through min max optimization games between generators and discriminators. However, it remains challenging to stabilize the training process and diversify generated results. To address these problems, we present the non trivial Bayesian extension of cyclic model and an integrated cyclic framework for inter domain mappings. The proposed method stimulated by Bayesian GAN explores the full posteriors of Bayesian cyclic model (with latent sampling) and optimizes the model with maximum a posteriori (MAP) estimation. By exploring the full posteriors over model parameters, the Bayesian marginalization could alleviate the risk of model collapse and boost multimodal distribution learning. Besides, we deploy a combination of L1 loss and GANLoss between reconstructed images and source images to enhance the reconstructed learning, we also prove that this variation has a global optimality theoretically and show its effectiveness in experiments. Prerequisites The code has the following dependencies: python 3.5 torch 0.3.0 torchvision 0.2.0 pillow (PIL) NVIDIA GPU + CUDA CuDNN Install PyTorch and dependencies on linux please follow instructions at Install python libraries visdom and dominate . pip install visdom pip install dominate Core training and testing options Training options gamma : balance factor that adjust l1 GAN loss niter : number of epoches with starting learning rate niter_decay : number of epoches with non linearly decay learning rate to zero periodically beta1 : momentum term of adam lr : initial learning rate for adam no_lsgan : do not use least square GAN if it is active lambda_A : weight for cycle loss (A > B > A) lambda_B : weight for cycle loss (B > A > B) lambda_kl : weight for KL loss mc_y : Mento Carlo samples for generate zy mc_x : Mento Carlo samples for generate zx Testing options which_epoch : use which model to test use_feat : if true, replace SFM to other latent variables in inference process how_many : how many test images to run The crutial options, like gamma , take control over our model, which should be set carefully. We recommend batchSize set to 1 in order to get final results, we didn't have time to test other values that may lower FCN scores. Usage Installation 1. Install the required dependencies 2. Clone this repository 3. Download corresponding datasets Unsupervised and Semi supervised Learning on benchmark datasets Cityscapes training scripts for cityscapes for cityscapes (128 x 256) using Bayesian cyclic model with noise margalization. python train_bayes_z.py dataroot /data/cityscapes name cityscapes_bayes_L1_lsgan_noise batchSize 1 loadSize 256 ratio 2 netG_A global netG_B global ngf 32 num_D_A 1 num_D_B 1 mc_x 3 mc_y 3 n_blocks_global 6 n_downsample_global 2 niter 50 niter_decay 50 gamma 0 lambda_kl 0.1 If you want to use Bayesian model with encoder margalization, you only need to change train_bayes_z.py to train_bayes.py . By the same token, you can set gamma to 0.5 if you want use L1 loss combined with GANLoss in the recycled learning. continue train If your machine encounters some questions and stops work, you may need revive machanism to help you. In our train scripts, you should change the start_epoch and epoch_iter to that cut point and continue train by adding the following clause to the command: continue_train which_epoch latest testing scripts for cityscapes python test_bayes_z.py dataroot /data/cityscapes name cityscapes_bayes_L1_lsgan phase test loadSize 256 ratio 2 netG_A global netG_B global ngf 32 n_blocks_global 6 n_downsample_global 2 which_epoch latest how_many 500 You can choose which model to use by reset the option which_epoch . Pre trained model Our latest model are avaliable in Google drive Qualitative result display Final qualitative results samples for Bayesian cyclic model in unsupervised setting under condition gamma 0 ! (./assets/cityscapes.PNG) Comparison about model stability: When gamma 0.5 , our method maintain stable convergence while the original one collapses to one distribution for photo2label task. ! (./assets/cityscapes_compare.png) FID and Inception score ! (./assets/cityscapes_fid_inception.png) FID and Inception score for reconstructed learning ! (./assets/cityscapes_rec_fid_inception.png) Quantitative metrics: FCN scores In our experiment, we use Bayesian cyclic model with random noise marginalization for the first 100 epoches, and finetune the model with SFM latent sampling for the later 100 epoches. The results show that Bayesian version cyclic model outperform original one. Pre trained models are available at Google drive Methods Per pixel acc. Per class acc. Class IOU CycleGAN (dropout) 0.56 0.18 0.12 CycleGAN (buffer) 0.58 0.22 0.16 Bayesian CycleGAN 0.73 0.27 0.20 Pix2pix (supervised) 0.85 0.40 0.32 Maps The training command are similar with cityscapes, but you should notice that the figures' size of Maps are resized to 256x256, consequently, ratio should be 1. The results are illustrated as: ! (./assets/maps.png) Monet2Photo Art mapping is a kind of image style transfer, This dataset is crawled from Wikiart.org and Flickr by Junyan Zhu et all., which contains 1074 Monet artwork and 6853 Photographs. Interestingly, if we imposed restriction on latent space by using the encoder network to generate statistic feature map, Bayesian cyclic model could generate diversified images by replacing SFM with other features in inference process. In our implementation, we use option use_feat in inference procedure to let us change statistic feature map to any other pictures stored at /dataroot/feat . The results illustrated as follow: ! (./assets/monet2photo.PNG) Semi supervised learning In cases where paired data is accessible, we can lever age the condition to train our model in a semi supervisedsetting. In the training process ofCityscapes, mapping errors often occur, for example, the Gaussian initial model cannot recognize trees, thus, trans lating trees into something else due to the unsupervised set ting. To resolve these ambiguities requires weak semanticsupervision, we can use 30 (around 1%) paired data (pictures of cityscape and corresponding label images) to initialize our model at the beginning for each epoch. FID and Inception score ! (./assets/cityscapes_semi_fid_inception.png) Acknowledgement Code is inspired by CycleGAN .",Image-to-Image Translation,Image-to-Image Translation 2228,Computer Vision,Computer Vision,Computer Vision,"pix2pixHD Project Youtube Paper Pytorch implementation of our method for high resolution (e.g. 2048x1024) photorealistic image to image translation. It can be used for turning semantic label maps into photo realistic images or synthesizing portraits from face label maps. High Resolution Image Synthesis and Semantic Manipulation with Conditional GANs Ting Chun Wang 1 , Ming Yu Liu 1 , Jun Yan Zhu 2 , Andrew Tao 1 , Jan Kautz 1 , Bryan Catanzaro 1 1 NVIDIA Corporation, 2 UC Berkeley In arxiv, 2017. Image to image translation at 2k/1k resolution Our label to streetview results Interactive editing results Additional streetview results Label to face and interactive editing results Our editing interface Prerequisites Linux or macOS Python 2 or 3 NVIDIA GPU (12G or 24G memory) + CUDA cuDNN Getting Started Installation Install PyTorch and dependencies from Install python libraries dominate . bash pip install dominate Clone this repo: bash git clone cd pix2pixHD Testing A few example Cityscapes test images are included in the datasets folder. Please download the pre trained Cityscapes model from here (google drive link), and put it under ./checkpoints/label2city_1024p/ Test the model ( bash ./scripts/test_1024p.sh ): bash !./scripts/test_1024p.sh python test.py name label2city_1024p netG local ngf 32 resize_or_crop none The test results will be saved to a html file here: ./results/label2city_1024p/test_latest/index.html . More example scripts can be found in the scripts directory. Dataset We use the Cityscapes dataset. To train a model on the full dataset, please download it from the official website (registration required). After downloading, please put it under the datasets folder in the same way the example images are provided. Training Train a model at 1024 x 512 resolution ( bash ./scripts/train_512p.sh ): bash !./scripts/train_512p.sh python train.py name label2city_512p To view training results, please checkout intermediate results in ./checkpoints/label2city_512p/web/index.html . If you have tensorflow installed, you can see tensorboard logs in ./checkpoints/label2city_512p/logs by adding tf_log to the training scripts. Multi GPU training Train a model using multiple GPUs ( bash ./scripts/train_512p_multigpu.sh ): bash !./scripts/train_512p_multigpu.sh python train.py name label2city_512p batchSize 8 gpu_ids 0,1,2,3,4,5,6,7 Note: this is not tested and we trained our model using single GPU only. Please use at your own discretion. Training at full resolution To train the images at full resolution (2048 x 1024) requires a GPU with 24G memory ( bash ./scripts/train_1024p_24G.sh ). If only GPUs with 12G memory are available, please use the 12G script ( bash ./scripts/train_1024p_12G.sh ), which will crop the images during training. Performance is not guaranteed using this script. Training with your own dataset If you want to train with your own dataset, please generate label maps which are one channel whose pixel values correspond to the object labels (i.e. 0,1,...,N 1, where N is the number of labels). This is because we need to generate one hot vectors from the label maps. Please also specity label_nc N during both training and testing. If your input is not a label map, please just specify label_nc 0 which will directly use the RGB colors as input. The folders should then be named train_A , train_B instead of train_label , train_img , where the goal is to translate images from A to B. If you don't have instance maps or don't want to use them, please specify no_instance . The default setting for preprocessing is scale_width , which will scale the width of all training images to opt.loadSize (1024) while keeping the aspect ratio. If you want a different setting, please change it by using the resize_or_crop option. For example, scale_width_and_crop first resizes the image to have width opt.loadSize and then does random cropping of size (opt.fineSize, opt.fineSize) . crop skips the resizing step and only performs random cropping. If you don't want any preprocessing, please specify none , which will do nothing other than making sure the image is divisible by 32. More Training/Test Details Flags: see options/train_options.py and options/base_options.py for all the training flags; see options/test_options.py and options/base_options.py for all the test flags. Instance map: we take in both label maps and instance maps as input. If you don't want to use instance maps, please specify the flag no_instance . Citation If you find this useful for your research, please use the following. @article{wang2017highres, title {High Resolution Image Synthesis and Semantic Manipulation with Conditional GANs}, author {Ting Chun Wang and Ming Yu Liu and Jun Yan Zhu and Andrew Tao and Jan Kautz and Bryan Catanzaro}, journal {arXiv preprint arXiv:1711.11585}, year {2017} } Acknowledgments This code borrows heavily from pytorch CycleGAN and pix2pix .",Image-to-Image Translation,Image-to-Image Translation 2234,Computer Vision,Computer Vision,Computer Vision,"SimGAN Captcha With simulated unsupervised learning, breaking captchas has never been easier. There is no need to label any captchas manually for convnet. By using a captcha synthesizer and a refiner trained with GAN, it's feasible to generate synthesized training pairs for classifying captchas. Link to paper: SimGAN by Apple PDF HTML ! SimGAN The task HackMIT Puzzle 5 . Correctly label 10000 out of 15000 captcha or 90% per character. Preprocessing Download target captcha files Here we download some captchas from the contest website. Each batch has 1000 captchas. We'll use 20000 so 20 batches. python import requests import threading URL DIR challenges/ NUM_CHALLENGES 20 lock threading.Lock() python def download_file(url, fname): NOTE the stream True parameter r requests.get(url, stream True) with open(fname, 'wb') as f: for chunk in r.iter_content(chunk_size 1024): if chunk: filter out keep alive new chunks f.write(chunk) f.flush() commented by recommendation from J.F.Sebastian with lock: pass print fname ts for i in range(NUM_CHALLENGES): fname DIR + challenge {} .format(i) t threading.Thread(target download_file, args (URL, fname)) ts.append(t) t.start() for t in ts: t.join() print Done Done Decompression Each challenge file is actually a json object containing 1000 base64 encoded jpg image file. So for each of these challenge files, we decompress each base64 strs into a jpeg and put that under a seprate folder. python import json, base64, os IMG_DIR ./orig fnames {}/challenge {} .format(DIR, i) for i in range(NUM_CHALLENGES) if not os.path.exists(IMG_DIR): os.mkdir(IMG_DIR) def save_imgs(fname): with open(fname) as f: l json.loads(f.read()) for image in l 'images' : b base64.decodestring(image 'jpg_base64' ) name image 'name' with open(IMG_DIR+ /{}.jpg .format(name), 'w') as f: f.write(b) for fname in fnames: save_imgs(fname) assert len(os.listdir(IMG_DIR)) 1000 NUM_CHALLENGES python from PIL import Image imgpath IMG_DIR + / + os.listdir(IMG_DIR) 0 imgpath2 IMG_DIR + / + os.listdir(IMG_DIR) 3 im Image.open(example_image_path) im2 Image.open(example_image_path2) IMG_FNAMES IMG_DIR + '/' + p for p in os.listdir(IMG_DIR) python im ! png (imgs/output_8_0.png) python im2 ! png (imgs/output_9_0.png) Convert to black and white Instead of RGB, binarized image saves significant compute. Here we hardcode a threshold and iterate over each pixel to obtain a binary image. python def gray(img_path): convert to grayscale, then binarize img Image.open(img_path).convert( L ) img img.point(lambda x: 255 if x > 200 or x 0 else x) value found through T&E img img.point(lambda x: 0 if x 230 else x) im im.point(lambda x:0 if x 0.5: w txt.rotate( 20 (random() 1), expand 1) og.paste( w, (i 20 + int(25 random()), int(25+30 (random() 1))), w) else: w txt.rotate(20 (random() 1), expand 1) og.paste( w, (i 20 + int(25 random()), int(20 random())), w) segments seg(og) if len(segments) ! 4: return gen_one() ogarr np.array(og) ogarr np.bitwise_or(noiseimg, ogarr) ogarr np.expand_dims(ogarr, axis 2).astype(float) ogarr np.random.random(size (50,100,1)) ogarr ogarr (ogarr > 0.0).astype(float) add noise return ogarr, text def synth_generator(): arrs while True: for _ in range(BATCH_SIZE): arrs.append(gen_one() 0 ) yield np.array(arrs) arrs python def get_image_batch(generator): keras generators may generate an incomplete batch for the last batch img_batch generator.next() if len(img_batch) ! BATCH_SIZE: img_batch generator.next() assert len(img_batch) BATCH_SIZE return img_batch python import matplotlib.pyplot as plt imarr get_image_batch(real_generator) 0, :, :, 0 plt.imshow(imarr) ! png (imgs/output_25_1.png) python imarr get_image_batch(synth_generator()) 0, :, :, 0 print imarr.shape plt.imshow(imarr) (50, 100) ! png (imgs/output_26_2.png) What happened next? Plug all the data in an MNIST like classifier and call it a day. Unfortunately, it's not that simple. I actually spent a long time fine tuning the network but accuracy plateued around 55% sampled. The passing requirement is 10000 out of 15000 submitted or 90% accuracy or 66% per char. I was facing a dilemma: tune the model even further or manually label x amount of data: 0.55 (15000 x) + x 10000 x 3888 Obviously I am not going to label 4000 captchas and break my neck in the process. Meanwhile, there happened a burnt out guy who decided to label all 10000 captchas. This dilligent dude was 2000 in. I asked if he is willing to collaborate on a solution. It's almost like he didn't want to label captchas anymore. He agreed immediately. Using the same model, accuracy immediately shot up to 95% and we both qualified for HackMIT. /aside After the contest, I perfected the model and got 95% without labelling a single image. Here is the model for SimGAN: ! SimGAN Model Definition There are three components to the network: Refiner The refiner network, Rθ, is a residual network (ResNet). It modifies the synthetic image on a pixel level, rather than holistically modifying the image content, preserving the global structure and annotations. Discriminator The discriminator network Dφ, is a simple ConvNet that contains 5 conv layers and 2 max pooling layers. It's abinary classifier that outputs whether a captcha is synthesized or real. Combined Pipe the refined image into discriminator. python def refiner_network(input_image_tensor): :param input_image_tensor: Input tensor that corresponds to a synthetic image. :return: Output tensor that corresponds to a refined synthetic image. def resnet_block(input_features, nb_features 64, nb_kernel_rows 3, nb_kernel_cols 3): A ResNet block with two nb_kernel_rows x nb_kernel_cols convolutional layers, each with nb_features feature maps. See Figure 6 in :param input_features: Input tensor to ResNet block. :return: Output tensor from ResNet block. y layers.Convolution2D(nb_features, nb_kernel_rows, nb_kernel_cols, border_mode 'same')(input_features) y layers.Activation('relu')(y) y layers.Convolution2D(nb_features, nb_kernel_rows, nb_kernel_cols, border_mode 'same')(y) y layers.merge( input_features, y , mode 'sum') return layers.Activation('relu')(y) an input image of size w × h is convolved with 3 × 3 filters that output 64 feature maps x layers.Convolution2D(64, 3, 3, border_mode 'same', activation 'relu')(input_image_tensor) the output is passed through 4 ResNet blocks for _ in range(4): x resnet_block(x) the output of the last ResNet block is passed to a 1 × 1 convolutional layer producing 1 feature map corresponding to the refined synthetic image return layers.Convolution2D(1, 1, 1, border_mode 'same', activation 'tanh')(x) def discriminator_network(input_image_tensor): :param input_image_tensor: Input tensor corresponding to an image, either real or refined. :return: Output tensor that corresponds to the probability of whether an image is real or refined. x layers.Convolution2D(96, 3, 3, border_mode 'same', subsample (2, 2), activation 'relu')(input_image_tensor) x layers.Convolution2D(64, 3, 3, border_mode 'same', subsample (2, 2), activation 'relu')(x) x layers.MaxPooling2D(pool_size (3, 3), border_mode 'same', strides (1, 1))(x) x layers.Convolution2D(32, 3, 3, border_mode 'same', subsample (1, 1), activation 'relu')(x) x layers.Convolution2D(32, 1, 1, border_mode 'same', subsample (1, 1), activation 'relu')(x) x layers.Convolution2D(2, 1, 1, border_mode 'same', subsample (1, 1), activation 'relu')(x) here one feature map corresponds to is_real and the other to is_refined , and the custom loss function is then tf.nn.sparse_softmax_cross_entropy_with_logits return layers.Reshape(( 1, 2))(x) Refiner synthetic_image_tensor layers.Input(shape (HEIGHT, WIDTH, 1)) refined_image_tensor refiner_network(synthetic_image_tensor) refiner_model models.Model(input synthetic_image_tensor, output refined_image_tensor, name 'refiner') Discriminator refined_or_real_image_tensor layers.Input(shape (HEIGHT, WIDTH, 1)) discriminator_output discriminator_network(refined_or_real_image_tensor) discriminator_model models.Model(input refined_or_real_image_tensor, output discriminator_output, name 'discriminator') Combined refiner_model_output refiner_model(synthetic_image_tensor) combined_output discriminator_model(refiner_model_output) combined_model models.Model(input synthetic_image_tensor, output refiner_model_output, combined_output , name 'combined') def self_regularization_loss(y_true, y_pred): delta 0.0001 FIXME: need to figure out an appropriate value for this return tf.multiply(delta, tf.reduce_sum(tf.abs(y_pred y_true))) define custom local adversarial loss (softmax for each image section) for the discriminator the adversarial loss function is the sum of the cross entropy losses over the local patches def local_adversarial_loss(y_true, y_pred): y_true and y_pred have shape (batch_size, of local patches, 2), but really we just want to average over the local patches and batch size so we can reshape to (batch_size of local patches, 2) y_true tf.reshape(y_true, ( 1, 2)) y_pred tf.reshape(y_pred, ( 1, 2)) loss tf.nn.softmax_cross_entropy_with_logits(labels y_true, logits y_pred) return tf.reduce_mean(loss) compile models BATCH_SIZE 512 sgd optimizers.RMSprop() refiner_model.compile(optimizer sgd, loss self_regularization_loss) discriminator_model.compile(optimizer sgd, loss local_adversarial_loss) discriminator_model.trainable False combined_model.compile(optimizer sgd, loss self_regularization_loss, local_adversarial_loss ) Pre training It is not necessary to pre train GANs but it seems pretraining makes GANs converge faster. Here we pre train both models. For the refiner, we train by supplying the identity. For the discriminator, we train with the correct real, synth labeled pairs. python the target labels for the cross entropy loss layer are 0 for every yj (real) and 1 for every xi (refined) y_real np.array( 1.0, 0.0 discriminator_model.output_shape 1 BATCH_SIZE) y_refined np.array( 0.0, 1.0 discriminator_model.output_shape 1 BATCH_SIZE) assert y_real.shape (BATCH_SIZE, discriminator_model.output_shape 1 , 2) python LOG_INTERVAL 10 MODEL_DIR ./model/ print('pre training the refiner network...') gen_loss np.zeros(shape len(refiner_model.metrics_names)) for i in range(100): synthetic_image_batch get_image_batch(synth_generator()) gen_loss np.add(refiner_model.train_on_batch(synthetic_image_batch, synthetic_image_batch), gen_loss) log every log_interval steps if not i % LOG_INTERVAL: print('Refiner model self regularization loss: {}.'.format(gen_loss / LOG_INTERVAL)) gen_loss np.zeros(shape len(refiner_model.metrics_names)) refiner_model.save(os.path.join(MODEL_DIR, 'refiner_model_pre_trained.h5')) pre training the refiner network... Saving batch of refined images during pre training at step: 0. Refiner model self regularization loss: 0.05277019 . Saving batch of refined images during pre training at step: 10. Refiner model self regularization loss: 4.2269813 . Saving batch of refined images during pre training at step: 20. Refiner model self regularization loss: 0.76108101 . Saving batch of refined images during pre training at step: 30. Refiner model self regularization loss: 0.28633648 . Saving batch of refined images during pre training at step: 40. Refiner model self regularization loss: 0.19448772 . Saving batch of refined images during pre training at step: 50. Refiner model self regularization loss: 0.16131182 . Saving batch of refined images during pre training at step: 60. Refiner model self regularization loss: 0.11931724 . Saving batch of refined images during pre training at step: 70. Refiner model self regularization loss: 0.11075923 . Saving batch of refined images during pre training at step: 80. Refiner model self regularization loss: 0.10888441 . Saving batch of refined images during pre training at step: 90. Refiner model self regularization loss: 0.10765313 . python from tqdm import tqdm print('pre training the discriminator network...') disc_loss np.zeros(shape len(discriminator_model.metrics_names)) for _ in tqdm(range(100)): real_image_batch get_image_batch(real_generator) disc_loss np.add(discriminator_model.train_on_batch(real_image_batch, y_real), disc_loss) synthetic_image_batch get_image_batch(synth_generator()) refined_image_batch refiner_model.predict_on_batch(synthetic_image_batch) disc_loss np.add(discriminator_model.train_on_batch(refined_image_batch, y_refined), disc_loss) discriminator_model.save(os.path.join(MODEL_DIR, 'discriminator_model_pre_trained.h5')) hard coded for now print('Discriminator model loss: {}.'.format(disc_loss / (100 2))) pre training the discriminator network... Discriminator model loss: 0.04783788 . Training This is the most important training step in which we refine a synthesized captcha, then pass it through the discriminator and backprop gradients. python from image_history_buffer import ImageHistoryBuffer k_d 1 number of discriminator updates per step k_g 2 number of generative network updates per step nb_steps 1000 TODO: what is an appropriate size for the image history buffer? image_history_buffer ImageHistoryBuffer((0, HEIGHT, WIDTH, 1), BATCH_SIZE 100, BATCH_SIZE) combined_loss np.zeros(shape len(combined_model.metrics_names)) disc_loss_real np.zeros(shape len(discriminator_model.metrics_names)) disc_loss_refined np.zeros(shape len(discriminator_model.metrics_names)) see Algorithm 1 in for i in range(nb_steps): print('Step: {} of {}.'.format(i, nb_steps)) train the refiner for _ in range(k_g 2): sample a mini batch of synthetic images synthetic_image_batch get_image_batch(synth_generator()) update θ by taking an SGD step on mini batch loss LR(θ) combined_loss np.add(combined_model.train_on_batch(synthetic_image_batch, synthetic_image_batch, y_real ), combined_loss) for _ in range(k_d): sample a mini batch of synthetic and real images synthetic_image_batch get_image_batch(synth_generator()) real_image_batch get_image_batch(real_generator) refine the synthetic images w/ the current refiner refined_image_batch refiner_model.predict_on_batch(synthetic_image_batch) use a history of refined images half_batch_from_image_history image_history_buffer.get_from_image_history_buffer() image_history_buffer.add_to_image_history_buffer(refined_image_batch) if len(half_batch_from_image_history): refined_image_batch :batch_size // 2 half_batch_from_image_history update φ by taking an SGD step on mini batch loss LD(φ) disc_loss_real np.add(discriminator_model.train_on_batch(real_image_batch, y_real), disc_loss_real) disc_loss_refined np.add(discriminator_model.train_on_batch(refined_image_batch, y_refined), disc_loss_refined) if not i % LOG_INTERVAL: log loss summary print('Refiner model loss: {}.'.format(combined_loss / (LOG_INTERVAL k_g 2))) print('Discriminator model loss real: {}.'.format(disc_loss_real / (LOG_INTERVAL k_d 2))) print('Discriminator model loss refined: {}.'.format(disc_loss_refined / (LOG_INTERVAL k_d 2))) combined_loss np.zeros(shape len(combined_model.metrics_names)) disc_loss_real np.zeros(shape len(discriminator_model.metrics_names)) disc_loss_refined np.zeros(shape len(discriminator_model.metrics_names)) save model checkpoints model_checkpoint_base_name os.path.join(MODEL_DIR, '{}_model_step_{}.h5') refiner_model.save(model_checkpoint_base_name.format('refiner', i)) discriminator_model.save(model_checkpoint_base_name.format('discriminator', i)) Step: 0 of 1000. Saving batch of refined images at adversarial step: 0. Refiner model loss: 2.46834831 0.01272553 2.45562277 . Discriminator model loss real: 2.27849432e 07 . Discriminator model loss refined: 1.63936726e 05 . Step: 1 of 1000. Step: 2 of 1000. Step: 3 of 1000. Step: 4 of 1000. Step: 5 of 1000. Step: 6 of 1000. Step: 7 of 1000. Step: 8 of 1000. Step: 9 of 1000. Step: 10 of 1000. Saving batch of refined images at adversarial step: 10. Refiner model loss: 27.00968537 0.11238954 26.8972959 . Discriminator model loss real: 1.26835085e 10 . Discriminator model loss refined: 4.44882481e 08 . Step: 11 of 1000. Step: 12 of 1000. Step: 13 of 1000. Step: 14 of 1000. Step: 15 of 1000. Step: 16 of 1000. Step: 17 of 1000. Step: 18 of 1000. Step: 19 of 1000. Step: 20 of 1000. Saving batch of refined images at adversarial step: 20. Refiner model loss: 26.89902883 0.10987803 26.78915081 . Discriminator model loss real: 1.48619811e 07 . Discriminator model loss refined: 4.60907181e 08 . Step: 21 of 1000. Step: 22 of 1000. Step: 23 of 1000. Step: 24 of 1000. Step: 25 of 1000. Step: 26 of 1000. Step: 27 of 1000. Step: 28 of 1000. Step: 29 of 1000. Step: 30 of 1000. Saving batch of refined images at adversarial step: 30. Refiner model loss: 25.93090506 0.10890296 25.82200208 . Discriminator model loss real: 3.96611703e 09 . Discriminator model loss refined: 5.07067440e 08 . Step: 31 of 1000. Step: 32 of 1000. Step: 33 of 1000. Step: 34 of 1000. Step: 35 of 1000. Step: 36 of 1000. Step: 37 of 1000. Step: 38 of 1000. Step: 39 of 1000. Step: 40 of 1000. Saving batch of refined images at adversarial step: 40. Refiner model loss: 28.67232819 2.33041485 26.34191332 . Results of SimGAN: As you can see below, we no longer have the cookie cutter fonts. There are quite a few artifacts that did not exist before refinement. The edges are blurred and noisy which is impossible to simulate heuristically. And it is exactly these tiny things that renders MNIST like convnet useless. Now the refined results are basically the original captchas. python synthetic_image_batch get_image_batch(synth_generator()) arr refiner_model.predict_on_batch(synthetic_image_batch) python plt.imshow(arr 200, :, :, 0 ) ! png (imgs/output_38_1.png) python plt.imshow(get_image_batch(real_generator) 2,:,:,0 ) ! png (imgs/output_39_1.png) MNIST for Captcha Now we finish the puzzle by building an MNIST like convnet to predict captcha labels. python n_class len(alphanumeric) def mnist_raw_generator(batch_size 128): X np.zeros((batch_size, HEIGHT, WIDTH, 1), dtype np.uint8) y np.zeros((batch_size, n_class), dtype np.uint8) for _ in range(4) 4 chars while True: for i in range(batch_size): im, random_str gen_one() X i im for j, ch in enumerate(random_str): y j i, : 0 y j i, alphanumeric.find(ch) 1 yield np.array(X), y def mnist_generator(batch_size 128): X np.zeros((batch_size, HEIGHT, WIDTH, 1), dtype np.uint8) y np.zeros((batch_size, n_class), dtype np.uint8) for _ in range(4) 4 chars while True: for i in range(batch_size): im, random_str gen_one() X i im for j, ch in enumerate(random_str): y j i, : 0 y j i, alphanumeric.find(ch) 1 yield refiner_model.predict(np.array(X)), y mg mnist_generator().next() plt.imshow(mg 0 0,:,:,0 ) sanity check python from keras.layers import input_tensor Input((HEIGHT, WIDTH, 1)) x input_tensor x Conv2D(32, kernel_size (3, 3), activation 'relu')(x) for _ in range(4): x Conv2D(128, (3, 3), activation 'relu')(x) x MaxPooling2D(pool_size (2, 2))(x) x Dropout(0.25)(x) x Flatten()(x) x Dense(128, activation 'relu')(x) x Dropout(0.5)(x) x Dense(n_class, activation 'softmax', name 'c%d'%(i+1))(x) for i in range(4) model models.Model(inputs input_tensor, outputs x) model.compile(loss 'categorical_crossentropy', optimizer 'rmsprop', metrics 'accuracy' ) python from keras.callbacks import History history History() model.fit_generator(mnist_generator(), steps_per_epoch 1000, epochs 20, callbacks history ) Epoch 1/20 341/1000 >.................... ETA: 376s loss: 2.7648 c1_loss: 0.6493 c2_loss: 0.6757 c3_loss: 0.6681 c4_loss: 0.7717 c1_acc: 0.8199 c2_acc: 0.8185 c3_acc: 0.8197 c4_acc: 0.7820 Obviously you will need to keep training as the per character accuracy is only 80% Let's test the trained model Synthetic python def decode(y): y np.argmax(np.array(y), axis 2) :,0 return ''.join( alphanumeric x for x in y ) X, y next(mnist_generator(1)) plt.title('real: %s\npred:%s'%(decode(y), decode(y_pred))) plt.imshow(X 0, :, :, 0 , cmap 'gray') plt.axis('off') ( 0.5, 99.5, 49.5, 0.5) ! png (imgs/output_45_2.png) Real python X next(real_generator) X refiner_model.predict(X) y_pred model.predict(X) plt.title('pred:%s'%(decode(y_pred))) plt.imshow(X 0,:,:,0 , cmap 'gray') plt.axis('off') ( 0.5, 99.5, 49.5, 0.5) ! png (imgs/output_47_1.png) python",Image-to-Image Translation,Image-to-Image Translation 2310,Computer Vision,Computer Vision,Computer Vision,"pix2pix tensorflow Based on pix2pix by Isola et al. Article about this implemention Interactive Demo Tensorflow implementation of pix2pix. Learns a mapping from input images to output images, like these examples from the original paper: This port is based directly on the torch implementation, and not on an existing Tensorflow implementation. It is meant to be a faithful implementation of the original work and so does not add anything. The processing speed on a GPU with cuDNN was equivalent to the Torch implementation in testing. Setup Prerequisites Tensorflow 1.4.1 Recommended Linux with Tensorflow GPU edition + cuDNN Getting Started sh clone this repo git clone cd pix2pix tensorflow download the CMP Facades dataset (generated from python tools/download dataset.py facades train the model (this may take 1 8 hours depending on GPU, on CPU you will be waiting for a bit) python pix2pix.py \ mode train \ output_dir facades_train \ max_epochs 200 \ input_dir facades/train \ which_direction BtoA test the model python pix2pix.py \ mode test \ output_dir facades_test \ input_dir facades/val \ checkpoint facades_train The test run will output an HTML file at facades_test/index.html that shows input/output/target image sets. If you have Docker installed, you can use the provided Docker image to run pix2pix without installing the correct version of Tensorflow: sh train the model python tools/dockrun.py python pix2pix.py \ mode train \ output_dir facades_train \ max_epochs 200 \ input_dir facades/train \ which_direction BtoA test the model python tools/dockrun.py python pix2pix.py \ mode test \ output_dir facades_test \ input_dir facades/val \ checkpoint facades_train Datasets and Trained Models The data format used by this program is the same as the original pix2pix format, which consists of images of input and desired output side by side like: For example: Some datasets have been made available by the authors of the pix2pix paper. To download those datasets, use the included script tools/download dataset.py . There are also links to pre trained models alongside each dataset, note that these pre trained models require the current version of pix2pix.py: dataset example python tools/download dataset.py facades 400 images from CMP Facades dataset . (31MB) Pre trained: BtoA python tools/download dataset.py cityscapes 2975 images from the Cityscapes training set . (113M) Pre trained: AtoB BtoA python tools/download dataset.py maps 1096 training images scraped from Google Maps (246M) Pre trained: AtoB BtoA python tools/download dataset.py edges2shoes 50k training images from UT Zappos50K dataset . Edges are computed by HED edge detector + post processing. (2.2GB) Pre trained: AtoB python tools/download dataset.py edges2handbags 137K Amazon Handbag images from iGAN project . Edges are computed by HED edge detector + post processing. (8.6GB) Pre trained: AtoB The facades dataset is the smallest and easiest to get started with. Creating your own dataset Example: creating images with blank centers for inpainting sh Resize source images python tools/process.py \ input_dir photos/original \ operation resize \ output_dir photos/resized Create images with blank centers python tools/process.py \ input_dir photos/resized \ operation blank \ output_dir photos/blank Combine resized images with blanked images python tools/process.py \ input_dir photos/resized \ b_dir photos/blank \ operation combine \ output_dir photos/combined Split into train/val set python tools/split.py \ dir photos/combined The folder photos/combined will now have train and val subfolders that you can use for training and testing. Creating image pairs from existing images If you have two directories a and b , with corresponding images (same name, same dimensions, different data) you can combine them with process.py : sh python tools/process.py \ input_dir a \ b_dir b \ operation combine \ output_dir c This puts the images in a side by side combined image that pix2pix.py expects. Colorization For colorization, your images should ideally all be the same aspect ratio. You can resize and crop them with the resize command: sh python tools/process.py \ input_dir photos/original \ operation resize \ output_dir photos/resized No other processing is required, the colorization mode (see Training section below) uses single images instead of image pairs. Training Image Pairs For normal training with image pairs, you need to specify which directory contains the training images, and which direction to train on. The direction options are AtoB or BtoA sh python pix2pix.py \ mode train \ output_dir facades_train \ max_epochs 200 \ input_dir facades/train \ which_direction BtoA Colorization pix2pix.py includes special code to handle colorization with single images instead of pairs, using that looks like this: sh python pix2pix.py \ mode train \ output_dir photos_train \ max_epochs 200 \ input_dir photos/train \ lab_colorization In this mode, image A is the black and white image (lightness only), and image B contains the color channels of that image (no lightness information). Tips You can look at the loss and computation graph using tensorboard: sh tensorboard logdir facades_train If you wish to write in progress pictures as the network is training, use display_freq 50 . This will update facades_train/index.html every 50 steps with the current training inputs and outputs. Testing Testing is done with mode test . You should specify the checkpoint to use with checkpoint , this should point to the output_dir that you created previously with mode train : sh python pix2pix.py \ mode test \ output_dir facades_test \ input_dir facades/val \ checkpoint facades_train The testing mode will load some of the configuration options from the checkpoint provided so you do not need to specify which_direction for instance. The test run will output an HTML file at facades_test/index.html that shows input/output/target image sets: Code Validation Validation of the code was performed on a Linux machine with a 1.3 TFLOPS Nvidia GTX 750 Ti GPU and an Azure NC6 instance with a K80 GPU. sh git clone cd pix2pix tensorflow python tools/download dataset.py facades sudo nvidia docker run \ volume $PWD:/prj \ workdir /prj \ env PYTHONUNBUFFERED x \ affinelayer/pix2pix tensorflow \ python pix2pix.py \ mode train \ output_dir facades_train \ max_epochs 200 \ input_dir facades/train \ which_direction BtoA sudo nvidia docker run \ volume $PWD:/prj \ workdir /prj \ env PYTHONUNBUFFERED x \ affinelayer/pix2pix tensorflow \ python pix2pix.py \ mode test \ output_dir facades_test \ input_dir facades/val \ checkpoint facades_train Comparison on facades dataset: Input Tensorflow Torch Target Unimplemented Features The following models have not been implemented: defineG_encoder_decoder defineG_unet_128 defineD_pixelGAN Citation If you use this code for your research, please cite the paper this code is based on: Image to Image Translation Using Conditional Adversarial Networks : @article{pix2pix2016, title {Image to Image Translation with Conditional Adversarial Networks}, author {Isola, Phillip and Zhu, Jun Yan and Zhou, Tinghui and Efros, Alexei A}, journal {arxiv}, year {2016} } Acknowledgments This is a port of pix2pix from Torch to Tensorflow. It also contains colorspace conversion code ported from Torch. Thanks to the Tensorflow team for making such a quality library! And special thanks to Phillip Isola for answering my questions about the pix2pix code.",Image-to-Image Translation,Image-to-Image Translation 2318,Computer Vision,Computer Vision,Computer Vision,"Image Translation using Conditional Adversarial Network Conditional GAN for pix2pix Aerial to Map images dataset . This is an implementation of Condition GAN based on a paper by Isola et al. Link : Implemantated using Tensorflow framework for python. This implementation has a U Net generator as proposed by Isola et al. and also a ResNet 9 generator, of which, any one can be chosen. Optionally, noise can also added to input of generator. Results The performance of each model was evaluated by compluting Mean Squared Error (MSE) between the predicted and expected output on test data. Model MSE U Net 0.01834 U Net with noise 0.01852 ResNet 9 0.01443 ResNet 9 with noise 0.01476 The study showed that ResNet 9 generator performed about 21.3% better as compared to U Net generator. Although, addition of noise to generator input did not improve results. See Project Report for details on results, model architecture, choice of hyperparameters, data preprocessing, etc. Usage Steps to use code 1. Import required functions. from initializeDataset import load_dataset from c_gan import C_GAN, UNetGenerator, ResNet9Generator, PatchDiscriminator 2. Create object of C_GAN class. for U Net generator gan C_GAN(generator UNetGenerator(), discriminator PatchDiscriminator()) for ResNet 9 generator gan C_GAN(generator ResNet9Generator(), discriminator PatchDiscriminator()) for U Net generator with noise added to inputs gan C_GAN(generator UNetGenerator(noise True), discriminator PatchDiscriminator()) for ResNet 9 generator with noise added to inputs gan C_GAN(generator UNetGenerator(noise True), discriminator PatchDiscriminator()) Default training rate is 2e 4 for both generator and discriminator Use 'generator_learning_rate' and 'discriminator_learning_rate' parameters to specify different learning rates Use 'ckpt_freq' to change frequency of saving TF checkpoint, default frequency is 20 epochs 3. Train model. EPOCHS 200 gan.train(train_dataset, EPOCHS) 4. Evaluate model (Calculate Mean Sqaured Error). mse gan.evaluate(test_dataset) 5. Generate samples outputs. NUM_SAMPLES 10 for tar, img in test_dataset.take(NUM_SAMPLES): gan.generate_image(img, tar) 6. Predict map image from a given aerial image. Input image can be of any size (square images are prefered) Outputs is a 3 x 256 x 256 RGB image of prediction (range of values 1, 1 ) gan.predict(input_image) 7. Restoring model from checkpoint. gan.restore_from_checkpoint() 8. Output model summary Make sure model is compiled before this gan.summary()",Image-to-Image Translation,Image-to-Image Translation 2449,Computer Vision,Computer Vision,Computer Vision,"Domain adaptation on segmentation Some useful Domain adaptation code > This repository collect some useful code fun others. And we will implement them in the future. So you can use them easily. Table of Contents Blog papers ( papers) Talks ( talks) Datasets ( datasets) Papers Learning to Adapt Structured Output Space for Semantic Segmentation paper link: code: provided in Folder: Adapt Structured Output main code forked from tutorial Video () No More Discrimination: Cross City Adaptation of Road Scene Segmenters paper link: code: provided in Folder: Adapt_Road_Scene tutorial Video () FCNs in the Wild: Pixel level Adversarial and Constraint based Adaptation paper link: code: provided in Folder: FCNs_Wild tutorial Video () Maximum Classifier Discrepancy for Domain Adaptation with Semantic Segmentation paper link: code: provided in Folder: MCD_DA_seg main code forked from tutorial Video () Talks Learning to Adapt Structured Output Space for Semantic Segmentation , Wei Chih Hung. No More Discrimination: Cross City Adaptation of Road Scene Segmenters , Yu Ting Chen. Maximum Classifier Discrepancy for Domain Adaptation with Semantic Segmentation , Kuniaki Saito. Datasets Synthia Dataset ( Download the subset SYNTHIA RAND CITYSCAPES ) Cityscapes Dataset Our Dataset contains four subsets Taipei, Tokyo, Roma, Rio used as target domain (only testing data has annotations)",Image-to-Image Translation,Image-to-Image Translation 2472,Computer Vision,Computer Vision,Computer Vision,"pix2pix Project Arxiv PyTorch Torch implementation for learning a mapping from input images to output images, for example: Image to Image Translation with Conditional Adversarial Networks Phillip Isola , Jun Yan Zhu , Tinghui Zhou , Alexei A. Efros CVPR, 2017. On some tasks, decent results can be obtained fairly quickly and on small datasets. For example, to learn to generate facades (example shown above), we trained on just 400 images for about 2 hours (on a single Pascal Titan X GPU). However, for harder problems it may be important to train on far larger datasets, and for many hours or even days. Note : Please check out our PyTorch implementation for pix2pix and CycleGAN. The PyTorch version is under active development and can produce results comparable or better than this Torch version. Setup Prerequisites Linux or OSX NVIDIA GPU + CUDA CuDNN (CPU mode and CUDA without CuDNN may work with minimal modification, but untested) Getting Started Install torch and dependencies from Install torch packages nngraph and display bash luarocks install nngraph luarocks install Clone this repo: bash git clone git@github.com:phillipi/pix2pix.git cd pix2pix Download the dataset (e.g. CMP Facades ): bash bash ./datasets/download_dataset.sh facades Train the model bash DATA_ROOT ./datasets/facades name facades_generation which_direction BtoA th train.lua (CPU only) The same training command without using a GPU or CUDNN. Setting the environment variables gpu 0 cudnn 0 forces CPU only bash DATA_ROOT ./datasets/facades name facades_generation which_direction BtoA gpu 0 cudnn 0 batchSize 10 save_epoch_freq 5 th train.lua (Optionally) start the display server to view results as the model trains. ( See Display UI ( display ui) for more details): bash th ldisplay.start 8000 0.0.0.0 Finally, test the model: bash DATA_ROOT ./datasets/facades name facades_generation which_direction BtoA phase val th test.lua The test results will be saved to an html file here: ./results/facades_generation/latest_net_G_val/index.html . Train bash DATA_ROOT /path/to/data/ name expt_name which_direction AtoB th train.lua Switch AtoB to BtoA to train translation in opposite direction. Models are saved to ./checkpoints/expt_name (can be changed by passing checkpoint_dir your_dir in train.lua). See opt in train.lua for additional training options. Test bash DATA_ROOT /path/to/data/ name expt_name which_direction AtoB phase val th test.lua This will run the model named expt_name in direction AtoB on all images in /path/to/data/val . Result images, and a webpage to view them, are saved to ./results/expt_name (can be changed by passing results_dir your_dir in test.lua). See opt in test.lua for additional testing options. Datasets Download the datasets using the following script. Some of the datasets are collected by other researchers. Please cite their papers if you use the data. bash bash ./datasets/download_dataset.sh dataset_name facades : 400 images from CMP Facades dataset . Citation (datasets/bibtex/facades.tex) cityscapes : 2975 images from the Cityscapes training set . Citation (datasets/bibtex/cityscapes.tex) maps : 1096 training images scraped from Google Maps edges2shoes : 50k training images from UT Zappos50K dataset . Edges are computed by HED edge detector + post processing. Citation (datasets/bibtex/shoes.tex) edges2handbags : 137K Amazon Handbag images from iGAN project . Edges are computed by HED edge detector + post processing. Citation (datasets/bibtex/handbags.tex) night2day : around 20K natural scene images from Transient Attributes dataset Citation (datasets/bibtex/transattr.tex) . To train a day2night pix2pix model, you need to add which_direction BtoA . Models Download the pre trained models with the following script. You need to rename the model (e.g. facades_label2image to /checkpoints/facades/latest_net_G.t7 ) after the download has finished. bash bash ./models/download_model.sh model_name facades_label2image (label > facade): trained on the CMP Facades dataset. cityscapes_label2image (label > street scene): trained on the Cityscapes dataset. cityscapes_image2label (street scene > label): trained on the Cityscapes dataset. edges2shoes (edge > photo): trained on UT Zappos50K dataset. edges2handbags (edge > photo): trained on Amazon handbags images. day2night (daytime scene > nighttime scene): trained on around 100 webcams . Setup Training and Test data Generating Pairs We provide a python script to generate training data in the form of pairs of images {A,B}, where A and B are two different depicitions of the same underlying scene. For example, these might be pairs {label map, photo} or {bw image, color image}. Then we can learn to translate A to B or B to A: Create folder /path/to/data with subfolders A and B . A and B should each have their own subfolders train , val , test , etc. In /path/to/data/A/train , put training images in style A. In /path/to/data/B/train , put the corresponding images in style B. Repeat same for other data splits ( val , test , etc). Corresponding images in a pair {A,B} must be the same size and have the same filename, e.g. /path/to/data/A/train/1.jpg is considered to correspond to /path/to/data/B/train/1.jpg . Once the data is formatted this way, call: bash python scripts/combine_A_and_B.py fold_A /path/to/data/A fold_B /path/to/data/B fold_AB /path/to/data This will combine each pair of images (A,B) into a single image file, ready for training. Notes on Colorization No need to run combine_A_and_B.py for colorization. Instead, you just need to prepare some natural images, and set preprocess colorization in the script. The program will automatically convert each RGB image into Lab color space, and create L > ab image pair during the training. Also set input_nc 1 and output_nc 2 . Extracting Edges We provide python and Matlab scripts to extract coarse edges from photos. Run scripts/edges/batch_hed.py to compute HED edges. Run scripts/edges/PostprocessHED.m to simplify edges with additional post processing steps. Check the code documentation for more details. Evaluating Labels2Photos on Cityscapes We provide scripts for running the evaluation of the Labels2Photos task on the Cityscapes validation set. We assume that you have installed caffe (and pycaffe ) in your system. If not, see the official website for installation instructions. Once caffe is successfully installed, download the pre trained FCN 8s semantic segmentation model (512MB) by running bash bash ./scripts/eval_cityscapes/download_fcn8s.sh Then make sure ./scripts/eval_cityscapes/ is in your system's python path. If not, run the following command to add it bash export PYTHONPATH ${PYTHONPATH}:./scripts/eval_cityscapes/ Now you can run the following command to evaluate your predictions: bash python ./scripts/eval_cityscapes/evaluate.py cityscapes_dir /path/to/original/cityscapes/dataset/ result_dir /path/to/your/predictions/ output_dir /path/to/output/directory/ By default, images in your prediction result directory have the same naming convention as the Cityscapes dataset (e.g. frankfurt_000001_038418_leftImg8bit.png ). The script will output a txt file under output_dir containing the metric. Further notes : The pre trained model does not work well on Cityscapes in the original resolution (1024x2048) as it was trained on 256x256 images that are resized to 1024x2048. The purpose of the resizing was to 1) keep the label maps in the original high resolution untouched and 2) avoid the need of changing the standard FCN training code for Cityscapes. To get the ground truth numbers in the paper, you need to resize the original Cityscapes images to 256x256 before running the evaluation code. Display UI Optionally, for displaying images during training and test, use the display package . Install it with: luarocks install Then start the server with: th ldisplay.start Open this URL in your browser: By default, the server listens on localhost. Pass 0.0.0.0 to allow external connections on any interface: bash th ldisplay.start 8000 0.0.0.0 Then open in your browser to load the remote desktop. L1 error is plotted to the display by default. Set the environment variable display_plot to a comma seperated list of values errL1 , errG and errD to visualize the L1, generator, and descriminator error respectively. For example, to plot only the generator and descriminator errors to the display instead of the default L1 error, set display_plot errG,errD .",Image-to-Image Translation,Image-to-Image Translation 2475,Computer Vision,Computer Vision,Computer Vision,"Valid are: apple2orange, summer2winter_yosemite, horse2zebra, monet2photo, cezanne2photo, ukiyoe2photo, vangogh2photo, maps, cityscapes, facades, iphone2dslr_flower, ae_photos Alternatively you can build your own dataset by setting up the following directory structure: . ├── datasets ├── i.e. apple2orange ├── trainA Contains domain A images (i.e. apple) ├── trainB Contains domain B images (i.e. orange) ├── testA Testing └── testB Testing 2. Train python train.py dataroot datasets/ / cuda This command will start a training session using the images under the dataroot/train directory with the hyperparameters that showed best results according to CycleGAN authors. Both generators and discriminators weights will be saved ./output/ / the output directory. If you don't own a GPU remove the cuda option, although I advise you to get one! Testing The pre trained file is on Google drive . Download the file and save it on ./output/ /netG_A2B.pth and ./output/ /netG_B2A.pth . python test.py dataroot datasets/ / cuda This command will take the images under the dataroot/testA/ and dataroot/testB/ directory, run them through the generators and save the output under the ./output/ / directories. Examples of the generated outputs (default params) apple2orange, summer2winter_yosemite, horse2zebra dataset: ! Alt text (./output/imgs/0167.png) ! Alt text (./output/imgs/0035.png) ! Alt text (./output/imgs/0111.png) Acknowledgments Code is modified by PyTorch CycleGAN . All credit goes to the authors of CycleGAN , Zhu, Jun Yan and Park, Taesung and Isola, Phillip and Efros, Alexei A. cvfve_hw1 homework1 color transfer >>>>>>> ac3b5e8b0f43ab0125b6dabe321993271a7f2ed0",Image-to-Image Translation,Image-to-Image Translation 2501,Computer Vision,Computer Vision,Computer Vision,"face generator Based on pix2pix by Isola et al. Keras implementation of pix2pix and ICGAN . Here is a demo of HTML: jitang.info Setup Prerequisites Keras 2.2.0 clone this repo git clone \ cd face generator prepare the dataset (Celeba) Put input sketches in folder input/ \ Put output images in folder output/ \ (if you want to train ICGAN, put label description in root path label.txt ) train the pix2pix model sh python train.py \ mode pix2pix \ input_dir input/ \ outnput_dir output/ \ max_epochs 10 \ summary_freq 1 \ sample_freq 50 \ sample_dir images/ \ save_freq 1000 \ batch_size 1 \ ngf 64 \ ndf 64 \ scale_size 256 train the ICGAN model sh python train.py \ mode ICGAN \ input_dir input/ \ outnput_dir output/ \ label_dir 'label.txt' \ attributes 2 \ max_epochs 10 \ summary_freq 1 \ sample_freq 50 \ sample_dir images/ \ save_freq 1000 \ batch_size 1 \ ngf 64 \ ndf 64 \ scale_size 256",Image-to-Image Translation,Image-to-Image Translation 2515,Computer Vision,Computer Vision,Computer Vision,"Image segmentation Preferred Anaconda Python distribution PyCharm Getting Started Create environment and install requirements Clone this repository bash git clone Create directories bash mkdir datasets mkdir temp Download Drosophila VNC dataset bash git clone datasets/vnc Download Mouse Cortex dataset bash git clone datasets/cortex Run the examples (docs/README.md) TODO Major issue Evaluation of the labelling of EM images Survey on network type Survey on loss Survey on network depth (number of layers) Survey on amount of training New Survey on filter size and depth (number of features) Minor issues For prediction and text read an image from input path and determine width, height Support for 1 channel (gray scale) png as input and output flexible learning rate for the Adams solver Future features one hot coding for labels from index colors (e.g. up to 256 categories in gif images) DONE Prediction directly from images to labels (with same filenames) Keeping only classic losses, classic networks Acknowledgement This repository is based on this Tensorflow implementation of the paired image to image translation ( Isola et al., 2016 ) Highway and dense net were adapted from the implementation exemplified in this blog entry . Citation If you use this code for your research, please cite the papers this code is based on: @inproceedings{johnson2016perceptual, title {Perceptual losses for real time style transfer and super resolution}, author {Johnson, Justin and Alahi, Alexandre and Fei Fei, Li}, booktitle {European Conference on Computer Vision}, pages {694 711}, year {2016}, organization {Springer} } @article{He2016identity, title {Identity Mappings in Deep Residual Networks}, author {Kaiming He and Xiangyu Zhang and Shaoqing Ren and Jian Sun}, journal {arXiv preprint arXiv:1603.05027}, year {2016}} @article{Srivastava2015highway, title {Highway Networks}, author {Rupesh Kumar Srivastava and Klaus Greff and J{\ {u}}rgen Schmidhuber}, journal {arXiv preprint arXiv:1505.00387}, year {2015} } @article{Huang2016dense, title {Densely Connected Convolutional Networks}, author {Gao Huang and Zhuang Liu and Kilian Q. Weinberger}, journal {arXiv preprint arXiv:1608.06993}, year {2016} }",Image-to-Image Translation,Image-to-Image Translation 2516,Computer Vision,Computer Vision,Computer Vision,"Image translation by CycleGAN and pix2pix in Tensorflow This is my ongoing tensorflow implementation for unpaired image to image translation ( Zhu et al., 2017 ). Latest results can be found here, comparing (docs/run_1.md) paired and unpaired image to image translation. Image to image translation learns a mapping from input images to output images, like these examples from the original papers: CycleGAN: Project Paper Torch Pix2pix: Project Paper Torch Prerequisites Linux or OSX. Python 2 or Python 3. CPU or NVIDIA GPU + CUDA CuDNN. Requirements Tensorflow 1.0 Preferred Anaconda Python distribution PyCharm Getting Started Clone this repository sh git clone cd imagetranslation tensorflow Install Tensorflow, e.g. with Anaconda Create directories or symlink sh mkdir datasets or symlink; for datasets mkdir temp or symlink; for checkpoints, test results Download the CMP Facades dataset (generated from sh python tools/download dataset.py facades datasets Train the model (this may take 1 8 hours depending on GPU, on CPU you will be waiting for a bit) sh python translate.py \ model pix2pix \ mode train \ output_dir temp/facades_train \ max_epochs 200 \ input_dir datasets/facades/train \ which_direction BtoA Test the model sh python translate.py \ model pix2pix \ mode test \ output_dir temp/facades_test \ input_dir datasets/facades/val \ checkpoint temp/facades_train The test run will output an HTML file at temp/facades_test/index.html that shows input/output/target image sets. For training of the CycleGAN use model CycleGAN instead of model pix2pix . Both models use u net as generator by default but can use faststyle net when specified by generator faststyle . You can look at the loss and computation graph for pix2pix (docs/run_1_images/Graph_Pix2Pix.png) and CycleGAN (docs/run_1_images/Graph_CycleGAN.png) using tensorboard: sh tensorboard logdir temp/facades_train If you wish to write in progress pictures as the network is training, use display_freq 50 . This will update temp/facades_train/index.html every 50 steps with the current training inputs and outputs. TODO Finish CycleGAN implementation according to publication Hu et al., 2017 Major issues test u net declaration with decoder using encoder dimensions (fix crash when height and width other than powers of 2) test other datasets, show results on README.md Minor issues add image buffer that stores the previous image (to update discriminators using a history of 50 generated images) add instance normalization ( Ulyanov D et al., 2016 ) flexible learning rate for the Adams solver add one direction test mode for CycleGAN add identity loss Done test CycleGAN with u net generator and log loss and compare with pix2pix: OK (docs/run_1.md) test CycleGAN with faststyle net generator and log loss: OK (docs/run_1.md) square loss and several options for loss function for generator (maximising discriminator loss, ...) refactor summary and export of images to work for all models: Pix2Pix, CycleGAN, Pix2Pix2 two batches delivering unpaired images for CycleGAN import of images from different subdirectories different (classic) loss function recursive implementations for u net res net, highway net, dense net implementation with endcoder/decoder as in faststyle res net tested transfer of generators from paired to unpaired Acknowledgement This repository is based on this Tensorflow implementation of the paired image to image translation ( Isola et al., 2016 ) Highway and dense net were adapted from the implementation exemplified in this blog entry . Citation If you use this code for your research, please cite the papers this code is based on: tex @article{pix2pix2016, title {Image to Image Translation with Conditional Adversarial Networks}, author {Isola, Phillip and Zhu, Jun Yan and Zhou, Tinghui and Efros, Alexei A}, journal {arXiv preprint arXiv:1611.07004v1}, year {2016} } @article{CycleGAN2017, title {Unpaired Image to Image Translation using Cycle Consistent Adversarial Networkss}, author {Zhu, Jun Yan and Park, Taesung and Isola, Phillip and Efros, Alexei A}, journal {arXiv preprint arXiv:1703.10593}, year {2017} } @inproceedings{johnson2016perceptual, title {Perceptual losses for real time style transfer and super resolution}, author {Johnson, Justin and Alahi, Alexandre and Fei Fei, Li}, booktitle {European Conference on Computer Vision}, pages {694 711}, year {2016}, organization {Springer} } @article{He2016identity, title {Identity Mappings in Deep Residual Networks}, author {Kaiming He and Xiangyu Zhang and Shaoqing Ren and Jian Sun}, journal {arXiv preprint arXiv:1603.05027}, year {2016}} @article{Srivastava2015highway, title {Highway Networks}, author {Rupesh Kumar Srivastava and Klaus Greff and J{\ {u}}rgen Schmidhuber}, journal {arXiv preprint arXiv:1505.00387}, year {2015} } @article{Huang2016dense, title {Densely Connected Convolutional Networks}, author {Gao Huang and Zhuang Liu and Kilian Q. Weinberger}, journal {arXiv preprint arXiv:1608.06993}, year {2016} }",Image-to-Image Translation,Image-to-Image Translation 2528,Computer Vision,Computer Vision,Computer Vision,"2018 dlcv team2 DLCV 2018 Team 2 In this project we used the architecture of Pix2Pix, which is a Conditional GAN, to colourise facades of buildings. Then we tried to transfer this learning to be able to colourise cats instead. A link to the paper describing Pix2Pix can be found here: You can also find the repository in which we based the project in the folowing link: If you want to run the code yourself this is what you need installed: Linux Python with numpy NVIDIA GPU + CUDA 8.0 + CuDNNv5.1 pytorch torchvision Then you should clone the repo and extract the datasets. If you want to train the network you should run: python train.py dataset facades nEpochs 200 cuda And for training through transfer of the colourising of facades to cats: python train_transfer.py dataset cat_dataset nEpochs 50 cuda If you want to see the results the commands are python test.py dataset facades model checkpoint/facades/netG_model_epoch_200.pth cuda python test.py dataset cat_dataset model checkpoint/facades/netG_transfer_epoch_50.pth cuda A result is shown in the Jupyter notebook",Image-to-Image Translation,Image-to-Image Translation 2588,Computer Vision,Computer Vision,Computer Vision,"SmartSketch Supercharge your creativity with state of the art image synthesis ! promo.png (promo.png) Background A few months ago, some of us saw the NVIDIA demo of their GuaGAN model for semantic image synthesis. It blew us away. Unfortunately, unlike their StyleGAN model, they did not quickly release the source code, and everyone was left wanting more of this breakthrough tech. Fortunately, whilst one group member was playing on their phone on the bus to the hackathon, they noticed a repo called SPADE that had just been made public by NVIDIA research on GitHub. It turns out this contained all their code, along with some pretrained models using GuaGAN! The repo was made public Friday, April 12th. For those inclinded, the paper (which has gotten accepted as an oral presentation to CVPR) can be found here . Tech description The user draws an image using given colors. Each of these colors represents a segment or type of object being present at that pixel. When they are satisfied with their sketch, they can click a button and the image will be uploaded into the backend of our program hosted on the Google Cloud. The program converts the image to a more readable form and passes it into a folder where we use NVIDIA's pre trained models to create a synthesized image using learned traits about each texture and object and the pixel map the user submitted. This image is then displayed on the website for the user to see. ... One thing our program does not do is generate an instance map. The instance map segments the image into different instances of objects and makes sure that if two of the same object are overlapping each other, they have different pixel values. Detecting and working past overlapping images is a hard problem for computer vision, and this helps generate higher fidelity images. However, we thought that the end user would not enjoy having to sketch another image, and would probably not be submitting very complex drawing with many overlapping figures, so we omitted using this in the model. Challenges The repo was made public the day we started working on it, so there was not much documentation The code had few comments The actual running of the model was tightly coupled with their testing function, and would have taken too much time to decouple. This means we had to run a modified version of their testing script to actually use it, which was not desirable. Passing files around in the backend is difficult, as the NVIDIA model runs assuming certain directory configurations We were going to use react for the frontend but ran into difficulties with the canvas, so had to scrap that at the last minute It turns out that there are bugs in the source code for the NVIDIA models, so we could only use the model trained on the COCO dataset we spent some time trying to debug the other models but saw on GitHub that other people were also having the same issues Future Ideas Real time sketches like in the video demo More colors and textures Hooking into the models deeper, so we do not have to use their testing code to run them Evaluating different pre trained models (ade20k, cityscapes, etc) Making the server stateless (save images to a shared file store), run the models on another VM that is dedicated to that (with more GPUs etc) Enable better concurrent connections (this comes with running the models natively in our server) Credits",Image-to-Image Translation,Image-to-Image Translation 2604,Computer Vision,Computer Vision,Computer Vision,"CycleGAN PyTorch A simple PyTorch implementation of CycleGAN This implementation of CycleGAN was created as part of a course in deep learning Training Datasets can be downloaded via UC Berkeley's repository . Datasets should be set up in the following directory structure: datasets dataset name train x y test x y Acknowledgment All credit goes to the authers of CycleGAN , Jun Yan Zhu, Taesung Park, Phillip Isola, Alexei A. Efros.",Image-to-Image Translation,Image-to-Image Translation 2644,Computer Vision,Computer Vision,Computer Vision,"pix2pix tensorflow Based on pix2pix by Isola et al. Article about this implemention Interactive Demo Tensorflow implementation of pix2pix. Learns a mapping from input images to output images, like these examples from the original paper: This port is based directly on the torch implementation, and not on an existing Tensorflow implementation. It is meant to be a faithful implementation of the original work and so does not add anything. The processing speed on a GPU with cuDNN was equivalent to the Torch implementation in testing. Setup Prerequisites Tensorflow 1.4.1 Recommended Linux with Tensorflow GPU edition + cuDNN Getting Started sh clone this repo git clone cd pix2pix tensorflow download the CMP Facades dataset (generated from python tools/download dataset.py facades train the model (this may take 1 8 hours depending on GPU, on CPU you will be waiting for a bit) python pix2pix.py \ mode train \ output_dir facades_train \ max_epochs 200 \ input_dir facades/train \ which_direction BtoA test the model python pix2pix.py \ mode test \ output_dir facades_test \ input_dir facades/val \ checkpoint facades_train The test run will output an HTML file at facades_test/index.html that shows input/output/target image sets. If you have Docker installed, you can use the provided Docker image to run pix2pix without installing the correct version of Tensorflow: sh train the model python tools/dockrun.py python pix2pix.py \ mode train \ output_dir facades_train \ max_epochs 200 \ input_dir facades/train \ which_direction BtoA test the model python tools/dockrun.py python pix2pix.py \ mode test \ output_dir facades_test \ input_dir facades/val \ checkpoint facades_train Datasets and Trained Models The data format used by this program is the same as the original pix2pix format, which consists of images of input and desired output side by side like: For example: Some datasets have been made available by the authors of the pix2pix paper. To download those datasets, use the included script tools/download dataset.py . There are also links to pre trained models alongside each dataset, note that these pre trained models require the current version of pix2pix.py: dataset example python tools/download dataset.py facades 400 images from CMP Facades dataset . (31MB) Pre trained: BtoA python tools/download dataset.py cityscapes 2975 images from the Cityscapes training set . (113M) Pre trained: AtoB BtoA python tools/download dataset.py maps 1096 training images scraped from Google Maps (246M) Pre trained: AtoB BtoA python tools/download dataset.py edges2shoes 50k training images from UT Zappos50K dataset . Edges are computed by HED edge detector + post processing. (2.2GB) Pre trained: AtoB python tools/download dataset.py edges2handbags 137K Amazon Handbag images from iGAN project . Edges are computed by HED edge detector + post processing. (8.6GB) Pre trained: AtoB The facades dataset is the smallest and easiest to get started with. Creating your own dataset Example: creating images with blank centers for inpainting sh Resize source images python tools/process.py \ input_dir photos/original \ operation resize \ output_dir photos/resized Create images with blank centers python tools/process.py \ input_dir photos/resized \ operation blank \ output_dir photos/blank Combine resized images with blanked images python tools/process.py \ input_dir photos/resized \ b_dir photos/blank \ operation combine \ output_dir photos/combined Split into train/val set python tools/split.py \ dir photos/combined The folder photos/combined will now have train and val subfolders that you can use for training and testing. Creating image pairs from existing images If you have two directories a and b , with corresponding images (same name, same dimensions, different data) you can combine them with process.py : sh python tools/process.py \ input_dir a \ b_dir b \ operation combine \ output_dir c This puts the images in a side by side combined image that pix2pix.py expects. Colorization For colorization, your images should ideally all be the same aspect ratio. You can resize and crop them with the resize command: sh python tools/process.py \ input_dir photos/original \ operation resize \ output_dir photos/resized No other processing is required, the colorization mode (see Training section below) uses single images instead of image pairs. Training Image Pairs For normal training with image pairs, you need to specify which directory contains the training images, and which direction to train on. The direction options are AtoB or BtoA sh python pix2pix.py \ mode train \ output_dir facades_train \ max_epochs 200 \ input_dir facades/train \ which_direction BtoA Colorization pix2pix.py includes special code to handle colorization with single images instead of pairs, using that looks like this: sh python pix2pix.py \ mode train \ output_dir photos_train \ max_epochs 200 \ input_dir photos/train \ lab_colorization In this mode, image A is the black and white image (lightness only), and image B contains the color channels of that image (no lightness information). Tips You can look at the loss and computation graph using tensorboard: sh tensorboard logdir facades_train If you wish to write in progress pictures as the network is training, use display_freq 50 . This will update facades_train/index.html every 50 steps with the current training inputs and outputs. Testing Testing is done with mode test . You should specify the checkpoint to use with checkpoint , this should point to the output_dir that you created previously with mode train : sh python pix2pix.py \ mode test \ output_dir facades_test \ input_dir facades/val \ checkpoint facades_train The testing mode will load some of the configuration options from the checkpoint provided so you do not need to specify which_direction for instance. The test run will output an HTML file at facades_test/index.html that shows input/output/target image sets: Code Validation Validation of the code was performed on a Linux machine with a 1.3 TFLOPS Nvidia GTX 750 Ti GPU and an Azure NC6 instance with a K80 GPU. sh git clone cd pix2pix tensorflow python tools/download dataset.py facades sudo nvidia docker run \ volume $PWD:/prj \ workdir /prj \ env PYTHONUNBUFFERED x \ affinelayer/pix2pix tensorflow \ python pix2pix.py \ mode train \ output_dir facades_train \ max_epochs 200 \ input_dir facades/train \ which_direction BtoA sudo nvidia docker run \ volume $PWD:/prj \ workdir /prj \ env PYTHONUNBUFFERED x \ affinelayer/pix2pix tensorflow \ python pix2pix.py \ mode test \ output_dir facades_test \ input_dir facades/val \ checkpoint facades_train Comparison on facades dataset: Input Tensorflow Torch Target Unimplemented Features The following models have not been implemented: defineG_encoder_decoder defineG_unet_128 defineD_pixelGAN Citation If you use this code for your research, please cite the paper this code is based on: Image to Image Translation Using Conditional Adversarial Networks : @article{pix2pix2016, title {Image to Image Translation with Conditional Adversarial Networks}, author {Isola, Phillip and Zhu, Jun Yan and Zhou, Tinghui and Efros, Alexei A}, journal {arxiv}, year {2016} } Acknowledgments This is a port of pix2pix from Torch to Tensorflow. It also contains colorspace conversion code ported from Torch. Thanks to the Tensorflow team for making such a quality library! And special thanks to Phillip Isola for answering my questions about the pix2pix code.",Image-to-Image Translation,Image-to-Image Translation 2653,Computer Vision,Computer Vision,Computer Vision,"CycleGAN TensorFlow An implementation of CycleGan using TensorFlow (work in progress). Original paper: Results on test data apple > orange Input Output Input Output Input Output ! apple2orange_1 (samples/real_apple2orange_1.jpg) ! apple2orange_1 (samples/fake_apple2orange_1.jpg) ! apple2orange_2 (samples/real_apple2orange_2.jpg) ! apple2orange_2 (samples/fake_apple2orange_2.jpg) ! apple2orange_3 (samples/real_apple2orange_3.jpg) ! apple2orange_3 (samples/fake_apple2orange_3.jpg) orange > apple Input Output Input Output Input Output ! orange2apple_1 (samples/real_orange2apple_1.jpg) ! orange2apple_1 (samples/fake_orange2apple_1.jpg) ! orange2apple_2 (samples/real_orange2apple_2.jpg) ! orange2apple_2 (samples/fake_orange2apple_2.jpg) ! orange2apple_3 (samples/real_orange2apple_3.jpg) ! orange2apple_3 (samples/fake_orange2apple_3.jpg) Environment TensorFlow 1.0.0 Python 3.6.0 Data preparing First, download a dataset, e.g. apple2orange bash $ bash download_dataset.sh apple2orange Write the dataset to tfrecords bash $ python3 build_data.py Check $ python3 build_data.py help for more details. Training bash $ python3 train.py If you want to change some default settings, you can pass those to the command line, such as: bash $ python3 train.py \ X data/tfrecords/horse.tfrecords \ Y data/tfrecords/zebra.tfrecords Here is the list of arguments: usage: train.py h batch_size BATCH_SIZE image_size IMAGE_SIZE use_lsgan USE_LSGAN nouse_lsgan norm NORM lambda1 LAMBDA1 lambda2 LAMBDA2 learning_rate LEARNING_RATE beta1 BETA1 pool_size POOL_SIZE ngf NGF X X Y Y load_model LOAD_MODEL optional arguments: h, help show this help message and exit batch_size BATCH_SIZE batch size, default: 1 image_size IMAGE_SIZE image size, default: 256 use_lsgan USE_LSGAN use lsgan (mean squared error) or cross entropy loss, default: True nouse_lsgan norm NORM instance, batch use instance norm or batch norm, default: instance lambda1 LAMBDA1 weight for forward cycle loss (X >Y >X), default: 10.0 lambda2 LAMBDA2 weight for backward cycle loss (Y >X >Y), default: 10.0 learning_rate LEARNING_RATE initial learning rate for Adam, default: 0.0002 beta1 BETA1 momentum term of Adam, default: 0.5 pool_size POOL_SIZE size of image buffer that stores previously generated images, default: 50 ngf NGF number of gen filters in first conv layer, default: 64 X X X tfrecords file for training, default: data/tfrecords/apple.tfrecords Y Y Y tfrecords file for training, default: data/tfrecords/orange.tfrecords load_model LOAD_MODEL folder of saved model that you wish to continue training (e.g. 20170602 1936), default: None Check TensorBoard to see training progress and generated images. $ tensorboard logdir checkpoints/${datetime} If you halted the training process and want to continue training, then you can set the load_model parameter like this. bash $ python3 train.py \ load_model 20170602 1936 Here are some funny screenshots from TensorBoard when training orange > apple: ! train_screenshot (samples/train_screenshot.png) Notes If high constrast background colors between input and generated images are observed (e.g. black becomes white), you should restart your training! Train several times to get the best models. Export model You can export from a checkpoint to a standalone GraphDef file as follow: bash $ python3 export_graph.py checkpoint_dir checkpoints/${datetime} \ XtoY_model apple2orange.pb \ YtoX_model orange2apple.pb \ image_size 256 Inference After exporting model, you can use it for inference. For example: bash python3 inference.py model pretrained/apple2orange.pb \ input input_sample.jpg \ output output_sample.jpg \ image_size 256 Pretrained models My pretrained models are available at Contributing Please open an issue if you have any trouble or found anything incorrect in my code :) License This project is licensed under the MIT License see the LICENSE (LICENSE) file for details. References CycleGAN paper: Official source code in Torch:",Image-to-Image Translation,Image-to-Image Translation 2708,Computer Vision,Computer Vision,Computer Vision,"pix2pixHD Project Youtube Paper Pytorch implementation of our method for high resolution (e.g. 2048x1024) photorealistic image to image translation. It can be used for turning semantic label maps into photo realistic images or synthesizing portraits from face label maps. High Resolution Image Synthesis and Semantic Manipulation with Conditional GANs Ting Chun Wang 1 , Ming Yu Liu 1 , Jun Yan Zhu 2 , Andrew Tao 1 , Jan Kautz 1 , Bryan Catanzaro 1 1 NVIDIA Corporation, 2 UC Berkeley In arxiv, 2017. Image to image translation at 2k/1k resolution Our label to streetview results Interactive editing results Additional streetview results Label to face and interactive editing results Our editing interface Prerequisites Linux or macOS Python 2 or 3 NVIDIA GPU (12G or 24G memory) + CUDA cuDNN Getting Started Installation Install PyTorch and dependencies from Install python libraries dominate . bash pip install dominate Clone this repo: bash git clone cd pix2pixHD Testing A few example Cityscapes test images are included in the datasets folder. Please download the pre trained Cityscapes model from here (google drive link), and put it under ./checkpoints/label2city_1024p/ Test the model ( bash ./scripts/test_1024p.sh ): bash !./scripts/test_1024p.sh python test.py name label2city_1024p netG local ngf 32 resize_or_crop none The test results will be saved to a html file here: ./results/label2city_1024p/test_latest/index.html . More example scripts can be found in the scripts directory. Dataset We use the Cityscapes dataset. To train a model on the full dataset, please download it from the official website (registration required). After downloading, please put it under the datasets folder in the same way the example images are provided. Training Train a model at 1024 x 512 resolution ( bash ./scripts/train_512p.sh ): bash !./scripts/train_512p.sh python train.py name label2city_512p To view training results, please checkout intermediate results in ./checkpoints/label2city_512p/web/index.html . If you have tensorflow installed, you can see tensorboard logs in ./checkpoints/label2city_512p/logs by adding tf_log to the training scripts. Multi GPU training Train a model using multiple GPUs ( bash ./scripts/train_512p_multigpu.sh ): bash !./scripts/train_512p_multigpu.sh python train.py name label2city_512p batchSize 8 gpu_ids 0,1,2,3,4,5,6,7 Note: this is not tested and we trained our model using single GPU only. Please use at your own discretion. Training at full resolution To train the images at full resolution (2048 x 1024) requires a GPU with 24G memory ( bash ./scripts/train_1024p_24G.sh ). If only GPUs with 12G memory are available, please use the 12G script ( bash ./scripts/train_1024p_12G.sh ), which will crop the images during training. Performance is not guaranteed using this script. Training with your own dataset If you want to train with your own dataset, please generate label maps which are one channel whose pixel values correspond to the object labels (i.e. 0,1,...,N 1, where N is the number of labels). This is because we need to generate one hot vectors from the label maps. Please also specity label_nc N during both training and testing. If your input is not a label map, please just specify label_nc 0 which will directly use the RGB colors as input. The folders should then be named train_A , train_B instead of train_label , train_img , where the goal is to translate images from A to B. If you don't have instance maps or don't want to use them, please specify no_instance . The default setting for preprocessing is scale_width , which will scale the width of all training images to opt.loadSize (1024) while keeping the aspect ratio. If you want a different setting, please change it by using the resize_or_crop option. For example, scale_width_and_crop first resizes the image to have width opt.loadSize and then does random cropping of size (opt.fineSize, opt.fineSize) . crop skips the resizing step and only performs random cropping. If you don't want any preprocessing, please specify none , which will do nothing other than making sure the image is divisible by 32. More Training/Test Details Flags: see options/train_options.py and options/base_options.py for all the training flags; see options/test_options.py and options/base_options.py for all the test flags. Instance map: we take in both label maps and instance maps as input. If you don't want to use instance maps, please specify the flag no_instance . Citation If you find this useful for your research, please use the following. @article{wang2017highres, title {High Resolution Image Synthesis and Semantic Manipulation with Conditional GANs}, author {Ting Chun Wang and Ming Yu Liu and Jun Yan Zhu and Andrew Tao and Jan Kautz and Bryan Catanzaro}, journal {arXiv preprint arXiv:1711.11585}, year {2017} } Acknowledgments This code borrows heavily from pytorch CycleGAN and pix2pix .",Image-to-Image Translation,Image-to-Image Translation 2711,Computer Vision,Computer Vision,Computer Vision,"CycleGAN TensorFlow An implementation of CycleGan using TensorFlow (work in progress). Original paper: Results on test data apple > orange Input Output Input Output Input Output ! apple2orange_1 (samples/real_apple2orange_1.jpg) ! apple2orange_1 (samples/fake_apple2orange_1.jpg) ! apple2orange_2 (samples/real_apple2orange_2.jpg) ! apple2orange_2 (samples/fake_apple2orange_2.jpg) ! apple2orange_3 (samples/real_apple2orange_3.jpg) ! apple2orange_3 (samples/fake_apple2orange_3.jpg) orange > apple Input Output Input Output Input Output ! orange2apple_1 (samples/real_orange2apple_1.jpg) ! orange2apple_1 (samples/fake_orange2apple_1.jpg) ! orange2apple_2 (samples/real_orange2apple_2.jpg) ! orange2apple_2 (samples/fake_orange2apple_2.jpg) ! orange2apple_3 (samples/real_orange2apple_3.jpg) ! orange2apple_3 (samples/fake_orange2apple_3.jpg) Environment TensorFlow 1.0.0 Python 3.6.0 Data preparing First, download a dataset, e.g. apple2orange bash $ bash download_dataset.sh apple2orange Write the dataset to tfrecords bash $ python3 build_data.py Check $ python3 build_data.py help for more details. Training bash $ python3 train.py If you want to change some default settings, you can pass those to the command line, such as: bash $ python3 train.py \ X data/tfrecords/horse.tfrecords \ Y data/tfrecords/zebra.tfrecords Here is the list of arguments: usage: train.py h batch_size BATCH_SIZE image_size IMAGE_SIZE use_lsgan USE_LSGAN nouse_lsgan norm NORM lambda1 LAMBDA1 lambda2 LAMBDA2 learning_rate LEARNING_RATE beta1 BETA1 pool_size POOL_SIZE ngf NGF X X Y Y load_model LOAD_MODEL optional arguments: h, help show this help message and exit batch_size BATCH_SIZE batch size, default: 1 image_size IMAGE_SIZE image size, default: 256 use_lsgan USE_LSGAN use lsgan (mean squared error) or cross entropy loss, default: True nouse_lsgan norm NORM instance, batch use instance norm or batch norm, default: instance lambda1 LAMBDA1 weight for forward cycle loss (X >Y >X), default: 10.0 lambda2 LAMBDA2 weight for backward cycle loss (Y >X >Y), default: 10.0 learning_rate LEARNING_RATE initial learning rate for Adam, default: 0.0002 beta1 BETA1 momentum term of Adam, default: 0.5 pool_size POOL_SIZE size of image buffer that stores previously generated images, default: 50 ngf NGF number of gen filters in first conv layer, default: 64 X X X tfrecords file for training, default: data/tfrecords/apple.tfrecords Y Y Y tfrecords file for training, default: data/tfrecords/orange.tfrecords load_model LOAD_MODEL folder of saved model that you wish to continue training (e.g. 20170602 1936), default: None Check TensorBoard to see training progress and generated images. $ tensorboard logdir checkpoints/${datetime} If you halted the training process and want to continue training, then you can set the load_model parameter like this. bash $ python3 train.py \ load_model 20170602 1936 Here are some funny screenshots from TensorBoard when training orange > apple: ! train_screenshot (samples/train_screenshot.png) Notes If high constrast background colors between input and generated images are observed (e.g. black becomes white), you should restart your training! Train several times to get the best models. Export model You can export from a checkpoint to a standalone GraphDef file as follow: bash $ python3 export_graph.py checkpoint_dir checkpoints/${datetime} \ XtoY_model apple2orange.pb \ YtoX_model orange2apple.pb \ image_size 256 Inference After exporting model, you can use it for inference. For example: bash python3 inference.py model pretrained/apple2orange.pb \ input input_sample.jpg \ output output_sample.jpg \ image_size 256 Pretrained models My pretrained models are available at Contributing Please open an issue if you have any trouble or found anything incorrect in my code :) License This project is licensed under the MIT License see the LICENSE (LICENSE) file for details. References CycleGAN paper: Official source code in Torch:",Image-to-Image Translation,Image-to-Image Translation 2749,Computer Vision,Computer Vision,Computer Vision,"> CycleGAN Tensorflow implementation for learning an image to image translation without input output pairs. The method is proposed by Jun Yan Zhu in Unpaired Image to Image Translation using Cycle Consistent Adversarial Networkssee . For example in paper: Collection Style Transfer Object Transfiguration Season Transfer Photo Enhancement: iPhone photo to DSLR photo > Update Results The results of this implementation: Horses > Zebras Zebras > Horses You can download the pretrained model from this url and extract the rar file to ./checkpoint/ . Prerequisites tensorflow r1.1 numpy 1.11.0 scipy 0.17.0 pillow 3.3.0 Getting Started Installation Install tensorflow from Clone this repo: bash git clone cd CycleGAN tensorflow Train Download a dataset (e.g. zebra and horse images from ImageNet): bash bash ./download_dataset.sh horse2zebra Train a model: bash CUDA_VISIBLE_DEVICES 0 python main.py dataset_dir horse2zebra Use tensorboard to visualize the training details: bash tensorboard logdir ./logs Test Finally, test the model: bash CUDA_VISIBLE_DEVICES 0 python main.py dataset_dir horse2zebra phase test which_direction AtoB Training and Test Details To train a model, bash CUDA_VISIBLE_DEVICES 0 python main.py dataset_dir /path/to/data/ Models are saved to ./checkpoints/ (can be changed by passing checkpoint_dir your_dir ). To test the model, bash CUDA_VISIBLE_DEVICES 0 python main.py dataset_dir /path/to/data/ phase test which_direction AtoB/BtoA Datasets Download the datasets using the following script: bash bash ./download_dataset.sh dataset_name facades : 400 images from the CMP Facades dataset . cityscapes : 2975 images from the Cityscapes training set . maps : 1096 training images scraped from Google Maps. horse2zebra : 939 horse images and 1177 zebra images downloaded from ImageNet using keywords wild horse and zebra . apple2orange : 996 apple images and 1020 orange images downloaded from ImageNet using keywords apple and navel orange . summer2winter_yosemite : 1273 summer Yosemite images and 854 winter Yosemite images were downloaded using Flickr API. See more details in our paper. monet2photo , vangogh2photo , ukiyoe2photo , cezanne2photo : The art images were downloaded from Wikiart . The real photos are downloaded from Flickr using combination of tags landscape and landscapephotography . The training set size of each class is Monet:1074, Cezanne:584, Van Gogh:401, Ukiyo e:1433, Photographs:6853. iphone2dslr_flower : both classe of images were downlaoded from Flickr. The training set size of each class is iPhone:1813, DSLR:3316. See more details in our paper. Reference The torch implementation of CycleGAN, The tensorflow implementation of pix2pix,",Image-to-Image Translation,Image-to-Image Translation 2801,Computer Vision,Computer Vision,Computer Vision,"conditional GAN (pix2pix tensorflow) Paper Torch ver. ! examples (./examples.jpg) Setup Prerequisites Linux GPU tensorflow 1.4 tensorboard (optional) Getting Started Clone this repo: git clone cd conGAN Download the dataset: python ./tools/download dataset.py facades mv facades dataset Train mode python main.py mode train Test mode python main.py mode test Visualization tensorboard logdir summary/ Citation If you use this code for your research, cite paper Image to Image Translation with Conditional Adversarial Networks @article{pix2pix2016, title {Image to Image Translation with Conditional Adversarial Networks}, author {Isola, Phillip and Zhu, Jun Yan and Zhou, Tinghui and Efros, Alexei A}, journal {arxiv}, year {2016} }",Image-to-Image Translation,Image-to-Image Translation 2807,Computer Vision,Computer Vision,Computer Vision,"pix2pix tensorflow TensorFlow implementation of Image to Image Translation Using Conditional Adversarial Networks that learns a mapping from input images to output images. Here are some results generated by the authors of paper: Setup Prerequisites Linux Python with numpy NVIDIA GPU + CUDA 8.0 + CuDNNv5.1 TensorFlow 0.11 Getting Started Clone this repo: bash git clone git@github.com:yenchenlin/pix2pix tensorflow.git cd pix2pix tensorflow Download the dataset (script borrowed from torch code ): bash bash ./download_dataset.sh facades Train the model bash python main.py phase train Test the model: bash python main.py phase test Results Here is the results generated from this implementation: Facades: More results on other datasets coming soon! Note : To avoid the fast convergence of D (discriminator) network, G (generator) network is updated twice for each D network update, which differs from original paper but same as DCGAN tensorflow , which this project based on. Train Code currently supports CMP Facades dataset. To reproduce results presented above, it takes 200 epochs of training. Exact computing time depends on own hardware conditions. Test Test the model on validation set of CMP Facades dataset. It will generate synthesized images provided corresponding labels under directory ./test . Acknowledgments Code borrows heavily from pix2pix and DCGAN tensorflow . Thanks for their excellent work! License MIT",Image-to-Image Translation,Image-to-Image Translation 2845,Computer Vision,Computer Vision,Computer Vision,"This repository provides a PyTorch implementation of StarGAN . StarGAN can flexibly translate an input image to any desired target domain using only a single generator and a discriminator. The demo video for StarGAN can be found here . Paper StarGAN: Unified Generative Adversarial Networks for Multi Domain Image to Image Translation Yunjey Choi 1,2 , Minje Choi 1,2 , Munyoung Kim 2,3 , Jung Woo Ha 2 , Sung Kim 2,4 , and Jaegul Choo 1,2 1 Korea University, 2 Clova AI Research (NAVER Corp.), 3 The College of New Jersey, 4 HKUST IEEE Conference on Computer Vision and Pattern Recognition ( CVPR ), 2018 ( Oral ) Dependencies Python 3.5+ PyTorch 0.4.0+ TensorFlow 1.3+ (optional for tensorboard) Usage 1. Cloning the repository bash $ git clone $ cd StarGAN/ 2. Downloading the dataset To download the CelebA dataset: bash $ bash download.sh celeba To download the RaFD dataset, you must request access to the dataset from the Radboud Faces Database website . Then, you need to create a folder structure as described here . 3. Training To train StarGAN on CelebA, run the training script below. See here for a list of selectable attributes in the CelebA dataset. If you change the selected_attrs argument, you should also change the c_dim argument accordingly. bash $ python main.py mode train dataset CelebA image_size 128 c_dim 5 \ sample_dir stargan_celeba/samples log_dir stargan_celeba/logs \ model_save_dir stargan_celeba/models result_dir stargan_celeba/results \ selected_attrs Black_Hair Blond_Hair Brown_Hair Male Young To train StarGAN on RaFD: bash $ python main.py mode train dataset RaFD image_size 128 c_dim 8 \ sample_dir stargan_rafd/samples log_dir stargan_rafd/logs \ model_save_dir stargan_rafd/models result_dir stargan_rafd/results To train StarGAN on both CelebA and RafD: bash $ python main.py mode train dataset Both image_size 256 c_dim 5 c2_dim 8 \ sample_dir stargan_both/samples log_dir stargan_both/logs \ model_save_dir stargan_both/models result_dir stargan_both/results To train StarGAN on your own dataset, create a folder structure in the same format as RaFD and run the command: bash $ python main.py mode train dataset RaFD rafd_crop_size CROP_SIZE image_size IMG_SIZE \ c_dim LABEL_DIM rafd_image_dir TRAIN_IMG_DIR \ sample_dir stargan_custom/samples log_dir stargan_custom/logs \ model_save_dir stargan_custom/models result_dir stargan_custom/results 4. Testing To test StarGAN on CelebA: bash $ python main.py mode test dataset CelebA image_size 128 c_dim 5 \ sample_dir stargan_celeba/samples log_dir stargan_celeba/logs \ model_save_dir stargan_celeba/models result_dir stargan_celeba/results \ selected_attrs Black_Hair Blond_Hair Brown_Hair Male Young To test StarGAN on RaFD: bash $ python main.py mode test dataset RaFD image_size 128 \ c_dim 8 rafd_image_dir data/RaFD/test \ sample_dir stargan_rafd/samples log_dir stargan_rafd/logs \ model_save_dir stargan_rafd/models result_dir stargan_rafd/results To test StarGAN on both CelebA and RaFD: bash $ python main.py mode test dataset Both image_size 256 c_dim 5 c2_dim 8 \ sample_dir stargan_both/samples log_dir stargan_both/logs \ model_save_dir stargan_both/models result_dir stargan_both/results To test StarGAN on your own dataset: bash $ python main.py mode test dataset RaFD rafd_crop_size CROP_SIZE image_size IMG_SIZE \ c_dim LABEL_DIM rafd_image_dir TEST_IMG_DIR \ sample_dir stargan_custom/samples log_dir stargan_custom/logs \ model_save_dir stargan_custom/models result_dir stargan_custom/results 5. Pretrained model To download a pretrained model checkpoint, run the script below. The pretrained model checkpoint will be downloaded and saved into ./stargan_celeba_256/models directory. bash $ bash download.sh pretrained celeba 256x256 To translate images using the pretrained model, run the evaluation script below. The translated images will be saved into ./stargan_celeba_256/results directory. bash $ python main.py mode test dataset CelebA image_size 256 c_dim 5 \ selected_attrs Black_Hair Blond_Hair Brown_Hair Male Young \ model_save_dir 'stargan_celeba_256/models' \ result_dir 'stargan_celeba_256/results' Results 1. Facial Attribute Transfer on CelebA 2. Facial Expression Synthesis on RaFD 3. Facial Expression Synthesis on CelebA Citation If this work is useful for your research, please cite our paper : @InProceedings{StarGAN2018, author {Choi, Yunjey and Choi, Minje and Kim, Munyoung and Ha, Jung Woo and Kim, Sunghun and Choo, Jaegul}, title {StarGAN: Unified Generative Adversarial Networks for Multi Domain Image to Image Translation}, booktitle {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, month {June}, year {2018} } Acknowledgement This work was mainly done while the first author did a research internship at Clova AI Research, NAVER . We thank all the researchers at NAVER, especially Donghyun Kwak, for insightful discussions.",Image-to-Image Translation,Image-to-Image Translation 2898,Computer Vision,Computer Vision,Computer Vision,"pytorch pix2pix Pytorch implementation of pix2pix 1 for various datasets. you can download datasets: you can see more information for network architecture and training details in dataset cityscapes 2,975 training images, 200 train epochs, 1 batch size, inverse order: True facades 400 training images, 200 train epochs, 1 batch size, inverse order: True maps 1,096 training images, 200 train epochs, 1 batch size, inverse order: True edges2shoes 50k training images, 15 train epochs, 4 batch size, inverse order: False edges2handbags 137k training images, 15 train epochs, 4 batch size, inverse order: False Resutls cityscapes cityscapes after 200 epochs First column: input, second column: output, third column: ground truth ! city_result (cityscapes_results/cityscapes_200.png) Generate animation for fixed inputs ! cityscapes_gif (cityscapes_results/cityscapes_generate_animation.gif) Learning Time cityscapes pix2pix Avg. per epoch: 332.08 sec; Total 200 epochs: 66,846.58 sec facades facades after 200 epochs First column: input, second column: output, third column: ground truth ! facades_result (facades_results/facades_200.png) Generate animation for fixed inputs ! facades_gif (facades_results/facades_generate_animation.gif) Learning Time facades pix2pix Avg. per epoch: 44.94 sec; Total 200 epochs: 9,282.64 sec edges2handbags edges2handbags after 15 epochs First column: input, second column: output, third column: ground truth ! edges2handbags_result (edges2handbags_results/edges2handbags_15.png) Generate animation for fixed inputs ! edges2handbags_gif (edges2handbags_results/edges2handbags_generate_animation.gif) Learning Time edges2handbags pix2pix Avg. per epoch: 10,228.08 sec; Total 15 epochs: 153,443.62 sec Development Environment Ubuntu 14.04 LTS NVIDIA GTX 1080 ti cuda 8.0 Python 2.7.6 pytorch 0.1.12 matplotlib 1.3.1 imageio 2.2.0 scipy 0.19.1 Reference 1 Isola, Phillip, et al. Image to image translation with conditional adversarial networks. arXiv preprint arXiv:1611.07004 (2016). (Full paper:",Image-to-Image Translation,Image-to-Image Translation 2907,Computer Vision,Computer Vision,Computer Vision,"pix2pix tensorflow Based on pix2pix by Isola et al. Article about this implemention Interactive Demo Tensorflow implementation of pix2pix. Learns a mapping from input images to output images, like these examples from the original paper: This port is based directly on the torch implementation, and not on an existing Tensorflow implementation. It is meant to be a faithful implementation of the original work and so does not add anything. The processing speed on a GPU with cuDNN was equivalent to the Torch implementation in testing. Setup Prerequisites Tensorflow 1.4.1 Recommended Linux with Tensorflow GPU edition + cuDNN Getting Started sh clone this repo git clone cd pix2pix tensorflow download the CMP Facades dataset (generated from python tools/download dataset.py facades train the model (this may take 1 8 hours depending on GPU, on CPU you will be waiting for a bit) python pix2pix.py \ mode train \ output_dir facades_train \ max_epochs 200 \ input_dir facades/train \ which_direction BtoA test the model python pix2pix.py \ mode test \ output_dir facades_test \ input_dir facades/val \ checkpoint facades_train The test run will output an HTML file at facades_test/index.html that shows input/output/target image sets. If you have Docker installed, you can use the provided Docker image to run pix2pix without installing the correct version of Tensorflow: sh train the model python tools/dockrun.py python pix2pix.py \ mode train \ output_dir facades_train \ max_epochs 200 \ input_dir facades/train \ which_direction BtoA test the model python tools/dockrun.py python pix2pix.py \ mode test \ output_dir facades_test \ input_dir facades/val \ checkpoint facades_train Datasets and Trained Models The data format used by this program is the same as the original pix2pix format, which consists of images of input and desired output side by side like: For example: Some datasets have been made available by the authors of the pix2pix paper. To download those datasets, use the included script tools/download dataset.py . There are also links to pre trained models alongside each dataset, note that these pre trained models require the current version of pix2pix.py: dataset example python tools/download dataset.py facades 400 images from CMP Facades dataset . (31MB) Pre trained: BtoA python tools/download dataset.py cityscapes 2975 images from the Cityscapes training set . (113M) Pre trained: AtoB BtoA python tools/download dataset.py maps 1096 training images scraped from Google Maps (246M) Pre trained: AtoB BtoA python tools/download dataset.py edges2shoes 50k training images from UT Zappos50K dataset . Edges are computed by HED edge detector + post processing. (2.2GB) Pre trained: AtoB python tools/download dataset.py edges2handbags 137K Amazon Handbag images from iGAN project . Edges are computed by HED edge detector + post processing. (8.6GB) Pre trained: AtoB The facades dataset is the smallest and easiest to get started with. Creating your own dataset Example: creating images with blank centers for inpainting sh Resize source images python tools/process.py \ input_dir photos/original \ operation resize \ output_dir photos/resized Create images with blank centers python tools/process.py \ input_dir photos/resized \ operation blank \ output_dir photos/blank Combine resized images with blanked images python tools/process.py \ input_dir photos/resized \ b_dir photos/blank \ operation combine \ output_dir photos/combined Split into train/val set python tools/split.py \ dir photos/combined The folder photos/combined will now have train and val subfolders that you can use for training and testing. Creating image pairs from existing images If you have two directories a and b , with corresponding images (same name, same dimensions, different data) you can combine them with process.py : sh python tools/process.py \ input_dir a \ b_dir b \ operation combine \ output_dir c This puts the images in a side by side combined image that pix2pix.py expects. Colorization For colorization, your images should ideally all be the same aspect ratio. You can resize and crop them with the resize command: sh python tools/process.py \ input_dir photos/original \ operation resize \ output_dir photos/resized No other processing is required, the colorization mode (see Training section below) uses single images instead of image pairs. Training Image Pairs For normal training with image pairs, you need to specify which directory contains the training images, and which direction to train on. The direction options are AtoB or BtoA sh python pix2pix.py \ mode train \ output_dir facades_train \ max_epochs 200 \ input_dir facades/train \ which_direction BtoA Colorization pix2pix.py includes special code to handle colorization with single images instead of pairs, using that looks like this: sh python pix2pix.py \ mode train \ output_dir photos_train \ max_epochs 200 \ input_dir photos/train \ lab_colorization In this mode, image A is the black and white image (lightness only), and image B contains the color channels of that image (no lightness information). Tips You can look at the loss and computation graph using tensorboard: sh tensorboard logdir facades_train If you wish to write in progress pictures as the network is training, use display_freq 50 . This will update facades_train/index.html every 50 steps with the current training inputs and outputs. Testing Testing is done with mode test . You should specify the checkpoint to use with checkpoint , this should point to the output_dir that you created previously with mode train : sh python pix2pix.py \ mode test \ output_dir facades_test \ input_dir facades/val \ checkpoint facades_train The testing mode will load some of the configuration options from the checkpoint provided so you do not need to specify which_direction for instance. The test run will output an HTML file at facades_test/index.html that shows input/output/target image sets: Code Validation Validation of the code was performed on a Linux machine with a 1.3 TFLOPS Nvidia GTX 750 Ti GPU and an Azure NC6 instance with a K80 GPU. sh git clone cd pix2pix tensorflow python tools/download dataset.py facades sudo nvidia docker run \ volume $PWD:/prj \ workdir /prj \ env PYTHONUNBUFFERED x \ affinelayer/pix2pix tensorflow \ python pix2pix.py \ mode train \ output_dir facades_train \ max_epochs 200 \ input_dir facades/train \ which_direction BtoA sudo nvidia docker run \ volume $PWD:/prj \ workdir /prj \ env PYTHONUNBUFFERED x \ affinelayer/pix2pix tensorflow \ python pix2pix.py \ mode test \ output_dir facades_test \ input_dir facades/val \ checkpoint facades_train Comparison on facades dataset: Input Tensorflow Torch Target Unimplemented Features The following models have not been implemented: defineG_encoder_decoder defineG_unet_128 defineD_pixelGAN Citation If you use this code for your research, please cite the paper this code is based on: Image to Image Translation Using Conditional Adversarial Networks : @article{pix2pix2016, title {Image to Image Translation with Conditional Adversarial Networks}, author {Isola, Phillip and Zhu, Jun Yan and Zhou, Tinghui and Efros, Alexei A}, journal {arxiv}, year {2016} } Acknowledgments This is a port of pix2pix from Torch to Tensorflow. It also contains colorspace conversion code ported from Torch. Thanks to the Tensorflow team for making such a quality library! And special thanks to Phillip Isola for answering my questions about the pix2pix code.",Image-to-Image Translation,Image-to-Image Translation 2004,Medical,Medical,Other,"This is my attempt to implement a semantic segmentation network on darknet framework. I implement a modified version of U Net to detect road surface. Input to model: 224 x 224 x 3 images Output: 224 x 224 binary label I added a bunch of code to the darknet source files for data pre processing. I recommend using cscope to browse through the code, makes life easier! Sample data to train and test can be found in data/unet folder. Feel free to adapt the code to your needs. Notes: The input images have to be reshaped to 224x224x3 (RGB) and the output label is a binary image (0's & 1's) as I only predict one class. If you want to use a different size image or multi class prediction you need to the change the shape of input and output layer accordingly in the config file. I normalize the image channel wise using the mean for each channel, if you want use different normalization you have to the change the ipl_into_image function in the src/image.c file. All the Unet specific code that I added to darknet has been commented with Unet code so you can easily track the changes. Cheers! Sample: ! input image (data/unet/test/5.png) ! output image (data/unet/result/5.png.png)",Medical Image Segmentation,Medical 2046,Medical,Medical,Other,"Unet for image segmentation (deep convolutional neural network) Keras (Tensorflow API) implementation of Unet with data generators, both for images and segmentation maps, as well as for training cycles with validation set. For details on Unet consider the publication by Ronneberger et al.:",Medical Image Segmentation,Medical 2049,Medical,Medical,Other,Peptide Graph Encoding Neural Network Oliver Nakano Baker Sid Rath Jonathon Francis Landau This project builds upon work by Duvenaud et. al.,Drug Discovery,Medical 2069,Medical,Medical,Other,"Retina blood vessel segmentation with a convolution neural network (U net) ! (test/test_Original_GroundTruth_Prediction3.png) This repository contains the implementation of a convolutional neural network used to segment blood vessels in retina fundus images. This is a binary classification task: the neural network predicts if each pixel in the fundus image is either a vessel or not. The neural network structure is derived from the U Net architecture, described in this paper . The performance of this neural network is tested on the DRIVE database, and it achieves the best score in terms of area under the ROC curve in comparison to the other methods published so far. Also on the STARE datasets, this method reports one of the best performances. Methods Before training, the 20 images of the DRIVE training datasets are pre processed with the following transformations: Gray scale conversion Standardization Contrast limited adaptive histogram equalization (CLAHE) Gamma adjustment The training of the neural network is performed on sub images (patches) of the pre processed full images. Each patch, of dimension 48x48, is obtained by randomly selecting its center inside the full image. Also the patches partially or completely outside the Field Of View (FOV) are selected, in this way the neural network learns how to discriminate the FOV border from blood vessels. A set of 190000 patches is obtained by randomly extracting 9500 patches in each of the 20 DRIVE training images. Although the patches overlap, i.e. different patches may contain same part of the original images, no further data augmentation is performed. The first 90% of the dataset is used for training (171000 patches), while the last 10% is used for validation (19000 patches). The neural network architecture is derived from the U net architecture (see the paper ). The loss function is the cross entropy and the stochastic gradient descent is employed for optimization. The activation function after each convolutional layer is the Rectifier Linear Unit (ReLU), and a dropout of 0.2 is used between two consecutive convolutional layers. Training is performed for 150 epochs, with a mini batch size of 32 patches. Using a GeForce GTX TITAN GPU the training lasts for about 20 hours. Results on DRIVE database Testing is performed with the 20 images of the DRIVE testing dataset, using the gold standard as ground truth. Only the pixels belonging to the FOV are considered. The FOV is identified with the masks included in the DRIVE database. In order to improve the performance, the vessel probability of each pixel is obtained by averaging multiple predictions. With a stride of 5 pixels in both height and width, multiple consecutive overlapping patches are extracted in each testing image. Then, for each pixel, the vessel probability is obtained by averaging probabilities over all the predicted patches covering the pixel. The results reported in the ./test folder are referred to the trained model which reported the minimum validation loss. The ./test folder includes: Model: test_model.png schematic representation of the neural network test_architecture.json description of the model in json format test_best_weights.h5 weights of the model which reported the minimum validation loss, as HDF5 file test_last_weights.h5 weights of the model at last epoch (150th), as HDF5 file test_configuration.txt configuration of the parameters of the experiment Experiment results: performances.txt summary of the test results, including the confusion matrix Precision_recall.png the precision recall plot and the corresponding Area Under the Curve (AUC) ROC.png the Receiver Operating Characteristic (ROC) curve and the corresponding AUC all_ .png the 20 images of the pre processed originals, ground truth and predictions relative to the DRIVE testing dataset sample_input_ .png sample of 40 patches of the pre processed original training images and the corresponding ground truth test_Original_GroundTruth_Prediction .png from top to bottom, the original pre processed image, the ground truth and the prediction. In the predicted image, each pixel shows the vessel predicted probability, no threshold is applied. The following table compares this method to other recent techniques, which have published their performance in terms of Area Under the ROC curve (AUC ROC) on the DRIVE dataset. Method AUC ROC on DRIVE : : Soares et al 1 .9614 Azzopardi et al. 2 .9614 Osareh et al 3 .9650 Roychowdhury et al. 4 .9670 Fraz et al. 5 .9747 Qiaoliang et al. 6 .9738 Melinscak et al. 7 .9749 Liskowski et al.^ 8 .9790 this method .9790 ^ different definition of FOV Running the experiment on DRIVE The code is written in Python, it is possible to replicate the experiment on the DRIVE database by following the guidelines below. Prerequisities The neural network is developed with the Keras library, we refer to the Keras repository for the installation. This code has been tested with Keras 1.1.0, using either Theano or TensorFlow as backend. In order to avoid dimensions mismatch, it is important to set image_dim_ordering : th in the /.keras/keras.json configuration file. If this file isn't there, you can create it. See the Keras documentation for more details. The following dependencies are needed: numpy > 1.11.1 PIL > 1.1.7 opencv > 2.4.10 h5py > 2.6.0 ConfigParser > 3.5.0b2 scikit learn > 0.17.1 Also, you will need the DRIVE database, which can be freely downloaded as explained in the next section. Training First of all, you need the DRIVE database. We are not allowed to provide the data here, but you can download the DRIVE database at the official website . Extract the images to a folder, and call it DRIVE , for example. This folder should have the following tree: DRIVE │ └───test ├───1st_manual └───2nd_manual └───images └───mask │ └───training ├───1st_manual └───images └───mask We refer to the DRIVE website for the description of the data. It is convenient to create HDF5 datasets of the ground truth, masks and images for both training and testing. In the root folder, just run: python prepare_datasets_DRIVE.py The HDF5 datasets for training and testing will be created in the folder ./DRIVE_datasets_training_testing/ . N.B: If you gave a different name for the DRIVE folder, you need to specify it in the prepare_datasets_DRIVE.py file. Now we can configure the experiment. All the settings can be specified in the file configuration.txt , organized in the following sections: data paths Change these paths only if you have modified the prepare_datasets_DRIVE.py file. experiment name Choose a name for the experiment, a folder with the same name will be created and will contain all the results and the trained neural networks. data attributes The network is trained on sub images (patches) of the original full images, specify here the dimension of the patches. training settings Here you can specify: N_subimgs : total number of patches randomly extracted from the original full images. This number must be a multiple of 20, since an equal number of patches is extracted in each of the 20 original training images. inside_FOV : choose if the patches must be selected only completely inside the FOV. The neural network correctly learns how to exclude the FOV border if also the patches including the mask are selected. However, a higher number of patches are required for training. N_epochs : number of training epochs. batch_size : mini batch size. nohup : the standard output during the training is redirected and saved in a log file. After all the parameters have been configured, you can train the neural network with: python run_training.py If available, a GPU will be used. The following files will be saved in the folder with the same name of the experiment: model architecture (json) picture of the model structure (png) a copy of the configuration file model weights at last epoch (HDF5) model weights at best epoch, i.e. minimum validation loss (HDF5) sample of the training patches and their corresponding ground truth (png) Evaluate the trained model The performance of the trained model is evaluated against the DRIVE testing dataset, consisting of 20 images (as many as in the training set). The parameters for the testing can be tuned again in the configuration.txt file, specifically in the testing settings section, as described below: testing settings best_last : choose the model for prediction on the testing dataset: best the model with the lowest validation loss obtained during the training; last the model at the last epoch. full_images_to_test : number of full images for testing, max 20. N_group_visual : choose how many images per row in the saved figures. average_mode : if true, the predicted vessel probability for each pixel is computed by averaging the predicted probability over multiple overlapping patches covering the same pixel. stride_height : relevant only if average_mode is True. The stride along the height for the overlapping patches, smaller stride gives higher number of patches. stride_width : same as stride_height. nohup : the standard output during the prediction is redirected and saved in a log file. The section experiment name must be the name of the experiment you want to test, while data paths contains the paths to the testing datasets. Now the section training settings will be ignored. Run testing by: python run_testing.py If available, a GPU will be used. The following files will be saved in the folder with same name of the experiment: The ROC curve (png) The Precision recall curve (png) Picture of all the testing pre processed images (png) Picture of all the corresponding segmentation ground truth (png) Picture of all the corresponding segmentation predictions (png) One or more pictures including (top to bottom): original pre processed image, ground truth, prediction Report on the performance All the results are referred only to the pixels belonging to the FOV, selected by the masks included in the DRIVE database Results on STARE database This neural network has been tested also on another common database, the STARE . The neural network is identical as in the experiment with the DRIVE dataset, however some modifications in the code and in the methodology were necessary due to the differences between the two datasets. The STARE consists of 20 retinal fundus images with two sets of manual segmentation provided by two different observers, with the former one considered as the ground truth. Conversely to the DRIVE dataset, there is no standard division into train and test images, therefore the experiment has been performed with the leave one out method. The training testing cycle has been repeated 20 times: at each iteration one image has been left out from the training set and then used for the test. The pre processing is the same applied for the DRIVE dataset, and 9500 random patches of 48x48 pixels each are extracted from each of the 19 images forming the training set. Also the area outside the FOV has been considered for the patch extraction. From these patches, 90% (162450 patches) are used for training and 10% (18050 patches) are used for validation. The training parameters (epochs, batch size...) are the same as in the DRIVE experiment. The test is performed each time on the single image left out from the training dataset. Similarly to the DRIVE dataset, the vessel probability of each pixel is obtained by averaging over multiple overlapping patches, obtained with a stride of 5 pixels in both width and height. Only the pixels belonging to the FOV are considered. This time the FOV is identified by applying a color threshold in the original images, since no masks are available in the STARE dataset. The following table shows the results (in terms of AUC ROC) obtained over the 20 different trainings, with the stated image used for test. STARE image AUC ROC : : im0239.ppm .9751 im0324.ppm .9661 im0139.ppm .9845 im0082.ppm .9929 im0240.ppm .9832 im0003.ppm .9856 im0319.ppm .9702 im0163.ppm .9952 im0077.ppm .9925 im0162.ppm .9913 im0081.ppm .9930 im0291.ppm .9635 im0005.ppm .9703 im0235.ppm .9912 im0004.ppm .9732 im0044.ppm .9883 im0001.ppm .9709 im0002.ppm .9588 im0236.ppm .9893 im0255.ppm .9819 __AVERAGE: .9805 + .0113__ The folder ./STARE_results contains all the predictions. Each image shows (from top to bottom) the pre processed original image of the STARE dataset, the ground truth and the corresponding prediction. In the predicted image, each pixel shows the vessel predicted probability, no threshold is applied. The following table compares this method to other recent techniques, which have published their performance in terms of Area Under the ROC curve (AUC ROC) on the STARE dataset. Method AUC ROC on STARE : : Soares et al 1 .9671 Azzopardi et al. 2 .9563 Roychowdhury et al. 4 .9688 Fraz et al. 5 .9768 Qiaoliang et al. 6 .9879 Liskowski et al.^ 8 .9930 this method .9805 ^ different definition of FOV Bibliography 1 Soares et al., “Retinal vessel segmentation using the 2 d Gabor wavelet and supervised classification,” Medical Imaging, IEEE Transactions on , vol. 25, no. 9, pp. 1214–1222, 2006. 2 Azzopardi et al., “Trainable cosfire filters for vessel delineation with application to retinal images,” Medical image analysis , vol. 19, no. 1, pp. 46–57, 2015. 3 Osareh et al., “Automatic blood vessel segmentation in color images of retina,” Iran. J. Sci. Technol. Trans. B: Engineering , vol. 33, no. B2, pp. 191–206, 2009. 4 Roychowdhury et al., “Blood vessel segmentation of fundus images by major vessel extraction and subimage classification,” Biomedical and Health Informatics, IEEE Journal of , vol. 19, no. 3, pp. 1118–1128, 2015. 5 Fraz et al., An Ensemble Classification Based Approach Applied to Retinal Blood Vessel Segmentation , IEEE Transactions on Biomedical Engineering , vol. 59, no. 9, pp. 2538 2548, 2012. 6 Qiaoliang et al., A Cross Modality Learning Approach for Vessel Segmentation in Retinal Images , IEEE Transactions on Medical Imaging , vol. 35, no. 1, pp. 109 118, 2016. 7 Melinscak et al., Retinal vessel segmentation using deep neural networks , In Proceedings of the 10th International Conference on Computer Vision Theory and Applications (VISIGRAPP 2015) , (2015), pp. 577–582. 8 Liskowski et al., Segmenting Retinal Blood Vessels with Deep Neural Networks , IEEE Transactions on Medical Imaging , vol. PP, no. 99, pp. 1 1, 2016. Acknowledgements This work was supported by the EU Marie Curie Initial Training Network (ITN) “REtinal VAscular Modelling, Measurement And Diagnosis (REVAMMAD), Project no. 316990. License This project is licensed under the MIT License Copyright (c) 2016 Daniele Cortinovis, Orobix Srl (www.orobix.com).",Medical Image Segmentation,Medical 2073,Medical,Medical,Other,"Simple implementation of Unet, following the paper: basic unet model is built instead of crop and concatenate, here same size is maintained for both side of skip connection, so that simple concatenate could be handled training script for isbi 2012 neural cell image segmentation task is implemented",Medical Image Segmentation,Medical 2096,Medical,Medical,Other,"Multi Planar UNet Quick Start Installation git clone pip install e MultiPlanarUNet This package is still frequently updated and it is thus recommended to install the package with PIP with the e ('editable') flag so that the package can be updated with recent changes on GitHub without re installing: cd MultiPlanarUNet git pull Usage usage: mp script script args... Multi Planar UNet (0.1.0) Available scripts: cv_experiment cv_split init_project predict predict_3D summary train train_fusion ... Overview This package implements fully autonomous deep learning based segmentation of any 3D medical image. It uses a fixed hyperparameter set and a fixed model topology, eliminating the need for conducting hyperparameter tuning experiments. No manual involvement is needed except for supplying the training data. The system has been evaluated on a wide range of tasks spanning organ and pathology segmentation across tissue types and scanner modalities. The model obtained a top 5 position at the 2018 Medical Segmentation Decathlon despite its simplicity and computational efficiency. This software may be used as is and does not require deep learning expertise to get started. It may also serve as a strong baseline method for general purpose semantic segmentation of medical images. Method The base model is a just slightly modified 2D UNet trained under a multi planar framework. Specifically, the 2D model is fed images sampled across multiple views onto the image volume simultaneously: Multi Planar Animation (resources/multi_planar_training.gif) At test time, the model predict along each of the views and recreates a set of full segmentation volumes. These volumes are majority voted into one using a learned function that weights each class from each view individually to maximuse the performance. ! (resources/multi_planar_model.png) The method is described in detail below. Usage Project initialization, model training, evaluation, prediction etc. may be performed using the scripts located in MultiPlanarUNet.bin . The script named mp.py serves as an entry point to all other scripts, and is used as follows: bash Invoke the help menu mp help Launch the train script mp train arguments passed to 'train'... NOTE: You only need to specify the training data in the format described below. Training, evaluation and prediction will be handled automatically if using the above scripts. Preparing the data \ In order to train a model to solve a specific task, a set of manually annotated images must be stored in a folder under the following structure: ./data_folder/ train/ images/ image1.nii.gz image5.nii.gz labels/ image1.nii.gz image5.nii.gz val/ images/ labels/ test/ images/ labels/ aug/ > SUMMARY REPORT FOR FOLDER >> ./my_project/predictions/csv/ >> >> >> Per class: >> >> Mean dice by class +/ STD min max N >> 1 0.856 0.060 0.672 0.912 34 >> 2 0.891 0.029 0.827 0.934 34 >> 3 0.888 0.027 0.829 0.930 34 >> 4 0.802 0.164 0.261 0.943 34 >> 5 0.819 0.075 0.552 0.926 34 >> 6 0.863 0.047 0.663 0.917 34 >> >> Overall mean: 0.853 + 0.088 >> >> >> By views: >> >> 0.8477811 0.50449719 0.16355361 0.825 >> 0.70659414 0.35532932 0.6119361 0.819 >> 0.11799461 0.07137918 0.9904455 0.772 >> 0.95572575 0.28795306 0.06059151 0.827 >> 0.16704373 0.96459936 0.20406974 0.810 >> 0.72188903 0.68418977 0.10373322 0.819 >> Cross Validation Experiments Cross validation experiments may be easily performed. First, invoke the mp cv_split command to split your data_folder into a number of random splits: mp cv_split data_dir ./data_folder CV 5 Here, we prepare for a 5 CV setup. By default, the above command will create a folder at data_folder/views/5 CV/ storing in this case 5 folders split0, split1, ..., split5 each structured like the main data folder with sub folders train , val , test and aug (optionally, set with the aug_sub_dir flag). Inside these sub folders, images a symlinked to their original position to safe storage. Running a CV Experiment \ A cross validation experiment can now be performed. On systems with multiple GPUs, each fold can be assigned a given number of the total pool of GPUs in which case multiple folds will run in parallel and new once automatically start , when previous folds terminate. First, we create a new project folder. This time, we do not specify a data folder yet: mp init_project name CV_experiment We also create a file named script , giving the following folder structure: ./CV_experiment train_hparams.yaml script The train_hparams.yaml file will serve as a template that will be applied to all folds. We can set any parameters we want here, or let the framework decide on proper parameters for each fold automatically. The script file details the mp commands (and optionally various arguments) to execute on each fold. For instance, a script file may look like: mp train no_images Do not save example segmentations mp train_fusion mp predict out_dir predictions We can now execute the 5 CV experiment by running: mp cv_experiment CV_dir ./data_dir/views/5 CV \ out_dir ./splits \ num_GPUs 2 monitor_GPUs_every 600 Above, we assign 2 GPUs to each fold. On a system of 8 GPUs, 4 folds will be run in parallel. We set monitor_GPUs_every 600 to scan the system for new free GPU resources every 600 seconds (otherwise, only GPUs that we initially available will be cycled and new free ones will be ignored). The cv_experiment script will create a new project folder for each split located at out_dir ( CV_experiment/splits in this case). For each fold, each of the commands outlined in the script file will be launched one by one inside the respective project folder of the fold, so that the predictions are stored in CV_experiment/splits/split0/predictions for fold 0 etc. Afterwards, we may get a CV summary by invoking: mp summary ... from inside the CV_experiment/splits folder.",Medical Image Segmentation,Medical 2098,Medical,Medical,Other,"PySemSeg PySemSeg is a library for training Deep Learning Models for Semantic Segmentation in Pytorch. The goal of the library is to provide implementations of SOTA segmentation models, with pretrained versions on popular datasets, as well as an easy to use training loop for new models and datasets. Most Semantic Segmentation datasets with fine grained annotations are small, so Transfer Learning is crucial for success and is a core capability of the library. PySemSeg can use visdom or tensorboardX for training summary visualialization. Installation Using pip: .. code:: bash pip install git+ Models FCN paper _ FCN32, FCN16, FCN8 with pre trained VGG16 UNet paper _ Tiramisu (FC DenseNets) paper _ FC DenseNet 56, FC DenseNet 67, FC DensetNet 103 with efficient checkpointing DeepLab V3 paper _ Multi grid, ASPP and BatchNorm fine tuning with pre trained resnets backbone DeepLab V3+ paper _ RefineNet paper _ Upcoming ... PSPNet paper _ Upcoming ... Datasets Pascal VOC _ CamVid _ Cityscapes Upcoming ... ADE20K Upcoming ... Train a model from command line The following is an example command to train a VGGFCN8 model on the Pascal VOC 2012 dataset. In addition to the dataset and the model, a transformer class should be passed (PascalVOCTransform in this case) a callable where all input image and mask augmentations and tensor transforms are implemented. Run :code: pysemseg train h for a full list of options. .. code:: bash pysemseg train \ model VGGFCN8 \ model dir /models/vgg8_pascal_model/ \ dataset PascalVOCSegmentation \ data dir /datasets/PascalVOC/ \ batch size 4 \ test batch size 1 \ epochs 40 \ lr 0.001 \ optimizer SGD \ optimizer args '{ weight_decay : 0.0005, momentum : 0.9}' \ transformer PascalVOCTransform \ lr scheduler PolyLR \ lr scheduler_args '{ max_epochs : 40, gamma : 0.8}' or pass a YAML config .. code:: bash pysemseg train config config.yaml .. code:: YAML model: VGGFCN32 model dir: models/vgg8_pascal_model/ dataset: PascalVOCSegmentation data dir: datasets/PascalVOC/ batch size: 4 test batch size: 1 epochs: 40 lr: 0.001 optimizer: SGD optimizer args: weight_decay: 0.0005 momentum: 0.9 transformer: PascalVOCTransform no cuda: true lr scheduler: PolyLR lr scheduler args: max_epochs: 40 gamma: 0.8 Load and predict with a trained model To use a checkpoint for inference you have to call :code: load_model with a checkpoint, the model class and the transformer class used during training. .. code:: python import torch.nn.functional as F from pysemseg.transforms import CV2ImageLoader from pysemseg.utils import load_model from pysemseg.models import VGGFCN32 from pysemseg.datasets import PascalVOCTransform model load_model( './checkpoint_path', VGGFCN32, PascalVOCTransform ) image CV2ImageLoader()('./image_path') logits model(image) probabilities F.softmax(logits, dim 1) predictions torch.argmax(logits, dim 1)",Medical Image Segmentation,Medical 2122,Medical,Medical,Other,"4d cbct A deep convolutional neural network model (based on the 'U Net') to enhance the image quality of 4 D Cone Beam CT Objective In this project, inspired by the SPARE Challenge , we are investigating the performance of deep learning models to improve the quality of 4 dimensional cone beam CT images. In particular, we have implemented a deep convolutional neural network based on the 'U Net' architecture (Ronneberger et al 2015). The model presented here corresponds to our first prototype. The Model ! U Net The figure above shows the architecture of the original 2 D U Net that was implemented for image segmentation tasks . Our model contains the following modifications: We have replaced the Maxpooling layers by 2 D Convolutional layers. We have replaced the up convolution layers by re size (using nearest neighbours) + 2 D convolutions. This modification is intended to prevent the network from exibiting artifacts typical of deconvolutional layers. A very nice description of this problem can be found here: Our input/output corresponds to 448 x 448 cbct axial slices. The Data The data was provided by the SPARE Challenge. The SPARE challenge is led by Dr Andy Shieh and Prof Paul Keall at the ACRF Image X Institute, The University of Sydney. Collaborators who have contributed to the datasets include A/Prof Xun Jia, Miss Yesenia Gonzalez, and Mr Bin Li from the University of Texas Southwestern Medical Center, and Dr Simon Rit from the Creatis Medical Imaging Research Center. The data consisted of 4 Dimensional cone beam CT images of 12 patients acquired in 1 minute (sparse input data, suffering from high levels of noise and artifacts), and the corresponding high quality images (complete output data). These data will be released to the public by the organizers of the challenge in the future. Preliminary Results (Prototype model) ! U Net The figure above illustrates the performance of our prototype on images from the validation set. The top row displays three cone beam CT slices reconstructed from 1 minute scans (input data). The middle row shows the improvements made by our model (predictions). The bottom row shows the ground truth (high quality images). Quantitative assessment of the prototype performance In deep learning applications to enhance image data, the mean square error loss function (applied on a pixel by pixel basis) is often used. However, different groups have shown that the selection of a loss function, more relevant to the imaging task at hand, can greatly improve the overall performance of the model. For instance, Zhao et al 2015 proposed several alternatives to the mean square error loss function for de noising, super resolution, and JPEG artifacts removal. The authors proposed a loss function which is a combination of the mean absolute error and the structural similarity. Read the study here: Another very recent study by Taghanaki et al 2018 showed that a simple network with the proper loss function can outperform more complex architectures (e.g. networks with skip connections) in image segmentation tasks. Read their work here: In light of these results, we decided to investigate the following research question: 1) Using the U Net architecture, what is the optimum loss fuction for denoising and artifact removal of 4 D cone beam CT images? To this end, we evaluated the performance of our prototype model with the following loss functions: Loss A: mean squared error Loss B: mean absolute error Loss C: structural similarity Loss D: 0.75 (mean square error) + 0.25 (structural similarity) We assessed the performance of each trained version of our prototype model by evaluating multiple metrics (mean square error, mean absolute error, peak signal to noise ratio and structural similarity) on the test dataset (i.e., images of patients that were not shown during training). In particular, we computed these metrics on both the entire image of each patient and also within the patient body only. The patient body on each image was segmented using a region growing algorithm (available on the SimpleITK library for python. The code is available in my repository). The results are shown in the four figures below. Overall, we do observe an improvement in all the image quality metrics with respect to the initial 'un enhanced' images (referred to as 'Original' in the figures). ! per ! per2 Future work Our prototype was built to improve the quality of the reconstructed images. One limitation of this approach is that the performance of the model will depend on the quality/artifacts present on the input images. Such quality of inputs also is sensitive to the method applied to reconstruct the measured projection data. To overcome this limitation, and to generalize our model as much as possible, we are investigating the following research question: 2) Can we build a deep learning model that improves the quality of the measured projection data (i.e., the sinograms)? How does the performance of such model compares to the performance of our current prototype?",Medical Image Segmentation,Medical 2148,Medical,Medical,Other,"Segmentation This repository contains tensorflow implemenation of two models for semantic segmentations known to give high accuracy: U net : The network can be found in u_net.py (./u_net.py). Here is the architecture which I have borrowed from the paper : ! unet (./misc/unet.png) There are a few minor differences in my implementation. I have used 'same' padding to simplify things. For the upsampling, I have simply used tf.image.resize_images function (see layers_unet.py (./layers_unet.py)) . The full transpose convolution (deconvolution) layer is implemented for FCN described next. FCN with global convolution : The network can be found in fcn_gcn_net.py (./fcn_gcn_net.py). Here is the architecture which I have borrowed from the paper : ! fcn_gcn (./misc/fcn_gcn.png) Again, there are a few minor differences in my implementation. In particular, I have used VGG style encoder instead of ResNet blocks. All the layers/blocks used in the architecture (including the deconvolution layer) can be found in layers_fcn_gcn_net.py (./layers_fcn_gcn_net.py). Kaggle Carvana image masking challenge I applied these models to one of the Kaggle competetions where the background behind the object (in this case : cars) had to be removed. More details can be found here : Kaggle : Carvana image masking challenge . Due to lack of time and resources, I ended up making only a single submission and got a score of 99.2% (winning solution had a score of 99.7%). For this particular challenge, since there is only one class, U net is a better model choice. Here is a sample result when U net is applied to test image: ! carvana_test (./misc/carvana_test.png) ! carvana_test_overlay (./misc/carvana_test_overlay.png) Scope for improvement : There are several strategies that could have improved the score but I did not use due to lack of time: Image preprocessing and data augmentation Use higher resolution images : I was using AWS which wasn't fast enough to handle high resolution images. So, I had to scale down images considerably which leads to lower accuracy. Tiling sub regions of high resolution image : This strategy will ensure that each tile can fit in the GPU but is obviously more time consuming. Apply Conditional Random Field post processing",Medical Image Segmentation,Medical 2149,Medical,Medical,Other,"Neural Message Passing for Quantum Chemistry Implementation of different models of Neural Networks on graphs as explained in the article proposed by Gilmer et al. 1 . Installation $ pip install r requirements.txt $ python main.py Installation of rdkit Running any experiment using QM9 dataset needs installing the rdkit package, which can be done following the instructions available here Data The data used in this project can be downloaded here . Bibliography 1 Gilmer et al. , Neural Message Passing for Quantum Chemistry , arXiv, 2017. 2 Duvenaud et al. , Convolutional Networks on Graphs for Learning Molecular Fingerprints , NIPS, 2015. 3 Li et al. , Gated Graph Sequence Neural Networks , ICLR, 2016. 4 Battaglia et al. , Interaction Networks for Learning about Objects , NIPS, 2016. 5 Kipf et al. , Semi Supervised Classification with Graph Convolutional Networks , ICLR, 2017 6 Defferrard et al. , Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering , NIPS, 2016. 7 Kearnes et al. , Molecular Graph Convolutions: Moving Beyond Fingerprints , JCAMD, 2016. 8 Bruna et al. , Spectral Networks and Locally Connected Networks on Graphs , ICLR, 2014. Cite @Article{Gilmer2017, author {Justin Gilmer and Samuel S. Schoenholz and Patrick F. Riley and Oriol Vinyals and George E. Dahl}, title {Neural Message Passing for Quantum Chemistry}, journal {CoRR}, year {2017} } Authors Pau Riba (@priba) Webpage Anjan Dutta (@AnjanDutta) Webpage",Drug Discovery,Medical 2150,Medical,Medical,Other,"Note: This README is an early work in progress. The body still needs to be written and images need to be resized. Implementing U Net using the TensorFlow Estimator API This repository contains scripts for building, training, and generating predictions with our own implementation of the U Net architecture. The network was implemented in Python using TensorFlow's Estimator API. Our goal was to reproduce the U Net architecture described in 1 , then train the network on data provided by the Carvana Image Masking Challenge hosted on Kaggle 2 . Our implementation produced masks with average pixel accuracy of 0.9959 +/ 0.0003 over a 6 fold cross validation set. Example masks produced by a single network trained on one fold of the cross validation set can be seen below. ! Example of a mask generated by our U Net (images/example_prediction.png) Description of the original U Net architecture U Net is a deep, fully convolutional neural network architecture proposed for biomedical image segmentation. A visual representation of the network, as shown in the original publication 1 , can be found below. ! Image of U Net Architecture (images/U Net.png) Differences between our implementation and the original architecture Summaries of each file Running the program References 1. U Net: Convolutional Networks for Biomedical Image Segmentation Olaf Ronneberger, Philipp Fischer, Thomas Brox. link . arXiv:1505.04597, 2015. 2. Carvana Image Masking Challenge Carvana LLC. link . Kaggle, 2017.",Medical Image Segmentation,Medical 2161,Medical,Medical,Other,"DeepVOG DeepVOG is a framework for pupil segmentation and gaze estimation based on a fully convolutional neural network. Currently it is available for offline gaze estimation of eye tracking video clips. Getting Started These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system. Prerequisites To run DeepVOG, you need to have a Python distribution (we recommend Anaconda ) and the following Python packages: numpy scikit video scikit image tensorflow gpu keras urwid (Not necessary if you do not use the Text based user interface) As an alternative, you can use our docker image which already includes all the dependencies. The only requirement is a platform installed with nvidia driver and nvidia docker (or nvidia runtime of docker). Installing A step by step series of examples that tell you how to get DeepVOG running. 1. Installing from package $ git clone (or you can download the files in this repo with your browser) Move to the directory of DeepVOG that you just cloned/downloaded, and type $ python setup.py install If it happens to be missing some dependencies listed above, you may install them with pip: $ pip install numpy $ pip install scikit video $ ... 2. It is highly recommended to run our program in docker. You can directly pull our docker image from dockerhub. (For tutorials on docker, see docker and nvidia docker ) $ docker run runtime nvidia it rm yyhhoi/deepvog:v1.1.1 bash or $ nvidia docker run it rm yyhhoi/deepvog:v1.1.1 bash Usage (Command line interface) The CLI allows you to fit/infer single video, or multiple of them by importing a csv table. They can be simply called by: $ python m deepvog fit /PATH/video_fit.mp4 /PATH/eyeball_model.json $ python m deepvog infer /PATH/video_infer.mp4 /PATH/eyeball_model.json /PATH/results.csv $ python m deepvog table /PATH/list_of_operations.csv DeepVOG first fits a 3D eyeball model from a video clip. Base on the eyeball model, it estimates the gaze direction on any other videos if the relative position of the eye with respect to the camera remains the same. It has no problem that you fit an eyeball model and infer the gaze directions from the same video clip. However, for clinical use, some users may want to have a more accurate estimate by having a separate fitting clip where the subjects perform a calibration paradigm. In addition, you will need to specify your camera parameters such as focal length, if your parameters differ from default values. $ python m deepvog fit /PATH/video_fit.mp4 /PATH/eyeball_model.json flen 12 vid shape 240,320 sensor 3.6,4.8 batchsize 32 gpu 0 Please refer to doc/documentation.md (doc/documentation.md) for the meaning of arguments and input/output formats. Alternatively, you can also type $ python m deepvog h for usage examples. Usage (Text based user interface) DeepVOG comes with a simple text based user interface (TUI). After installation, you can simply type in terminal: $ python m deepvog tui If it is successful, you should see the interface: From now on, you can follow the instructions within the interface and do offline analysis on your videos. Usage (As a python module) For more flexibility, you may import the module directly in python. python import deepvog Load our pre trained network model deepvog.load_DeepVOG() Initialize the class. It requires information of your camera's focal length and sensor size, which should be available in product manual. inferer deepvog.gaze_inferer(model, focal_length, video_shape, sensor_size) Fit an eyeball model from video_1.mp4 . The model will be stored as the inferer instance's attribute. inferer.fit( video_1.mp4 ) After fitting, infer gaze from video_1.mp4 and output the results into result_video_1.csv inferer.predict( video_1.mp4 , result_video_1.csv ) Optional You may also save the eyeball model to video_1_mode.json for subsequent gaze inference inferer.save_eyeball_model( video_1_model.json ) By loading the eyeball model, you don't need to fit the model again with inferer.fit( video_1.mp4 ) inferer.load_eyeball_model( video_1_model.json ) Publication and Citation If you plan to use this work in your research or product, please cite this repository and our publication pre print on arXiv . Authors Yiu Yuk Hoi Implementation and validation Seyed Ahmad Ahmadi Research study concept Moustafa Aboulatta Initial work Links to other related papers U Net: Convolutional Networks for Biomedical Image Segmentation V Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation A fully automatic, temporal approach to single camera, glint free 3D eye model fitting License This project is licensed under the GNU General Public License v3.0 (GNU GPLv3) License see the LICENSE (LICENSE) file for details Acknowledgments We thank our fellow researchers at the German Center for Vertigo and Balance Disorders for help in acquiring data for training and validation of pupil segmentation and gaze estimation. In particular, we would like to thank Theresa Raiser, Dr. Virginia Flanagin and Prof. Dr. Peter zu Eulenburg.",Medical Image Segmentation,Medical 2208,Medical,Medical,Other,"Complementary_Segmentation_Network (Outperforms u nets everytime :) for binary segmentation ) Pretrained optimal compnet model on 1st fold of Oasis Brain MRI dataset link (let me know if this gets corrupted ) Please note that the green sigmoid in the image should be an concatenation. We simply concatenate the pairwise addition of the intermediate branches and then send that whole concatenation (NO SIGMOID on it) to the reconstruction branch Future work will include the idea for multi class complementary segmentation Network Architecture for the MICCAI_2018 paper : CompNet: Complementary Segmentation Network for Brain MRI Extraction. To view the paper on Archive click the following email me rd31879@uga.edu for any questions !! Am happy to discuss Built With/Things Needed to implement experiments Python Python 2 Keras Deep Learning Framework used Numpy Numpy Sklearn Scipy/Sklearn/Scikit learn CUDA CUDA 8 CUDNN CUDNN 5 You have to register to get access to CUDNN OASIS Oasis dataset website 12 gb TitanX To implement this exact network Basic Idea Pre requisites This architecture can be understood after learning about the U Net {PLEASE READ U NET before reading this paper} and W Net {Optional}. Please see line 1541 in comp_net_raw.py file in src for the main essence of complementary network i.e. summing up the intermediate outputs of segmentation and complementary branches and then concatenating them for reconstruction layer. Hyper parameters to be set l2_Lambda used for regularizing/penalizing parameters of the current layer Mainly used to prevent overfitting and is incorporated in the loss function Please see keras.io for more details DropP sets the % of dropout at the end of every dense block Kernel_size is the kernel size of the convolution filters Please see readme for additional resources. Lines 73 648 is the common encoder of the segmentation and complementary branches. Layers such as xconv1a,xmerge1........ belong to the complementary upsampling branch branch of the architecture. The convolution layers's number indicates its level and so up6 and xup6 are at the same level and are parallel to each other Layers such as xxconv1a,xxmerge1 .... belong to the reconstruction branch. For more details of the multi outputs please see my isbi repository here Basically to summarize, we have two branches one which has negative dice with ground truth brain mask and is the segmentation branch We then have another branch with positive dice with ground truth masks The THEME of comp net is to sum up the two sections, future works will provide a better way to do this and a generalized version :) We do this theme of summing at every stage of the intermediate outputs i.e. the first intermediate output of segmentation branch is summed with first intermediate output of the complementary branch. We obtain a final summary of the outputs of the segmentation branch and complementary branch and also sum these two new summaries Finally we concat all of these summations and send to the reconstruction branch reconstruction branch is a simple structure of dense multi output U Net and the ground truth is the input image and loss is MSE. Comp Net summary ROI and CO branches We take the downsampling branch of a U Net as it is, however we split the upsampling branch into two halves, one to obtain the Region of Interest and the other for Complementary aka non region of interest. Losses here are negative dice for ROI and positive dice for Non ROI region. Reconstruction Branch Next we merge these two ROI and non ROI outputs using Summation operation and then pass it into another U Net, This U Net is the reconstruction branch. The input is the summed image from previous step and the output is the original image that we start with. The loss of reconstruction branch is MSE. ! alt text Architecture of our complementary segmentation network, the optimal CompNet. The dense blocks (DB), corresponding to the gray bars, are used in each encoder and decoder. The triple (x,y,z) in each dense block indicates that it has x convolutional layers with a kernel size 3×3; each layer has y filters, except for the last one that has z filters. SO: segmentation output for the brain mask; CO: complementary segmentation output for the non brain mask; RO: reconstruction output for the input image. These three outputs produced by the Sigmoid function are the final predictions; while all other Sigmoids produce intermediate outputs, except for the green one that is the concatenation of the summation from each intermediate layers. Best viewed in color. The code in this repository provides only the stand alone code for this architecture. You may implement it as is, or convert it into modular structure if you so wish. The dataset of OASIS can obtained from the link above and the preprocessiong steps involved are mentioned in the paper. You have to provide the inputs. Building your own Comp Net from whatever U Net you have Copy the upsampling branch of your U Net Duplicate it Use same loss functions as the original U Net BUT change its sign {Warning Make sure your loss function is defined for the opposite sign and try to think intuitively what it acheives. Example dice is simply overlap between two objects and optimizing negative dice gives us maximum possible overlap, but positive dice lowest value is 0 since you CANNOT quantify how much seperation is there between two objects using the DICE score but simply quantify if the two overlap or not and if they overlap how much } Add the two upsampling branch outputs pairwise for each channel using keras's model.add layer Feed that into the new reconstruction U Net where the loss function is MSE with the Input image of the first U Net i.e. the original input ! alt text Sample results",Medical Image Segmentation,Medical 2209,Medical,Medical,Other,"Complementary_Segmentation_Network (Outperforms u nets everytime :) for binary segmentation ) Pretrained optimal compnet model on 1st fold of Oasis Brain MRI dataset link (let me know if this gets corrupted ) Please note that the green sigmoid in the image should be an concatenation. We simply concatenate the pairwise addition of the intermediate branches and then send that whole concatenation (NO SIGMOID on it) to the reconstruction branch Future work will include the idea for multi class complementary segmentation Network Architecture for the MICCAI_2018 paper : CompNet: Complementary Segmentation Network for Brain MRI Extraction. To view the paper on Archive click the following email me rd31879@uga.edu for any questions !! Am happy to discuss Built With/Things Needed to implement experiments Python Python 2 Keras Deep Learning Framework used Numpy Numpy Sklearn Scipy/Sklearn/Scikit learn CUDA CUDA 8 CUDNN CUDNN 5 You have to register to get access to CUDNN OASIS Oasis dataset website 12 gb TitanX To implement this exact network Basic Idea Pre requisites This architecture can be understood after learning about the U Net {PLEASE READ U NET before reading this paper} and W Net {Optional}. Please see line 1541 in comp_net_raw.py file in src for the main essence of complementary network i.e. summing up the intermediate outputs of segmentation and complementary branches and then concatenating them for reconstruction layer. Hyper parameters to be set l2_Lambda used for regularizing/penalizing parameters of the current layer Mainly used to prevent overfitting and is incorporated in the loss function Please see keras.io for more details DropP sets the % of dropout at the end of every dense block Kernel_size is the kernel size of the convolution filters Please see readme for additional resources. Lines 73 648 is the common encoder of the segmentation and complementary branches. Layers such as xconv1a,xmerge1........ belong to the complementary upsampling branch branch of the architecture. The convolution layers's number indicates its level and so up6 and xup6 are at the same level and are parallel to each other Layers such as xxconv1a,xxmerge1 .... belong to the reconstruction branch. For more details of the multi outputs please see my isbi repository here Basically to summarize, we have two branches one which has negative dice with ground truth brain mask and is the segmentation branch We then have another branch with positive dice with ground truth masks The THEME of comp net is to sum up the two sections, future works will provide a better way to do this and a generalized version :) We do this theme of summing at every stage of the intermediate outputs i.e. the first intermediate output of segmentation branch is summed with first intermediate output of the complementary branch. We obtain a final summary of the outputs of the segmentation branch and complementary branch and also sum these two new summaries Finally we concat all of these summations and send to the reconstruction branch reconstruction branch is a simple structure of dense multi output U Net and the ground truth is the input image and loss is MSE. Comp Net summary ROI and CO branches We take the downsampling branch of a U Net as it is, however we split the upsampling branch into two halves, one to obtain the Region of Interest and the other for Complementary aka non region of interest. Losses here are negative dice for ROI and positive dice for Non ROI region. Reconstruction Branch Next we merge these two ROI and non ROI outputs using Summation operation and then pass it into another U Net, This U Net is the reconstruction branch. The input is the summed image from previous step and the output is the original image that we start with. The loss of reconstruction branch is MSE. ! alt text Architecture of our complementary segmentation network, the optimal CompNet. The dense blocks (DB), corresponding to the gray bars, are used in each encoder and decoder. The triple (x,y,z) in each dense block indicates that it has x convolutional layers with a kernel size 3×3; each layer has y filters, except for the last one that has z filters. SO: segmentation output for the brain mask; CO: complementary segmentation output for the non brain mask; RO: reconstruction output for the input image. These three outputs produced by the Sigmoid function are the final predictions; while all other Sigmoids produce intermediate outputs, except for the green one that is the concatenation of the summation from each intermediate layers. Best viewed in color. The code in this repository provides only the stand alone code for this architecture. You may implement it as is, or convert it into modular structure if you so wish. The dataset of OASIS can obtained from the link above and the preprocessiong steps involved are mentioned in the paper. You have to provide the inputs. Building your own Comp Net from whatever U Net you have Copy the upsampling branch of your U Net Duplicate it Use same loss functions as the original U Net BUT change its sign {Warning Make sure your loss function is defined for the opposite sign and try to think intuitively what it acheives. Example dice is simply overlap between two objects and optimizing negative dice gives us maximum possible overlap, but positive dice lowest value is 0 since you CANNOT quantify how much seperation is there between two objects using the DICE score but simply quantify if the two overlap or not and if they overlap how much } Add the two upsampling branch outputs pairwise for each channel using keras's model.add layer Feed that into the new reconstruction U Net where the loss function is MSE with the Input image of the first U Net i.e. the original input ! alt text Sample results",Medical Image Segmentation,Medical 2210,Medical,Medical,Other,"Complementary_Segmentation_Network (Outperforms u nets everytime :) for binary segmentation ) Pretrained optimal compnet model on 1st fold of Oasis Brain MRI dataset link (let me know if this gets corrupted ) Please note that the green sigmoid in the image should be an concatenation. We simply concatenate the pairwise addition of the intermediate branches and then send that whole concatenation (NO SIGMOID on it) to the reconstruction branch Future work will include the idea for multi class complementary segmentation Network Architecture for the MICCAI_2018 paper : CompNet: Complementary Segmentation Network for Brain MRI Extraction. To view the paper on Archive click the following email me rd31879@uga.edu for any questions !! Am happy to discuss Built With/Things Needed to implement experiments Python Python 2 Keras Deep Learning Framework used Numpy Numpy Sklearn Scipy/Sklearn/Scikit learn CUDA CUDA 8 CUDNN CUDNN 5 You have to register to get access to CUDNN OASIS Oasis dataset website 12 gb TitanX To implement this exact network Basic Idea Pre requisites This architecture can be understood after learning about the U Net {PLEASE READ U NET before reading this paper} and W Net {Optional}. Please see line 1541 in comp_net_raw.py file in src for the main essence of complementary network i.e. summing up the intermediate outputs of segmentation and complementary branches and then concatenating them for reconstruction layer. Hyper parameters to be set l2_Lambda used for regularizing/penalizing parameters of the current layer Mainly used to prevent overfitting and is incorporated in the loss function Please see keras.io for more details DropP sets the % of dropout at the end of every dense block Kernel_size is the kernel size of the convolution filters Please see readme for additional resources. Lines 73 648 is the common encoder of the segmentation and complementary branches. Layers such as xconv1a,xmerge1........ belong to the complementary upsampling branch branch of the architecture. The convolution layers's number indicates its level and so up6 and xup6 are at the same level and are parallel to each other Layers such as xxconv1a,xxmerge1 .... belong to the reconstruction branch. For more details of the multi outputs please see my isbi repository here Basically to summarize, we have two branches one which has negative dice with ground truth brain mask and is the segmentation branch We then have another branch with positive dice with ground truth masks The THEME of comp net is to sum up the two sections, future works will provide a better way to do this and a generalized version :) We do this theme of summing at every stage of the intermediate outputs i.e. the first intermediate output of segmentation branch is summed with first intermediate output of the complementary branch. We obtain a final summary of the outputs of the segmentation branch and complementary branch and also sum these two new summaries Finally we concat all of these summations and send to the reconstruction branch reconstruction branch is a simple structure of dense multi output U Net and the ground truth is the input image and loss is MSE. Comp Net summary ROI and CO branches We take the downsampling branch of a U Net as it is, however we split the upsampling branch into two halves, one to obtain the Region of Interest and the other for Complementary aka non region of interest. Losses here are negative dice for ROI and positive dice for Non ROI region. Reconstruction Branch Next we merge these two ROI and non ROI outputs using Summation operation and then pass it into another U Net, This U Net is the reconstruction branch. The input is the summed image from previous step and the output is the original image that we start with. The loss of reconstruction branch is MSE. ! alt text Architecture of our complementary segmentation network, the optimal CompNet. The dense blocks (DB), corresponding to the gray bars, are used in each encoder and decoder. The triple (x,y,z) in each dense block indicates that it has x convolutional layers with a kernel size 3×3; each layer has y filters, except for the last one that has z filters. SO: segmentation output for the brain mask; CO: complementary segmentation output for the non brain mask; RO: reconstruction output for the input image. These three outputs produced by the Sigmoid function are the final predictions; while all other Sigmoids produce intermediate outputs, except for the green one that is the concatenation of the summation from each intermediate layers. Best viewed in color. The code in this repository provides only the stand alone code for this architecture. You may implement it as is, or convert it into modular structure if you so wish. The dataset of OASIS can obtained from the link above and the preprocessiong steps involved are mentioned in the paper. You have to provide the inputs. Building your own Comp Net from whatever U Net you have Copy the upsampling branch of your U Net Duplicate it Use same loss functions as the original U Net BUT change its sign {Warning Make sure your loss function is defined for the opposite sign and try to think intuitively what it acheives. Example dice is simply overlap between two objects and optimizing negative dice gives us maximum possible overlap, but positive dice lowest value is 0 since you CANNOT quantify how much seperation is there between two objects using the DICE score but simply quantify if the two overlap or not and if they overlap how much } Add the two upsampling branch outputs pairwise for each channel using keras's model.add layer Feed that into the new reconstruction U Net where the loss function is MSE with the Input image of the first U Net i.e. the original input ! alt text Sample results",Medical Image Segmentation,Medical 2217,Medical,Medical,Other,"The Semantic Segmentation Project This project is an attempt to collate popular neural net architectures that are used for semantic segmentation, and provide them for easy training and use for the beginner in semantic segmentation. So far the following architectures are implemented: 1. Segnet 2. U Net 3. FCN Additions are welcome!",Medical Image Segmentation,Medical 2259,Medical,Medical,Other,"Unet_keras A simple keras implementation of Unet Original Paper U Net: Convolutional Networks for Biomedical Image Segmentation Shamlessly taking the training data from this repo . Personally I do not want a new wheel but those old repos about Unet are not runnable. (Maybe due to the upgraded API of keras, whatever...) If you do not like the heavy overhead of Keras, I do provide a naive trainer written in Tensorflow. This implementation is based on Keras 2.24 and Tensorflow 1.13.",Medical Image Segmentation,Medical 2276,Medical,Medical,Other,"TumorDetectionDeepLearning Using deep learning to detect tumors The encoder decoder temporal convolutional network model is derived from: Some hyperparameters are different and a flattening layer was added towards the end of the model. It is not an exact replica of the model described in the paper. Trains slowly on a GTX 1060, having taken nearly 34 minutes to train 10 epochs with a batch size of 128. Performs well on flattened MNIST data, obtaining a 98.82% training accuracy and 96.81% testing accuracy. The dilated temporan convolutional network model is also derived from: This model is also not an exact replica of the one described in the paper, but it has proven to be quite lighter and much faster to train than the encoder decoder temporal convolutional network model on the sequential MNIST data. Took 10 and a half minutes to train 15 epochs with a batch size of 128 on a GTX 1060. Performs better than the encoder decoder T.C.N model, having obtained a 99.68% training accuracy and 97.02% testing accuracy. The U Net model is derived from: It is a close replica of the one described in the paper, with added cropping and zero padding layers to account for an odd dimension in the input shape. As the results show, further data preprocessing is required to obtain optimal training accuracy and prediction results. Otherwise, the model itself works as intended, showing promising results in the task of image segmentation.",Medical Image Segmentation,Medical 2296,Medical,Medical,Other,"StrangerThings + Some Strange Projects involving Machine Learning Concepts and Papers + Structured Approach to put all my work in one Repo resources + + The Master Algorithm Pedro Domingos LEAPN + Here is a link to that book: Link: + Another book referenced in the video + Blockchain , NLP , Driverless AI + CNN + Latex + Analytics Vidya: + TYRO Labs + Graphlab Topics that are important to be clear about + tf idf vectorizer + Dimensionality reduction to images + PCA Principal Component Analysis + + + + SVD Singular Value Decomposition + + LDA Linear Discriminative analysis + CCA Canonical Correlation analysis + ICA Independent Component analysis + NMF non negative matrix factorization + K means clustering + Bayes Theorem + n grams + Batch normalization + Regression + Linear + Logistic + Multivariate + SVM Support Vector Machines + Where to place the Decision boundary reduce the MSE + Margin + Naive Bayes Classifier + Random Forest People to Follow: + KaggleRuns + Kaggle Competitions and Kernel Runs Word2vec + + + + + + + Bias Variance + + using lightgbm and xgboost + + + Project Report + Hypothesis Testing + clustering and Overfitting + + DataScience Bowl Sources to Read for Nuclei detection + Mask R CNN for object detection and instance segmentation on Keras and TensorFlow + + Kaggle DS Bowl Baseline + + Fast AI + + U net architecture + + Dog breed classification + YOLO architecture + Legacy U net architecture The Model is based on , seems the unet architecture came up in 2015 and was considered best at that time for Biomedical image segmentation + Using Pytorch for doing this + + Amazing Notebooks + + + Some resources about loading data into Amazon s3 and using Amazon Sagemaker + + + Using Amazon Sagemaker: + tqdm : + image reading : + Some other resources + + + + + Gluon/mxnet + Pytorch + Torch + BCE + AWS: wget >(target K.log(output) + (1.0 target) K.log(1.0 output)) + Boto3: Amazon SDK for Python : + Amazon ML : + Comparisions : + Amazon AMIs for Deep learning + Fast.ai forum + Deep Residual networks + Pytorch using DL + Scikit Image(morphology) + Tensorflow examples + + + Project guidelines + + Boto3 Other options for ML and running jupyter notebooks + + floydhub commands floyd init my_jupyter_project floyd run data sananand007/datasets/datasciencebowl2018/2 gpu mode jupyter GANs + Generative Adversarial Networks Introduction to GANs Mini Project IMDB Keras IMDB project using keras Keras Testing Keras module Project 1 First Neural Network Design Project Project 2 Dog Breed Classifier Project Mini Project Sentiment Analysis Project by Trask Mini Project Student Admissions Keras Project + Study of Recurrent neural Networks Recurrent Neural Networks Long Short term Memory Back Propagation through Time (BPTT) + Folded Networks to understand better + Practicing Tensor Flow Training a RNN Network + Mini Project Sentiment Analysis using RNN/LSTMs + Mini Project Implementing a SkipGram Model using Tensor Flow + Mini Project Training a RNN/LSTM to predict the next word + Project 3 Generate TV Scripts Use LSTM/RNNs to train a Model and Generate a TV script using Simpson's Dataset Next step will be to improve the Training Loss and use the whole Dataset SmartCab + How to train a Naive smartcab training agent + Used Q learning with multiple exploration factors to check where the Algorithm is converging + Results from the Project Attempt Epsilon Alpha Tolerance Safety Reliability n_test 1 epsilon epsilon(0.98) 0.5 0.000001 A+ B 40 2 epsilon epsilon(0.98) 0.5 0.0001 A+ A 40 3 epsilon epsilon(0.98) 0.5 0.001 A+ B 20 4 epsilon e^{ (0.05t)} 0.5 0.01 A+ B 20 5 epsilon e^{ (0.005t)} 0.5 0.01 A+ B 20 I choose the 2nd one from the top table RL + Re enforcement Learning + Book Reinforcement Learning: An Introduction Udacity Capstone Writing the Project Proposal + Getting to know some topics from + Topics DataScience Bowl Identify a Nuclei in a dataset of Images + Do some exploratory data analysis , learn a bit about the dataset + Timeline merger deadline: April 11th, 2018 + Reading and thoughts : U Net: Convolutional Networks for Biomedical Image Segmentation : Check other architectures that are comparable to Unet and why they perform better or lower First step : Solve MNIST using tf Second step : Do Analysis on the data Write ups as you go + How to write a Project Proposal : rubric : Project Proposal Submission In this capstone project proposal, prior to completing the following Capstone Project, you you will leverage what you’ve learned throughout the Nanodegree program to author a proposal for solving a problem of your choice by applying machine learning algorithms and techniques. A project proposal encompasses seven key points: The project's domain background — the field of research where the project is derived; A problem statement — a problem being investigated for which a solution will be defined; The datasets and inputs — data or inputs being used for the problem; A solution statement — a the solution proposed for the problem given; A benchmark model — some simple or historical model or result to compare the defined solution to; A set of evaluation metrics — functional representations for how the solution can be measured; An outline of the project design — how the solution will be developed and results obtained. Think about a technical field or domain that you are passionate about, such as robotics, virtual reality, finance, natural language processing, or even artificial intelligence (the possibilities are endless!). Then, choose an existing problem within that domain that you are interested in which you could solve by applying machine learning algorithms and techniques. Be sure that you have collected all of the resources needed (such as datasets, inputs, and research) to complete this project, and make the appropriate citations wherever necessary in your proposal. In addition, you may find a technical domain (along with the problem and dataset) as competitions on platforms such as Kaggle, or Devpost. This can be helpful for discovering a particular problem you may be interested in solving as an alternative to the suggested problem areas above. In many cases, some of the requirements for the capstone proposal are already defined for you when choosing from these platforms. Evaluation Your project will be reviewed by a Udacity reviewer against the Capstone Project Proposal rubric. Be sure to review this rubric thoroughly and self evaluate your project before submission. All criteria found in the rubric must be meeting specifications for you to pass. Submission Files At minimum, your submission will be required to have the following files listed below. If your submission method of choice is uploading an archive ( .zip), please take into consideration the total file size. You will need to include A project proposal, in PDF format only, with the name proposal.pdf, addressing each of the seven key points of a proposal. The recommended page length for a proposal is approximately two to three pages. Any additional supporting material such as datasets, images, or input files that are necessary for your project and proposal. If these files are too large and you are uploading your submission, instead provide appropriate means of acquiring the necessary files in an included README.md file. Once you have collected these files and reviewed the project rubric, proceed to the project submission page.",Medical Image Segmentation,Medical 2356,Medical,Medical,Other,"Briefing This is a image semantic segmentation demo using Keras. To simplify the code, I choose the horse dataset, as the two classes are quite balanced (background and horse). Horse Dataset Horse dataset is downloaded from Nets 1. FCN, translated from the original caffe code 2. Unet, 3. DeepLab V3+ (onging), Loss Inside the custom_loss.py, there are some losses not only for segmentation task, also for binary classification or category classification. Some famous loss implemented, such as focal loss. The custom_loss_eagermode.py is only for loss function testing purpose, testing on eager mode is more efficient. Class imbalance However the class imbalance is always a big problem in daily segmentation tasks. Tried on Pascal dataset, but the result is bad, still exploring. 1. Weight cross entropy, what's the reasonable loss weights for classes ? (the inverse class frequency? ongoing) 2. Dice loss / GDL (onging) 3. Tversky loss (onging)",Medical Image Segmentation,Medical 2359,Medical,Medical,Other,"pytorch unet segmentation Members : PyeongEun Kim , JuHyung Lee , MiJeong Lee Supervisors : Utku Ozbulak , Wesley De Neve Description This project aims to implement biomedical image segmentation with the use of U Net model. The below image briefly explains the output we want: The dataset we used is Transmission Electron Microscopy (ssTEM) data set of the Drosophila first instar larva ventral nerve cord (VNC), which is dowloaded from ISBI Challenge: Segmentation of of neural structures in EM stacks The dataset contains 30 images (.png) of size 512x512 for each train, train labels and test. Table of Content Dataset ( dataset) Preprocessing ( preprocessing) Model ( model) Loss function ( lossfunction) Post processing ( postprocessing) Results ( results) Dependency ( dependency) References ( references) Dataset ruby class SEMDataTrain(Dataset): def __init__(self, image_path, mask_path, in_size 572, out_size 388): Args: image_path (str): the path where the image is located mask_path (str): the path where the mask is located option (str): decide which dataset to import All file names Lists of image path and list of labels Calculate len Calculate mean and stdev def __getitem__(self, index): Get specific data corresponding to the index Args: index (int): index of the data Returns: Tensor: specific data on index which is converted to Tensor GET IMAGE Augmentation on image Flip Gaussian_noise Uniform_noise Brightness Elastic distort {0: distort, 1:no distort} Crop the image Pad the image Sanity Check for Cropped image Normalize the image Add additional dimension Convert numpy array to tensor Augmentation on mask Flip same way with image Elastic distort same way with image Crop the same part that was cropped on image Sanity Check Normalize the mask to 0 and 1 Add additional dimension Convert numpy array to tensor return (img_as_tensor, msk_as_tensor) def __len__(self): Returns: length (int): length of the data Preprocessing We preprocessed the images for data augmentation. Following preprocessing are : Flip Gaussian noise Uniform noise Brightness Elastic deformation Crop Pad Image Augmentation Original Image Image Flip Vertical Horizontal Both Gaussian noise Standard Deviation: 10 Standard Deviation: 50 Standard Deviation: 100 Uniform noise Intensity: 10 Intensity: 50 Intensity: 100 Brightness Intensity: 10 Intensity: 20 Intensity: 30 Elastic deformation Random Deformation: 1 Random Deformation: 2 Random Deformation: 3 Crop and Pad Crop Left Bottom Left Top Right Bottom Right Top Padding process is compulsory after the cropping process as the image has to fit the input size of the U Net model. In terms of the padding method, symmetric padding was done in which the pad is the reflection of the vector mirrored along the edge of the array. We selected the symmetric padding over several other padding options because it reduces the loss the most. To help with observation, a ! ffff00 'yellow border' is added around the original image: outside the border indicates symmetric padding whereas inside indicates the original image. Pad Left Bottom Left Top Right bottom Right Top Model Architecture We have same structure as U Net Model architecture but we made a small modification to make the model smaller. ! image Loss function We used a loss function where pixel wise softmax is combined with cross entropy. Softmax ! image .png) Cross entropy ! image .png) Post processing In attempt of reducing the loss, we did a post processing on the prediction results. We applied the concept of watershed segmentation in order to point out the certain foreground regions and remove regions in the prediction image which seem to be noises. ! postprocessing The numbered images in the figure above indicates the stpes we took in the post processing. To name those steps in slightly more detail: 1. Convertion into grayscale 2. Conversion into binary image 3. Morphological transformation: Closing 4. Determination of the certain background 5. Calculation of the distance 6. Determination of the certain foreground 7. Determination of the unknown region 8. Application of watershed 9. Determination of the final result Conversion into grayscale The first step is there just in case the input image has more than 1 color channel (e.g. RGB image has 3 channels) Conversion into binary image Convert the gray scale image into binary image by processing the image with a threshold value: pixels equal to or lower than 127 will be pushed down to 0 and greater will be pushed up to 255. Such process is compulsory as later transformation processes takes in binary images. Morphological transformation: Closing. We used morphologyEX() function in cv2 module which removes black noises (background) within white regions (foreground). Determination of the certain background We used dilate() function in cv2 module which emphasizes/increases the white region (foreground). By doing so, we connect detached white regions together for example, connecting detached cell membranes together to make ensure the background region. Caculation of the distance This step labels the foreground with a color code: ! ff0000 red color indicates farthest from the background while ! 003bff blue color indicates closest to the background. Determination of the foreground Now that we have an idea of how far the foreground is from the background, we apply a threshold value to decide which part could surely be the foreground. The threshold value is the maximum distance (calculated from the previous step) multiplied by a hyper parameter that we have to manually tune. The greater the hyper parameter value, the greater the threshold value, and therefore we will get less area of certain foreground. Determination of the unknown region From previous steps, we determined sure foreground and background regions. The rest will be classified as 'unknown' regions. Label the foreground: markers We applied connectedComponents() function from the cv2 module on the foreground to label the foreground regions with color to distinguish different foreground objects. We named it as a 'marker'. Application of watershed and Determination of the final result After applying watershed() function from cv2 module on the marker, we obtained an array of 1, 1, and many others. 1 Border region that distinguishes foreground and background 1 Background region To see the result, we created a clean white page of the same size with the input image. then we copied all the values from the watershed result to the white page except 1, which means that we excluded the background. Results Optimizer Learning Rate Lowest Loss Epoch Highest Accuracy Epoch SGD 0.001 0.196972 1445 0.921032 1855 0.005 0.205802 1815 0.918425 1795 0.01 0.193328 450 0.922908 450 RMS_prop 0.0001 0.203431 185 0.924543 230 0.0002 0.193456 270 0.926245 500 0.001 0.268246 1655 0.882229 1915 Adam 0.0001 0.194180 140 0.924470 300 0.0005 0.185212 135 0.925519 135 0.001 0.222277 165 0.912364 180 We chose the best learning rate that fits the optimizer based on how fast the model converges to the lowest error . In other word, the learning rate should make model to reach optimal solution in shortest epoch repeated. However, the intersting fact was that the epochs of lowest loss and highest accuracy were not corresponding. This might be due to the nature of loss function (Loss function is log scale, thus an extreme deviation might occur). For example, if the softmax probability of one pixel is 0.001, then the log(0.001) would be 1000 which is a huge value that contributes to loss. For consistency, we chose to focus on accuracy as our criterion of correctness of model. Accuracy and Loss Graph SGD (lr 0.01,momentum 0.99) RMS prop (lr 0.0002) Adam (lr 0.0005) We used two different optimizers (SGD, RMS PROP, and Adam). In case of SGD the momentum is manually set (0.99) whereas in case of other optimizers (RMS Prop and Adam) it is calculated automatically. Model Downloads Model trained with SGD can be downloaded via dropbox : Model trained with RMS prop can be downloaded via dropbox : Model trained with Adam can be downloaded via dropbox : Example Input Image Results comparsion original image mask RMS prop optimizer (Accuracy 92.48 %) SGD optimizer (Accuracy 91.52 %) Adam optimizer (Accuracy 92.55 %) Dependency Following modules are used in the project: python > 3.6 numpy > 1.14.5 torch > 0.4.0 PIL > 5.2.0 scipy > 1.1.0 matplotlib > 2.2.2 References : 1 O. Ronneberger, P. Fischer, and T. Brox. U Net: Convolutional Networks for Biomedical Image Segmentation, 2 P.Y. Simard, D. Steinkraus, J.C. Platt. Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis,",Medical Image Segmentation,Medical 2369,Medical,Medical,Other,cv_course_project Relevant resources 1. U Net: Convolutional Networks for Biomedical Images Segmentation: 2.,Medical Image Segmentation,Medical 2390,Medical,Medical,Other,"RedeemTheBoar This repository contains the code we used for the 2018 Data Science Bowl competition on kaggle . The competition required the implementation of a model that is capable of identifying a range of cell nuclei across varied conditions. keras_implementation is the main part of the code. It contains the image preprocessing, an implementation of a U Net (as defined in using Keras. The U Net architecture consists of a contracting path and an expansive path, each made of convolutional blocks. As the training data was limited in size, and skewed in content (images for which it was easier to mask the nuclei dominated the sample), data augmentation was an important part of this challenge. input_pipeline contains the data augmentation snippet keras_implementation calls during training. Below is an example of the data, and the output of our code. The top two panels is what is provided in the training data. The top left panel is the image, and the top right panel is the ground truth masks of the cell nuclei. The bottom two panels are the predictions of the neural network. The metric of accuracy for the competition was intersection over union (IoU), and this figure is an example of a high IoU score case. ! Figure 1. Team members: Sinan Deger ( @sinandeger ), Donald Lee Brown ( @dleebrown ), and Nesar Ramachandra ( @nesar ).",Medical Image Segmentation,Medical 2433,Medical,Medical,Other,"nbee Implementation of Deep Dynamic Networks for Retinal Vessel Segmentation A pytorch based framework for medical image processing with Convolutional Neural Network. Along with example of unet for DRIVE dataset segmentation 1 . DRIVE dataset is composed of 40 retinal fundus images. Required dependencies We need python3, numpy, pandas, pytorch, torchvision, matplotlib and PILLOW packages pip install r ature/assets/requirements.txt ! Flow (assets/flow_ature.png) Project Structure ature/nbee nbee framework core. ature/utils Utilities for dealing with F1 score, image cropping, slicing, visual precision recall, auto split train validation test set and many more. ature/viz Easy pytorch visualization. ature/testarch Full end to end working u net(Olaf Ronneberger et al.) and MINI UNET as per Deep Dynamic paper for more robust retinal image segmentation. ature/data DRIVE dataset.. Dataset check Original image and respective ground truth image. Ground truth is a binary image with each vessel pixel(white) 255 and background(black) 0. ! Sample DRIVE image (assets/merged_drive.png) U net architecture link ! Unet (assets/unet.png) Usage Example main.py python import testarch.unet as unet import testarch.unet.runs as r_unet import testarch.miniunet as mini_unet import testarch.miniunet.runs as r_miniunet import torchvision.transforms as tmf transforms tmf.Compose( tmf.ToPILImage(), tmf.ToTensor() ) if __name__ __main__ : unet.run( r_unet.DRIVE , transforms) mini_unet.run( r_miniunet.DRIVE , transforms) Where testarch.unet.runs file consist a predefined configuration DRIVE with all necessary parameters. python import os sep os.sep DRIVE { 'Params': { 'num_channels': 1, 'num_classes': 2, 'batch_size': 4, 'epochs': 250, 'learning_rate': 0.001, 'patch_shape': (388, 388), 'patch_offset': (150, 150), 'expand_patch_by': (184, 184), 'use_gpu': True, 'distribute': True, 'shuffle': True, 'log_frequency': 5, 'validation_frequency': 1, 'mode': 'train', 'parallel_trained': False, }, 'Dirs': { 'image': 'data' + sep + 'DRIVE' + sep + 'images', 'mask': 'data' + sep + 'DRIVE' + sep + 'mask', 'truth': 'data' + sep + 'DRIVE' + sep + 'manual', 'logs': 'logs' + sep + 'DRIVE' + sep + 'UNET', 'splits_json': 'data' + sep + 'DRIVE' + sep + 'splits' }, 'Funcs': { 'truth_getter': lambda file_name: file_name.split('_') 0 + '_manual1.gif', 'mask_getter': lambda file_name: file_name.split('_') 0 + '_mask.gif', 'dparm': lambda x: np.random.choice(np.arange(1, 101, 1), 2) } } Similarly, testarch.miniunet.runs file consist a predefined configuration DRIVE with all necessary parameters. NOTE: Make sure it picks up probability maps from the logs of previous run. python import os sep os.sep DRIVE { 'Params': { 'num_channels': 2, 'num_classes': 2, 'batch_size': 4, 'epochs': 100, 'learning_rate': 0.001, 'patch_shape': (100, 100), 'expand_patch_by': (40, 40) 'use_gpu': True, 'distribute': True, 'shuffle': True, 'log_frequency': 20, 'validation_frequency': 1, 'mode': 'train', 'parallel_trained': False }, 'Dirs': { 'image': 'data' + sep + 'DRIVE' + sep + 'images', 'image_unet': 'logs' + sep + 'DRIVE' + sep + 'UNET', 'mask': 'data' + sep + 'DRIVE' + sep + 'mask', 'truth': 'data' + sep + 'DRIVE' + sep + 'manual', 'logs': 'logs' + sep + 'DRIVE' + sep + 'MINI UNET', 'splits_json': 'data' + sep + 'DRIVE' + sep + 'splits' }, 'Funcs': { 'truth_getter': lambda file_name: file_name.split('_') 0 + '_manual1.gif', 'mask_getter': lambda file_name: file_name.split('_') 0 + '_mask.gif' } } num_channels : Input channels to the CNN. We are only feeding the green channel to unet. num_classes : Output classes from CNN. We have vessel, background. patch_shape, expand_patch_by : Unet takes 388 388 patch but also looks at 184 pixel on each dimension equally to make it 572 572. We mirror image if we run to image edges when expanding. So 572 572 goes in 388 388 2 comes out. patch_offset : Overlap between two input patches. We get more data doing this. distribute : Uses all gpu in parallel if set to True. WARN torch.cuda.set_device(1) Mustn't be done if set to True. shuffle : Shuffle train data after every epoch. log_frequency : Just print log after this number of batches with average scores. No rocket science :). validation_frequency : Do validation after this number of epochs. We also persist the best performing model. mode : train/test. parallel_trained : If a resumed model was parallel trained or not. logs : Dir for all logs splits_json : A directory that consist of json files with list of files with keys 'train', 'test' 'validation'. takes a folder with all images and does that automatically. This is handy when we want to do k fold cross validation. We jsut have to generate such k json files and put in splits_json folder. truth_getter, mask_getter : A custom function that maps input_image to its ground_truth and mask respectively. Sample log text workstation$ python main.py Total Params: 31042434 SPLIT FOUND: data/DRIVE/splits/UNET DRIVE.json Loaded Patches: 135 Patches: 9 Patches: 9 Patches: 9 Patches: 9 Patches: 9 Training... Epochs 1/40 Batch 5/34 loss:0.72354 pre:0.326 rec:0.866 f1:0.473 acc:0.833 Epochs 1/40 Batch 10/34 loss:0.34364 pre:0.584 rec:0.638 f1:0.610 acc:0.912 Epochs 1/40 Batch 15/34 loss:0.22827 pre:0.804 rec:0.565 f1:0.664 acc:0.939 Epochs 1/40 Batch 20/34 loss:0.19549 pre:0.818 rec:0.629 f1:0.711 acc:0.947 Epochs 1/40 Batch 25/34 loss:0.17726 pre:0.713 rec:0.741 f1:0.727 acc:0.954 Epochs 1/40 Batch 30/34 loss:0.16564 pre:0.868 rec:0.691 f1:0.770 acc:0.946 Running validation.. 21_training.tif PRF1A 0.66146, 0.37939, 0.4822, 0.93911 39_training.tif PRF1A 0.79561, 0.28355, 0.41809, 0.93219 37_training.tif PRF1A 0.78338, 0.47221, 0.58924, 0.94245 35_training.tif PRF1A 0.83836, 0.45788, 0.59228, 0.94534 38_training.tif PRF1A 0.64682, 0.26709, 0.37807, 0.92416 Score improved: 0.0 to 0.49741 BEST CHECKPOINT SAVED Epochs 2/40 Batch 5/34 loss:0.41760 pre:0.983 rec:0.243 f1:0.389 acc:0.916 Epochs 2/40 Batch 10/34 loss:0.27762 pre:0.999 rec:0.025 f1:0.049 acc:0.916 Epochs 2/40 Batch 15/34 loss:0.25742 pre:0.982 rec:0.049 f1:0.093 acc:0.886 Epochs 2/40 Batch 20/34 loss:0.23239 pre:0.774 rec:0.421 f1:0.545 acc:0.928 Epochs 2/40 Batch 25/34 loss:0.23667 pre:0.756 rec:0.506 f1:0.607 acc:0.930 Epochs 2/40 Batch 30/34 loss:0.19529 pre:0.936 rec:0.343 f1:0.502 acc:0.923 Running validation.. 21_training.tif PRF1A 0.95381, 0.45304, 0.6143, 0.95749 39_training.tif PRF1A 0.84353, 0.48988, 0.6198, 0.94837 37_training.tif PRF1A 0.8621, 0.60001, 0.70757, 0.95665 35_training.tif PRF1A 0.86854, 0.64861, 0.74263, 0.96102 38_training.tif PRF1A 0.93073, 0.28781, 0.43966, 0.93669 Score improved: 0.49741 to 0.63598 BEST CHECKPOINT SAVED ... Results The network is trained for 40 epochs with 15 training images, 5 validation images and 20 test images. ! Training_Loss (assets/loss.png) ! Training_Scores (assets/training_f1_acc.png) Figure above is the training cross entropy loss, F1, and accuracy. ! Precision Recall color Map (assets/train_pr_map.png) Figure above is the precision recall map for training and validation respectively with color being the training iterations. ! Validation_scores (assets/val_f1_acc.png) Figure above is the validation F1 and Accuracy. ! Test scores and result (assets/test.png) Figure on left is the test result on the test set after training and validation. Right one the is the segmentation result on one of the test images. Thank you! ❤ References 1. J. Staal, M. Abramoff, M. Niemeijer, M. Viergever, and B. van Ginneken, “Ridge based vessel segmentation in color images of the retina,” IEEE Transactions on Medical Imaging 23, 501–509 (2004) 2. O. Ronneberger, P. Fischer, and T. Brox, “U net: Convolutional networks for biomedical image segmentation,” inMICCAI,(2015) 3. Dynamic Deep Networks for Retinal Vessel Segmentation,",Medical Image Segmentation,Medical 2507,Medical,Medical,Other,"Tensorflow Bayesian U Net aka BUNet This is the source code for the MICCAI 2018 Paper, Exploring Uncertainty Measures in Deep Networks for Multiple Sclerosis Lesion Detection and Segmentation ( Nair et al. _), of which I am the first author. The network architecture is a heavily modified U Net ( Ronneberger et al. _), developed in Tensorflow. The network is augmented to provide the following 4 different uncertainty measures as an output. 1. Mutual Information ( Gal et al. _) 2. Entropy ( Gal et al. _) 3. MC Sample Variance ( Leibig et al. _) 4. Predicted Variance ( Kendall and Gal _) Details about the network architecture, and the equations for the uncertainty measures can be found in the paper here: The dataset used for this project comes from a large, proprietary, multi site, multi scanner, clinical MS dataset. As such, to use this code you will have to modify the dataprovider to be specific to your dataset. Training : 1. pip install r requirements.txt 2. python bunet_launcher.py o ./path_to_output/ c bunet/configs/train_bunet.json Author: Tanya Nair",Medical Image Segmentation,Medical 2556,Medical,Medical,Other,"TernausNetV2: Fully Convolutional Network for Instance Segmentation teaser We present network definition and weights for our second place solution in CVPR 2018 DeepGlobe Building Extraction Challenge _. .. contents:: Team members Vladimir Iglovikov _, Selim Seferbekov _, Alexandr Buslaev _, Alexey Shvets _ Citation If you find this work useful for your publications, please consider citing:: @InProceedings{Iglovikov_2018_CVPR_Workshops, author {Iglovikov, Vladimir and Seferbekov, Selim and Buslaev, Alexander and Shvets, Alexey}, title {TernausNetV2: Fully Convolutional Network for Instance Segmentation}, booktitle {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month {June}, year {2018} } Overview Automatic building detection in urban areas is an important task that creates new opportunities for large scale urban planning and population monitoring. In a CVPR 2018 Deepglobe Building Extraction Challenge _ participants were asked to create algorithms that would be able to perform binary instance segmentation of the building footprints from satellite imagery. Our team finished second and in this work we share the description of our approach, network weights _ and code that is sufficient for inference. Data The training data for the building detection subchallenge originate from the SpaceNet dataset _. The dataset uses satellite imagery with 30 cm resolution collected from DigitalGlobe’s WorldView 3 satellite. Each image has 650x650 pixels size and covers 195x195 m2 of the earth surface. Moreover, each region consists of high resolution RGB, panchromatic, and 8 channel low resolution multi spectral images. The satellite data comes from 4 different cities: Vegas, Paris, Shanghai, and Khartoum with different coverage, of (3831, 1148, 4582, 1012) images in the train and (1282, 381, 1528, 336) images in the test sets correspondingly. Method The originial TernausNet _ was extened in a few ways: 1. The encoder was replaced with WideResnet 38 that has In Place Activated BatchNorm _. 2. The input to the network was extended to work with 11 input channels. Three for RGB and eight for multispectral data. In order to make our network to perform instance segmentation, we utilized the idea that was proposed and successfully executed by Alexandr Buslaev _, Selim Seferbekov _ and Victor Durnov in their winning solutions of the Urban 3d _ and Data Science Bowl 2018 _ challenges. 3. Output of the network was modified to predict both the binary mask in which we predict building / non building classes on the pixel level and binary mask in which we predict areas of an image where different objects touch or very close to each other. These predicted masks are combined and used as an input to the watershed transform. network Results Result on the public and private leaderboard with respect to the metric that was used by the organizers of the CVPR 2018 DeepGlobe Building Extraction Challenge _. .. table:: Results per city City: Public Leaderboard Private Leaderboard Vegas 0.891 0.892 Paris 0.781 0.756 Shanghai 0.680 0.687 Khartoum 0.603 0.608 Average 0.739 0.736 Dependencies Python 3.6 PyTorch 0.4 numpy 1.14.0 opencv python 3.3.0.10 Demo Example Network weights _ You can easily start using our network and weights, following the demonstration example demo.ipynb _ .. _ demo.ipynb : .. _ Selim Seferbekov : .. _ Alexey Shvets : .. _ Vladimir Iglovikov : .. _ Alexandr Buslaev : .. _ CVPR 2018 DeepGlobe Building Extraction Challenge : .. _ TernausNet : .. _ U Net : .. _ Urban 3d : .. _ Data Science Bowl 2018 : .. _ WideResnet 38 that has In Place Activated BatchNorm : .. _ SpaceNet dataset : .. _ weights : .. network image:: .. teaser image::",Medical Image Segmentation,Medical 2557,Medical,Medical,Other,"MICCAI 2017 Endoscopic Vision Challenge Angiodysplasia Detection and Localization Here we present our wining solution and its further development for MICCAI 2017 Endoscopic Vision Challenge Angiodysplasia Detection and Localization _. It addresses binary segmentation problem, where every pixel in image is labeled as an angiodysplasia lesions or background. Then, we analyze connected component of each predicted mask. Based on the analysis we developed a classifier that predict angiodysplasia lesions (binary variable) and a detector for their localization (center of a component). .. contents:: Team members Alexey Shvets _, Vladimir Iglovikov _, Alexander Rakhlin _, Alexandr A. Kalinin _ Citation If you find this work useful for your publications, please consider citing:: @article{shvets2018angiodysplasia, title {Angiodysplasia Detection and Localization Using Deep Convolutional Neural Networks}, author {Shvets, Alexey and Iglovikov, Vladimir and Rakhlin, Alexander and Kalinin, Alexandr A.}, journal {arXiv preprint arXiv:arXiv:1804.08024}, year {2018} } Overview Angiodysplasias are degenerative lesions of previously healthy blood vessels, in which the bowel wall have microvascular abnormalities. These lesions are the most common source of small bowel bleeding in patients older than 50 years, and cause approximately 8% of all gastrointestinal bleeding episodes. Gold standard examination for angiodysplasia detection and localization in the small bowel is performed using Wireless Capsule Endoscopy (WCE). Last generation of this pill like device is able to acquire more than 60 000 images with a resolution of approximately 520 520 pixels. According to the latest state of the art, only 69% of angiodysplasias are detected by gastroenterologist experts during the reading of WCE videos, and blood indicator software (provided by WCE provider like Given Imaging), in the presence of angiodysplasias, presents sensitivity and specificity values of only 41% and 67%, respectively. .. figure:: Data The dataset consists of 1200 color images obtained with WCE. The images are in 24 bit PNG format, with 576 times 576 pixel resolution. The dataset is split into two equal parts, 600 images for training and 600 for evaluation. Each subset is composed of 300 images with apparent AD and 300 without any pathology. The training subset is annotated by human expert and contains 300 binary masks in JPEG format of the same 576 times 576 pixel resolution. White pixels in the masks correspond to lesion localization. .. figure:: :scale: 30 % First row corresponds to images without pathology, the second row to images with several AD lesions in every image, and the last row contains masks that correspond to the pathology images from the second row. .. figure:: :scale: 45 % Most images contain 1 lesion. Distribution of AD lesion areas reaches maximum of 12,000 pixels and has median 1,648 pixels. Method We evaluate 4 different deep architectures for segmentation: U Net _ (Ronneberger et al., 2015; Iglovikov et al., 2017a), 2 modifications of TernausNet _ (Iglovikov and Shvets, 2018), and AlbuNet34 _, a modifications of LinkNet _ (Chaurasia and Culurciello, 2017; Shvets et al., 2018). As an improvement over standard U Net _, we use similar networks with pre trained encoders. TernausNet _ (Iglovikov and Shvets, 2018) is a U Net like architecture that uses relatively simple pre trained VGG11 or VGG16 (Simonyan and Zisserman, 2014) networks as an encoder. VGG11 consists of seven convolutional layers, each followed by a ReLU activation function, and ve max polling operations, each reducing feature map by 2. All convolutional layers have 3 times 3 kernels. TernausNet16 has a similar structure and uses VGG16 network as an encoder .. figure:: :scale: 72 % .. figure:: :scale: 72 % Training We use Jaccard index (Intersection Over Union) as the evaluation metric. It can be interpreted as a similarity measure between a finite number of sets. For two sets A and B, it can be defined as following: .. raw:: html Since an image consists of pixels, the expression can be adapted for discrete objects in the following way: .. figure:: :align: center where y and y_hat are a binary value (label) and a predicted probability for the pixel i , respectively. Since image segmentation task can also be considered as a pixel classification problem, we additionally use common classification loss functions, denoted as H. For a binary segmentation problem H is a binary cross entropy, while for a multi class segmentation problem H is a categorical cross entropy. .. figure:: :align: center As an output of a model, we obtain an image, in which each pixel value corresponds to a probability of belonging to the area of interest or a class. The size of the output image matches the input image size. For binary segmentation, we use 0.3 as a threshold value (chosen using validation dataset) to binarize pixel probabilities. All pixel values below the speci ed threshold are set to 0, while all values above the threshold are set to 255 to produce final prediction mask. Following the segmentation step, we perform postprocessing in order to nd the coordinates of angiodysplasia lesions in the image. In the postprocessing step we use OpenCV implementation of connected component labeling function connectedComponentsWithStats . This function returns the number of connected components, their sizes (areas), and centroid coordinates of the corresponding connected component. In our detector we use another threshold to neglect all clusters with the size smaller than 300 pixels. Therefore, in order to establish the presence of the lesions, the number of found components should be higher than 0, otherwise the image corresponds to a normal condition. Then, for localization of angiodysplasia lesions we return centroid coordinates of all connected components. Results The quantitative comparison of our models' performance is presented in the Table 1. For the segmentation task the best results is achieved by AlbuNet34 _ providing IoU 0.754 and Dice 0.850. When compared by the inference time, AlbuNet34 _ is also the fastest model due to the light encoder. In the segmentation task this network takes around 20ms .. figure:: :scale: 60 % Prediction of our detector on the validation image. The left picture is original image, the central is ground truth mask, and the right is predicted mask. Green dots correspond to centroid coordinates that define localization of the angiodysplasia. .. table:: Table 1. Segmentation results per task. Intersection over Union, Dice coefficient and inference time, ms. Model IOU, % Dice, % Inference time, ms U Net 73.18 83.06 21 TernausNet 11 74.94 84.43 51 TernausNet 16 73.83 83.05 60 AlbuNet34 75.35 84.98 30 Pre trained weights for all model of all segmentation tasks can be found on google drive _ Dependencies Python 3.6 PyTorch 0.3.1 TorchVision 0.1.9 numpy 1.14.0 opencv python 3.3.0.10 tqdm 4.19.4 These dependencies can be installed by running:: pip install r requirements.txt How to run The dataset is organized in the folloing way:: :: ├── data │ ├── test │ └── train │ ├── angyodysplasia │ │ ├── images │ │ └── masks │ └── normal │ ├── images │ └── masks │ ....................... The training dataset contains 2 sets of images, one with angyodysplasia and second without it. For training we used only the images with angyodysplasia, which were split in 5 folds. 1. Training The main file that is used to train all models train.py . Running python train.py help will return set of all possible input parameters. To train all models we used the folloing bash script (batch size was chosen depending on how many samples fit into the GPU RAM, limit was adjusted accordingly to keep the same number of updates for every network):: !/bin/bash for i in 0 1 2 3 do python train.py device ids 0,1,2,3 limit 10000 batch size 12 fold $i workers 12 lr 0.0001 n epochs 10 jaccard weight 0.3 model UNet11 python train.py device ids 0,1,2,3 limit 10000 batch size 12 fold $i workers 12 lr 0.00001 n epochs 15 jaccard weight 0.3 model UNet11 done 2. Mask generation. The main file to generate masks is generate_masks.py . Running python generate_masks.py help will return set of all possible input parameters. Example:: python generate_masks.py output_path predictions/UNet16 model_type UNet16 model_path data/models/UNet16 fold 1 batch size 4 3. Evaluation. The evaluation is different for a binary and multi class segmentation: a In the case of binary segmentation it calculates jaccard (dice) per image / per video and then the predictions are avaraged. b In the case of multi class segmentation it calculates jaccard (dice) for every class independently then avaraged them for each image and then for every video:: python evaluate.py target_path predictions/UNet16 train_path data/train/angyodysplasia/masks 4. Further Improvements. Our results can be improved further by few percentages using simple rules such as additional augmentation of train images and train the model for longer time. In addition, the cyclic learning rate or cosine annealing could be also applied. To do it one can use our pre trained weights as initialization. To improve test prediction TTA technique could be used as well as averaging prediction from all folds. Demo Example You can start working with our models using the demonstration example: Demo.ipynb _ .. _ Demo.ipynb : Demo.ipynb .. _ Alexander Rakhlin : .. _ Alexey Shvets : .. _ Vladimir Iglovikov : .. _ Alexandr A. Kalinin : .. _ MICCAI 2017 Endoscopic Vision SubChallenge Angiodysplasia Detection and Localization : .. _ TernausNet : .. _ U Net : .. _ AlbuNet34 : .. _ LinkNet : .. _ google drive : .. br raw:: html .. plusmn raw:: html ± .. times raw:: html × .. micro raw:: html µm .. y image:: .. y_hat image:: .. i image::",Medical Image Segmentation,Medical 2614,Medical,Medical,Other,"J Net: Multiresolution Neural Network for Semantic Segmentation Multiresolution neural network for segmentic segmentation inspired by the U net 1 . Since it consists of the expansive path only, it resembles the letter J (hence the name). The network is composed of several segments (one for each resolution level) such that the first one operates on the lowest resolution and the final one on the original image resolution. Each segment is a convolutional neural network (CNN) followed either by a deconvolution layer 2 , which upsamples the output of the segment by factor two, or a final layer which outputs the segmentation. The input of the first segment is the image downsampled to the lowest resolution and the input of the other segments is the (upsampled) output of the previous segment concatenated with the image downsampled to the corresponding resolution level. ! Example of the J net architecture (images/jnet.png) The figure shows an example of a J net architecture. It consists of three segments, each being a CNN with 3×3 convolution filters and leaky ReLU activations. In order to maintain the spatial dimensions of the input throughout the segment the convolutions are preceded with a padding layer, which extends the tensor by reflecting the boundary pixels. The convolutions are furthermore followed by the batch normalization. The lowest resolution segment (128×128 px) has 16 convolution layers with 64 channels, the following segment (256×256 px) has 2 layers with 72 channels and the final full resolution segment (512×512 px) has 2 layers with 80 channels. The last layer has two output channels one for binary segmentation (sigmoid activations) and the other predicts for each pixel its (truncated) distance to the nearest cell boundary (ReLU activation). In total (including the deconvolution and final layers) the network has 23 layers. Dependencies The code is known to work with Python 3.5, 3.6 numpy 1.13, 1.14 PyTorch 0.2.0, 0.3.1 Other versions may work too (and probably will) but they were not tested. Example This example demonstrates the J net on segmentation of images from the DIC C2DH HeLa from the Cell Tracking Challenge 3 . The structure of the network was the same as described above. Since there are only 17 annotated images, data augmentation was used extensively when training the network (random flips and elastic transforms). The truncation threshold for the boundary distance was set to 6 pixels. The optimizer was Adam, the initial learning rate was 0.00003 (decreased my multiplicative factor 0.75 if the training loss did not decrease for 30 epochs) and the training was stopped after 1850 epochs. The batch size was 8 and to make the size of the training set was artificially increased by replicating each training image 16 times (note that due to random augmentation the network never see the same image multiple times). The loss was sum of the BCE loss of the segmentation output and the MSE loss of the boundary distance layer. The full command used to start the learning was as follows: python3 main.py cuda images_idx '{ 01 : 002 , 005 , 021 , 031 , 033 , 034 , 039 , 054 , 02 : 006 , 007 , 014 , 027 , 034 , 038 , 042 , 061 , 067 }' load_dataset_to_ram 1 num_workers 1 dataset_len_multiplier 16 batch_size 8 resolution_levels 2, 1,0 aug_rotation_flip aug_elastic_params (50,5,5),( 1, 1,1) structure 16,64,3 , 2,8,3 , 2,8,3 dt_bound ${DT_BOUND} validation_percentage 0.17 learning_rate 0.00003 mode train dataset_root /path/to/DIC C2DH HeLa_training output_dir results/DIC C2DH HeLa6 This command needs about 14 GB of GPU memory and one learning epoch takes about 30 seconds. Most of the time is spent on elastic transforms since they have to be done on the original resolution. Segmentation results for all images in the challenge sequences can be obtained by python3 main.py cuda resolution_levels ' 2, 1,0 ' dt_bound 6 images_idx '{ 01 : , 02 : }' mode vis dataset_root /path/to/DIC C2DH HeLa_test model_file results/DIC C2DH HeLa6/model_best_train_train output_dir results/DIC C2DH HeLa6 One image takes about 0.12 seconds. Results Segmentation results on a training image from the DIC C2DH HeLa dataset (image 038 from sequence 02, same as in the U net paper). From left to right: original image, segmentation, truncated distance to the cell boundary. The upper row are the images generated by the network and the lower row is the ground truth: ! Segmentation results on a training image (image 038 from sequence 02, same as in the U net paper). From left to right: original image, segmentation, truncated distance to the cell boundary. The upper row are the images generated by the network and the lower row is the ground truth. (images/train.png) Segmentation results on an image from a challenge sequence (image 024 from sequence 01, no image from the challenge sequence was used during the training). From left to right: original image, segmentation, truncated distance to the cell boundary: ! Segmentation results on an image from a challenge sequence (image 024 from sequence 01). From left to right: original image, segmentation, truncated distance to the cell boundary. (images/test.png) Toy Example The previous example needs a GPU and substantial amount of memory. If you do not have proper HW, it is more convenient to experiment with smaller network, less images, etc. Directory toyexample contains scripts for training, evaluation and visualization and furthermore a subset of the DIC C2DH HeLa dataset. The network has two segments with three and two layers respectively, 16 and 24 channels, batch size is decreased to 2 and the training finishes after 100 epochs. All scripts run on CPU and do not need more than 1.5 GB RAM. Do not expect good results though, DIC C2DH HeLa is quite difficult dataset. Input arguments Required arguments are typed in bold face. Main arguments batch_size\ Size of the batch. Default value is 1. dataset_len_multiplier\ Determines how many times is every training image replicated. Image replication allows to use bigger batch size and makes the training more stable. Default value is 1 (no replication) dt_bound\ Truncation threshold for the boundary distance output. Default value is 9. dataset_root \ Directory with the Cell Tracking Challenge dataset (it contains subdirectories 01, 02 and optionally 01_GT and 02_GT). images_idx \ Dictionary of lists of three digit ids of the input images from the Cell Tracking Challenge. The keys of the dictionary are the sequence numbers (including the leading zero). For example '{ 01 : 002 , 005 , 021 '} means images 002, 005 and 021 from sequence 01 and '{ 01 : 002 , 005 , 021 , 02 : 006 , 007 }' means images 002, 005 and 021 from sequence 01 and images 006 and 007 from sequence 02. The images ids must be specified explicitely in train and eval mode. In vis mode the list of ids might be empty, e.g. '{ 01 : }', in which case the script takes all the images from the sequence(s). learning_rate\ Learning rate. Default value is 0.0001. load_dataset_to_ram\ If 1, the script preloads training images to RAM instead of loading them from HDD on demand. This saves some time when the training set is small and the images are replicated. Default value is 0. mode\ Mode of the script. Possible values are train , eval and vis . The train mode is used for parameter learning, the eval mode is used for calculating the loss and accuracies for input images and the vis mode is used for visualization (similar to eval mode, but does not need ground truth for the input images). Default value is train . model_file\ File name of the model loaded in the eval or vis mode. Default value is ''. num_epochs\ Number of training epochs. Default value is 5000. output_dir \ Output directory, where the script saves the learned models (train mode) and segmentation results (vis, eval). If the directory does not exist, the script tries to create it. resolution_levels \ List of resolution levels. Every level is a negative integer (or 0), such that the corresponding input is downscaled by factor 2 lvl . For example 0 is the original resolution, for 1 the image is downscaled by factor 2, 2 is downscaled by factor 4 etc. The resolution levels must be consecutive numbers in ascending order. Last level can be smaller than zero (this is useful mainly for debugging). For example 3, 2, 1 means, that the input to the initial segment is downscaled by factor 8 and the output of the network is half the size of the input images. save_model_frequency\ Save model every save_model_frequency th epoch. 1 means never. Default value is 200. structure\ Structure of the network in the format numof_layers1,numof_channels1,rf_size1 , numof_layers2,additional_numof_channels2,rf_size2 , numof_layers3,additional_numof_channels3,rf_size3 ,... , where numof_layers is the number of convolution layers, numof_channels is the number of channels and rf_size is the size of the convolution filters in pixels. Number of channels is additive, i.e., number of channels in the second segment is given by numof_channels1+additional_numof_channels2. This argument is required in the train mode. validation_percentage\ Percentage (number between 0 and 1) of the available training images used for validation during training. Default value is 0 (no validation is done). Data augmentation aug_elastic_params\ List of parameters for elastic transforms: alpha1, sigma1, weight1 , alpha2, sigma2, weight2 ,... , where alpha and sigma are parameters of the elastic transform and weight is the unnormalized probability of this combination. Parameter alpha is related to the scale of the transform and sigma to its smoothness (see for more details). For example 50, 5, 7 , 4, 1, 2 means, that the image will be with probability 7/9 distorted with parameters alpha 50, sigma 5, and with probability 2/9 with parameters alpha 4, sigma 1. If either alpha or sigma is smaller than 0, this parameter combination means no distortion is done. This can be used to include the original non distorted images into the training set. aug_intensity_params\ List of parameters for intensity transform of the foreground pixels: shift_lbound, shift_ubound, mult_lbound, mult_ubound . The intensity _i_ of every foreground pixel is changed to _i_+(_m_ _i_)+_s_ foreground_mean, where _m_ is a random number from interval mult_lbound, mult_ubound and _s_ is a random number from interval shift_lbound, shift_ubound . If mult_lbound>mult_ubound or shift_lbound>shift_ubound, the corresponding part of the intensity transform is not used. aug_rotation\ Determines, whether to use random rotations. The images are rolled by random number of pixels along both axes, rotated and cropped to the original size. Undefined pixels are interpolated as if the input image was padded by reflecting the boundary pixels. The advantage of this procedure is that pixels in the image corners are equally likely to be in the rotated image as the pixels that were originally close to its center. On the down side the rolling introduces artificial edges. aug_rotation_flip\ Determines, whether to use random flips and 90° rotations. GPU related cuda\ Use GPU if available. num_workers\ Number of workers used for loading and augmenting the training images. If 1, data augmentation is done on GPU (if available). Default value is 0 (PyTorch default). Other batchnorm_momentum\ Momentum parameter for BatchNorm2d layers. Default value is 0.1 (PyTorch default). non_decreasing_output_file\ Output file for debugging early stopping experiments. Default value is (no debugging). References 1 Ronneberger, O., Fischer, P., Brox, T.: U net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer Assisted Intervention – MICCAI 2015. pp. 234–241. Springer International Publishing, Cham (2015). Available at arXiv:1505.04597 . 2 Long J., Shelhamer E., Trevor Darrell T.: Fully convolutional networks for semantic segmentation. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 3431 3440. Available at arXiv:1411.4038 . 3",Medical Image Segmentation,Medical 2622,Medical,Medical,Other,"Project Overview Dataset used: MRBrainS18 148 is used as validation About dataset _Classes_ label : : : : Cortical gray matter 1 Basal ganglia 2 White matter 3 White matter lesions 4 Cerebrospinal fluid in the extracerebral space 5 Ventricles 6 Cerebellum 7 Brain stem 8 Preprocessing steps   Registered and Bias Field Correction was already done in the dataset Skull stripping was done only for T1 weighted MRI using DeepBrain library which creates a mask for skull removal. Furthermore, contrast of T1 weighted MRI was improved using Histogram Equalization technique : : : : : : Regularized Biased Field Corrected MRI Removed Skull Histogram equalization Approach Cortical gray matter, White matter, Cerebrospinal fluid in the extracerebral space Cortical gray matter, White matter, Cerebrospinal fluid in the extracerebral space can be easily reduced by appling thresholding to T1 weighted MRI further a small U Net was used to denoise the threshold image. Total parameters: 60,553   Rest For rest of the classes training was done on a custom model inspired by Unet Architecture. The model has 3 encoders stacked together in bottleneck layer and then a single decoder. There are skip connections from encoder to decoder to enhance segmentation. Total parameters: 151,717 Key difference between U Net and Architecture used U Net Architecture used : : : : Only one encoder and one decoder Three encoder and one decoder Deep architecture with about 10 Million parameters Shallow with about 600 Thousands parameters Doesn’t have dilated convolution layers Has dilated convolution layers Loss Function Dice coefficient is used as Loss function in final training though Jaccard distance and crossentropy were also tried. Learning Curves Results (On Validation Data) Label 1 2 3 4 5 6 7 8 : : : : : : : : : : : : : : : : : : Dice coefficient 0.702 0.758 0.770 0.746 0.704 0.882 0.887 0.855 References Generalised Dice overlap as a deep learning loss function for highly unbalanced segmentations ( ) U Net: Convolutional Networks for Biomedical Image Segmentation ( ) MRBrainS18 ( ) > This project was made as part of the Smart India hackathon 2018 Software Edition , a 36 hour hackathon organised by Government of India. The problem statement was given by Department of Atomic Energy, India",Medical Image Segmentation,Medical 2628,Medical,Medical,Other,盐块识别挑战 注意:请不要直接使用网上公开的代码 Kaggle: TGS Salt Identification Challenge 描述 通过图像识别预测地表下是否为盐块。 数据 此数据集可以从 Kaggle 平台上下载: 数据集下载 建议模型 这是一个语义分割问题,如果你不知道如何去构建你的模型,可以尝试以下的模型: UNET: PSPNET: DEEPLAB: 通关最低要求 本项目的最低要求是 Kaggle Private Leaderboard 前 10%。 参考链接: TGS Private Leaderboard 评估 你的项目会由优达学城项目评审师依照 机器学习毕业项目要求 来评审。请确定你已完整的读过了这个要求,并在提交前对照检查过了你的项目。提交项目必须满足所有要求中每一项才能算作项目通过。 提交 PDF 报告文件 代码( jupyter notebook 与其导出的 html 文件) 包含使用的库,机器硬件,机器操作系统,训练时间等数据的 README 文档(建议使用 Markdown ),Medical Image Segmentation,Medical 2629,Medical,Medical,Other,"pytorch unet segmentation Members : PyeongEun Kim , JuHyung Lee , MiJeong Lee Supervisors : Utku Ozbulak , Wesley De Neve Description This project aims to implement biomedical image segmentation with the use of U Net model. The below image briefly explains the output we want: The dataset we used is Transmission Electron Microscopy (ssTEM) data set of the Drosophila first instar larva ventral nerve cord (VNC), which is dowloaded from ISBI Challenge: Segmentation of of neural structures in EM stacks The dataset contains 30 images (.png) of size 512x512 for each train, train labels and test. Table of Content Dataset ( dataset) Preprocessing ( preprocessing) Model ( model) Loss function ( lossfunction) Post processing ( postprocessing) Results ( results) Dependency ( dependency) References ( references) Dataset ruby class SEMDataTrain(Dataset): def __init__(self, image_path, mask_path, in_size 572, out_size 388): Args: image_path (str): the path where the image is located mask_path (str): the path where the mask is located option (str): decide which dataset to import All file names Lists of image path and list of labels Calculate len Calculate mean and stdev def __getitem__(self, index): Get specific data corresponding to the index Args: index (int): index of the data Returns: Tensor: specific data on index which is converted to Tensor GET IMAGE Augmentation on image Flip Gaussian_noise Uniform_noise Brightness Elastic distort {0: distort, 1:no distort} Crop the image Pad the image Sanity Check for Cropped image Normalize the image Add additional dimension Convert numpy array to tensor Augmentation on mask Flip same way with image Elastic distort same way with image Crop the same part that was cropped on image Sanity Check Normalize the mask to 0 and 1 Add additional dimension Convert numpy array to tensor return (img_as_tensor, msk_as_tensor) def __len__(self): Returns: length (int): length of the data Preprocessing We preprocessed the images for data augmentation. Following preprocessing are : Flip Gaussian noise Uniform noise Brightness Elastic deformation Crop Pad Image Augmentation Original Image Image Flip Vertical Horizontal Both Gaussian noise Standard Deviation: 10 Standard Deviation: 50 Standard Deviation: 100 Uniform noise Intensity: 10 Intensity: 50 Intensity: 100 Brightness Intensity: 10 Intensity: 20 Intensity: 30 Elastic deformation Random Deformation: 1 Random Deformation: 2 Random Deformation: 3 Crop and Pad Crop Left Bottom Left Top Right Bottom Right Top Padding process is compulsory after the cropping process as the image has to fit the input size of the U Net model. In terms of the padding method, symmetric padding was done in which the pad is the reflection of the vector mirrored along the edge of the array. We selected the symmetric padding over several other padding options because it reduces the loss the most. To help with observation, a ! ffff00 'yellow border' is added around the original image: outside the border indicates symmetric padding whereas inside indicates the original image. Pad Left Bottom Left Top Right bottom Right Top Model Architecture We have same structure as U Net Model architecture but we made a small modification to make the model smaller. ! image Loss function We used a loss function where pixel wise softmax is combined with cross entropy. Softmax ! image .png) Cross entropy ! image .png) Post processing In attempt of reducing the loss, we did a post processing on the prediction results. We applied the concept of watershed segmentation in order to point out the certain foreground regions and remove regions in the prediction image which seem to be noises. ! postprocessing The numbered images in the figure above indicates the stpes we took in the post processing. To name those steps in slightly more detail: 1. Convertion into grayscale 2. Conversion into binary image 3. Morphological transformation: Closing 4. Determination of the certain background 5. Calculation of the distance 6. Determination of the certain foreground 7. Determination of the unknown region 8. Application of watershed 9. Determination of the final result Conversion into grayscale The first step is there just in case the input image has more than 1 color channel (e.g. RGB image has 3 channels) Conversion into binary image Convert the gray scale image into binary image by processing the image with a threshold value: pixels equal to or lower than 127 will be pushed down to 0 and greater will be pushed up to 255. Such process is compulsory as later transformation processes takes in binary images. Morphological transformation: Closing. We used morphologyEX() function in cv2 module which removes black noises (background) within white regions (foreground). Determination of the certain background We used dilate() function in cv2 module which emphasizes/increases the white region (foreground). By doing so, we connect detached white regions together for example, connecting detached cell membranes together to make ensure the background region. Caculation of the distance This step labels the foreground with a color code: ! ff0000 red color indicates farthest from the background while ! 003bff blue color indicates closest to the background. Determination of the foreground Now that we have an idea of how far the foreground is from the background, we apply a threshold value to decide which part could surely be the foreground. The threshold value is the maximum distance (calculated from the previous step) multiplied by a hyper parameter that we have to manually tune. The greater the hyper parameter value, the greater the threshold value, and therefore we will get less area of certain foreground. Determination of the unknown region From previous steps, we determined sure foreground and background regions. The rest will be classified as 'unknown' regions. Label the foreground: markers We applied connectedComponents() function from the cv2 module on the foreground to label the foreground regions with color to distinguish different foreground objects. We named it as a 'marker'. Application of watershed and Determination of the final result After applying watershed() function from cv2 module on the marker, we obtained an array of 1, 1, and many others. 1 Border region that distinguishes foreground and background 1 Background region To see the result, we created a clean white page of the same size with the input image. then we copied all the values from the watershed result to the white page except 1, which means that we excluded the background. Results Optimizer Learning Rate Lowest Loss Epoch Highest Accuracy Epoch SGD 0.001 0.196972 1445 0.921032 1855 0.005 0.205802 1815 0.918425 1795 0.01 0.193328 450 0.922908 450 RMS_prop 0.0001 0.203431 185 0.924543 230 0.0002 0.193456 270 0.926245 500 0.001 0.268246 1655 0.882229 1915 Adam 0.0001 0.194180 140 0.924470 300 0.0005 0.185212 135 0.925519 135 0.001 0.222277 165 0.912364 180 We chose the best learning rate that fits the optimizer based on how fast the model converges to the lowest error . In other word, the learning rate should make model to reach optimal solution in shortest epoch repeated. However, the intersting fact was that the epochs of lowest loss and highest accuracy were not corresponding. This might be due to the nature of loss function (Loss function is log scale, thus an extreme deviation might occur). For example, if the softmax probability of one pixel is 0.001, then the log(0.001) would be 1000 which is a huge value that contributes to loss. For consistency, we chose to focus on accuracy as our criterion of correctness of model. Accuracy and Loss Graph SGD (lr 0.01,momentum 0.99) RMS prop (lr 0.0002) Adam (lr 0.0005) We used two different optimizers (SGD, RMS PROP, and Adam). In case of SGD the momentum is manually set (0.99) whereas in case of other optimizers (RMS Prop and Adam) it is calculated automatically. Model Downloads Model trained with SGD can be downloaded via dropbox : Model trained with RMS prop can be downloaded via dropbox : Model trained with Adam can be downloaded via dropbox : Example Input Image Results comparsion original image mask RMS prop optimizer (Accuracy 92.48 %) SGD optimizer (Accuracy 91.52 %) Adam optimizer (Accuracy 92.55 %) Dependency Following modules are used in the project: python > 3.6 numpy > 1.14.5 torch > 0.4.0 PIL > 5.2.0 scipy > 1.1.0 matplotlib > 2.2.2 References : 1 O. Ronneberger, P. Fischer, and T. Brox. U Net: Convolutional Networks for Biomedical Image Segmentation, 2 P.Y. Simard, D. Steinkraus, J.C. Platt. Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis,",Medical Image Segmentation,Medical 2683,Medical,Medical,Other,Unet U net model from implemented with additional features: residual blocks dilated convolutions group normalization Requires PyTorch 0.4,Medical Image Segmentation,Medical 2725,Medical,Medical,Other,"Keras with mat for U net Implementation of the Keras U Net for .mat file of MATLAB. Image pattern extraction using U Net Requirements (Windows) Python 3.5.2 ver. Details are shown below. Keras 2.0.4 tensorflow gpu 1.6.0 scipy 1.0.0 numpy 1.14.2 matplolib 2.1.1 Augmentation image augmentation with rotation 90 degree, up down filp, left right flip randomly. ! augmentation ( ./images/augmentation.png) Model U Net based model for pattern extraction ! model ( ./images/model.png) U Net > U Net: Convolutional Networks for Biomedical Image Segmentation Reference code > Results Deep Learning results A few slices of Input image, Label image, Result image comparison ! result ( ./images/result.png) (a) Input image (with field inhomogeneity artifact in MRI, out of phase angle image) (b) Label image (with field inhomogeneity artifact removal SUPER method in MRI) > SUPER method (c) Deep Learning result image Water Fat seperation result ! wf_result (./images/wf_result.png) (a) Before artifact removal (b) After artifact removal with SUPER method (c) After artifact removal with trained SUPER method Deep Learning Feature visualization ! feature (./images/feature.png) (a) Conv2D layer1 (b) Conv2D layer2 (c) Conv2D layer3 (d) Conv2D layer4 (e) Conv2D layer5 (f) Deconvolution layer1 (g) Deconvolution layer2 (h) Deconvolution layer3 (i) Deconvolution layer4",Medical Image Segmentation,Medical 2797,Medical,Medical,Other,"Data Science Bowl 2018 This project is a tool for detecting cells in medical images. It is based on the following Kaggle challenge . Project structure TODO: Add structure Install the project For this, run the following command: conda install file environement.txt Then, run: python setup.py develop Workflow 1. Get and process the data 2. Train the model 3. Postprocess the predictions Get the data Once you have installed the project dependencies, you will have access to the Kaggle official CLI. Run kaggle help to confirm this. To get the data, run: kaggle competitions download c data science bowl 2018 You will need to accept the competition conditions and create an API key first. Running TensorBoard To run TensorBoard (a great visualization tool), tensorboard logdir /path/to/tb_logs Tips It is better to use skimage instead of numpy for reading and processing images. It is even better to use Keras built in image processing capabilities. Log the various image sizes when debugging your data processing pipeline. Use only one image when debugging your data pipeline (so that to avoid loading all the data multiple times). Sources and useful links U net Kaggle kernel: U net paper: Upsampling basics:",Medical Image Segmentation,Medical 2842,Medical,Medical,Other,"Gated Graph Sequence Neural Networks This is the code for our ICLR'16 paper: Yujia Li, Daniel Tarlow, Marc Brockschmidt, Richard Zemel. Gated Graph Sequence Neural Networks . International Conference on Learning Representations, 2016. Please cite the above paper if you use our code. The code is released under the MIT license (LICENSE). Testing Run th test.lua to test all the modules in the ggnn and rnn libraries. Reproducing the bAbI tasks and graph algorithms experiment results To run the bAbI experiments, and experiments on the two extra sequence tasks: 1. Go into babi/data , run bash get_10_fold_data.sh to get 10 folds of bAbI data for 5 tasks (4, 15, 16, 18, 19) and do some preprocessing. 2. Go into babi/data/extra_seq_tasks , run bash generate_10_fold_data.sh to get 10 folds of data for the two extra sequence tasks. 3. Go back to babi/ and use run_experiments.py to run the GGNN/GGS NN experiments, e.g. python run_experiments.py babi18 runs GGNN on bAbI task 18 for all 10 folds of data. 4. Use run_rnn_baselines.py to run RNN/LSTM baseline experiments, e.g. python run_rnn_baselines.py babi18 lstm runs LSTM on bAbI task 18 for all 10 folds of data. Notes Make sure ./?.lua and ./?/init.lua are on your lua path. For example by export LUA_PATH ./?.lua;./?/init.lua;$LUA_PATH . The experiment results may differ slightly from what we reported in the paper, as the datasets are randomly generated and will be different from run to run.",Drug Discovery,Medical 2844,Medical,Medical,Other,"kaggle tgs salt competition This is a simple implementation of U Net for 2 class object segumentation task in python via keras (TF backends). Kaggle competition site : The original paper of U Net is here: U Net: Convolutional Networks for Biomedical Image Segmentation (Olaf Ronneberger, Philipp Fischer, Thomas Brox, May 2015)",Medical Image Segmentation,Medical 2886,Medical,Medical,Other,CNN segmentation for Lung cancer OARs a deep convolutional neural network (CNN) based automatic segmentation technique was applied to the multiple organs at risk (OARs) in CT images of lung cancer The Uploads of the our article Preliminary comparison of the automatic segmentation of multiple organs at risk in CT images of lung cancer between deep convolutional neural network based and atlas based techniques . The uploads include the neural network architecture and the detail architecture diagrams. This is a generic 3D volume U Net convolutional network implementation as proposed by Ronneberger et al. The loss function is dice similarity coefficient (DSC) with variable weight. Requirements Python 3.5+ tensorflow gpu 1.3+ keras 2.1+ numpy 1.12.0+,Medical Image Segmentation,Medical 1996,Natural Language Processing,Natural Language Processing,Natural Language Processing,sentiment analysis pytorch Resources:,Sentiment Analysis,Sentiment Analysis 2009,Natural Language Processing,Natural Language Processing,Natural Language Processing,"WOMBAT See our COLING 2018 demo paper for additional details. Please cite the paper if you use WOMBAT. Note: Due to a name clash with another python package, the actual WOMBAT package structure is slightly different than that used in the COLING paper examples! The examples used in this web site are up to date. > ,. .._ . .' . ; ; ; ' ; ) / ' . ; / ; . ; ,.' : . : ) ; \' : ./ ) \ ;/ ; \ , ./ ; ).; /\/ \/ ); : \ ; : _ _ ; ) . \;\ /;/ ; / ! : : ,/ ; ( . : _ : ,/ ; \\\ ^ : ; This is WOMBAT, the WOrd eMBedding dATa base API (Version 2.1) ( ) //// Wombat artwork by akg Introduction WOMBAT , the WO rd e MB edding d AT abase, is a light weight Python tool for more transparent, efficient, and robust access to potentially large numbers of word embedding collections (WECs). It supports NLP researchers and practitioners in developing compact, efficient, and reusable code. Key features of WOMBAT are transparent identification of WECs by means of a clean syntax and human readable features, efficient lazy, on demand retrieval of word vectors, and increased robustness by systematic integration of executable preprocessing code. WOMBAT implements some Best Practices for research reproducibility and complements existing approaches towards WEC standardization and sharing. WOMBAT provides a single point of access to existing WECs. Each plain text WEC file has to be imported into WOMBAT once, receiving in the process a set of ATT:VAL identifiers consisting of five system attributes (algo, dims, dataset, unit, fold) plus arbitrarily many user defined ones. Introduction ( introduction) > Installation ( installation) Importing Pre Trained Embeddings to WOMBAT: GloVe ( importing pre trained embeddings to wombat glove) Integrating automatic preprocessing ( integrating automatic preprocessing) Simple preprocessing ( simple preprocessing no mwes) Advanced preprocessing with MWE) ( advanced preprocessing with mwes) Use Cases ( use cases) Pairwise Distance ( pairwise distance) Installation WOMBAT does not have a lot of special requirements. The basic functionality only requires sqlite3, numpy, and tqdm, the analyse module requires psutil, matplotlib, and scikit learn in addition. Note that sqlite3 is commonly available as a default package, e.g. with conda. In addition, the standard_preprocessor (see below) requires NLTK 3.2.5. A working environment can be set up like this: shell $ conda create name wombat python 3.6 numpy tqdm psutil matplotlib scikit learn nltk 3.2.5 $ source activate wombat $ git clone $ cd WOMBAT $ pip install . Note: Depending on your environment, you might have to install NLTK 3.2.5 with shell conda install c conda forge nltk 3.2.5 Importing Pre Trained Embeddings to WOMBAT: GloVe One of the main uses of WOMBAT is as a wrapper for accessing existing, off the shelf word embeddings like e.g. GloVe. (The other use involves access to self trained embeddings, including preprocessing and handling of multi word expressions, cf. below ( integrating automatic preprocessing)) The following code is sufficient to import a sub set of the GloVe embeddings. python from wombat_api.core import connector as wb_conn wbpath data/wombat data/ importpath data/embeddings/glove.6B/ wbc wb_conn(path wbpath, create_if_missing True) for d in '50', '100', '200', '300' : for n in 'none', 'abtt' : wbc.import_from_file(importpath+ glove.6B. +d+ d.txt , algo:glove;dataset:6b;dims: +d+ ;fold:1;unit:token;norm: +n, normalize n, prepro_picklefile ) Using norm:abtt ( All but the top ) creates a normalized version as described in this paper. Parameter D is set to D max(int(dim/100), 1) . To execute this code, run shell $ python tools/import_to_wombat.py from the WOMBAT directory. The required GloVe embeddings are not part of WOMBAT and can be obtained from Stanford here . Extract them into data/embeddings . The WOMBAT master and embeddings data bases will be created in data/wombat data . The above import assigns the following minimally required system ATT:VAL pairs to the embeddings. Attribute Meaning algo Descriptive label for the algorithm used for training these embeddding vectors. dataset Descriptive label for the data set used for training these embedding vectors. dims Dimensionality of these embedding vectors. Required for description and for creating right sized empty vectors for OOV words. fold Indicates whether the embedding vectors are case sensitive (fold 0) or not (fold 1). If fold 1, input words are lowercased before lookup. unit Unit of representation used in the embedding vectors. Works as a descriptive label with pre trained embeddings for which no custom preprocessing has been integrated into WOMBAT. If custom preprocessing exists, the value of this attribute is passed to the process() method. The current preprocessor modules (cf. below) support the values stem and token . In addition, the following user defined ATT:VAL pair is assigned. Attribute Meaning norm Descriptive label for the normalization applied at input time. none or one of l1 , l2 , or abtt . After import, the embedding vectors are immediately available for efficient lookup of already preprocessed words. The following code accesses two of the eight GloVe WECs and looks up <unit, vector> tuples for two sequences of words. For performance reasons, input order is ignored. python from wombat_api.core import connector as wb_conn wbpath data/wombat data/ wbc wb_conn(path wbpath, create_if_missing False) wec_ids algo:glove;dataset:6b;dims:50;fold:1;unit:token;norm:{none,abtt} vecs wbc.get_vectors(wec_ids, {}, for_input 'this','is','a', 'test' , 'yet', 'another', 'test' , in_order False) One wec_result for each wec specified in wec_identifier. norm:{none,abtt} notation is expanded at execution time. for wec_index in range(len(vecs)): Index 0 element is the wec_id print( \nWEC: %s %vecs wec_index 0 ) Index 1 element is the list of all results for this wec Result list contains tuples of ( raw , prepro , (w,v) tuples ) for (raw, prepro, tuples) in vecs wec_index 1 : print( Raw: '%s' %str(raw)) print( Prepro: %s %str(prepro)) for (w,v) in tuples: print( Unit: %s\nVector: %s\n %(w,str(v))) To execute this code, run shell $ python tools/test_get_vectors.py from the WOMBAT directory. The result is a nested python list with one result set for each supplied WEC identifier. WEC: algo:glove;dataset:6b;dims:50;fold:1;norm:none;unit:token Raw: '' Prepro: 'this', 'is', 'a', 'test' Unit: a Vector: 0.21705 0.46515 0.46757001 0.10082 1.01349998 0.74844998 0.53104001 0.26256001 0.16812 0.13181999 0.24909 0.44185001 0.21739 0.51003999 0.13448 0.43141001 0.03123 0.20674001 0.78138 0.20148 0.097401 0.16088 0.61835998 0.18504 0.12461 2.25259995 0.22321001 0.5043 0.32257 0.15312999 3.96359992 0.71364999 0.67012 0.28388 0.21738 0.14432999 0.25926 0.23434 0.42739999 0.44451001 0.13812999 0.36973 0.64288998 0.024142 0.039315 0.26036999 0.12017 0.043782 0.41012999 0.1796 Unit: is Vector: 6.18499994e 01 6.42539978e 01 4.65519994e 01 3.75699997e 01 7.48380005e 01 5.37389994e 01 2.22390005e 03 6.05769992e 01 2.64079988e 01 1.17030002e 01 4.37220007e 01 2.00920001e 01 5.78589998e 02 3.45889986e 01 2.16639996e 01 5.85730016e 01 5.39189994e 01 6.94899976e 01 1.56179994e 01 5.58300018e 02 6.05149984e 01 2.89970011e 01 2.55939998e 02 5.55930018e 01 2.53560007e 01 1.96120000e+00 5.13809979e 01 6.90959990e 01 6.62460029e 02 5.42239994e 02 3.78710008e+00 7.74030030e 01 1.26890004e 01 5.14649987e 01 6.67050034e 02 3.29329997e 01 1.34829998e 01 1.90490007e 01 1.38119996e 01 2.15030000e 01 1.65730007e 02 3.12000006e 01 3.31889987e 01 2.60010008e 02 3.82030010e 01 1.94030002e 01 1.24660000e 01 2.75570005e 01 3.08990002e 01 4.84970003e 01 Unit: test Vector: 0.13175 0.25516999 0.067915 0.26192999 0.26155001 0.23569 0.13077 0.011801 1.76590002 0.20781 0.26198 0.16428 0.84641999 0.020094 0.070176 0.39778 0.15278 0.20213 1.61839998 0.54326999 0.17856 0.53894001 0.49868 0.10171 0.66264999 1.70510006 0.057193 0.32405001 0.66834998 0.26653999 2.84200001 0.26844001 0.59536999 0.50040001 1.51989996 0.039641 1.66589999 0.99757999 0.55970001 0.70493001 0.0309 0.28301999 0.13564 0.64289999 0.41490999 1.23619998 0.76586998 0.97798002 0.58507001 0.30175999 Unit: this Vector: 5.30740023e 01 4.01169986e 01 4.07849997e 01 1.54440001e 01 4.77820009e 01 2.07540005e 01 2.69510001e 01 3.40229988e 01 1.08790003e 01 1.05630003e 01 1.02890000e 01 1.08489998e 01 4.96809989e 01 2.51280010e 01 8.40250015e 01 3.89490008e 01 3.22840005e 01 2.27970004e 01 4.43419993e 01 3.16489995e 01 1.24059997e 01 2.81699985e 01 1.94670007e 01 5.55129983e 02 5.67049980e 01 1.74189997e+00 9.11450028e 01 2.70359993e 01 4.19270009e 01 2.02789996e 02 4.04050016e+00 2.49430001e 01 2.04160005e 01 6.27619982e 01 5.47830015e 02 2.68830001e 01 1.84440002e 01 1.82040006e 01 2.35359997e 01 1.61550000e 01 2.76549995e 01 3.55059989e 02 3.82110000e 01 7.51340005e 04 2.48219997e 01 2.81639993e 01 1.28189996e 01 2.87620008e 01 1.44400001e 01 2.36110002e 01 Raw: '' Prepro: 'yet', 'another', 'test' Unit: another Vector: 0.50759 0.26321 0.19638 0.18407001 0.90792 0.45267001 0.54491001 0.41815999 0.039569 0.061854 0.24574 0.38501999 0.39649001 0.32165 0.59610999 0.39969999 0.015734 0.074218 0.83148003 0.019284 0.21331 0.12873 0.25409999 0.079348 0.12588 2.12940001 0.29091999 0.044597 0.27353999 0.037492 3.45799994 0.34641999 0.32802999 0.17566 0.22466999 0.08987 0.24528 0.070129 0.2165 0.44312999 0.02516 0.40817001 0.33533001 0.0067758 0.11499 0.15701 0.085219 0.018568 0.26124999 0.015387 Unit: test Vector: 0.13175 0.25516999 0.067915 0.26192999 0.26155001 0.23569 0.13077 0.011801 1.76590002 0.20781 0.26198 0.16428 0.84641999 0.020094 0.070176 0.39778 0.15278 0.20213 1.61839998 0.54326999 0.17856 0.53894001 0.49868 0.10171 0.66264999 1.70510006 0.057193 0.32405001 0.66834998 0.26653999 2.84200001 0.26844001 0.59536999 0.50040001 1.51989996 0.039641 1.66589999 0.99757999 0.55970001 0.70493001 0.0309 0.28301999 0.13564 0.64289999 0.41490999 1.23619998 0.76586998 0.97798002 0.58507001 0.30175999 Unit: yet Vector: 0.69349998 0.13891999 0.10862 0.18671 0.56310999 0.070388 0.52788001 0.35681 0.21765 0.44887999 0.14023 0.020312 0.44203001 0.072964 0.85846001 0.41819 0.19097 0.33511999 0.012309 0.53561002 0.44547999 0.38117 0.2255 0.26947999 0.56835002 1.71700001 0.76059997 0.43305999 0.41890001 0.091699 3.2262001 0.18561 0.014535 0.69815999 0.21151 0.28681999 0.12492 0.49278 0.57783997 0.75677001 0.47876 0.083749 0.013377 0.19862001 0.14819001 0.21787 0.30472001 0.54255003 0.20916 0.14964999 WEC: algo:glove;dataset:6b;dims:50;fold:1;norm:abtt;unit:token Raw: '' Prepro: 'this', 'is', 'a', 'test' Unit: a Vector: 0.38456726 0.39097878 0.1628997 0.35068694 0.99550414 0.44776174 0.50116265 0.31360865 0.35520661 0.12043196 0.06741576 0.22319981 0.3842575 0.31569615 0.12704191 0.6358701 0.36765504 0.2414223 0.2757951 0.06014517 0.47552517 0.17220016 0.76332432 0.32266825 0.3489612 1.037853 0.32191628 0.15478981 0.11307254 0.47718403 1.48160338 1.41211295 0.17363971 0.33394873 0.05526268 0.04968219 0.40862644 0.32090271 0.75181049 0.07840931 0.39596623 0.88622624 0.85963786 0.91397953 0.53625643 0.70439553 0.31108141 0.22278789 0.51454931 1.25660634 Unit: is Vector: 0.09126818 0.60529983 0.19061366 0.60591251 0.75278735 0.27584556 0.00489476 0.10457748 0.42818767 0.12769794 0.5956223 0.79856926 0.23736086 0.52514869 0.23125611 0.40881187 0.9044193 0.28455088 0.76149231 0.16461219 0.9323107 0.26970825 0.14817345 0.42578259 0.66423047 0.9320755 0.04194349 0.37159386 0.32375848 0.23331042 1.64041948 1.39662826 0.2985028 0.49035078 0.17418115 0.42143601 0.27057451 0.27170798 0.43615541 0.24219584 0.20077799 0.79368269 0.51842153 0.87728345 0.13601783 0.19085133 0.53250313 0.44660494 0.4021166 1.45063889 Unit: test Vector: 0.38716662 0.2882818 0.20366421 0.48994547 0.25463828 0.02147874 0.11951575 0.48101032 1.92743909 0.03607689 0.41778082 0.42583492 1.02733421 0.15747839 0.08725743 0.22394061 0.51424712 0.60825217 0.71632195 0.43812668 0.50002372 0.56020129 0.37860283 0.2310212 1.06628919 0.69672549 0.52087027 0.64004648 1.05325282 0.54999208 0.73280215 0.34567764 0.17792362 0.47898144 1.28256369 0.05218088 1.80012178 1.07820046 0.26461291 0.25504762 0.18192536 0.19477114 0.31879386 0.19867522 0.92652762 0.85793 0.36064354 0.80783612 0.67693424 0.65146315 Unit: this Vector: 0.05769843 0.33354184 0.10845295 0.40082479 0.46379334 0.08621311 0.24618721 0.22265518 0.07422543 0.14528893 0.07466117 0.76159853 0.66591597 0.4429512 0.83671933 0.18990952 0.71576232 0.669433 0.58903944 0.18092248 0.49315494 0.26879567 0.0536716 0.08078988 1.02947712 0.56003988 0.3793031 0.07380959 0.00828686 0.33786285 1.6179111 0.93445206 0.27972579 0.58211684 0.3217994 0.36302748 0.33139306 0.26765579 0.08437763 0.34973046 0.02588651 0.54583436 0.59350443 0.92348766 0.31716001 0.15190703 0.29891419 0.11002633 0.24681857 1.29339063 Raw: '' Prepro: 'yet', 'another', 'test' Unit: another Vector: 0.025814 0.22290546 0.47375607 0.41591337 0.91046846 0.1878776 0.5489589 0.92557371 0.20558342 0.18349826 0.08540669 0.21822196 0.57494354 0.1411396 0.60889614 0.5789035 0.35228795 0.33926874 0.09776771 0.0921993 0.54469943 0.1482498 0.37853688 0.05142017 0.54176974 1.0848732 0.18702543 0.27727041 0.12025136 0.25307268 1.2834959 0.97531319 0.10326141 0.20209748 0.01885122 0.00244692 0.38215482 0.15179047 0.51672393 0.01954687 0.24587034 0.89274144 0.52436882 0.85171229 0.63781095 0.54679894 0.49500448 0.15312833 0.3553136 0.99029428 Unit: test Vector: 0.38716662 0.2882818 0.20366421 0.48994547 0.25463828 0.02147874 0.11951575 0.48101032 1.92743909 0.03607689 0.41778082 0.42583492 1.02733421 0.15747839 0.08725743 0.22394061 0.51424712 0.60825217 0.71632195 0.43812668 0.50002372 0.56020129 0.37860283 0.2310212 1.06628919 0.69672549 0.52087027 0.64004648 1.05325282 0.54999208 0.73280215 0.34567764 0.17792362 0.47898144 1.28256369 0.05218088 1.80012178 1.07820046 0.26461291 0.25504762 0.18192536 0.19477114 0.31879386 0.19867522 0.92652762 0.85793 0.36064354 0.80783612 0.67693424 0.65146315 Unit: yet Vector: 0.19308138 0.16284789 0.15555759 0.03641786 0.57559294 0.17704657 0.5483343 0.83097643 0.06182532 0.20686415 0.00978364 0.59366596 0.62608606 0.10085706 0.88102579 0.25119966 0.54406774 0.73183894 0.87969595 0.4385618 0.75427032 0.40465489 0.11098945 0.39693087 0.95634723 0.75478542 0.31514072 0.12455961 0.04534632 0.3660695 1.20038748 0.78086185 0.38523355 0.6831497 0.01792914 0.3780098 0.25575435 0.57207143 0.28931174 0.32322413 0.27600241 0.38538483 0.18901677 0.6213603 0.34912282 0.14569211 0.7041254 0.37438834 0.12010401 1.07518613 WOMBAT also supports the selection of embedding vectors for words matching a particular string pattern . The following code looks up embedding vectors matching the supplied pattern. The pattern uses the GLOB syntax described here . In a nut shell, it allows the use of placeholders like ?, , , ^ , and ranges. python import sys from wombat_api.core import connector as wb_conn from wombat_api.analyse import plot_tsne pattern,exclude_pattern,wbpath,wec_ids , , , plot False for i in range(len(sys.argv)): if sys.argv i p : pattern sys.argv i+1 elif sys.argv i xp : exclude_pattern sys.argv i+1 elif sys.argv i wbpath : wbpath sys.argv i+1 elif sys.argv i wecs : wec_ids sys.argv i+1 elif sys.argv i plot : plot True wbc wb_conn(path wbpath, create_if_missing False) vecs wbc.get_matching_vectors(wec_ids, pattern pattern, exclude_pattern exclude_pattern) if plot: plot_tsne(vecs, iters 1000, fontsize 5, size (10,10), arrange_by wec_ids, silent False) else: One wec_result for each wec specified in wec_identifier for wec_index in range(len(vecs)): Index 0 element is the wec_id print( \nWEC: %s %vecs wec_index 0 ) Index 1 element is the list of all results for this wec Result list contains tuples of ( raw , prepro , (w,v) tuples ) for (raw, prepro, tuples) in vecs wec_index 1 : print( Raw: '%s' %str(raw)) print( Prepro: %s %str(prepro)) for (w,v) in tuples: print( Unit: %s\nVector: %s\n %(w,str(v))) Executing this code with shell $ python tools/test_get_matching_vectors.py wbpath data/wombat data/ wecs algo:glove;dataset:6b;dims:50;fold:1;norm:none;unit:token p comput xp _ from the WOMBAT directory returns from the GloVe embeddings a list of tuples for all words matching the substring comput , but excluding those with an underscore. WEC: P: computer ;XP: _ ;@algo:glove;dataset:6b;dims:50;fold:1;norm:none;unit:token Raw: '' Prepro: Unit: computer Vector: 0.079084 0.81503999 1.79009998 0.91653001 0.10797 0.55628002 0.84426999 1.49510002 0.13417999 0.63626999 0.35146001 0.25813001 0.55028999 0.51055998 0.37408999 0.12092 1.61660004 0.83653003 0.14202 0.52348 0.73452997 0.12207 0.49079001 0.32532999 0.45306 1.58500004 0.63848001 1.00530005 0.10454 0.42984 3.18099999 0.62186998 0.16819 1.01390004 0.064058 0.57844001 0.45559999 0.73782998 0.37202999 0.57722002 0.66441 0.055129 0.037891 1.32749999 0.30991 0.50696999 1.23570001 0.1274 0.11434 0.20709001 Unit: computers Vector: 0.56105 1.19659996 2.4124999 0.35547999 0.046729 0.73904002 0.70042002 1.65859997 0.030509 0.63224 0.40307 0.30063 0.13483 0.20847 0.38823 0.50260001 1.83519995 0.83701003 0.6455 0.72898 0.69954002 0.21853 0.063499 0.34255999 0.65038002 1.11230004 0.41428 1.12329996 0.62655002 0.60872 2.81030011 0.19251999 0.19487 0.71785003 0.21378 0.75274003 0.27748001 0.81586999 0.24152 0.040814 0.40838999 0.0029812 0.35493001 1.46300006 0.17201 0.80510002 0.49981999 0.15800001 0.26460999 0.38896999 Unit: computing Vector: 0.075077 0.10027 1.18130004 0.95204997 0.041338 0.79659998 0.03967 1.66919994 0.34807 0.42230001 0.26225001 0.07144 0.052628 0.041547 0.67650998 0.0065369 0.49070001 1.26110005 0.64635003 0.5262 0.21816 0.52133 0.44356999 0.15283 0.55921 0.15716 0.68899 1.22010005 0.040493 0.65311998 2.38890004 0.50182003 0.26547 1.20449996 0.43509001 0.36212999 0.99496001 1.25100005 0.45027 0.019758 0.76959002 0.48109999 0.90126997 1.56589997 0.29357001 0.32879999 1.13759995 0.15703 0.20730001 0.50344002 Unit: computerized Vector: 0.22301 1.31719995 0.75747001 0.38552001 0.50441998 0.55441999 0.39649999 1.13160002 1.22570002 0.22702 0.30836999 0.18944 0.49366 0.90425003 0.45399001 0.042686 1.2723 0.062451 0.13463999 0.50247002 0.39923999 0.36028001 0.81274998 0.037325 0.046816 0.33647001 1.0474 0.37382001 0.34393999 0.50757003 1.57729995 0.076262 0.3581 0.76959997 0.19645999 1.02550006 0.36827001 0.38780999 0.12588 0.13531999 0.31990999 0.03272 0.01128 1.47019994 0.69431001 0.071377 1.22099996 0.81044 0.40667999 0.098573 Unit: computational Vector: 4.31499988e 01 3.67849991e 02 9.68580022e 02 4.22829986e 01 3.88289988e 01 6.89260006e 01 1.01639998e+00 1.73469996e+00 1.34930000e 01 5.69400005e 02 8.11169982e 01 2.79329985e 01 6.17060006e 01 3.97960007e 01 4.00079995e 01 2.86139995e 01 2.48089999e 01 1.27509999e+00 2.92879999e 01 7.10950017e 01 8.70049968e 02 8.45350027e 01 2.09790006e 01 2.22760007e 01 8.37759972e 01 9.81409997e 02 7.16199994e 01 8.74830008e 01 2.18679994e 01 8.55109990e 01 1.46029997e+00 7.84169972e 01 3.67179990e 01 1.71550000e+00 9.42170024e 02 8.05830002e 01 1.20410001e+00 1.88180006e+00 1.08070004e+00 1.10560000e+00 4.94690001e 01 3.08530003e 01 1.84230000e 01 1.47109997e+00 5.90629995e 01 3.49229991e 01 2.28239989e+00 1.30540001e+00 1.02009997e 03 1.60899997e 01 Unit: computation Vector: 0.44551 0.20328 0.16670001 0.29977 0.24637 0.44426 1.08599997 1.11899996 0.39616001 0.75651002 0.27359 0.020149 0.10735 0.12139 0.22418 0.25176001 0.028599 0.31507999 0.25172001 0.24843 0.22615001 0.93827999 0.38602 0.089497 0.98723 0.39436001 0.34908 0.99075001 0.34147 0.021747 1.43799996 0.83107001 0.48113999 0.83788002 0.13285001 0.065932 0.10166 1.00689995 0.10475 0.90570003 0.052845 0.68559003 0.81279999 1.72060001 1.00870001 0.61612999 1.9217 0.52373999 0.0051134 0.23796999 Unit: computed Vector: 0.92198998 0.42993999 1.18130004 0.60396999 0.58127999 0.12542 1.14040005 1.41620004 0.091121 0.57312 1.1875 0.33028999 0.17159 0.20772 0.23935001 0.91812998 0.30410999 0.57440001 0.51454002 0.28658 0.054586 1.50179994 1.06110001 0.10836 0.016461 0.57080001 0.79029 0.015223 0.54136997 0.24146999 0.77051997 0.14156 0.038233 0.84209001 0.10314 0.41255999 0.94155002 1.25880003 0.38464999 0.82897002 0.32045999 0.27164999 0.77164 1.43519998 1.39279997 1.17069995 1.56280005 0.73864001 0.75353003 0.19359 Unit: compute Vector: 0.63358003 0.37999001 1.15170002 0.10287 0.56019002 0.33078 0.78088999 0.52937001 0.36013001 0.049813 0.41021001 0.51063001 0.023768 0.73566997 0.087008 0.44508001 0.23927 0.13426 0.53015 0.84297001 0.36684999 1.60409999 0.60742003 0.4862 0.59741002 0.73307002 1.10570002 0.44442001 0.81307 0.44319999 1.11520004 0.14816999 0.53328001 0.031922 0.01878 0.13345 0.0033607 0.33338001 0.41016999 0.45853001 0.56351 0.59254998 0.79004002 1.08350003 1.11530006 0.64942002 1.47350001 0.21834999 0.36024001 0.37728 Unit: supercomputer Vector: 0.054309 0.74190003 0.98615003 1.48800004 0.31690001 0.79742998 0.33346999 1.24890006 0.48521 0.47497001 0.57542002 0.14462 0.047178 0.71052998 0.55022001 0.51172 0.45679 1.06949997 0.86000001 0.62437999 0.67954999 1.68169999 1.35780001 0.86707997 0.23199999 0.44558001 0.016437 0.13151 0.30254 0.75502998 0.24353001 0.51615 0.23749 0.47378001 0.86453003 0.33899 0.52517998 1.24790001 0.023642 0.34333 0.023264 0.71818 0.10802 0.89945 0.62333 0.32117 1.028 0.053564 0.27849001 0.15685 Unit: supercomputers Vector: 0.13271999 1.63479996 1.54130006 1.0187 0.36779001 0.98526001 0.18335 1.27250004 0.43555999 0.35550001 0.38440999 0.059009 0.093939 0.61080998 0.026098 0.25139001 0.12072 0.90805 0.68120003 1.03770006 0.11673 1.93009996 0.45818001 0.47898 0.35043001 0.38150999 0.14930999 0.82398999 0.43788001 0.30847001 0.11093 0.41409999 0.58244002 0.18618 0.065696 0.18224999 0.62984002 1.5941 0.81909001 0.30436 0.057413 0.014005 0.84983999 1.28690004 0.38229001 0.43239999 0.74114001 0.36223999 0.61400002 0.27274001 Unit: computations Vector: 0.92869002 1.02049994 0.19661 0.14015999 0.11591 0.34413001 1.30859995 0.23383 0.15123001 0.77437001 0.11961 0.14681999 0.035171 0.23051 0.021644 0.26311001 0.11231 0.16500001 0.011065 0.82683998 0.66431999 0.88352001 0.069709 0.19406 0.60465002 0.89796001 0.93678999 0.94221997 0.026637 0.65461999 0.96908998 0.23707999 0.47549 0.36783999 0.30926999 0.47736999 0.75032002 0.92299998 0.14572001 0.87426001 0.17066 0.3971 0.38001999 1.71399999 0.73566997 0.97488999 1.31379998 0.83398998 0.38859999 0.32051 Unit: computerised Vector: 0.12611 1.65090001 0.23131999 0.42032 0.85224003 0.64967 0.10709 0.82485002 0.82120001 0.013014 0.23706 0.085659 0.52227002 0.78956997 0.73622 0.17614999 0.94698 0.18522 0.032076 0.035771 0.20302001 0.56418997 0.73012 0.063655 0.079343 0.53434002 0.23952 0.024863 0.023046 0.072238 0.20665 0.21754 0.27156001 0.26984 0.24496 0.74730998 0.58513999 0.16144 0.31505999 0.11659 0.096848 0.47889999 0.5596 1.82539999 1.1983 0.10177 0.71583003 0.88134998 0.63433999 0.43048999 ..... Unit: computec Vector: 0.26438001 0.031859 0.37781999 1.19770002 0.037241 0.28432 0.48710001 0.71013999 0.097773 1.08249998 0.91813999 0.11769 1.06219995 0.95842999 0.72715002 0.75755 1.24370003 0.19340999 0.74687999 0.28589001 1.046 0.21258999 0.61084998 0.24936999 0.45050001 0.79170001 0.46599001 0.22724999 0.72018999 0.24209 1.78380001 0.52792001 0.23574001 0.35584 1.83280003 1.35420001 1.56149995 0.41892999 0.42469001 0.65151 0.22994 0.96930999 0.25121 0.035985 1.04270005 0.34784001 0.34584001 0.28391001 0.26899999 0.16615 Unit: computerise Vector: 0.13243 1.00460005 0.69104999 0.46228001 0.95081002 0.83868998 0.50146002 0.96180999 0.66720003 0.0078055 0.41389999 0.1487 0.94172001 0.27941 0.68633997 0.71447998 0.74552 0.26036999 1.26040006 0.12515 0.43461999 0.22176 0.1957 0.25902 0.4844 0.81441998 0.24135999 0.50159001 0.13429999 0.31376001 1.12609994 0.70595002 0.18280999 0.14963999 0.12553 0.17343999 0.53565001 0.47918999 0.73098999 0.082523 0.13792001 0.97311002 0.23997 0.35769999 0.49739999 0.19893999 0.29245001 0.35404 0.33359 0.29841 Unit: ncomputing Vector: 0.13777 0.89407998 0.36000001 0.23384 0.16268 0.25003001 0.38916999 0.040075 0.5772 0.38306999 0.17998999 0.11491 0.47702 0.16103999 0.56414002 0.41909999 0.1071 0.56476998 0.86243999 0.14602 0.019593 0.29097 0.25075001 0.075766 0.14061999 0.73618001 0.24442001 0.25635001 0.33256 0.32995999 1.73239994 0.65521997 0.42548999 0.27728999 0.016066 0.077929 0.44281 0.19193999 0.24304 0.42770001 0.15459 0.18421 0.60525 0.031987 0.054108 0.024123 0.39344999 0.38275999 0.40790999 0.47226 Unit: computrace Vector: 0.032573 0.20901 0.52177 0.58008999 0.29374 0.68484998 0.39283001 0.24631999 0.91284001 1.19729996 0.067714 0.14139 0.20815 0.44073999 0.075302 0.030624 0.15228 0.12558 0.86303997 0.24861 0.41420001 0.33192 0.70894998 0.43792 1.24559999 1.09360003 0.12145 0.14472 0.64788997 0.037487 0.92712998 0.21217 0.113 0.61799002 0.3064 0.19243 0.045926 0.10823 0.13944 0.33397001 0.10098 0.45471999 0.42684001 0.048138 0.027003 0.40382001 1.00129998 0.26407 0.51999003 0.084454 Unit: computacenter Vector: 0.086849 0.17321 1.00810003 0.21253 0.5334 0.13697 0.56629997 0.68970001 0.47001001 0.65403998 0.30138999 0.64124 0.77232999 0.4826 0.44688001 0.12972 0.034202 0.54593003 0.41102001 0.45901 0.16802999 0.65959001 0.80486 0.30281001 0.07883 0.39427999 0.18619999 0.06051 0.44953999 1.17190003 1.57009995 0.18610001 0.63310999 0.50357002 0.20285 0.48023 0.1048 0.41510001 0.505 0.89828998 0.14026999 0.075739 0.23270001 0.2129 0.094783 0.04949 0.60021001 0.24270999 0.34661001 0.23172 WOMBAT also supports random sampling from WEC vocabularies. Sample size can be specified as an absolute size or as percentage. Integrating automatic preprocessing Simple preprocessing (no MWEs) In order to process raw input, WOMBAT supports the integration of arbitrary preprocessing python code right into the word embedding database. Then, if WOMBAT is accessed with the attribute raw True , this code is automatically executed in the background. WOMBAT provides the class wombat_api.preprocessors.preprocessor_stub.py to be used as a base for customized preprocessing code. python import pickle Stop word replacement SW_SYMBOL sw class preprocessor(object): def __init__(self, name __name__, phrasefile , verbose False): if verbose: print( Initializing preprocessor %s %name) This method is called from WOMBAT. 'line' is the raw string to be processed, 'unit' is the processing unit to be used (e.g. token, stem). def process(self, line, unit, fold True, sw_symbol SW_SYMBOL, conflate False, no_phrases False, verbose False): Lowercase if fold True if fold: line line.lower() This does the most rudimentary preprocessing only return line.split( ) def pickle(self, picklefile): pickle.dump(self, open(picklefile, wb ), protocol pickle.HIGHEST_PROTOCOL) However, WOMBAT also provides the ready to use standard preprocessor wombat_api.preprocessors.standard_preprocessor.py (based on NLTK 3.2.5). In order to link it (or any other preprocessing code based on the above stub!!) to one or more WECs in WOMBAT, a pickled instance has to be created first, and then linked to one or more WECs. The following code is available in tools/assign_preprocessor_to_glove.py python from wombat_api.preprocessors.standard_preprocessor import preprocessor from wombat_api.core import connector as wb_conn prepro preprocessor(name wombat_standard_preprocessor , phrasefile ) prepro.pickle( temp/wombat_standard_preprocessor.pkl ) wbpath data/wombat data/ wbc wb_conn(path wbpath, create_if_missing False) wbc.assign_preprocessor( algo:glove;dataset:6b;dims:{50,100,200,300};fold:1;unit:token;norm:{none,abtt} , temp/wombat_standard_preprocessor.pkl ) Calling this method with an empty string as pickle file name removes the preprocessor. wbc.assign_preprocessor( algo:glove;dataset:6b;dims:{50,100,200,300};fold:1;unit:token;norm:{none,abtt} , ) After that, raw, unprocessed input data can be streamed directly into WOMBAT's vector retrieval methods. python import numpy as np from wombat_api.core import connector as wb_conn wbpath data/wombat data/ wbc wb_conn(path wbpath, create_if_missing False) wec_ids algo:glove;dataset:6b;dims:50;fold:1;unit:token;norm:none rawfile data/text/STS.input.track5.en en.txt vecs wbc.get_vectors(wec_ids, {}, for_input np.loadtxt(rawfile, dtype str, delimiter '\t', usecols 0) , raw True, in_order True, ignore_oov True) One wec_result for each wec specified in wec_identifier for wec_index in range(len(vecs)): Index 0 element is the wec_id print( \nWEC: %s %vecs wec_index 0 ) Index 1 element is the list of all results for this wec Result list contains tuples of ( raw , prepro , (w,v) tuples ) for (raw, prepro, tuples) in vecs wec_index 1 : print( Raw: '%s' %str(raw)) print( Prepro: %s %str(prepro)) for (w,v) in tuples: print( Unit: %s\nVector: %s\n %(w,str(v))) ignore_oov True suppresses empty default vectors in the output for oov words (incl. \ sw\ (stop words) produced by the preprocessor). If the original input ordering need not be preserved (e.g. because vectors of a sentence are averaged anyway), use in_order False in order to speed up the retrieval. Executing this code with shell $ python tools/test_get_vectors_from_raw.py from the WOMBAT directory returns (abbreviated) WEC: algo:glove;dataset:6b;dims:50;fold:1;norm:none;unit:token Raw: 'A person is on a baseball team.' Prepro: ' sw ', 'person', ' sw ', ' sw ', ' sw ', 'baseball', 'team' Unit: person Vector: 0.61734003 0.40035 0.067786 0.34263 2.06469989 0.60843998 0.32558 0.38690001 0.36906001 0.16553 0.0065053 0.075674 0.57099003 0.17314 1.01419997 0.49581 0.38152 0.49254999 0.16737001 0.33948001 0.44405001 0.77543002 0.20935 0.60070002 0.86649001 1.89230001 0.37900999 0.28044 0.64213997 0.23548999 2.93580008 0.086004 0.14327 0.50160998 0.25290999 0.065446 0.60768002 0.13984001 0.018135 0.34876999 0.039985 0.07943 0.39318001 1.05620003 0.23624 0.41940001 0.35332 0.15233999 0.62158 0.79256999 Unit: baseball Vector: 1.93270004 1.04209995 0.78514999 0.91033 0.22711 0.62158 1.64929998 0.07686 0.58679998 0.058831 0.35628 0.68915999 0.50598001 0.70472997 1.26639998 0.40031001 0.020687 0.80862999 0.90565997 0.074054 0.87674999 0.62910002 0.12684999 0.11524 0.55685002 1.68260002 0.26291001 0.22632 0.713 1.08280003 2.12310004 0.49869001 0.066711 0.48225999 0.17896999 0.47699001 0.16384 0.16537 0.11506 0.15962 0.94926 0.42833 0.59456998 1.35660005 0.27506 0.19918001 0.36008 0.55667001 0.70314997 0.17157 Unit: team Vector: 0.62800997 0.12254 0.39140001 0.87936997 0.28571999 0.41953 1.42649996 0.80462998 0.27045 0.82498997 1.02769995 0.18546 1.76049995 0.18551999 0.56818998 0.38554999 0.61609 0.51209003 1.51530004 0.45688999 1.19289994 0.33886001 0.18038 0.10788 0.35567001 1.57009995 0.02989 0.38742 0.60838002 0.59188998 2.99110007 1.20220006 0.52598 0.76941001 0.63006002 0.63827997 0.30772999 1.01230001 0.0050781 1.03260005 0.29736 0.77503997 0.27015001 0.18161 0.04211 0.32168999 0.018298 0.85202003 0.038442 0.050767 Raw: 'Our current vehicles will be in museums when everyone has their own aircraft.' Prepro: ' sw ', 'current', 'vehicles', ' sw ', ' sw ', ' sw ', 'museums', ' sw ', 'everyone', ' sw ', ' sw ', ' sw ', 'aircraft' Unit: current Vector: 9.75340009e 02 7.97389984e 01 4.52930003e 01 8.86869989e 03 5.11780009e 02 1.81779992e 02 1.17909998e 01 6.97929978e 01 1.59400001e 01 3.38860005e 01 2.13860005e 01 1.19450003e 01 3.30779999e 01 7.08459988e 02 5.38580000e 01 5.27660012e 01 9.79890004e 02 3.43899988e 02 6.65669963e 02 2.71719992e 01 1.15869999e 01 7.70420015e 01 2.33769998e 01 8.57570022e 02 2.75379986e 01 1.26929998e+00 1.56700000e 01 4.58920002e 02 3.45319986e 01 1.30330002e+00 3.62069988e+00 9.13279969e 03 1.26800001e 01 6.15760028e 01 6.60099983e 02 2.54509985e 01 1.35349995e 03 5.12209982e 02 2.21770003e 01 4.43280011e 01 5.41520000e 01 1.96909994e 01 3.30339998e 01 3.70520004e 03 8.57439995e 01 1.67030007e 01 4.14049998e 02 5.95790029e 01 9.78059992e 02 1.86419994e 01 Unit: vehicles Vector: 0.75981998 0.76559001 2.09439993 0.37478 0.34946999 0.18489 1.11520004 1.01549995 0.24493 0.71603 0.60359001 1.04719996 0.28301999 0.36221999 0.29956001 0.043537 0.31847 1.47529995 0.49761999 2.1802001 0.52872998 0.34920001 0.78740001 0.058825 0.11986 0.59237999 0.19368 0.42545 1.21319997 0.19446 2.66330004 0.30814999 0.1981 0.28797999 1.17560005 0.68199998 0.4655 0.3504 1.00339997 0.83025002 0.2051 0.24585 1.10619998 0.8197 0.26460999 0.73376 0.53285003 0.035146 0.25134 0.60158002 Unit: museums Vector: 9.85180020e 01 1.13440001e+00 6.29760027e 01 3.34529996e 01 3.53210010e 02 1.28009999e+00 1.04939997e+00 6.92629993e 01 1.51199996e 02 6.12629987e 02 1.91709995e 01 1.35699997e 03 5.42540014e 01 1.70609996e 01 5.36289990e 01 3.47109996e 02 8.75020027e 01 4.11379989e 03 4.10959981e 02 7.34909996e 02 1.28649998e+00 2.06609994e 01 8.32859993e 01 3.66389990e 01 6.33740008e 01 2.20280007e 01 1.35179996e+00 3.86290014e 01 5.34630001e 01 1.21969998e+00 1.55239999e+00 6.94739997e 01 1.02810001e+00 1.52869999e+00 5.21550000e 01 8.31290007e 01 8.52039978e 02 8.92379999e 01 4.59740013e 01 5.44290006e 01 1.50869995e 01 6.45650029e 01 1.70070004e+00 6.50240004e 01 1.69949993e 01 9.48629975e 01 1.07200003e+00 7.92410001e 02 5.76539993e 01 7.30650008e 01 Unit: everyone Vector: 4.72460017e 02 4.25340012e 02 1.11500002e 01 5.33339977e 01 1.14870000e+00 4.18350011e 01 4.16669995e 01 4.66320008e 01 3.93959992e 02 2.13530004e 01 1.67190000e 01 2.35850006e 01 3.46029997e 01 3.85849997e 02 1.06449997e+00 4.68389988e 01 4.45210010e 01 3.39459985e 01 2.97329992e 01 9.35410023e 01 2.72670001e 01 9.17469978e 01 2.66399998e 02 4.96710002e 01 1.24520004e+00 1.83879995e+00 5.42389989e 01 4.77459997e 01 9.36029971e 01 9.21980023e 01 2.71600008e+00 1.13660002e+00 2.25899994e 01 3.84640008e 01 6.01819992e 01 2.26870000e 01 1.16690002e 01 3.29930000e 02 2.30489999e 01 4.95480001e 01 2.52389997e 01 6.36380017e 02 8.74719992e 02 5.59130013e 01 7.14589987e 05 2.49380007e 01 2.10319996e 01 2.35870004e 01 1.01240002e 01 7.58400023e 01 Unit: aircraft Vector: 1.77139997 0.75713998 1.02170002 0.26717001 0.36311001 0.29269001 0.79655999 0.49746001 0.41422001 1.06019998 1.22150004 0.41672 0.40248999 0.70012999 1.06949997 0.19489001 1.08860004 1.24090004 2.15050006 1.1609 0.10969 0.17290001 0.82805997 0.97654003 0.14616001 1.26409996 0.13635001 0.041624 1.09389997 0.71160001 2.47399998 0.16225 0.26348001 0.15532 1.19949996 0.0076471 0.76388001 0.071138 1.38689995 0.88787001 0.36175001 0.33419001 1.65120006 0.52294999 0.30656999 0.17399 0.55383003 0.46204001 0.59634 0.41802001 Raw: 'A woman supervisor is instructing the male workers.' Prepro: ' sw ', 'woman', 'supervisor', ' sw ', 'instructing', ' sw ', 'male', 'workers' Unit: woman Vector: 1.81529999e 01 6.48270011e 01 5.82099974e 01 4.94509995e 01 1.54149997e+00 1.34500003e+00 4.33050007e 01 5.80590010e 01 3.55560005e 01 2.51839995e 01 2.02539995e 01 7.16430008e 01 3.06100011e 01 5.61269999e 01 8.39280009e 01 3.80849987e 01 9.08749998e 01 4.33259994e 01 1.44360000e 02 2.37250000e 01 5.37989974e 01 1.77730000e+00 6.64329976e 02 6.97950006e 01 6.92910016e 01 2.67389989e+00 7.68050015e 01 3.39289993e 01 1.96950004e 01 3.52450013e 01 2.29200006e+00 2.74109989e 01 3.01690012e 01 8.52859986e 04 1.69229999e 01 9.14330035e 02 2.36099996e 02 3.62359993e 02 3.44880015e 01 8.39470029e 01 2.51740009e 01 4.21229988e 01 4.86160010e 01 2.23249998e 02 5.57600021e 01 8.52230012e 01 2.30729997e 01 1.31379998e+00 4.87639993e 01 1.04670003e 01 Unit: supervisor Vector: 0.43483999 0.29879001 0.33191001 0.66744 0.015454 0.15109 0.6063 0.43643999 0.50387001 1.29209995 0.19067 0.22946 0.15900999 0.11937 0.30079001 0.71973997 0.76618999 0.40612 0.45030999 0.56156999 0.46836001 0.56080002 0.24398001 0.41773999 0.060769 0.85593998 0.44560999 0.0173 0.18959001 0.47902 1.09940004 0.39855999 0.15020999 1.33490002 0.23598 0.40862 0.46061 0.041265 1.44430006 0.25913 0.28817001 0.92123002 0.29732999 0.10582 0.75729001 0.40329 0.026871 0.35651001 0.38978001 1.96019995 Unit: instructing Vector: 0.12468 0.76235002 0.036286 0.89383 0.44255 0.7999 0.014672 0.40333 0.19618 0.31009001 0.081948 0.53548002 0.3971 0.12518001 0.010218 0.50193 1.04390001 0.15561999 0.9472 0.46739 0.52798003 0.47464001 0.33513999 0.16192 0.13628 0.43952999 0.39326 0.59561998 0.43298 0.79999 0.30941999 0.40891001 0.94845003 0.58431 0.083376 0.27149999 0.41819 0.45974001 0.33594 0.34017 0.31760001 0.2308 0.20413999 0.30772999 0.14139999 0.39932001 0.10814 0.62976003 0.074504 0.12097 Unit: male Vector: 0.23046 0.65937001 0.28411001 0.44365999 1.59220004 1.85640001 0.0054708 0.58679003 0.1506 0.021166 1.10290003 0.79501998 1.18990004 0.53535002 0.25255999 0.15882 0.31825 0.53609002 0.59439999 0.21288 0.94989002 0.91619003 0.48789999 0.77063 0.16215 1.05149996 0.70570999 0.79813999 0.79354 0.086372 2.24970007 0.68785 0.085613 0.68004 0.62212002 0.02536 0.10967 0.38747999 0.62791002 1.08710003 0.37412 0.061965 0.19225 0.89262998 0.51762998 1.47909999 0.23219 1.15890002 0.066075 0.038772 Unit: workers Vector: 0.47005999 0.64020002 0.74308002 0.70699 0.18398 0.095573 1.12329996 0.66938001 0.31698999 0.87045002 0.36017999 1.01370001 0.60290003 0.14692 0.65534002 0.63380003 0.17293 0.89907002 0.60336 1.47580004 0.35749999 0.22641 0.66198999 0.059413 0.36116001 1.24820006 0.021193 0.58884001 0.081766 0.16429999 3.48309994 0.50941998 0.38088 0.0052672 0.38922 0.086958 0.047593 0.56067002 1.07790005 0.53268999 0.81387001 0.49265999 0.92754 0.34024999 0.8642 0.59026998 1.4217 0.29286 0.31193 0.34274 cut Advanced preprocessing with MWEs Preprocessing raw textual data for embedding vector lookup becomes non trivial when the WEC training data itself was processed in a non trivial way: When the training data was stemmed , the WEC vocabulary also consists of stems, and turning raw textual data into compatible units for lookup requires ideally that the exact same stemming algorithm be applied to it. The same is true for any other word level normalization / modification that might have been applied to the WEC training data. Integrating preprocessing code into embedding vector lookup, as described above, is a first step towards acknowledging the importance of preprocessing. For pretrained WECs, like GloVe above, the preprocessing code is often not available, or preprocessing is considered trivial. In these cases, it is possible with reasonable effort to inspect the WEC vocabulary and derive preprocessing rules which more or less imitate the original preprocessing. The standard_preprocessor class used above is an example of this. Preprocessing code to be integrated into WOMBAT supports an optional phrasespotter.py module, which can be initialized with a list of phrases / multi word expressions that you want to be treated as tokens. For custom, self trained WECs, the procedure is ideally the following: Obtain a list or dictionary of phrases / multi word expressions. This can either be a preexisting, manually curated resource (e.g. based on the Computer Science Ontology ), or a list of phrases mined automatically from some text (e.g. with ToPMine ). Create a preprocessor as above, providing the name of the file containing the phrases (one per line) as value to the phrasefile parameter. python from wombat_api.preprocessors import standard_preprocessor prepro standard_preprocessor.preprocessor(name my_cs_savvy_standard_preprocessor , phrasefile data/mwes/cso mwes stemmed.txt ) prepro.pickle( temp/my_cs_savvy_standard_preprocessor.pkl ) Apply the preprocessor to the raw WEC training data before training the WECs . WOMBAT provides the script tools/apply_preprocessor.py for that purpose. We provide a plain text file of CS publication titles from the DBLP site here . Unzip it to data/text/dblp titles.txt . Parallel Integer Sorting and Simulation Amongst CRCW Models. Pattern Matching in Trees and Nets. NP complete Problems Simplified on Tree Schemas. On the Power of Chain Rules in Context Free Grammars. Schnelle Multiplikation von Polynomen über Körpern der Charakteristik 2. A characterization of rational D0L power series. The Derivation of Systolic Implementations of Programs. Fifo Nets Without Order Deadlock. On the Complementation Rule for Multivalued Dependencies in Database Relations. Equational weighted tree transformations. Using this data set as input, the script can be called like this: shell $ python tools/apply_preprocessor.py data/text/dblp titles.txt temp/my_cs_savvy_standard_preprocessor.pkl stopwords: sws conflate unit:stem fold repeat_phrases to produce the following output: shell data/text/dblp titles.txt.conflated_sys.nophrases.stem data/text/dblp titles.txt.conflated_sys.repeat_phrases.stem data/text/dblp titles.txt.conflated_sys.nophrases.stem.idf data/text/dblp titles.txt.conflated_sys.repeat_phrases.stem.idf data/text/dblp titles.txt.conflated_sys.nophrases.stem contains the plain, stemmed version of the input files: parallel integ sort sw simul amongst crcw model pattern match sw tree sw net np complet problem simplifi sw tree schema sw power sw chain rule sw context free grammar schnell multiplik von polynomen über körpern der charakteristik 0 sw character sw ration d0l power seri sw deriv sw systol implement sw program fifo net without order deadlock sw complement rule sw multivalu depend sw databas relat equat weight tree transform data/text/dblp titles.txt.conflated_sys.repeated_phrases.stem contains the stemmed version of the input files, with identified phrases. In addition, due to the repeat_phrases switch, it contains a plain copy of each line in which at least one phrase was detected. parallel integ sort sw simul amongst crcw model pattern_match sw tree sw net pattern match sw tree sw net np complet problem simplifi sw tree schema sw power sw chain rule sw context_free_grammar sw power sw chain rule sw context free grammar schnell multiplik von polynomen über körpern der charakteristik 0 sw character sw ration d0l power seri sw deriv sw systol implement sw program fifo net without order deadlock sw complement rule sw multivalu depend sw databas relat equat weight tree transform data/text/dblp titles.txt.conflated_sys.repeated_phrases.stem.idf contains idf scores for all vocabulary items. parallel 5.9009944474123 integ 8.105335037869118 sort 8.476328191481095 sw 1.8121353984487958 simul 5.7200901939963575 amongst 11.67999918619934 crcw 13.33225709581637 model 4.221747418292076 pattern_match 9.385228981189533 tree 6.3878685829354325 net 7.425108697454633 pattern 6.269503282251706 match 6.71239224432375 np 9.158831826956924 complet 7.385855293345302 problem 5.400074426355499 simplifi 8.818311696228356 schema 8.479982721069225 power 5.880688809116575 chain 7.260870040566218 rule 6.757268427774883 context_free_grammar 10.561623408412391 context 6.646591236440547 free 6.905869776159018 grammar 7.980991554950237 Train embedding vectors on the preprocessed training data, using your favourite training algorithm and setup. Import ( importing pre trained embeddings to wombat glove) the embedding vectors into WOMBAT, and assign the preprocessor ( integrating automatic preprocessing), using the code above. Done! You are all set now to retrieve embedding vectors for arbitrary, raw input text, and fast !! Use Cases Pairwise Distance The computation of pairwise semantic distance is a standard task in NLP. One common application is computing the similarity of pre defined sentence pairs . WOMBAT provides the script tools/sentence_pair_similarity.py for this task, which uses the method wombat_api.analyse.plot_pairwise_distances . python import numpy as np, scipy.spatial.distance from wombat_api.core import connector as wb_conn from wombat_api.analyse import plot_pairwise_distances wbpath data/wombat data/ wbc wb_conn(path wbpath, create_if_missing False) Note: You can use e.g. algo:glove;dataset:6b;dims:{50,100,200};fold:1;unit:token to create three different plots in one run! wec_ids algo:glove;dataset:6b;dims:50;fold:1;unit:token rawfile data/text/STS.input.track5.en en.txt pp_cache {} vecs1 wbc.get_vectors(wec_ids, pp_cache, for_input np.loadtxt(rawfile, dtype str, delimiter '\t', usecols 0, skiprows 0) , raw True) vecs2 wbc.get_vectors(wec_ids, pp_cache, for_input np.loadtxt(rawfile, dtype str, delimiter '\t', usecols 1, skiprows 0) , raw True) Use ignore_identical True to ignore pairs whose avg. vectors are identical ( max. similarity or min. distance) pd plot_pairwise_distances(vecs1, vecs2, arrange_by wec_ids, pdf_name temp/sent_sim.pdf , size (25,10), max_pairs 20, ignore_identical False) Calling this script produces the following output: ! Wombat sentence similarity plot One might also be interested in finding maximally similar pairs of sentences in a plain list. WOMBAT provides the script tools/full_pairwise_similarity.py for this. The main difference to the above script is that it supplies None as the value for the second parameter. This causes the wombat_api.analyse.plot_pairwise_distances method to create a cartesian product of all sentences supplied as value to the first, obligatory parameter. python import numpy as np, scipy.spatial.distance from wombat_api.core import connector as wb_conn from wombat_api.analyse import plot_pairwise_distances wbpath data/wombat data/ wbc wb_conn(path wbpath, create_if_missing False) wec_ids algo:glove;dataset:6b;dims:50;fold:1;unit:token rawfile data/text/STS.input.track5.en en.txt vecs1 wbc.get_vectors(wec_ids, {}, for_input np.loadtxt(rawfile, dtype str, delimiter '\t', usecols 0, skiprows 0) , raw True) Use ignore_identical True to ignore pairs whose avg. vectors are identical ( max. similarity or min. distance) pd plot_pairwise_distances(vecs1, None, arrange_by wec_ids, pdf_name temp/full_pw_sim.pdf , size (25,10), max_pairs 20, ignore_identical False) Calling this script produces the following output: ! Wombat full list similarity plot Most Similar Words WOMBAT provides the script tools/get_most_similar.py for computing the most similar words to a given list of target words. The script uses the method wombat_api.analyse.get_most_similar . python import sys from wombat_api.core import connector as wb_conn from wombat_api.analyse import get_most_similar import scipy.spatial.distance as dist wbpath sys.argv 1 wec_ids sys.argv 2 targets sys.argv 3 .split( , ) try: to_rank sys.argv 4 .split( , ) except IndexError: to_rank wbc wb_conn(path wbpath, create_if_missing False) sims get_most_similar(wbc, wec_ids, targets targets, measures dist.cosine , to_rank to_rank) for (w, wec, mes, simlist) in sims: print( \n%s %(wec)) for (t,s) in simlist: print( %s(%s, %s)\t%s %(mes,w,t,s)) Computing the similarity of a given list of target words to all words in an embedding set is a task that does not benefit from Wombat's lazy loading philosophy, because it involves iterating over a lot of single items. The above code compensates this by accepting several target words at once, while loading the words in the embedding set only once. Executing the script with shell $ python tools/get_most_similar.py data/wombat data/ algo:glove;dataset:6b;dims:{50,100};fold:1;norm:{none,abtt};unit:token car,bike from the WOMBAT directory returns algo:glove;dataset:6b;dims:50;fold:1;norm:none;unit:token cosine(car, truck) 0.07914144136184864 cosine(car, cars) 0.11298109069525497 cosine(car, vehicle) 0.11663159684321234 cosine(car, driver) 0.15359811852812422 cosine(car, driving) 0.16158120657580843 cosine(car, bus) 0.17894889497726807 cosine(car, vehicles) 0.18250077858745317 cosine(car, parked) 0.2097811084657102 cosine(car, motorcycle) 0.2133497199448282 cosine(car, taxi) 0.21660710099093428 algo:glove;dataset:6b;dims:50;fold:1;norm:none;unit:token cosine(bike, bicycle) 0.07540862422613559 cosine(bike, rides) 0.12897087378541827 cosine(bike, bikes) 0.15252882825561032 cosine(bike, ride) 0.16029085596645365 cosine(bike, cart) 0.20388619664671093 cosine(bike, bicycles) 0.22393171208065155 cosine(bike, riding) 0.2297407298062787 cosine(bike, motorcycle) 0.24199681247288152 cosine(bike, skateboard) 0.24562024322931186 cosine(bike, wheel) 0.24976224925775947 algo:glove;dataset:6b;dims:50;fold:1;norm:abtt;unit:token cosine(car, truck) 0.0806001419007456 cosine(car, driver) 0.12179994387193638 cosine(car, vehicle) 0.1385399783711604 cosine(car, cars) 0.14205120673399707 cosine(car, tractor) 0.19330317428597177 cosine(car, cab) 0.19371578595889627 cosine(car, driving) 0.1967477518121835 cosine(car, taxi) 0.19764512986360383 cosine(car, parked) 0.2024978715831982 cosine(car, forklift) 0.21243824560524704 algo:glove;dataset:6b;dims:50;fold:1;norm:abtt;unit:token cosine(bike, bicycle) 0.08398014976833035 cosine(bike, rides) 0.1430640377058503 cosine(bike, bikes) 0.16369354577451944 cosine(bike, ride) 0.17653528980791744 cosine(bike, limo) 0.1823194282582885 cosine(bike, skateboard) 0.2085667400501673 cosine(bike, cart) 0.21514646350843625 cosine(bike, bicycles) 0.23932357247389668 cosine(bike, riding) 0.25687287619295995 cosine(bike, biking) 0.26260029724823075 algo:glove;dataset:6b;dims:100;fold:1;norm:none;unit:token cosine(car, vehicle) 0.13691616910455218 cosine(car, truck) 0.1402122094746816 cosine(car, cars) 0.16283305313114194 cosine(car, driver) 0.18140894723421486 cosine(car, driving) 0.21873640792744087 cosine(car, motorcycle) 0.2446842503669403 cosine(car, vehicles) 0.25377434558164547 cosine(car, parked) 0.2540535380120613 cosine(car, bus) 0.26272929599923434 cosine(car, taxi) 0.28447302367774396 algo:glove;dataset:6b;dims:100;fold:1;norm:none;unit:token cosine(bike, bicycle) 0.10315127761665555 cosine(bike, bikes) 0.20443421876273637 cosine(bike, ride) 0.22046929133315563 cosine(bike, rides) 0.2638311426114084 cosine(bike, riding) 0.27133477109461057 cosine(bike, motorcycle) 0.27805119727347305 cosine(bike, biking) 0.2816471833865629 cosine(bike, horseback) 0.31557397925187236 cosine(bike, bicycles) 0.3187722929261676 cosine(bike, riders) 0.3254949790131334 algo:glove;dataset:6b;dims:100;fold:1;norm:abtt;unit:token cosine(car, truck) 0.15238329488374347 cosine(car, vehicle) 0.15575847257407438 cosine(car, cars) 0.19167657709380725 cosine(car, driver) 0.20033349172277293 cosine(car, parked) 0.24794750003421806 cosine(car, motorcycle) 0.2510652900482522 cosine(car, driving) 0.25658421356403294 cosine(car, suv) 0.2881546903629949 cosine(car, bus) 0.2910614135644427 cosine(car, vehicles) 0.29615907557187104 algo:glove;dataset:6b;dims:100;fold:1;norm:abtt;unit:token cosine(bike, bicycle) 0.1088470577560825 cosine(bike, bikes) 0.21590419939848782 cosine(bike, ride) 0.23369856648438625 cosine(bike, rides) 0.27806636584727484 cosine(bike, biking) 0.2832740671069537 cosine(bike, riding) 0.28638550538216256 cosine(bike, motorcycle) 0.2913097546696938 cosine(bike, horseback) 0.324846874936749 cosine(bike, bicycles) 0.3404461149572644 cosine(bike, wagon) 0.3443322594384779 The above code takes some time, though. Things are a lot different when only a small list of words is to be ranked according to their similarity to one or more target words. Executing the above script with an additional list of words like this shell $ python tools/get_most_similar.py data/wombat data/ algo:glove;dataset:6b;dims:{50,100};fold:1;norm:{none,abtt};unit:token car,bike trolley,bus,vehicle,transporter from the WOMBAT directory returns algo:glove;dataset:6b;dims:50;fold:1;norm:none;unit:token cosine(car, vehicle) 0.11663159684321234 cosine(car, bus) 0.17894889497726807 cosine(car, trolley) 0.48697765622473255 cosine(car, transporter) 0.6139896275893459 algo:glove;dataset:6b;dims:50;fold:1;norm:none;unit:token cosine(bike, vehicle) 0.3427957759292295 cosine(bike, bus) 0.34365947338677905 cosine(bike, trolley) 0.3602480028404018 cosine(bike, transporter) 0.7320497642797394 algo:glove;dataset:6b;dims:50;fold:1;norm:abtt;unit:token cosine(car, vehicle) 0.1385399783711604 cosine(car, bus) 0.2158960678290227 cosine(car, trolley) 0.46696018041448584 cosine(car, transporter) 0.5406758968293157 algo:glove;dataset:6b;dims:50;fold:1;norm:abtt;unit:token cosine(bike, trolley) 0.3678464886357319 cosine(bike, vehicle) 0.3874397902633365 cosine(bike, bus) 0.3921970555479769 cosine(bike, transporter) 0.7319556230922035 algo:glove;dataset:6b;dims:100;fold:1;norm:none;unit:token cosine(car, vehicle) 0.13691616910455218 cosine(car, bus) 0.26272929599923434 cosine(car, trolley) 0.5475087400049348 cosine(car, transporter) 0.7290820977867609 algo:glove;dataset:6b;dims:100;fold:1;norm:none;unit:token cosine(bike, trolley) 0.38364037699224673 cosine(bike, bus) 0.44165326460377197 cosine(bike, vehicle) 0.4536933011117086 cosine(bike, transporter) 0.8071001886680546 algo:glove;dataset:6b;dims:100;fold:1;norm:abtt;unit:token cosine(car, vehicle) 0.15575847257407438 cosine(car, bus) 0.2910614135644427 cosine(car, trolley) 0.5404368768171397 cosine(car, transporter) 0.6956990227076467 algo:glove;dataset:6b;dims:100;fold:1;norm:abtt;unit:token cosine(bike, trolley) 0.3900553987623596 cosine(bike, bus) 0.4667747849371262 cosine(bike, vehicle) 0.48185728456605526 cosine(bike, transporter) 0.807988795692304",Sentiment Analysis,Sentiment Analysis 2034,Natural Language Processing,Natural Language Processing,Natural Language Processing,"SCDV Python Sparse Composite Document Vectors python Implementation Paper: Requirements In my Dockerfile, use python 3.5.2 bash penguin@37ef290e8e58:/working$ python V Python 3.5.2 :: Continuum Analytics, Inc. and following libraries console jupyter notebook tqdm matplotlib numpy scipy scikit learn pandas seaborn lightgbm joblib NLP Tools mecab python3 gensim Setup Recommended bash cp project.env .env docker compose build docker compose up d docker exec it scdv jupyter bash or access to localhost:7001 note jupyter's default password is written on Dockerfile arg. The default is dolphin . Trouble Shooting If you catch error as below bash ERROR: for scdv jupyter Cannot start service jupyter: driver failed programming external connectivity on endpoint scdv jupyter (a16e504598f6081390b47fe6809aaba1a8b52672956e65feb11d3c00773363ba): Bind for 0.0.0.0:7011 failed: port is already allocated , change .env JUPYTER_PORT number. Create Feature Create SCDV feature using livedoor corpus . bash $ python src/train.py h usage: train.py h c COMPONENTS optional arguments: h, help show this help message and exit c COMPONENTS, components COMPONENTS GMM component size (i.e. latent space size.) (default: 60)) Benchmark Benchmark on Multilabel classfilication task using livedoor corpus. bash python src/SCDV_vs_SWEM.py > NOTO: run train.py before Settings Embedding Model: pretrained fasttext :pray: Embedding dim: 300 Gaussian Mixture Clusters: 60 Features SCDV SCDV without compress Compressed SCDV by PCA (compressed dim 100, 300, 500) SWEM: Simple Word Embedding Model proposed in max pooling and average pooling n gram SWEM (n 3, 5, 8) Classification Model: LightGBM same parameters for all features Train and Evaluation With stratified 5 fold, train and predict for each folds, create Out of Fold predict TODO LightGBM parameter tuning train by other models. (CNN, LSTM, SVM, ...) Result ! (report/scdv_vs_swem_result.png) SCDV (not comporessed) work well computation cost: SWEM << SCDV (calculate Gaussian Mixture) Accuracy: SWEM < SCDV Similarity Search The query document index is 500 query label: sports watch",Sentiment Analysis,Sentiment Analysis 2054,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Status: Active (under active development, breaking changes may occur) Blocksparse The blocksparse package contains TensorFlow Ops and corresponding GPU kernels for block sparse matrix multiplication. Also included are related ops like edge bias, sparse weight norm and layer norm. To learn more, see the launch post on the OpenAI blog . Prerequisites First, you need at least one Nvidia GPU. For best performance, we recommend using a Pascal or Maxwell generation GPU this is the full list of features by GPU type: GPU Family BSMatMul ASM BSMatMul CudaC BSConv Kepler X Maxwell X (fastest) X X Pascal X (fastest) X X Volta X (fastest) Note that BSMatMul CudaC only supports feature_axis 0 , while BSMatMul ASM only supports feature_axis 1 . Additionally, you need: A working Linux installation (we run Ubuntu 16.04) with the Nvidia drivers for your GPU. CUDA 8 (in /usr/local/cuda ) Python 3.5 or newer, or 2.7 or newer TensorFlow 1.4.0 or newer, with GPU support (e.g. pip install tensorflow gpu ) CUDA 9 and Volta will work if you update the build targets ( gencode arch compute_70,code sm_70) and also build tenorflow from source. Installation pip install blocksparse Usage This example performs a block sparse matrix multiplication: from blocksparse.matmul import BlocksparseMatMul import tensorflow as tf import numpy as np hidden_size 4096 block_size 32 minibatch_size 64 Create a (random) sparsity pattern sparsity np.random.randint(2, size (hidden_size//block_size,hidden_size//block_size)) Initialize the sparse matrix multiplication object bsmm BlocksparseMatMul(sparsity, block_size block_size) Input to graph x tf.placeholder(tf.float32, shape None, hidden_size ) Initialize block sparse weights w tf.get_variable( w , bsmm.w_shape, dtype tf.float32) Block sparse matrix multiplication y bsmm(x, w) Run sess tf.InteractiveSession() sess.run(tf.global_variables_initializer()) result sess.run( y , feed_dict {x: np.ones((minibatch_size,hidden_size), dtype 'float32')}) print(result) For a more involved example using block sparse ops to train a language model, see examples/ (./examples/). Development If you're interested in hacking on the ops and kernels, go ahead and build from source: git clone git@github.com:openai/blocksparse.git cd blocksparse make compile pip install dist/ .whl test it if you like test/blocksparse_matmul_test.py test/blocksparse_conv_test.py If your CUDA is not in /usr/local/cuda or you have several versions, e.g. both /usr/local/cuda 8.0 and /usr/local/cuda 9.0 , set CUDA_HOME to the base path to use when compiling make compile . API Documentation: blocksparse.matmul class BlocksparseMatMul(object) def __init__(self, layout, block_size 32, feature_axis 1) layout: a 2d array of ones and zeros specifying the block layout block_size: values 32, 16, 8 supported feature_axis: when block_size is less than 32 memory access becomes far more efficient with a (C,N) activation layout shape helpers for generating tensors (N minibatch) self.w_shape def i_shape(self, N) def o_shape(self, N) return the coordinates (c,k) in the layout that corresponds to a given block id def block_coord(self, block) experimental ortho init def ortho_init(self) in practice, identity_init + layernorm is all you need for initialization with gpu True the init is performed by kernel on the device def identity_init(self, gpu False) To implement weight normalization. In practice, layernorm works much better. def l2_normalize(self, W, gain None, epsilon 1e 6, dtype np.float32) def __call__(self, I, W, dw_dtype tf.float32) Execute the op. Note that the weight variable is independant from the bsmm object. This allows multiple weights to be tied to the same bsmm layout. dw_dtype: allows control over dw precision format. def group_param_grads(param_grad, group_size 8, cast32 True) param_grad: the tensorflow parameter gradient for a give bsmm weight variable (returned from tf.gradients) group_size: desired group size, up to 8 supported This causes the tf graph to be rewritten so that weight grad matmuls from different time steps (and shared weights across time) are combined into a more efficient single matmul. class SparseProj(object): def __init__(self, nhidden, nproj None, proj_stride None, block_size 32, gather_lut None) Experimental class to support dense to sparse and sparse to dense projections. Basically the same as the tensorflow ops but faster and support alternate precision formats. They assume a unique 1 to 1 mapping so atomics need not be used on backward ops. def gather(self, x) def scatter(self, x) def scatter_add(self, x, y) def scatter_mul(self, x, y) blocksparse.conv class BlocksparseConv(object): def __init__(self, BCK, TRS, DHW, MPQ None, strides (1,1,1), dilates (1,1,1), padding SAME , edge_bias False) BCK: ( block(B)/input(C)/output(K) feature dims ( (c0, c1, c2, ...), (k0, k1, k2, ...) ), block 0 c,k are indeces into C,K dims ( (c0, c1, c2, ...), (k0, k1, k2, ...) ), block 1 ( (c0, c1, c2, ...), (k0, k1, k2, ...) ), block 2 ... ) TRS: (T,R,S) or (R,S) or (S,) filter spatial size dims DHW: (D,H,W) or (H,W) or (W,) input image spatial size dims MPQ: (M,P,Q) or (P,Q) or (Q,) or None output image spatial size dims (used for ambiguous dims in strided transpose conv) strides: (1,1,1) or (1,1) or (1,) dilates: (1,1,1) or (1,1) or (1,) padding: (1,1,1) or (1,1) or (1,) or SAME or VALID edge_bias: True/False shape helpers for setting up variables or test tensors def edge_bias_shape(self) def f_shape(self, block None) def i_shape(self, N) def o_shape(self, N) execute op passing in param variables and input def __call__(self, F, I, edge_bias None): for implementing weight norm def l2_normalize(self, F, gain None, epsilon 1e 6, dtype np.float32): class BlocksparseDeconv(BlocksparseConv) def __init__(self, BCK, TRS, DHW, MPQ None, strides (1,1,1), dilates (1,1,1), padding SAME , edge_bias False) Deconvolution. Same params as above. def cwise_linear(x, a None, b None) In the NCHW tensor format, tensorflow is extremely slow at implementing simple broadcasting ops on the middle C dim. This lets you do: y a x + b y a x y x + b Where a and b are of shape (1,C,1,1) This is useful for ops like weight norm. blocksparse.ew same as tf ops but generally more efficient and allow custom precision formats def add(x, y, name None) def multiply(x, y, name None) def subtract(x, y, name None) def divide(x, y, name None) def maximum(x, y, name None) def minimum(x, y, name None) def negative(x, name None) def reciprocal(x, name None) def square(x, name None) def sqrt(x, name None) def exp(x, name None) def log(x, name None) def sigmoid(x, name None) def tanh(x, name None) def relu(x, name None) def elu (x, alpha 1.0, name None) here args can be the 4 independant gate tensors or a single merged gate tensor (which gets split in 4 internally) def fused_lstm_gates(c, args, name None) def split4(x) def concat4(x0, x1, x2, x3) A custom cast op to help explore novel precision formats def float_cast(x, dtype, dx_dtype None) a much faster (and non deterministic) dropout op also supports novel precision formats def dropout(x, keep_prob 0.8, mask None) an op to be used in tf.gradients when adding together multiple contributions of a gradient. note that only 8 inputs are supported as you'd never want a single op to consume all possible inputs before it starts executing in the graph (and hence reducing the memory footprint) def add_n8(xs, name None) blocksparse.norms def layer_norm(x, g, b, axis 1, epsilon 1e 6, relu False) Very fast layernorm to support both bsmm feature_axis activation layouts. Also inlcludes optional integrated relu (applied to end) basic batch norm ops for the NCHW layout def batch_norm(x, g, b, epsilon 1e 6) def batch_norm_inference(x, g, b, m, v, epsilon 1e 6)",Sentiment Analysis,Sentiment Analysis 2058,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Status: Archive (code is provided as is, no updates expected) Generating Reviews and Discovering Sentiment Code for Learning to Generate Reviews and Discovering Sentiment (Alec Radford, Rafal Jozefowicz, Ilya Sutskever). Right now the code supports using the language model as a feature extractor. from encoder import Model model Model() text 'demo!' text_features model.transform(text) A demo of using the features for sentiment classification as reported in the paper for the binary version of the Stanford Sentiment Treebank (SST) is included as sst_binary_demo.py . Additionally this demo visualizes the distribution of the sentiment unit like Figure 3 in the paper. ! Sentiment Unit Visualization (/data/sst_binary_sentiment_unit_vis.png) Additionally there is a PyTorch port made by @guillitte which demonstrates how to train a model from scratch. This repo also contains the parameters of the multiplicative LSTM model with 4,096 units we trained on the Amazon product review dataset introduced in McAuley et al. (2015) 1 . The dataset in de duplicated form contains over 82 million product reviews from May 1996 to July 2014 amounting to over 38 billion training bytes. Training took one month across four NVIDIA Pascal GPUs, with our model processing 12,500 characters per second. 1 McAuley, Julian, Pandey, Rahul, and Leskovec, Jure. Inferring networks of substitutable and complementary products. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pp. 785–794. ACM, 2015.",Sentiment Analysis,Sentiment Analysis 2089,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Text Preprocessing in Neural Text Classification Jose Camacho Collados and Mohammad Taher Pilehvar The following repository includes the pre trained word embeddings and preprocessed text classification datasets for the paper On the Role of Text Preprocessing in Neural Network Architectures: An Evaluation Study on Text Categorization and Sentiment Analysis . Pre trained word embeddings We release the 300 dimension word embeddings used in our experiments as binary bin files. The embeddings were trained on the UMBC corpus with the following preprocessing techniques: Vanilla (simple tokenization): Download here 1.8 GB Lowercased : Download here 1.6 GB Lemmatized : Download here 1.7 GB Multiword grouped : Download here 2.1 GB Preprocessed datasets We also release the text categorization and sentiment analysis datasets already preprocessed: Text categorization : Available here Sentiment analysis : Available here Note 1 : If you use any of these datasets, please acknowledge the original sources (you can find them in the reference paper). Note 2 : For each class file in the dataset directories, each line corresponds to an instance in the corpus, be it a phrase, sentence or document (depending on the dataset). Code The code to run our experiments is available in the following complementary repository: Reference paper If you use any of these resources, please cite the following paper : bash @InProceedings{camacho:preprocessing2018, author Camacho Collados, Jose and Pilehvar, Mohammad Taher , title On the Role of Text Preprocessing in Neural Network Architectures: An Evaluation Study on Text Categorization and Sentiment Analysis , booktitle Proceedings of the EMNLP Workshop on Analyzing and interpreting neural networks for NLP , year 2018 , publisher Association for Computational Linguistics , location Brussels, Belgium }",Sentiment Analysis,Sentiment Analysis 2133,Natural Language Processing,Natural Language Processing,Natural Language Processing,"! fastrtext Travis CI Build Status AppVeyor Build Status CRAN_Status_Badge CRAN_time_from_release CRAN_Download License: MIT codecov Follow R Documentation Release Notes FAQ Multilingual pretrained models R wrapper for fastText C++ code from Facebook. FastText is an open source, free, lightweight library that allows users to learn text representations and text classifiers. It works on standard, generic hardware. Models can later be reduced in size to even fit on mobile devices. License © Contributors, 2019. Licensed under a MIT license.",Sentiment Analysis,Sentiment Analysis 2136,Natural Language Processing,Natural Language Processing,Natural Language Processing,"fastText with modified factored CBOW Table of contents Introduction ( introduction) Full documentation ( full documentation) References ( references) Introduction Full documentation Invoke a command without arguments to list available arguments and their default values: $ ./fasttext supervised Empty input or output path. The following arguments are mandatory: input training file path output output file path The following arguments are optional: verbose verbosity level 2 The following arguments for the dictionary are optional: minCount minimal number of word occurences 1 minCountLabel minimal number of label occurences 0 wordNgrams max length of word ngram 1 bucket number of buckets 2000000 minn min length of char ngram 0 maxn max length of char ngram 0 t sampling threshold 0.0001 label labels prefix __label__ factored delimiter factor delimiter prefix 0x04 The following arguments for training are optional: lr learning rate 0.1 lrUpdateRate change the rate of updates for the learning rate 100 dim size of word vectors 100 ws size of the context window 5 epoch number of epochs 5 neg number of negatives sampled 5 loss loss function {ns, hs, softmax} softmax thread number of threads 12 pretrainedVectors pretrained word vectors for supervised learning saveOutput whether output params should be saved 0 The following arguments for quantization are optional: cutoff number of words and ngrams to retain 0 retrain finetune embeddings if a cutoff is applied 0 qnorm quantizing the norm separately 0 qout quantizing the classifier 0 dsub size of each sub vector 2 The website of the original fastText is located here References for the fastText with factored CBOW 1 References from the original fastText Please cite 1 ( enriching word vectors with subword information) if using this code for learning word representations or 2 ( bag of tricks for efficient text classification) if using for text classification. Enriching Word Vectors with Subword Information 1 P. Bojanowski\ , E. Grave\ , A. Joulin, T. Mikolov, Enriching Word Vectors with Subword Information @article{bojanowski2017enriching, title {Enriching Word Vectors with Subword Information}, author {Bojanowski, Piotr and Grave, Edouard and Joulin, Armand and Mikolov, Tomas}, journal {Transactions of the Association for Computational Linguistics}, volume {5}, year {2017}, issn {2307 387X}, pages {135 146} } Bag of Tricks for Efficient Text Classification 2 A. Joulin, E. Grave, P. Bojanowski, T. Mikolov, Bag of Tricks for Efficient Text Classification @InProceedings{joulin2017bag, title {Bag of Tricks for Efficient Text Classification}, author {Joulin, Armand and Grave, Edouard and Bojanowski, Piotr and Mikolov, Tomas}, booktitle {Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers}, month {April}, year {2017}, publisher {Association for Computational Linguistics}, pages {427 431}, } FastText.zip: Compressing text classification models 3 A. Joulin, E. Grave, P. Bojanowski, M. Douze, H. Jégou, T. Mikolov, FastText.zip: Compressing text classification models @article{joulin2016fasttext, title {FastText.zip: Compressing text classification models}, author {Joulin, Armand and Grave, Edouard and Bojanowski, Piotr and Douze, Matthijs and J{\'e}gou, H{\'e}rve and Mikolov, Tomas}, journal {arXiv preprint arXiv:1612.03651}, year {2016} } (\ These authors contributed equally.)",Sentiment Analysis,Sentiment Analysis 2159,Natural Language Processing,Natural Language Processing,Natural Language Processing,This is the emotion representation called emo2vec we published in WASSA 2018 paper Emo2Vec: Learning Generalized Emotion Representation by Multi task Training Embedding download Link:,Sentiment Analysis,Sentiment Analysis 2163,Natural Language Processing,Natural Language Processing,Natural Language Processing,"ToWE Task oriented Word Embedding This software implements the task oriented word embedding for text classification. Installation type make Example The example file of domain specific words is provided in domainwords.txt . To run: ./towe train 5groupcorpus output output.txt cbow 1 size 300 negative 25 hs 1 sample 1e 4 threads 30 binary 0 iter 20 alpha2 0.1 beta 0.1 group_count 50 list domainwords.txt Here, train is the training corpus group_count is the size of random sample for constructing the word pairs of salient word (detailed in equation 3 in the paper). alpha2 is the learning rate of the function aware component beta is the combination parameter (denoted as lambda in the paper) list is the file of domain specific words, where each line contains salient words for one category. The learnt word embedding will be written to output.txt, specified by output.",Sentiment Analysis,Sentiment Analysis 2206,Natural Language Processing,Natural Language Processing,Natural Language Processing,"SentAnalysis Sentiment analyser using the Rotten Tomatoes movie review dataset (RNN, Pytorch) The aim of this project is to implement a machine learning model for a sentiment analysis task using the Rotten Tomatoes movie review dataset. During this project you are asked to label phrases on a scale of five values : 0. negative 1. somewhat negative 2. neutral, 3. somewhat positive, 4. positive. Obstacles like sentence negation, sarcasm, terseness, language ambiguity, and many others make this task very challenging. Project Roadmap 1. Study and plot the training data 2. Split the data (train/dev) 3. Train and evaluate a vanilla deep recurrent neural network (RNN) 4. Use the Pytorch framework to train the RNN network. 5. Optimize the model and propose enhancement (Regularization, network init, new architecture) References:",Sentiment Analysis,Sentiment Analysis 2240,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Table of contents Introduction ( introduction) Requirements ( requirements) Building fastDNA ( building fastdna) Command line interface ( command line interface) Full documentation ( full documentation) Python ( python) References ( references) Continuous Embedding of DNA reads and application to metagenomics ( continuous embedding of dna reads and application to metagenomics) Enriching Word Vectors with Subword Information ( enriching word vectors with subword information) Bag of Tricks for Efficient Text Classification ( bag of tricks for efficient text classification) FastText.zip: Compressing text classification models ( fasttextzip compressing text classification models) License ( license) Introduction fastDNA ( continuous embedding of dna reads and application to metagenomics) is a library for classification of short DNA sequences. It is adapted from the fastText library. Requirements Generally, fastText builds on modern Mac OS and Linux distributions. Since it uses some C++11 features, it requires a compiler with good C++11 support. These include : (g++ 4.7.2 or newer) or (clang 3.3 or newer) Compilation is carried out using a Makefile, so you will need to have a working make . Building fastDNA $ git clone $ cd fastDNA $ make This will produce object files for all the classes as well as the main binary fastdna . For a trial run: $ cd test $ sh test.sh This should train and evaluate a small model on the toy dataset provided. DNA short read classification In order to train a dna classifier using the method described in 1 ( continuous embedding of dna reads and application to metagenomics), use: $ ./fastdna supervised input train.fasta labels labels.txt output model where train.fasta is a FASTA file containing the full reference genomes and labels.txt is a text file containing the genome labels (one label per line). This will output two files: model.bin and model.vec . Once the model was trained, you can evaluate it by computing the precision and recall at k (P@k and R@k) on a test set using: $ ./fastdna test model.bin test.fasta test_labels.txt n where test.fasta is a FASTA file containing the DNA fragments to be classified, and test_labels.txt contains the labels for each of the fragments. The argument n is optional, and is equal to 1 by default. In order to obtain the n most likely labels for a set of reads, use: $ ./fastdna predict model.bin test.fasta n or use predict prob to also get the probability for each label $ ./fastdna predict prob model.bin test.fasta n Doing so will print to the standard output the n most likely labels for each line. The argument n is optional, and equal to 1 by default. If you want to compute vector representations of DNA sequences, please use: $ ./fastdna print sentence vectors model.bin < text.fasta This assumes that the text.fasta file contains the DNA sequences that you want to get vectors for. The program will output one vector representation per line in the file. You can also quantize a supervised model to reduce its memory usage with the following command: $ ./fastdna quantize output model This will create a .ftz file with a smaller memory footprint. All the standard functionality, like test or predict work the same way on the quantized models: $ ./fastdna test model.ftz test.fasta test_labels.txt The quantization procedure follows the steps described in 3 ( fasttextzip compressing text classification models). Full documentation Invoke a command without arguments to list available arguments and their default values: $ ./fastdna supervised Empty input or output path. The following arguments are mandatory: input training file path output output file path The following arguments are optional: verbose verbosity level 2 The following arguments for the dictionary are optional: minn min length of char ngram 0 maxn max length of char ngram 0 label labels prefix __label__ The following arguments for training are optional: lr learning rate 0.1 lrUpdateRate change the rate of updates for the learning rate 100 dim size of word vectors 100 noise mutation rate (/100,000) 0 length length of fragments for training 200 epoch number of epochs 5 loss loss function {ns, hs, softmax} softmax thread number of threads 12 pretrainedVectors pretrained word vectors for supervised learning loadModel pretrained model for supervised learning saveOutput whether output params should be saved false freezeEmbeddings model does not update the embedding vectors false The following arguments for quantization are optional: cutoff number of words and ngrams to retain 0 retrain whether embeddings are finetuned if a cutoff is applied false qnorm whether the norm is quantized separately false qout whether the classifier is quantized false dsub size of each sub vector 2 Defaults may vary by mode. (Word representation modes skipgram and cbow use a default minCount of 5.) Python The python scripts require numpy and scikit learn for evaluating predictions. References Continuous Embedding of DNA reads, and application to metagenomics 1 R. Menegaux, J. Vert, Continuous Embedding of DNA reads, and application to metagenomics @article{menegaux2018continuous, title {Continuous Embedding of DNA reads and application to metagenomics}, author {Menegaux, Romain and Vert, Jean Philippe}, journal {bioRxiv preprint 335943}, year {2018} } Enriching Word Vectors with Subword Information 2 P. Bojanowski, E. Grave, A. Joulin, T. Mikolov, Enriching Word Vectors with Subword Information @article{bojanowski2016enriching, title {Enriching Word Vectors with Subword Information}, author {Bojanowski, Piotr and Grave, Edouard and Joulin, Armand and Mikolov, Tomas}, journal {arXiv preprint arXiv:1607.04606}, year {2016} } Bag of Tricks for Efficient Text Classification 3 A. Joulin, E. Grave, P. Bojanowski, T. Mikolov, Bag of Tricks for Efficient Text Classification @article{joulin2016bag, title {Bag of Tricks for Efficient Text Classification}, author {Joulin, Armand and Grave, Edouard and Bojanowski, Piotr and Mikolov, Tomas}, journal {arXiv preprint arXiv:1607.01759}, year {2016} } FastText.zip: Compressing text classification models 4 A. Joulin, E. Grave, P. Bojanowski, M. Douze, H. Jégou, T. Mikolov, FastText.zip: Compressing text classification models @article{joulin2016fasttext, title {FastText.zip: Compressing text classification models}, author {Joulin, Armand and Grave, Edouard and Bojanowski, Piotr and Douze, Matthijs and J{\'e}gou, H{\'e}rve and Mikolov, Tomas}, journal {arXiv preprint arXiv:1612.03651}, year {2016} } Contact: romain.menegaux@mines paristech.fr (mailto:romain.menegaux@mines paristech.fr) License fastText is BSD licensed. An additional patent grant is also provided",Sentiment Analysis,Sentiment Analysis 2246,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Table of contents Introduction ( introduction) Resources ( resources) Models ( models) Supplementary data ( supplementary data) FAQ ( faq) Cheatsheet ( cheatsheet) Requirements ( requirements) Building fastText ( building fasttext) Getting the source code ( getting the source code) Building fastText using make (preferred) ( building fasttext using make preferred) Building fastText using cmake ( building fasttext using cmake) Building fastText for Python ( building fasttext for python) Example use cases ( example use cases) Word representation learning ( word representation learning) Obtaining word vectors for out of vocabulary words ( obtaining word vectors for out of vocabulary words) Text classification ( text classification) Full documentation ( full documentation) References ( references) Enriching Word Vectors with Subword Information ( enriching word vectors with subword information) Bag of Tricks for Efficient Text Classification ( bag of tricks for efficient text classification) FastText.zip: Compressing text classification models ( fasttextzip compressing text classification models) Join the fastText community ( join the fasttext community) License ( license) Introduction fastText is a library for efficient learning of word representations and sentence classification. Resources Models Recent state of the art English word vectors . Word vectors for 157 languages trained on Wikipedia and Crawl . Models for language identification and various supervised tasks . Supplementary data The preprocessed YFCC100M data used in 2 . FAQ You can find answers to frequently asked questions on our website . Cheatsheet We also provide a cheatsheet full of useful one liners. Requirements We are continously building and testing our library, CLI and Python bindings under various docker images using circleci . Generally, fastText builds on modern Mac OS and Linux distributions. Since it uses some C++11 features, it requires a compiler with good C++11 support. These include : (g++ 4.7.2 or newer) or (clang 3.3 or newer) Compilation is carried out using a Makefile, so you will need to have a working make . If you want to use cmake you need at least version 2.8.9. One of the oldest distributions we successfully built and tested the CLI under is Debian wheezy . For the word similarity evaluation script you will need: Python 2.6 or newer NumPy & SciPy For the python bindings (see the subdirectory python) you will need: Python version 2.7 or > 3.4 NumPy & SciPy pybind11 One of the oldest distributions we successfully built and tested the Python bindings under is Debian jessie . If these requirements make it impossible for you to use fastText, please open an issue and we will try to accommodate you. Building fastText We discuss building the latest stable version of fastText. Getting the source code You can find our latest stable release in the usual place. There is also the master branch that contains all of our most recent work, but comes along with all the usual caveats of an unstable branch. You might want to use this if you are a developer or power user. Building fastText using make (preferred) $ wget $ unzip v0.1.0.zip $ cd fastText 0.1.0 $ make This will produce object files for all the classes as well as the main binary fasttext . If you do not plan on using the default system wide compiler, update the two macros defined at the beginning of the Makefile (CC and INCLUDES). Building fastText using cmake For now this is not part of a release, so you will need to clone the master branch. $ git clone $ cd fastText $ mkdir build && cd build && cmake .. $ make && make install This will create the fasttext binary and also all relevant libraries (shared, static, PIC). Building fastText for Python For now this is not part of a release, so you will need to clone the master branch. $ git clone $ cd fastText $ pip install . For further information and introduction see python/README.md Example use cases This library has two main use cases: word representation learning and text classification. These were described in the two papers 1 ( enriching word vectors with subword information) and 2 ( bag of tricks for efficient text classification). Word representation learning In order to learn word vectors, as described in 1 ( enriching word vectors with subword information), do: $ ./fasttext skipgram input data.txt output model where data.txt is a training file containing UTF 8 encoded text. By default the word vectors will take into account character n grams from 3 to 6 characters. At the end of optimization the program will save two files: model.bin and model.vec . model.vec is a text file containing the word vectors, one per line. model.bin is a binary file containing the parameters of the model along with the dictionary and all hyper parameters. The binary file can be used later to compute word vectors or to restart the optimization. Obtaining word vectors for out of vocabulary words The previously trained model can be used to compute word vectors for out of vocabulary words. Provided you have a text file queries.txt containing words for which you want to compute vectors, use the following command: $ ./fasttext print word vectors model.bin < queries.txt This will output word vectors to the standard output, one vector per line. This can also be used with pipes: $ cat queries.txt ./fasttext print word vectors model.bin See the provided scripts for an example. For instance, running: $ ./word vector example.sh will compile the code, download data, compute word vectors and evaluate them on the rare words similarity dataset RW Thang et al. 2013 . Text classification This library can also be used to train supervised text classifiers, for instance for sentiment analysis. In order to train a text classifier using the method described in 2 ( bag of tricks for efficient text classification), use: $ ./fasttext supervised input train.txt output model $ ./fasttext supervised input train.txt output model inputModel oldModel.bin incr for incremental training on new data, not from scratch where train.txt is a text file containing a training sentence per line along with the labels. By default, we assume that labels are words that are prefixed by the string __label__ . This will output two files: model.bin and model.vec . Once the model was trained, you can evaluate it by computing the precision and recall at k (P@k and R@k) on a test set using: $ ./fasttext test model.bin test.txt k The argument k is optional, and is equal to 1 by default. In order to obtain the k most likely labels for a piece of text, use: $ ./fasttext predict model.bin test.txt k or use predict prob to also get the probability for each label $ ./fasttext predict prob model.bin test.txt k where test.txt contains a piece of text to classify per line. Doing so will print to the standard output the k most likely labels for each line. The argument k is optional, and equal to 1 by default. See classification example.sh for an example use case. In order to reproduce results from the paper 2 ( bag of tricks for efficient text classification), run classification results.sh , this will download all the datasets and reproduce the results from Table 1. If you want to compute vector representations of sentences or paragraphs, please use: $ ./fasttext print sentence vectors model.bin < text.txt This assumes that the text.txt file contains the paragraphs that you want to get vectors for. The program will output one vector representation per line in the file. You can also quantize a supervised model to reduce its memory usage with the following command: $ ./fasttext quantize output model This will create a .ftz file with a smaller memory footprint. All the standard functionality, like test or predict work the same way on the quantized models: $ ./fasttext test model.ftz test.txt The quantization procedure follows the steps described in 3 ( fasttextzip compressing text classification models). You can run the script quantization example.sh for an example. Full documentation Invoke a command without arguments to list available arguments and their default values: $ ./fasttext supervised Empty input or output path. The following arguments are mandatory: input training file path output output file path The following arguments are optional: inputModel trained model file path (required only for incremental training) verbose verbosity level 2 The following arguments for the dictionary are optional: minCount minimal number of word occurences 1 minCountLabel minimal number of label occurences 0 wordNgrams max length of word ngram 1 bucket number of buckets 2000000 minn min length of char ngram 0 maxn max length of char ngram 0 t sampling threshold 0.0001 label labels prefix __label__ The following arguments for training are optional: lr learning rate 0.1 lrUpdateRate change the rate of updates for the learning rate 100 dim size of word vectors 100 ws size of the context window 5 epoch number of epochs 5 neg number of negatives sampled 5 loss loss function {ns, hs, softmax} softmax thread number of threads 12 pretrainedVectors pretrained word vectors for supervised learning saveOutput whether output params should be saved 0 incr incremental training, default false The following arguments for quantization are optional: cutoff number of words and ngrams to retain 0 retrain finetune embeddings if a cutoff is applied 0 qnorm quantizing the norm separately 0 qout quantizing the classifier 0 dsub size of each sub vector 2 Defaults may vary by mode. (Word representation modes skipgram and cbow use a default minCount of 5.) References Please cite 1 ( enriching word vectors with subword information) if using this code for learning word representations or 2 ( bag of tricks for efficient text classification) if using for text classification. Enriching Word Vectors with Subword Information 1 P. Bojanowski\ , E. Grave\ , A. Joulin, T. Mikolov, Enriching Word Vectors with Subword Information @article{bojanowski2016enriching, title {Enriching Word Vectors with Subword Information}, author {Bojanowski, Piotr and Grave, Edouard and Joulin, Armand and Mikolov, Tomas}, journal {arXiv preprint arXiv:1607.04606}, year {2016} } Bag of Tricks for Efficient Text Classification 2 A. Joulin, E. Grave, P. Bojanowski, T. Mikolov, Bag of Tricks for Efficient Text Classification @article{joulin2016bag, title {Bag of Tricks for Efficient Text Classification}, author {Joulin, Armand and Grave, Edouard and Bojanowski, Piotr and Mikolov, Tomas}, journal {arXiv preprint arXiv:1607.01759}, year {2016} } FastText.zip: Compressing text classification models 3 A. Joulin, E. Grave, P. Bojanowski, M. Douze, H. Jégou, T. Mikolov, FastText.zip: Compressing text classification models @article{joulin2016fasttext, title {FastText.zip: Compressing text classification models}, author {Joulin, Armand and Grave, Edouard and Bojanowski, Piotr and Douze, Matthijs and J{\'e}gou, H{\'e}rve and Mikolov, Tomas}, journal {arXiv preprint arXiv:1612.03651}, year {2016} } (\ These authors contributed equally.) Join the fastText community Facebook page: Google group: Contact: egrave@fb.com (mailto:egrave@fb.com), bojanowski@fb.com (mailto:bojanowski@fb.com), ajoulin@fb.com (mailto:ajoulin@fb.com), tmikolov@fb.com (mailto:tmikolov@fb.com) See the CONTRIBUTING file for information about how to help out. License fastText is BSD licensed. We also provide an additional patent grant.",Sentiment Analysis,Sentiment Analysis 2289,Natural Language Processing,Natural Language Processing,Natural Language Processing,"fasttext Build Status PyPI version fasttext is a Python interface for Facebook fastText . Update The fasttext pypi is now maintained by Facebook AI Research team. Read the documentation here: fastText python binding . Requirements fasttext support Python 2.6 or newer. It requires Cython in order to build the C++ extension. Installation shell pip install fasttext Example usage This package has two main use cases: word representation learning and text classification. These were described in the two papers 1 ( enriching word vectors with subword information) and 2 ( bag of tricks for efficient text classification). Word representation learning In order to learn word vectors, as described in 1 ( enriching word vectors with subword information), we can use fasttext.skipgram and fasttext.cbow function like the following: python import fasttext Skipgram model model fasttext.skipgram('data.txt', 'model') print model.words list of words in dictionary CBOW model model fasttext.cbow('data.txt', 'model') print model.words list of words in dictionary where data.txt is a training file containing utf 8 encoded text. By default the word vectors will take into account character n grams from 3 to 6 characters. At the end of optimization the program will save two files: model.bin and model.vec . model.vec is a text file containing the word vectors, one per line. model.bin is a binary file containing the parameters of the model along with the dictionary and all hyper parameters. The binary file can be used later to compute word vectors or to restart the optimization. The following fasttext(1) command is equivalent shell Skipgram model ./fasttext skipgram input data.txt output model CBOW model ./fasttext cbow input data.txt output model Obtaining word vectors for out of vocabulary words The previously trained model can be used to compute word vectors for out of vocabulary words. python print model 'king' get the vector of the word 'king' the following fasttext(1) command is equivalent: shell echo king ./fasttext print vectors model.bin This will output the vector of word king to the standard output. Load pre trained model We can use fasttext.load_model to load pre trained model: python model fasttext.load_model('model.bin') print model.words list of words in dictionary print model 'king' get the vector of the word 'king' Text classification This package can also be used to train supervised text classifiers and load pre trained classifier from fastText. In order to train a text classifier using the method described in 2 ( bag of tricks for efficient text classification), we can use the following function: python classifier fasttext.supervised('data.train.txt', 'model') equivalent as fasttext(1) command: shell ./fasttext supervised input data.train.txt output model where data.train.txt is a text file containing a training sentence per line along with the labels. By default, we assume that labels are words that are prefixed by the string __label__ . We can specify the label prefix with the label_prefix param: python classifier fasttext.supervised('data.train.txt', 'model', label_prefix '__label__') equivalent as fasttext(1) command: shell ./fasttext supervised input data.train.txt output model label '__label__' This will output two files: model.bin and model.vec . Once the model was trained, we can evaluate it by computing the precision at 1 (P@1) and the recall on a test set using classifier.test function: python result classifier.test('test.txt') print 'P@1:', result.precision print 'R@1:', result.recall print 'Number of examples:', result.nexamples This will print the same output to stdout as: shell ./fasttext test model.bin test.txt In order to obtain the most likely label for a list of text, we can use classifer.predict method: python texts 'example very long text 1', 'example very longtext 2' labels classifier.predict(texts) print labels Or with the probability labels classifier.predict_proba(texts) print labels We can specify k value to get the k best labels from classifier: python labels classifier.predict(texts, k 3) print labels Or with the probability labels classifier.predict_proba(texts, k 3) print labels This interface is equivalent as fasttext(1) predict command. The same model with the same input set will have the same prediction. API documentation Skipgram model Train & load skipgram model python model fasttext.skipgram(params) List of available params and their default value: input_file training file path (required) output output file path (required) lr learning rate 0.05 lr_update_rate change the rate of updates for the learning rate 100 dim size of word vectors 100 ws size of the context window 5 epoch number of epochs 5 min_count minimal number of word occurences 5 neg number of negatives sampled 5 word_ngrams max length of word ngram 1 loss loss function {ns, hs, softmax} ns bucket number of buckets 2000000 minn min length of char ngram 3 maxn max length of char ngram 6 thread number of threads 12 t sampling threshold 0.0001 silent disable the log output from the C++ extension 1 encoding specify input_file encoding utf 8 Example usage: python model fasttext.skipgram('train.txt', 'model', lr 0.1, dim 300) CBOW model Train & load CBOW model python model fasttext.cbow(params) List of available params and their default value: input_file training file path (required) output output file path (required) lr learning rate 0.05 lr_update_rate change the rate of updates for the learning rate 100 dim size of word vectors 100 ws size of the context window 5 epoch number of epochs 5 min_count minimal number of word occurences 5 neg number of negatives sampled 5 word_ngrams max length of word ngram 1 loss loss function {ns, hs, softmax} ns bucket number of buckets 2000000 minn min length of char ngram 3 maxn max length of char ngram 6 thread number of threads 12 t sampling threshold 0.0001 silent disable the log output from the C++ extension 1 encoding specify input_file encoding utf 8 Example usage: python model fasttext.cbow('train.txt', 'model', lr 0.1, dim 300) Load pre trained model File .bin that previously trained or generated by fastText can be loaded using this function python model fasttext.load_model('model.bin', encoding 'utf 8') Attributes and methods for the model Skipgram and CBOW model have the following atributes & methods python model.model_name Model name model.words List of words in the dictionary model.dim Size of word vector model.ws Size of context window model.epoch Number of epochs model.min_count Minimal number of word occurences model.neg Number of negative sampled model.word_ngrams Max length of word ngram model.loss_name Loss function name model.bucket Number of buckets model.minn Min length of char ngram model.maxn Max length of char ngram model.lr_update_rate Rate of updates for the learning rate model.t Value of sampling threshold model.encoding Encoding of the model model word Get the vector of specified word Supervised model Train & load the classifier python classifier fasttext.supervised(params) List of available params and their default value: input_file training file path (required) output output file path (required) label_prefix label prefix '__label__' lr learning rate 0.1 lr_update_rate change the rate of updates for the learning rate 100 dim size of word vectors 100 ws size of the context window 5 epoch number of epochs 5 min_count minimal number of word occurences 1 neg number of negatives sampled 5 word_ngrams max length of word ngram 1 loss loss function {ns, hs, softmax} softmax bucket number of buckets 0 minn min length of char ngram 0 maxn max length of char ngram 0 thread number of threads 12 t sampling threshold 0.0001 silent disable the log output from the C++ extension 1 encoding specify input_file encoding utf 8 pretrained_vectors pretrained word vectors (.vec file) for supervised learning Example usage: python classifier fasttext.supervised('train.txt', 'model', label_prefix '__myprefix__', thread 4) Load pre trained classifier File .bin that previously trained or generated by fastText can be loaded using this function. shell ./fasttext supervised input train.txt output classifier label 'some_prefix' python classifier fasttext.load_model('classifier.bin', label_prefix 'some_prefix') Test classifier This is equivalent as fasttext(1) test command. The test using the same model and test set will produce the same value for the precision at one and the number of examples. python result classifier.test(params) Properties result.precision Precision at one result.recall Recall at one result.nexamples Number of test examples The param k is optional, and equal to 1 by default. Predict the most likely label of texts This interface is equivalent as fasttext(1) predict command. texts is an array of string python labels classifier.predict(texts, k) Or with probability labels classifier.predict_proba(texts, k) The param k is optional, and equal to 1 by default. Attributes and methods for the classifier Classifier have the following atributes & methods python classifier.labels List of labels classifier.label_prefix Prefix of the label classifier.dim Size of word vector classifier.ws Size of context window classifier.epoch Number of epochs classifier.min_count Minimal number of word occurences classifier.neg Number of negative sampled classifier.word_ngrams Max length of word ngram classifier.loss_name Loss function name classifier.bucket Number of buckets classifier.minn Min length of char ngram classifier.maxn Max length of char ngram classifier.lr_update_rate Rate of updates for the learning rate classifier.t Value of sampling threshold classifier.encoding Encoding that used by classifier classifier.test(filename, k) Test the classifier classifier.predict(texts, k) Predict the most likely label classifier.predict_proba(texts, k) Predict the most likely label include their probability The param k for classifier.test , classifier.predict and classifier.predict_proba is optional, and equal to 1 by default. References Enriching Word Vectors with Subword Information 1 P. Bojanowski\ , E. Grave\ , A. Joulin, T. Mikolov, Enriching Word Vectors with Subword Information @article{bojanowski2016enriching, title {Enriching Word Vectors with Subword Information}, author {Bojanowski, Piotr and Grave, Edouard and Joulin, Armand and Mikolov, Tomas}, journal {arXiv preprint arXiv:1607.04606}, year {2016} } Bag of Tricks for Efficient Text Classification 2 A. Joulin, E. Grave, P. Bojanowski, T. Mikolov, Bag of Tricks for Efficient Text Classification @article{joulin2016bag, title {Bag of Tricks for Efficient Text Classification}, author {Joulin, Armand and Grave, Edouard and Bojanowski, Piotr and Mikolov, Tomas}, journal {arXiv preprint arXiv:1607.01759}, year {2016} } (\ These authors contributed equally.) Join the fastText community Facebook page: Google group:",Sentiment Analysis,Sentiment Analysis 2309,Natural Language Processing,Natural Language Processing,Natural Language Processing,"InceptionModel_SentimentAnalysis Performing sentiment analysis on Data using Inception Model. Fasttext is used for Word vectorization. The code is a part of Test I participated in during an Interview Process. It is built around a recent paper on the topic of NLP. The paper: BB twtr at SemEval 2017 Task 4: Here in 1. First part is Pre processing of Data to remove unnecessary charactres and repetative words (Temed as Stemming & Lemmatization) 2. Next, Trimming and padding of each sentence to MAX_SEQUENCE_LENGTH 3. Splitting of Dataset in Training, Validation and Testing part 4. Converted Words to Vectors using Fastext Embeddings. > Fasttext is better as it divides the word in smaller parts (called n grams), and then generate the model as per those n grams. This help in cases of those words which are rarely used, OR are not present in training set. As the probabilty of n grams repeating in vocab increases as compared to that rare word. Building the Model 1. Using CNN CNN is applied using different filter sizes 2x2, 3x3 and 4x4 as per Paper. Passing thorugh Dropout layers of 50% Contatenating results of above and passing through a Fully Connected Layer 2. Using LSTM Birectional LSTM is applied with dropout 50% and passed through Dense Layer.",Sentiment Analysis,Sentiment Analysis 2493,Natural Language Processing,Natural Language Processing,Natural Language Processing,"GradientReversal Implements the Gradient Reversal layer from Unsupervised Domain Adaptation by Backpropagation and Domain Adversarial Training of Neural Networks in Tensorflow. The forward pass is the identify function, but the backward pass multiplies the gradients by lambda.",Sentiment Analysis,Sentiment Analysis 2534,Natural Language Processing,Natural Language Processing,Natural Language Processing,"TensorFlow Models This repository contains a number of different models implemented in TensorFlow : The official models (official) are a collection of example models that use TensorFlow's high level APIs. They are intended to be well maintained, tested, and kept up to date with the latest stable TensorFlow API. They should also be reasonably optimized for fast performance while still being easy to read. We especially recommend newer TensorFlow users to start here. The research models are a large collection of models implemented in TensorFlow by researchers. They are not officially supported or available in release branches; it is up to the individual researchers to maintain the models and/or provide support on issues and pull requests. The samples folder (samples) contains code snippets and smaller models that demonstrate features of TensorFlow, including code presented in various blog posts. The tutorials folder (tutorials) is a collection of models described in the TensorFlow tutorials . Contribution guidelines If you want to contribute to models, be sure to review the contribution guidelines (CONTRIBUTING.md). License Apache License 2.0 (LICENSE)",Sentiment Analysis,Sentiment Analysis 2551,Natural Language Processing,Natural Language Processing,Natural Language Processing,"TensorFlow Models This repository contains a number of different models implemented in TensorFlow : The official models (official) are a collection of example models that use TensorFlow's high level APIs. They are intended to be well maintained, tested, and kept up to date with the latest stable TensorFlow API. They should also be reasonably optimized for fast performance while still being easy to read. We especially recommend newer TensorFlow users to start here. The research models are a large collection of models implemented in TensorFlow by researchers. They are not officially supported or available in release branches; it is up to the individual researchers to maintain the models and/or provide support on issues and pull requests. The samples folder (samples) contains code snippets and smaller models that demonstrate features of TensorFlow, including code presented in various blog posts. The tutorials folder (tutorials) is a collection of models described in the TensorFlow tutorials . Contribution guidelines If you want to contribute to models, be sure to review the contribution guidelines (CONTRIBUTING.md). License Apache License 2.0 (LICENSE)",Sentiment Analysis,Sentiment Analysis 2570,Natural Language Processing,Natural Language Processing,Natural Language Processing,"CNNs for Text Classification in PyTorch A minimal PyTorch implementation of Convolutional Neural Networks (CNNs) for text classification. Supported features: Character and/or word embeddings in the input layer Mini batch training with CUDA Usage Training data should be formatted as below: sentence \t label sentence \t label ... To prepare data: python prepare.py training_data To train: python train.py model char_to_idx word_to_idx tag_to_idx training_data.csv (validation_data) num_epoch To predict: python predict.py model.epochN char_to_idx word_to_idx tag_to_idx test_data To evaluate: python evaluate.py model.epochN char_to_idx word_to_idx tag_to_idx test_data References Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification. arXiv:1408.5882. Yunlun Yang, Yunhai Tong, Shulei Ma, Zhi Hong Deng. 2016. A Position Encoding Convolutional Neural Network Based on Dependency Tree for Relation Classification. In EMNLP. Xiang Zhang, Junbo Zhao, Yann LeCun. 2015. Character level Convolutional Networks for Text Classification. arXiv:1509.01626.",Sentiment Analysis,Sentiment Analysis 2571,Natural Language Processing,Natural Language Processing,Natural Language Processing,"CNNs for Text Classification in PyTorch A minimal PyTorch implementation of Convolutional Neural Networks (CNNs) for text classification. Supported features: Character and/or word embeddings in the input layer Mini batch training with CUDA Usage Training data should be formatted as below: sentence \t label sentence \t label ... To prepare data: python prepare.py training_data To train: python train.py model char_to_idx word_to_idx tag_to_idx training_data.csv (validation_data) num_epoch To predict: python predict.py model.epochN char_to_idx word_to_idx tag_to_idx test_data To evaluate: python evaluate.py model.epochN char_to_idx word_to_idx tag_to_idx test_data References Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification. arXiv:1408.5882. Yunlun Yang, Yunhai Tong, Shulei Ma, Zhi Hong Deng. 2016. A Position Encoding Convolutional Neural Network Based on Dependency Tree for Relation Classification. In EMNLP. Xiang Zhang, Junbo Zhao, Yann LeCun. 2015. Character level Convolutional Networks for Text Classification. arXiv:1509.01626.",Sentiment Analysis,Sentiment Analysis 2667,Natural Language Processing,Natural Language Processing,Natural Language Processing,This repository contains implementation of CrossGrad and DAN . Disclaimer The following software is shared for educational purpose only. Dataset We generated and used a character dataset using several hand written fonts downloaded from Google Fonts. This dataset is referred to as GFonts dataset and is described in the CrossGrad paper further. We make available this dataset through this repository. It can be found in the data/gfonts folder as numpy binaries. all_images.npy is a numpy array containing all the images in the dataset all_labels.npy numpy array of class labels in the same order as all_images.npy all_domains.npy numpy array contains the domain labels again in the same order as all_images.npy Following is a sprite of this dataset for character: 'A' ! gfonts_sprite (data/gfonts/sprite.png),Sentiment Analysis,Sentiment Analysis 2673,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Build Status Table of Contents Introduction ( introduction) Maven Dependency ( maven dependency) Building ( building) Quick Application Language Identification ( quick application \ language identification) Detailed Examples ( detailed examples) API ( api) FastText's Command Line ( fasttexts command line) License ( license) References ( references) Introduction JFastText is a Java wrapper for Facebook's fastText , a library for efficient learning of word embeddings and fast sentence classification. The JNI interface is built using javacpp . The library provides full fastText's command line interface. It also provides the API for loading trained model from file to do label prediction in memory. Model training and quantization are supported via the command line interface. JFastText is ideal for building fast text classifiers in Java. Maven Dependency xml com.github.vinhkhuc jfasttext 0.4 The Jar package on Maven Central is bundled with precompiled fastText library for Windows, Linux and MacOSX 64bit. Building C++ compiler (g++ on Mac/Linux or cl.exe on Windows) is required to compile fastText's code. bash git clone recursive cd JFastText mvn package Quick Application Language Identification JFastText can use FastText's pretrained models directly. Language identification models can be downloaded here . In this quick example, we will use the quantized model which is super small and a bit less accurate than the original model. bash $ wget q \ && { echo This is English ; echo Xin chào ; echo Привет ; } \ java jar target/jfasttext jar with dependencies.jar predict lid.176.ftz __label__en __label__vi __label__ru Detailed Examples Examples on how to use JFastText can be found at examples/api (examples/api) and examples/cmd (examples/cmd). API Initialization java import com.github.jfasttext.JFastText; ... JFastText jft new JFastText(); Word embedding learning java jft.runCmd(new String { skipgram , input , src/test/resources/data/unlabeled_data.txt , output , src/test/resources/models/skipgram.model , bucket , 100 , minCount , 1 }); Text classification java // Train supervised model jft.runCmd(new String { supervised , input , src/test/resources/data/labeled_data.txt , output , src/test/resources/models/supervised.model }); // Load model from file jft.loadModel( src/test/resources/models/supervised.model.bin ); // Do label prediction String text What is the most popular sport in the US ? ; JFastText.ProbLabel probLabel jft.predictProba(text); System.out.printf( \nThe label of '%s' is '%s' with probability %f\n , text, probLabel.label, Math.exp(probLabel.logProb)); FastText's Command Line FastText's command line interface can be accessed as follows: bash $ java jar target/jfasttext jar with dependencies.jar usage: fasttext The commands supported by fasttext are: supervised train a supervised classifier quantize quantize a model to reduce the memory usage test evaluate a supervised classifier predict predict most likely labels predict prob predict most likely labels with probabilities skipgram train a skipgram model cbow train a cbow model print word vectors print word vectors given a trained model print sentence vectors print sentence vectors given a trained model print ngrams print ngrams given a trained model and word nn query for nearest neighbors analogies query for analogies dump dump arguments,dictionary,input/output vectors For example: bash $ java jar target/jfasttext jar with dependencies.jar quantize h License BSD References (From fastText's references ) Please cite 1 ( enriching word vectors with subword information) if using this code for learning word representations or 2 ( bag of tricks for efficient text classification) if using for text classification. Enriching Word Vectors with Subword Information 1 P. Bojanowski\ , E. Grave\ , A. Joulin, T. Mikolov, Enriching Word Vectors with Subword Information @article{bojanowski2016enriching, title {Enriching Word Vectors with Subword Information}, author {Bojanowski, Piotr and Grave, Edouard and Joulin, Armand and Mikolov, Tomas}, journal {arXiv preprint arXiv:1607.04606}, year {2016} } Bag of Tricks for Efficient Text Classification 2 A. Joulin, E. Grave, P. Bojanowski, T. Mikolov, Bag of Tricks for Efficient Text Classification @article{joulin2016bag, title {Bag of Tricks for Efficient Text Classification}, author {Joulin, Armand and Grave, Edouard and Bojanowski, Piotr and Mikolov, Tomas}, journal {arXiv preprint arXiv:1607.01759}, year {2016} } FastText.zip: Compressing text classification models 3 A. Joulin, E. Grave, P. Bojanowski, M. Douze, H. Jégou, T. Mikolov, FastText.zip: Compressing text classification models @article{joulin2016fasttext, title {FastText.zip: Compressing text classification models}, author {Joulin, Armand and Grave, Edouard and Bojanowski, Piotr and Douze, Matthijs and J{\'e}gou, H{\'e}rve and Mikolov, Tomas}, journal {arXiv preprint arXiv:1612.03651}, year {2016} } (\ These authors contributed equally.)",Sentiment Analysis,Sentiment Analysis 2728,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Steam Descriptions Build status build image build Updates dependency image pyup Python 3 python3 image pyup Code coverage codecov image codecov Code Quality codacy image codacy This repository contains Python code to retrieve semantically similar Steam games. ! Sekiro: similar store descriptions with GloVe Requirements Install the latest version of Python 3.X . Install the required packages: bash pip install r requirements.txt Method Each game is described by the concatenation of: a short text below its banner on the Steam store: ! short game description a long text in the section called About the game : ! long game description The text is tokenized with spaCy by running utils.py (utils.py). The tokens are then fed as input to different methods to retrieve semantically similar game descriptions. For instance, a word embedding can be learnt with Word2Vec and then used for a sentence embedding based on a weighted average of word embeddings (cf. sif_embedding_perso.py (sif_embedding_perso.py)). A pre trained GloVe embedding can also be used instead of the self trained Word2Vec embedding. Or a document embedding can be learnt with Doc2Vec (cf. doc2vec_model.py (doc2vec_model.py)), although, in our experience, this is more useful to learn document tags, e.g. game genres, rather than to retrieve similar documents. Different baseline algorithms are suggested in sentence_baseline.py (sentence_baseline.py). Embeddings can also be computed with Universal Sentence Encoder on Google Colab with this notebook (universal_sentence_encoder.ipynb). Results are shown with universal_sentence_encoder.py (universal_sentence_encoder.py). Results An in depth commentary is provided on the Wiki . Overall, I would suggest to match store descriptions with: either Term Frequency Inverse Document Frequency (Tf Idf) , ! Witcher: similar store descriptions with Tf Idf or a weighted average of GloVe word embeddings, with Tf Idf reweighting, after removing some components: either only sentence components , or both sentence and word components (for slighly better results, by a tiny margin). ! Neverwinter: similar store descriptions with GloVe A retrieval score can be computed, thanks to a ground truth of games set in the same fictional universe. Alternative scores can be computed as the proportions of genres or tags shared between the query and the retrieved games. When using average of word embeddings as sentence embeddings: removing only sentence components provided a very large increase of the score (+105%), removing only word components provided a large increase of the score (+51%), removing both components provided a very large increase of the score (+108%), relying on a weighted average instead of a simple average lead to better results, Tf Idf reweighting lead to better results than Smooth Inverse Frequency reweighting, GloVe word embeddings lead to better results than Word2Vec. ! Influence of the removal of sentence components A table with scores for each major experiment is available . For each game series, the score is the number of games from this series which are found among the top 10 most similar games (excluding the query). The higher the score, the better the retrieval. Results can be accessed from the Wiki homepage . References My answer on StackOverlow , about sentence embeddings Tutorial on the official website of 'gensim' module Tutorial on a blog Tool: spaCy Tool: Gensim Word2Vec GloVe Universal Sentence Encoder Sanjeev Arora, Yingyu Liang, Tengyu Ma, A Simple but Tough to Beat Baseline for Sentence Embeddings , in: ICLR 2017 conference. Jiaqi Mu, Pramod Viswanath, All but the Top: Simple and Effective Postprocessing for Word Representations , in: ICLR 2018 conference. build : build image : pyup : dependency image : python3 image : codecov : codecov image : codacy : codacy image :",Sentiment Analysis,Sentiment Analysis 2778,Natural Language Processing,Natural Language Processing,Natural Language Processing,"fastText fastText is a library for efficient learning of word representations and sentence classification. FAQ / Cheatsheet You can find answers to frequently asked questions on our website . We also provide a cheatsheet full of useful one liners. Requirements fastText builds on modern Mac OS and Linux distributions. Since it uses C++11 features, it requires a compiler with good C++11 support. These include : (gcc 4.6.3 or newer) or (clang 3.3 or newer) Compilation is carried out using a Makefile, so you will need to have a working make . For the word similarity evaluation script you will need: python 2.6 or newer numpy & scipy Building fastText In order to build fastText , use the following: $ git clone $ cd fastText $ make This will produce object files for all the classes as well as the main binary fasttext . If you do not plan on using the default system wide compiler, update the two macros defined at the beginning of the Makefile (CC and INCLUDES). Example use cases This library has two main use cases: word representation learning and text classification. These were described in the two papers 1 ( enriching word vectors with subword information) and 2 ( bag of tricks for efficient text classification). Word representation learning In order to learn word vectors, as described in 1 ( enriching word vectors with subword information), do: $ ./fasttext skipgram input data.txt output model where data.txt is a training file containing utf 8 encoded text. By default the word vectors will take into account character n grams from 3 to 6 characters. At the end of optimization the program will save two files: model.bin and model.vec . model.vec is a text file containing the word vectors, one per line. model.bin is a binary file containing the parameters of the model along with the dictionary and all hyper parameters. The binary file can be used later to compute word vectors or to restart the optimization. Obtaining word vectors for out of vocabulary words The previously trained model can be used to compute word vectors for out of vocabulary words. Provided you have a text file queries.txt containing words for which you want to compute vectors, use the following command: $ ./fasttext print word vectors model.bin < queries.txt This will output word vectors to the standard output, one vector per line. This can also be used with pipes: $ cat queries.txt ./fasttext print word vectors model.bin See the provided scripts for an example. For instance, running: $ ./word vector example.sh will compile the code, download data, compute word vectors and evaluate them on the rare words similarity dataset RW Thang et al. 2013 . Text classification This library can also be used to train supervised text classifiers, for instance for sentiment analysis. In order to train a text classifier using the method described in 2 ( bag of tricks for efficient text classification), use: $ ./fasttext supervised input train.txt output model where train.txt is a text file containing a training sentence per line along with the labels. By default, we assume that labels are words that are prefixed by the string __label__ . This will output two files: model.bin and model.vec . Once the model was trained, you can evaluate it by computing the precision and recall at k (P@k and R@k) on a test set using: $ ./fasttext test model.bin test.txt k The argument k is optional, and is equal to 1 by default. In order to obtain the k most likely labels for a piece of text, use: $ ./fasttext predict model.bin test.txt k where test.txt contains a piece of text to classify per line. Doing so will print to the standard output the k most likely labels for each line. The argument k is optional, and equal to 1 by default. See classification example.sh for an example use case. In order to reproduce results from the paper 2 ( bag of tricks for efficient text classification), run classification results.sh , this will download all the datasets and reproduce the results from Table 1. If you want to compute vector representations of sentences or paragraphs, please use: $ ./fasttext print sentence vectors model.bin < text.txt This assumes that the text.txt file contains the paragraphs that you want to get vectors for. The program will output one vector representation per line in the file. You can also quantize a supervised model to reduce its memory usage with the following command: $ ./fasttext quantize output model This will create a .ftz file with a smaller memory footprint. All the standard functionality, like test or predict work the same way on the quantized models: $ ./fasttext test model.ftz test.txt The quantization procedure follows the steps described in 3 ( fastext zip). You can run the script quantization example.sh for an example. Full documentation Invoke a command without arguments to list available arguments and their default values: $ ./fasttext supervised Empty input or output path. The following arguments are mandatory: input training file path output output file path The following arguments are optional: verbose verbosity level 2 The following arguments for the dictionary are optional: minCount minimal number of word occurences 5 minCountLabel minimal number of label occurences 0 wordNgrams max length of word ngram 1 bucket number of buckets 2000000 minn min length of char ngram 3 maxn max length of char ngram 6 t sampling threshold 0.0001 label labels prefix __label__ The following arguments for training are optional: lr learning rate 0.05 lrUpdateRate change the rate of updates for the learning rate 100 dim size of word vectors 100 ws size of the context window 5 epoch number of epochs 5 neg number of negatives sampled 5 loss loss function {ns, hs, softmax} ns thread number of threads 12 pretrainedVectors pretrained word vectors for supervised learning saveOutput whether output params should be saved 0 The following arguments for quantization are optional: cutoff number of words and ngrams to retain 0 retrain finetune embeddings if a cutoff is applied 0 qnorm quantizing the norm separately 0 qout quantizing the classifier 0 dsub size of each sub vector 2 Defaults may vary by mode. (Word representation modes skipgram and cbow use a default minCount of 5.) References Please cite 1 ( enriching word vectors with subword information) if using this code for learning word representations or 2 ( bag of tricks for efficient text classification) if using for text classification. Enriching Word Vectors with Subword Information 1 P. Bojanowski\ , E. Grave\ , A. Joulin, T. Mikolov, Enriching Word Vectors with Subword Information @article{bojanowski2016enriching, title {Enriching Word Vectors with Subword Information}, author {Bojanowski, Piotr and Grave, Edouard and Joulin, Armand and Mikolov, Tomas}, journal {arXiv preprint arXiv:1607.04606}, year {2016} } Bag of Tricks for Efficient Text Classification 2 A. Joulin, E. Grave, P. Bojanowski, T. Mikolov, Bag of Tricks for Efficient Text Classification @article{joulin2016bag, title {Bag of Tricks for Efficient Text Classification}, author {Joulin, Armand and Grave, Edouard and Bojanowski, Piotr and Mikolov, Tomas}, journal {arXiv preprint arXiv:1607.01759}, year {2016} } FastText.zip: Compressing text classification models 3 A. Joulin, E. Grave, P. Bojanowski, M. Douze, H. Jégou, T. Mikolov, FastText.zip: Compressing text classification models @article{joulin2016fasttext, title {FastText.zip: Compressing text classification models}, author {Joulin, Armand and Grave, Edouard and Bojanowski, Piotr and Douze, Matthijs and J{\'e}gou, H{\'e}rve and Mikolov, Tomas}, journal {arXiv preprint arXiv:1612.03651}, year {2016} } (\ These authors contributed equally.) Resources You can find the preprocessed YFCC100M data used in 2 at Pre trained word vectors for 294 languages are available here . Join the fastText community Facebook page: Google group: Contact: egrave@fb.com (mailto:egrave@fb.com), bojanowski@fb.com (mailto:bojanowski@fb.com), ajoulin@fb.com (mailto:ajoulin@fb.com), tmikolov@fb.com (mailto:tmikolov@fb.com) See the CONTRIBUTING file for information about how to help out. License fastText is BSD licensed. We also provide an additional patent grant.",Sentiment Analysis,Sentiment Analysis 2793,Natural Language Processing,Natural Language Processing,Natural Language Processing,"text_gcn The implementation of Text GCN in our paper: Liang Yao, Chengsheng Mao, Yuan Luo. Graph Convolutional Networks for Text Classification. In 33rd AAAI Conference on Artificial Intelligence (AAAI 19) Require Python 2.7 or 3.6 Tensorflow > 1.4.0 Reproducing Results 1. Run python remove_words.py 20ng 2. Run python build_graph.py 20ng 3. Run python train.py 20ng 4. Change 20ng in above 3 command lines to R8 , R52 , ohsumed and mr when producing results for other datasets. Example input data 1. /data/20ng.txt indicates document names, training/test split, document labels. Each line is for a document. 2. /data/corpus/20ng.txt contains raw text of each document, each line is for the corresponding line in /data/20ng.txt 3. prepare_data.py is an example for preparing your own data, note that '\n' is removed in your documents or sentences.",Sentiment Analysis,Sentiment Analysis 2861,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Leveraging Multi grained Sentiment Lexicon Information for Neural Sequence Models Yan Zeng, Yangyang Lan, Yazhou Hao, Chen Li, Qinhua Zheng In this paper, I carefully collected a Sentiment Lexicon, and I tried to make it larger. Finally, it consists of: 1) 5106 negative words and 2759 positive words majorly collected from the 2 links below. Subjectivity Lexicon Opinion lexicon 2) 33 negation words and 62 intensifiers collected manually. All phases are changed into the form of 'a b', e.g. 'without doubt'(intensifier), '2 faced'(negative). If you use this resource, please cite: (Subjectivity Lexicon) Recognizing Contextual Polarity in Phrase Level Sentiment Analysis Theresa Wilson, Janyce Wiebe and Paul Hoffmann (Opinion lexicon) Mining and Summarizing Customer Reviews Minqin Hu and Bing Liu Leveraging Multi grained Sentiment Lexicon Information for Neural Sequence Models Yan Zeng, Yangyang Lan, Yazhou Hao, Chen Li, Qinhua Zheng This repertory aims to save your time to do repetitive work and promote further research in Sentiment Classification. Even using the simple and general method proposed in our paper, sequence models can gain performance improvement with the Sentiment Lexicon.",Sentiment Analysis,Sentiment Analysis 2864,Natural Language Processing,Natural Language Processing,Natural Language Processing,"SRNN Author: Zeping Yu This work is accepted by COLING 2018. Sliced Recurrent Neural Network (SRNN) . SRNN is able to get much faster speed than standard RNN by slicing the sequences into many subsequences. The code is written in keras, using tensorflow backend. We implement the SRNN(8,2) here, and Yelp 2013 dataset is used. keras version: 2.1.5 tensorflow version: 1.6.0 python : 2.7 If you have any question, please contact me at zepingyu@foxmail.com. The pre trained GloVe word embeddings could be downloaded at: The Yelp 2013, 2014 and 2015 datasets are at: Yelp_P, Amazon_P and Amazon_F datasets are at: Here is an interesting modification of SRNN for text generation, similar to language model:",Sentiment Analysis,Sentiment Analysis 2869,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Crepe DOI This repository contains code in Torch 7 for text classification from character level using convolutional networks. It can be used to reproduce the results in the following article: Xiang Zhang, Junbo Zhao, Yann LeCun. Character level Convolutional Networks for Text Classification . Advances in Neural Information Processing Systems 28 (NIPS 2015) Note : An early version of this work entitled “Text Understanding from Scratch” was posted in Feb 2015 as arXiv:1502.01710 . The present paper above has considerably more experimental results and a rewritten introduction. Note : See also the example implementation in NVIDIA DIGITS . Using CuDNN is 17 times faster than the code here. Components This repository contains the following components: data: data preprocessing scripts. It can be used to convert csv format to a Torch 7 binary format that can be used by the training program directly. We used csv format to distribute the datasets in our article. The datasets are available at train: training program. For more information, please refer to the readme files in each component directory. Example Usage Here is an example of using our data tools and training programs pipeline to replicate the small convolutional network for DBPedia ontology classification in the article. First, clone the project and download the file dbpedia_csv.tar.gz from our storage in Google Drive to the data directory. Then, uncompress the files and build t7b files using our dataset tools. sh $ cd data $ tar xvf dbpedia_csv.tar.gz $ qlua csv2t7b.lua input dbpedia_csv/train.csv output train.t7b $ qlua csv2t7b.lua input dbpedia_csv/test.csv output test.t7b $ cd .. In the commands above, you can replace qlua by luajit as long as it has an associated torch 7 distribution installed. Now there will be 2 files train.t7b and test.t7b in the data directory. Normally, the second step is to go to the train directory and change the configurations listed in config.lua , especially for data file location and number of output units in the last linear layer. This last linear layer is important because its number of output units should correspond to the number of classes in your dataset. Luckily for this example on DBPedia ontology dataset the configurations are all set. One just needs to go into the train directory and start the training process sh $ cd train $ qlua main.lua This time we have to use qlua , because there is a nice visualization using Qt that is updated for every era. Please make sure packages qtlua and qttorch are installed in your system and there is a corresponding X to your terminal. To run this example succesfully you will also need a NVidia GPU with at least 3GB of memory. Otherwise, you can configure the model in train/config.lua for less parameters. Okay! If you start to find out checkpointing files like main_EPOCHES_TIME.t7b and sequential_EPOCHES_TIME. t7b png appearing under the train directory in several hours or so, it means the program is running without problems. You should probably find some other entertainment for the day. :P Issues It is discovered that the alphabet actually has two hyphen/minus characters (' '). This issue was present for the results in the papers as well. Since this is probably a minor issue, we will keep the alphabet configurations as is to ensure reproduceability. That said, you are welcome to change the alphabet variable in train/config.lua to remove it. See issue 4 for more details. Why Call It Crepe ? It is just a word popping up to my mind pondering for a repository name in Github. It has nothing to do with French cuisine, text processing or convolutional networks. If a connection is really really needed, how about Convolutional REPresentation of Expressions ?",Sentiment Analysis,Sentiment Analysis 1805,Computer Vision,Computer Vision,Computer Vision,"InsightFace: 2D and 3D Face Analysis Project By Jia Guo and Jiankang Deng License The code of InsightFace is released under the MIT License. ArcFace Video Demo ArcFace Demo Please click the image to watch the Youtube video. For Bilibili users, click here . Recent Update 2019.01.17 : Please check for new training code which is much more clean. 2018.12.13 : TVM Benchmark 2018.10.28 : Gender Age created with a lightweight model. About 1MB size, 10ms on single CPU core. Gender accuracy 96% on validation set and 4.1 age MAE. 2018.10.16 : We got rank 1st on IQIYI_VID (IQIYI video person identification) competition which in conjunction with PRCV2018, see detail . 2018.06.14 : There's a large scale Asian training dataset provided by Glint, see this discussion for detail. 2018.02.13 : We achieved state of the art performance on MegaFace Challenge . Please check our paper and code for implementation details. Contents Deep Face Recognition ( deep face recognition) Introduction ( introduction) Training Data ( training data) Train ( train) Pretrained Models ( pretrained models) Verification Results On Combined Margin ( verification results on combined margin) Test on MegaFace ( test on megaface) 512 D Feature Embedding ( 512 d feature embedding) Third party Re implementation ( third party re implementation) Face Alignment ( face alignment) Face Detection ( face detection) Citation ( citation) Contact ( contact) Deep Face Recognition Introduction In this repository, we provide training data, network settings and loss designs for deep face recognition. The training data includes the normalised MS1M, VGG2 and CASIA Webface datasets, which were already packed in MXNet binary format. The network backbones include ResNet, MobilefaceNet, MobileNet, InceptionResNet_v2, DenseNet, DPN. The loss functions include Softmax, SphereFace, CosineFace, ArcFace and Triplet (Euclidean/Angular) Loss. ! margin penalty for target logit Our method, ArcFace, was initially described in an arXiv technical report . By using this repository, you can simply achieve LFW 99.80%+ and Megaface 98%+ by a single model. This repository can help researcher/engineer to develop deep face recognition algorithms quickly by only two steps: download the binary dataset and run the training script. Training Data All face images are aligned by MTCNN and cropped to 112x112: Please check Dataset Zoo for detail information and dataset downloading. Please check src/data/face2rec2.py on how to build a binary face dataset. Any public available MTCNN can be used to align the faces, and the performance should not change. We will improve the face normalisation step by full pose alignment methods recently. Train 1. Install MXNet with GPU support (Python 2.7). pip install mxnet cu90 2. Clone the InsightFace repository. We call the directory insightface as INSIGHTFACE_ROOT . git clone recursive 3. Download the training set ( MS1M Arcface ) and place it in $INSIGHTFACE_ROOT/datasets/ . Each training dataset includes at least following 6 files: Shell faces_emore/ train.idx train.rec property lfw.bin cfp_fp.bin agedb_30.bin The first three files are the training dataset while the last three files are verification sets. 4. Train deep face recognition models. In this part, we assume you are in the directory $INSIGHTFACE_ROOT/recognition/ . Shell export MXNET_CPU_WORKER_NTHREADS 24 export MXNET_ENGINE_TYPE ThreadedEnginePerDevice Place and edit config file: Shell cp sample_config.py config.py vim config.py edit dataset path etc.. We give some examples below. Our experiments were conducted on the Tesla P40 GPU. (1). Train ArcFace with LResNet100E IR. Shell CUDA_VISIBLE_DEVICES '0,1,2,3' python u train.py network r100 loss arcface dataset emore It will output verification results of LFW , CFP FP and AgeDB 30 every 2000 batches. You can check all options in config.py . This model can achieve LFW 99.80+ and MegaFace 98.3%+ . (2). Train CosineFace with LResNet50E IR. Shell CUDA_VISIBLE_DEVICES '0,1,2,3' python u train.py network r50 loss cosface dataset emore (3). Train Softmax with LMobileNet GAP. Shell CUDA_VISIBLE_DEVICES '0,1,2,3' python u train.py network m1 loss softmax dataset emore (4). Fine turn the above Softmax model with Triplet loss. Shell CUDA_VISIBLE_DEVICES '0,1,2,3' python u train.py network m1 loss triplet lr 0.005 pretrained ./models/m1 softmax emore,1 5. Verification results. LResNet100E IR network trained on MS1M Arcface dataset with ArcFace loss: Method LFW(%) CFP FP(%) AgeDB 30(%) Ours 99.80+ 98.0+ 98.20+ Pretrained Models You can use $INSIGHTFACE/src/eval/verification.py to test all the pre trained models. Please check Model Zoo for more pretrained models. Verification Results on Combined Margin A combined margin method was proposed as a function of target logits value and original θ : COM(θ) cos(m_1 θ+m_2) m_3 For training with m1 1.0, m2 0.3, m3 0.2 , run following command: CUDA_VISIBLE_DEVICES '0,1,2,3' python u train_softmax.py network r100 loss combined dataset emore Results by using MS1M IBUG(MS1M V1) Method m1 m2 m3 LFW CFP FP AgeDB 30 W&F Norm Softmax 1 0 0 99.28 88.50 95.13 SphereFace 1.5 0 0 99.76 94.17 97.30 CosineFace 1 0 0.35 99.80 94.4 97.91 ArcFace 1 0.5 0 99.83 94.04 98.08 Combined Margin 1.2 0.4 0 99.80 94.08 98.05 Combined Margin 1.1 0 0.35 99.81 94.50 98.08 Combined Margin 1 0.3 0.2 99.83 94.51 98.13 Combined Margin 0.9 0.4 0.15 99.83 94.20 98.16 Test on MegaFace Please check $INSIGHTFACE_ROOT/Evaluation/megaface/ to evaluate the model accuracy on Megaface. All aligned images were already provided. 512 D Feature Embedding In this part, we assume you are in the directory $INSIGHTFACE_ROOT/deploy/ . The input face image should be generally centre cropped. We use RNet+ONet of MTCNN to further align the image before sending it to the feature embedding network. 1. Prepare a pre trained model. 2. Put the model under $INSIGHTFACE_ROOT/models/ . For example, $INSIGHTFACE_ROOT/models/model r100 ii . 3. Run the test script $INSIGHTFACE_ROOT/deploy/test.py . For single cropped face image(112x112), total inference time is only 17ms on our testing server(Intel E5 2660 @ 2.00GHz, Tesla M40, LResNet34E IR ). Third party Re implementation TensorFlow: InsightFace_TF TensorFlow: tf insightface PyTorch: InsightFace_Pytorch PyTorch: arcface pytorch Caffe: arcface caffe Face Alignment Todo Face Detection Todo Citation If you find InsightFace useful in your research, please consider to cite the following related papers: @article{deng2018arcface, title {ArcFace: Additive Angular Margin Loss for Deep Face Recognition}, author {Deng, Jiankang and Guo, Jia and Niannan, Xue and Zafeiriou, Stefanos}, journal {arXiv:1801.07698}, year {2018} } Contact Jia Guo (guojia at gmail.com) Jiankang Deng (jiankangdeng at gmail.com)",Face Verification,Face Verification 1901,Computer Vision,Computer Vision,Computer Vision,"Face recognition Packages used Python: '3.7.1' Tensorflow: 1.13.0 rc1 Opencv: 4.0.0 Architecture used : Facenet Steps 1) Download and extract the zip file 2) Data Images of 3 person in 3 folder (Andrew Ng,Geoffrey Hinton,YanLecunn) 3) models All the models are saved in this folder 4) scripts All scripts 5) test data This folder contain 3 test image of each person and one unknown person 6) Run preprocess_image.py It will detect the face from all images and write that in 'processed_data' folder 7) Train the model for 3 person using train_model.py from folder scripts 8) Run predict_image.py for testing on images of all 3 person and one unknown person. References Inspired from For more details :",Face Verification,Face Verification 1928,Computer Vision,Computer Vision,Computer Vision,"arc face This repository hosts the contributor source files for the arc face model. ModelHub integrates these files into an engine and controlled runtime environment. A unified API allows for out of the box reproducible implementations of published models. For more information, please visit www.modelhub.ai or contact us info@modelhub.ai (mailto:info@modelhub.ai). meta id 02f2f15c 3285 44a4 8119 b298623b7acf application_area Computer Vision task Recognition task_extended Facial Detection & Recognition data_type Image/Photo data_source publication title ArcFace: Additive Angular Margin Loss for Deep Face Recognition source arxiv url year 2018 authors Jiankang Deng, Jia Guo, Niannan Xue, Stefanos Zafeiriou abstract One of the main challenges in feature learning using Deep Convolutional Neural Networks (DCNNs) for large scale face recognition is the design of appropriate loss functions that enhance discriminative power. Centre loss penalises the distance between the deep features and their corresponding class centres in the Euclidean space to achieve intra class compactness. SphereFace assumes that the linear transformation matrix in the last fully connected layer can be used as a representation of the class centres in an angular space and penalises the angles between the deep features and their corresponding weights in a multiplicative way. Recently, a popular line of research is to incorporate margins in well established loss functions in order to maximise face class separability. In this paper, we propose an Additive Angular Margin Loss (ArcFace) to obtain highly discriminative features for face recognition. The proposed ArcFace has a clear geometric interpretation due to the exact correspondence to the geodesic distance on the hypersphere. We present arguably the most extensive experimental evaluation of all the recent state of the art face recognition methods on over 10 face recognition benchmarks including a new large scale image database with trillion level of pairs and a large scale video dataset. We show that ArcFace consistently outperforms the state of the art and can be easily implemented with negligible computational overhead. We release all refined training data, training codes, pre trained models and training logs, which will help reproduce the results in this paper. google_scholar bibtex @article{DBLP:journals/corr/abs 1801 07698, author {Jiankang Deng and Jia Guo and Stefanos Zafeiriou}, title {ArcFace: Additive Angular Margin Loss for Deep Face Recognition}, journal {CoRR}, volume {abs/1801.07698}, year {2018}, url { archivePrefix {arXiv}, eprint {1801.07698}, timestamp {Mon, 13 Aug 2018 16:46:52 +0200}, biburl { bibsource {dblp computer science bibliography, model description ArcFace is a CNN based model for face recognition which learns discriminative features of faces and produces embeddings for input face images. To enhance the discriminative power of softmax loss, a novel supervisor signal called additive angular margin (ArcFace) is used here as an additive term in the softmax loss. provenance architecture Convolutional Neural Network (CNN) learning_type Supervised learning format .onnx I/O model I/O can be viewed here (contrib_src/model/config.json) license model license can be viewed here (contrib_src/license/model) run To run this model and view others in the collection, view the instructions on ModelHub . contribute To contribute models, visit the ModelHub docs .",Face Verification,Face Verification 1935,Computer Vision,Computer Vision,Computer Vision,"Facial Recognition Using FaceNet Siamese One Shot Learning This program is used to implement Facial Recognition using Siamese Network architecture. The implementation of the project is based on the research paper : > FaceNet: A Unified Embedding for Face Recognition and Clustering > arXiv:1503.03832 by Florian Schroff , Dmitry Kalenichenko , James Philbin Facenet implements concept of Triplet Loss function to minimize the distance between anchor and positive images, and increase the distance between anchor and negative images. Prerequisites h5py 2.8.0 Keras 2.2.4 tensorflow 1.13.0rc2 dlib 19.16.0 opencv_python 3.4.3.18 imutils 0.5.1 numpy 1.15.2 matplotlib 3.0.0 scipy 1.1.0 Install the packages using pip install r requirements.txt Usage To use the facial recognition system, you need to have a database of images through which the model will calculate image embeddings and show the output. The images which are in the database are stored as .jpg files in the directory ./images . To generate your own dataset and add more faces to the system, use the following procedure: > Sit in front of your webcam. > Use the Image_Dataset_Generator.py script to save 50 images of your face. > Use this command: python Image_Dataset_Generator.py to generate images which will be saved in images folder. To use the facial recognition system, run the command : python face_recognizer.py References 1. The code has been implemented using deeplearning.ai course Convolutional Networks Week 4 Assignment, which has the files fr_utils.py and inception_blocks_v2.py 2. The keras implementation of the model is by Victor Sy Wang's implementation and was loaded using his code:",Face Verification,Face Verification 2094,Computer Vision,Computer Vision,Computer Vision,"DREAM block for Pose Robust Face Recognition This is our implementation for our CVPR 2018 accepted paper Pose Robust Face Recognition via Deep Residual Equivariant Mapping paper on arxiv . The code is wriiten by Yu Rong and Kaidi Cao Prerequisites Linux or macOS Python 3 NVIDIA GPU + CUDA CuDNN or CPU (GPU is prefered) opencv2.4 (opencv 2.4.13 is preferred) Getting Started Installation Install Anaconda Anaconda3 4.2.0 for Python 3 Install Pytorch and torchvision through Anaconda (Please follow the guide in Pytorch (pytorch.org)) Clone this repo bash git clone git@github.com:penincillin/DREAM.git cd DREAM Prepare Data and Models Big files like model.zip could be downloaded both from Google Drive and Baidu Yun. If you have problems with downloading those files, you could contact me :) Face Alignment All the face images should be aligned. Please follow the align protocol in dataset CelebA . After alignment, the face should be in the center of the image, and the size of image should be 178x218. Some aligned samples could be found in image/align_sample. Datasets In this paper, we use three face datasets. We train base model and DREAM block on MS Celeb 1M We offer a subset of Ms Celeb 1M with 10 celebrities, you could download from the following link Ms Celeb 1M Subset (msceleb.zip): Google Drive Baidu Yun We evaluate our the performance of our models on CFP and IJB A. For CFP, we offer the code to get algined images from the original images (The code could only be runned on Linux). First, you need to download the original CFP dataset, and then download the image list from Google Drive Baidu Yun For IJBA, we provide the aligned images here. Google Drive Baidu Yun Pretrained Models We offer several pretrained models. They could be downloaded from Google Drive Baidu Yun Train DREAM Block stitch Training Prepare the feature extracted from any face recognition model (You could use the pretrained model we prepared). We prepared a piece of sample data (stitching.zip) which could be download from Google Drive Baidu Yun Download the sample data bash mkdir data mv stitching.zip data cd data unzip stitching.zip Train the model: bash cd src/stitching sh train_stitch.sh end2end Training Download the Ms Celeb 1M Subset bash mkdir data mv msceleb.zip data cd data unzip msceleb.zip Train the model: bash cd src/end2end sh train.sh evaluate CFP Download the CFP dataset and preprocess the image. Then download the image list for evaluation bash make sure you are in the root directory of DREAM project mkdir data cd src/preprocess sh align_cfp.sh cd data/CFP unzip CFP_protocol.zip Download pretrained model bash make sure you are in the root directory of DREAM project cd ../ mv model.zip data cd data unzip model.zip Evaluate the pretrained model on CFP dataset bash make sure you are in the root directory of DREAM project cd src/CFP sh eval_cfp.sh evaluate IJBA Download the IJBA dataset(contact me to get the aligned images) bash make sure you are in the root directory of DREAM project mkdir data mv IJBA.zip data cd data unzip IJBA.zip Download pretrained models (If have downloaded the models, skip this step) bash make sure you are in the root directory of DREAM project cd ../ mv model.zip data cd data unzip model.zip Evaluate the pretrained model on IJBA dataset bash make sure you are in the root directory of DREAM project cd src/IJBA sh eval_ijba.sh Citation Please cite the paper in your publications if it helps your research: @inproceedings{cao2018Dream, author {Kaidi Cao and Yu Rong and Cheng Li and Xiaoou Tang and Chen Change Loy}, booktitle {CVPR}, title {Pose Robust Face Recognition via Deep Residual Equivariant Mapping}, year {2018} }",Face Verification,Face Verification 2113,Computer Vision,Computer Vision,Computer Vision,"A simple Keras implementation of triplet loss for MNIST digit embeddings I use embedding size of 32, which result in faster converge and more stability during training. You can try to increase the embeddings size but remember to increase network depth. The implementation use all anchor positive and hard negative for triplet generate. More specifically, I use 10 sample per digits while selecting 5 negative digits Visualize result using tSNE on MNIST test set: ! alt text Requirements: tensorflow, keras, matplotlib (for visualize), sklearn (for tSNE), jupyter notebook How to use this repo: The notebook contain all thing you need (e.g data loader, model, triplet loss implementation, ...). Star if you like my work! References: FaceNet: A Unified Embedding for Face Recognition and Clustering: tSNE to visualize digits:",Face Verification,Face Verification 2244,Computer Vision,Computer Vision,Computer Vision,FaceNet_face_verification This repository takes inspiration from modern day Face Unlock feature found in smartphones. The repository is based on a demonstration of face recognition using the FaceNet network . The network learns an embedding for the required user and uses it for subsequent verification attempts. How to use : The project requirements can be installed using pip install r requirements.txt The project can be run using python main.py References :,Face Verification,Face Verification 2252,Computer Vision,Computer Vision,Computer Vision,"Humpback whale identity challenge Identify humpback whale based on its fluke This is a competition hosted at Kaggle. All images for trainig and testing can be obtained at This project implement triplt lost function, based on FaceNet: A Unified Embedding for Face Recognition and Clustering",Face Verification,Face Verification 2284,Computer Vision,Computer Vision,Computer Vision,"FaceNet with TripletLoss NOTE: Code is not polished. I will update it later to make it little easier to use. My implementation for face recognition using FaceNet model and Triplet Loss. > Medium post Dependencies keras numpy time os openCV Usage 1. Create a dataset of faces for each person and arrange them in below order. root folder │ └───Person 1 │ │───IMG1 │ │───IMG2 │ │ .... └───Person 2 │───IMG1 │───IMG2 .... 2. Run align_dataset_mtcnn.py to align faces. This code is taken from facenet . Usage example: ! screenshot_43 3. Edit path dictionary and face list in generator_utils.py according to your data. 4. Run train_triplet.py to train the model. Adjust parameters accordingly. 5. Run webcamFaceRecoMulti.py to recognize faces in real time. Note: It is not state of the art technique. So, dont't expect much from it. Model is trained using triplet loss. According to experiments it is recommended to chose positive , negative and anchor images carefully/manually for better results. Here I used a generator which selects images for positive , negative and anchor randomly (I'm Lazy af). To know more about this I recommend you to watch this video. While training the model it will show very low loss straight away from the beginning. Don't fall for that. I will later include some reliable metric to get idea of model's accuracy on data while training. Refrences FaceNet: A Unified Embedding for Face Recognition and Clustering : Deepface paper deeplearning.ai 's assignments.",Face Verification,Face Verification 2314,Computer Vision,Computer Vision,Computer Vision,"Quality Aware Network Codebase for (Partial) Quality Aware Network . Notice that QAN in ''Quality Aware Network for Set to Set Recognition'' is a 'single part' version of P QAN in ''Unsupervised Partial Quality Predictor for Large Scale Person Re identification'', so you can set part number to 1 if you wanna reproduce results of QAN. QAN is used to handle set to set recognition problem. It can automatically learn the quality score for each sample in a set, and use the score to be the weight for synthesizing feature. P QAN is our subsequent work after QAN. It is used to handle video based person re identification problem with awareness of partial quality. It can learn not only the quality of each frame but also the occlusion/blur/noise/etc. level in each part of a frame. We use the PRID 2011 , iLIDS VID (www.eecs.qmul.ac.uk/.../downloads_qmul_iLIDS VID_ReID_dataset.html) and LPW (coming soon) datasets for evaluation. They are video based tasks for person re identification. Running code 1.Clone CaffeMex\_v2 ; 2.Replace normalization layer (.hpp .cu .cpp) in CaffeMex_v2 with layers/ and add NormalizationParameter to caffe.proto ; 3.Complile with matlab interface (see readme in CaffeMex\_v2). 4.Configure the path CaffeMex_v2/matlab/+caffe to the directory in this project. 5.Configure the parameters in train_baseline/train_baseline.m , train_baseline/train_LPW.m and train_PQAN/train_LPW.m , train_PQAN/train_network_and_test.m , including the path of prototxt and the relative param. The pretrain_model can be found in here . 6.Running the scripts in the generate_data to generate your dataset split. 7.Modify the number of corresponding network classifications. Running the train_baseline/train_baseline.m or train_baseline/train_LPW.m for baseline and train_PQAN/train_LPW.m , train_PQAN/train_network_and_test.m for PQAN. Q&A Here we list some commonly asked questions we received from the public. Thanks for your engagement to make our work better! What is LPW ? Labeled Pedestrian in the Wild (LPW) is a large scale human re identification dataset that we collected in three different crowed scenes. We haven't released it now but if you are interested in it, please contact us by e mail. Now you can download it at Any strategies are used to make the quality scores effective? There are two main aspects that will affect the effectiveness of the quality scores. One is using a pre trained model for imagenet classification task in order to make the network converge well. In the training stage, the use_global_stats is set to false in the Batchnorm layer; The other is the configuration of the parameter like learning rate, margin in the tripletloss layer, and you can adjust the parameters by observing the change of the scores in the training stage. Still having questions? Feel free to drop us an email sharing your ideas. Citation Please kindly cite our work in your publications if it helps your research: @inproceedings{liu_2017_qan, Author {Liu, Yu and Junjie, Yan and Ouyang, Wanli}, Title {Quality Aware Network for Set to Set Recognition}, Booktitle {IEEE Conference on Computer Vision and Pattern Recognition}, Year {2017} }",Face Verification,Face Verification 2336,Computer Vision,Computer Vision,Computer Vision,"face recognition _ Face Verification과 Identification을 모두 아우르는 얼굴 인식 기술에 대한 기술 동향_ ! (images/progress.png) > ref: Mei Wang and Weihong Deng, Deep Face Recognition: A Survey, 2018 Contents Survey (mds/survey.md) 논문 리뷰 ( 논문 리뷰) Mei Wang and Weihong Deng, Deep Face Recognition: A Survey, CoRR, 2018 (papers/Deep_Face_Recognition_A_Survey.md) Pre trained Model ( pre trained model) Source Code ( code) Dataset ( dataset) Links ( links) 논문 리뷰 Mei Wang and Weihong Deng, Deep Face Recognition: A Survey, 2018 (papers/Deep_Face_Recognition_A_Survey.md) Pre Trained Model Tensorflow Inception ResNet v1(CASIA WebFace) Inception ResNet v1(VGGFace2) Keras krasserm's openFace iwantooxxoox' openFace oxford's VGGface Code Keras krasserm's openFace iwantooxxoox's openFace Dataset InsightFace Git의 Dataset Zoo 잘 정리되어있음 CASIA Webface, UMDFace, VGG2, MS1M IBUG, MS1M ArcFace, Asian Celeb, DeepGlint The Asian Face Age Dataset (AFAD) Trillionpairs google drive Links 딥러닝을 이용한 셀럽 얼굴 인식 by kakao 얼굴 인식에 관한 facebook 글 Face Recognition 서베이 ! (images/pwc_fv.png) > ref: ! (images/pwc_fi.png) > ref: 2014 CVPR에서 대표적인 두 모델(DeepFace, DeepID)이 소개되면서 딥러닝 기법으로 활발히 연구가 시작된, 얼굴 인식 분야에 대해 최근 5년간의 동향을 다루고자 합니다. 1. DeepFace (CVPR/2014) Training Set: 4.4M CNN Layers LFW 97.35% 3D Alignment Ensembel Model Softmax ref: 2. DeepID (CVPR/2014) Training Set: 20W CNN Layers Softmax, Bayseian LWF 97.45% Softmax의 장점과 단점 Softmax는, Soft를 최대값으로 하는 것이 목표인 분류기입니다. CNN 분류 문제에서 매우 핫한 존재죠. 4가지를 분류한다고 가정하면, (x1, x2, x3, x4)의 값이 softmax 분류기에 의해 출력됩니다. 이것은 일반적으로 백분율 형태로 계산되고, 가장 큰 출력 값이 예상하는 값/결과가 됩니다. Hard max vs Softmax fig13 ref: Deep Learning Face Representation by Joint Identification Verification fig4 3. FaceNet (CVPR/2015) Softmax 대신 Triple Loss 사용 Triplet loss는 (a, p, n)의 triple 형태로 최적화합니다. 서로 다른(negative)의 특징의 L2거리가 유사한(positive) 특징의 L2 거리보다 크고, 클래스 간 간소화 및 클래스간 분리가 수행됨. a: anchor p: positive n: negative LFW: 99.64% Training Set: 200M use only 128 dim feature mapping traiplet을 구현하는 코드는 매우 까다롭고 어려움 paper 개념 triplet loss 4. Large Margin Softmax Loss (ICML/2016) L softmax는 Large Margin을 가진 softmax이다. 큰 마진을 만드는 방법 United Fully Conntected Layer + Softmax + Cross Entropy Training Set: 0.49M 17 CNN Layers LFW: 98.72% 5. sphereFace (CVPR/2017) L softmax를 개선하여 A Softmax 제안 training sample 불균형의 수가 감소 mapping된 feature vector 각도의 최적화에 집중 Training Set: 0.49M 64 CNN Layers LFW: 99.42% (현재 additive margin 시리즈를 훈련하는 것이 간단하고 성능도 더 좋음) 6. Center Loss (ECCV/2016) 카테고리에 대응하는 모든 feature vector에 대해 각 카테고리의 중심을 중앙으로 당김. LFW(w/7 CNN Layers): 99.05 LFW(w/64 CNN Layers): 99.28 다수의 클래스 센터를 유지하려면 메모리 소비가 비교적 큼. 7. Center Invariant Loss (ACM MM/2017) 8. Range Loss (ICCV/2017) 9. Ring Loss (CVPR/2018) 10. COCO Congenerous Cosin (CVPR/2017) 11. L2 constrained Softmax Loss (Arxiv/2017) 12. NormFace (ACM MM/2017) L2 HyperSphere Embedding 13. AM softmax (ICLR/2018) Additive Margin Softmax 14. CosFace (CVPR/2018) 15. ArcFace (arXiv) ref: code(MXNET): 16. InsightFace () code(tensorflow):",Face Verification,Face Verification 2340,Computer Vision,Computer Vision,Computer Vision,"Face_Recognition_IN_Video About This is a desktop programe for finding and marking the faces required in videos. Requirement python36 I use python36 in this programe for using pytorch. Pytorch Only Pytorch.02+ can work in this programe.If you want using GPU to make it run faster,please download and install Cuda. Mxnet I use Mtcnn for detecting face from videos which need Mxnet. OpenCV Dealing image such as resizing,converting coler and more. PyQT5 Using this third package pyQT5 just for UI realization of the programe. Using run: python main.py README_MTCNN_face_detection_and_alignment MTCNN_face_detection_and_alignment About This is a python/mxnet implementation of Zhang 's work . it's fast and accurate, see link . It should have almost the same output with the original work, for mxnet fans and those can't afford matlab :) 中文blog Test you can change ctx to mx.gpu(0) for faster detection by setting num_worker 4 accurate_landmark False we can reduce the detection time by 1/4 1/3, the bboxes are still the same, but we skip the last landmark fine tune stage( mtcnn_v1 ). add function extract_face_chips , examples: ! 1 ! 2 ! 3 ! 4 see mtcnn_detector.py for the details about the parameters. this function use dlib 's align strategy, which works well on profile images :) Results ! big4 Reference K. Zhang and Z. Zhang and Z. Li and Y. Qiao Joint, Face Detection and Alignment Using Multitask Cascaded Convolutional Networks, IEEE Signal Processing Letters README_SphereFace SphereFace Introduction The repository contains the entire pipeline (including all the preprocessings) for deep face recognition with SphereFace . The recognition pipeline contains three major steps: face detection, face alignment and face recognition. SphereFace is a recently proposed face recognition method. It was initially described in an arXiv technical report and then published in CVPR 2017 . The most up to date paper with more experiments can be found at arXiv or here . To facilitate the face recognition research, we give an example of training on CAISA WebFace and testing on LFW using the 20 layer CNN architecture described in the paper (i.e. SphereFace 20). In SphereFace, our network architecures use residual units as building blocks, but are quite different from the standrad ResNets (e.g., BatchNorm is not used, the prelu replaces the relu, different initializations, etc). We proposed 4 layer, 20 layer, 36 layer and 64 layer architectures for face recognition (details can be found in the paper ( ) and prototxt files ). We provided the 20 layer architecure as an example here. If our proposed architectures also help your research, please consider to cite our paper. SphereFace achieves the state of the art verification performance (previously No.1) in MegaFace Challenge under the small training set protocol. A PyTorch Implementation of SphereFace. The code can be trained on CASIA Webface and the best accuracy on LFW is 99.22% . SphereFace: Deep Hypersphere Embedding for Face Recognition Models 1. Visualizations of network architecture (tools from ethereon ): SphereFace 20: link 2. Model file SphereFace 20: Google Drive Baidu Third party SphereFace 4 & SphereFace 6: here by zuoqing1988 Results 1. Following the instruction, we go through the entire pipeline for 5 times. The accuracies on LFW are shown below. Generally, we report the average but we release the model 3 ( models) here. Experiment 1 2 3 (released) 4 5 : : : : : : : : : : : : ACC 99.24% 99.20% 99.30% 99.27% 99.13% 2. Other intermediate results: LFW features: Google Drive Baidu Training log: Google Drive Baidu Reference Mtcnn for face detection Sphereface_pytorch for face recognition paper of MTCNN (ReferencePaper/mtcnn.pdf) paper of Sphereface (ReferencePaper/shpereface.pdf)",Face Verification,Face Verification 2460,Computer Vision,Computer Vision,Computer Vision,"InsightFace: 2D and 3D Face Analysis Project By Jia Guo and Jiankang Deng License The code of InsightFace is released under the MIT License. Recent Update 2018.07.17 : Model Zoo , Dataset Zoo 2018.06.14 : There's a large scale Asian training dataset provided by Glint, see this discussion for detail. 2018.05.16 : A new training dataset released here which can easily achieve much better accuracy. See discussion for detail. 2018.04.23 : Our implementation of MobileFaceNet is now available. Please set network y1 to use this lightweight but powerful backbone. 2018.03.26 : We can train with combined margin(loss type 5), see Verification Results On Combined Margin ( verification results on combined margin). 2018.02.13 : We achieved state of the art performance on MegaFace Challenge . Please check our paper and code for implementation details. Contents Deep Face Recognition ( deep face recognition) Introduction ( introduction) Training Data ( training Data) Train ( train) Pretrained Models ( pretrained models) Verification Results On Combined Margin ( verification results on combined margin) Test on MegaFace ( test on megaface) 512 D Feature Embedding ( 512 d feature embedding) Third party Re implementation ( third party re implementation) Face Alignment ( face alignment) Face Detection ( face detection) Citation ( citation) Contact ( contact) Deep Face Recognition Introduction In this repository, we provide training data, network settings and loss designs for deep face recognition. The training data includes the normalised MS1M and VGG2 datasets, which were already packed in the MxNet binary format. The network backbones include ResNet, InceptionResNet_v2, DenseNet, DPN and MobiletNet. The loss functions include Softmax, SphereFace, CosineFace, ArcFace and Triplet (Euclidean/Angular) Loss. loss type 0: Softmax loss type 1: SphereFace loss type 2: CosineFace loss type 4: ArcFace loss type 5: Combined Margin loss type 12: TripletLoss ! margin penalty for target logit Our method, ArcFace, was initially described in an arXiv technical report . By using this repository, you can simply achieve LFW 99.80%+ and Megaface 98%+ by a single model. This repository can help researcher/engineer to develop deep face recognition algorithms quickly by only two steps: download the binary dataset and run the training script. Training Data All face images are aligned by MTCNN and cropped to 112x112: Refined MS1M@BaiduDrive , Refined MS1M@GoogleDrive VGGFace2@BaiduDrive , VGGFace2@GoogleDrive Please check src/data/face2rec2.py on how to build a binary face dataset. Any public available MTCNN can be used to align the faces, and the performance should not change. We will improve the face normalisation step by full pose alignment methods recently. Note: If you use the refined MS1M dataset and the cropped VGG2 dataset, please cite the original papers. Train 1. Install MXNet with GPU support (Python 2.7). pip install mxnet cu80 2. Clone the InsightFace repository. We call the directory insightface as INSIGHTFACE_ROOT . git clone recursive 3. Download the training set ( MS1M ) and place it in $INSIGHTFACE_ROOT/datasets/ . Each training dataset includes following 7 files: Shell faces_ms1m_112x112/ train.idx train.rec property lfw.bin cfp_ff.bin cfp_fp.bin agedb_30.bin The first three files are the training dataset while the last four files are verification sets. 4. Train deep face recognition models. In this part, we assume you are in the directory $INSIGHTFACE_ROOT/src/ . export MXNET_CPU_WORKER_NTHREADS 24 export MXNET_ENGINE_TYPE ThreadedEnginePerDevice We give some examples below. Our experiments were conducted on the Tesla P40 GPU. (1). Train ArcFace with LResNet100E IR. Shell CUDA_VISIBLE_DEVICES '0,1,2,3' python u train_softmax.py network r100 loss type 4 margin m 0.5 data dir ../datasets/faces_ms1m_112x112 prefix ../model r100 It will output verification results of LFW , CFP FF , CFP FP and AgeDB 30 every 2000 batches. You can check all command line options in train\_softmax.py . This model can achieve LFW 99.80+ and MegaFace 98.0%+ . (2). Train CosineFace with LResNet50E IR. Shell CUDA_VISIBLE_DEVICES '0,1,2,3' python u train_softmax.py network r50 loss type 2 margin m 0.35 data dir ../datasets/faces_ms1m_112x112 prefix ../model r50 amsoftmax (3). Train Softmax with LMobileNetE. Shell CUDA_VISIBLE_DEVICES '0,1,2,3' python u train_softmax.py network m1 loss type 0 data dir ../datasets/faces_ms1m_112x112 prefix ../model m1 softmax (4). Fine turn the above Softmax model with Triplet loss. Shell CUDA_VISIBLE_DEVICES '0,1,2,3' python u train_triplet.py network m1 lr 0.005 mom 0.0 per batch size 150 data dir ../datasets/faces_ms1m_112x112 pretrained ../model m1 softmax,50 prefix ../model m1 triplet (5). Train LDPN107E network with Softmax loss on VGGFace2 dataset. Shell CUDA_VISIBLE_DEVICES '0,1,2,3,4,5,6,7' python u train_softmax.py network p107 loss type 0 per batch size 64 data dir ../datasets/faces_vgg_112x112 prefix ../model p107 softmax 5. Verification results. LResNet100E IR network trained on MS1M dataset with ArcFace loss: Method LFW(%) CFP FF(%) CFP FP(%) AgeDB 30(%) Ours 99.80+ 99.85+ 94.0+ 97.90+ LResNet50E IR network trained on VGGFace2 dataset with ArcFace loss: Method LFW(%) CFP FF(%) CFP FP(%) AgeDB 30(%) Ours 99.7+ 99.6+ 97.1+ 95.7+ We report the verification accuracy after removing training set overlaps to strictly follow the evaluation metric. (C) means after cleaning Dataset Identities Images Identites(C) Images(C) Acc Acc(C) LFW 85742 3850179 80995 3586128 99.83 99.81 CFP FP 85742 3850179 83706 3736338 94.04 94.03 AgeDB 30 85742 3850179 83775 3761329 98.08 97.87 Pretrained Models You can use $INSIGHTFACE/src/eval/verification.py to test all the pre trained models. 1. LResNet50E IR@BaiduDrive , @GoogleDrive Performance: Method LFW(%) CFP FF(%) CFP FP(%) AgeDB 30(%) MegaFace(%) Ours 99.80 99.83 92.74 97.76 97.64 2. LResNet34E IR@BaiduDrive Performance: Method LFW(%) CFP FF(%) CFP FP(%) AgeDB 30(%) MegaFace(%) Ours 99.65 99.77 92.12 97.70 96.70 Caffe LResNet50E IR@BaiduDrive , converted by above MXNet model. Performance: Method LFW(%) CFP FF(%) CFP FP(%) AgeDB 30(%) MegaFace1M(%) Ours 99.74 TBD TBD TBD TBD Verification Results on Combined Margin A combined margin method was proposed as a function of target logits value and original θ : COM(θ) cos(m_1 θ+m_2) m_3 For training with m1 0.9, m2 0.4, m3 0.15 , run following command: CUDA_VISIBLE_DEVICES '0,1,2,3' python u train_softmax.py network r100 loss type 5 margin a 0.9 margin m 0.4 margin b 0.15 data dir ../datasets/faces_ms1m_112x112 prefix ../model r100 Method m1 m2 m3 LFW CFP FP AgeDB 30 W&F Norm Softmax 1 0 0 99.28 88.50 95.13 SphereFace 1.5 0 0 99.76 94.17 97.30 CosineFace 1 0 0.35 99.80 94.4 97.91 ArcFace 1 0.5 0 99.83 94.04 98.08 Combined Margin 1.2 0.4 0 99.80 94.08 98.05 Combined Margin 1.1 0 0.35 99.81 94.50 98.08 Combined Margin 1 0.3 0.2 99.83 94.51 98.13 Combined Margin 0.9 0.4 0.15 99.83 94.20 98.16 Test on MegaFace In this part, we assume you are in the directory $INSIGHTFACE_ROOT/src/megaface/ . Note: We found there are overlap identities between facescrub dataset and Megaface distractors, which significantly affects the identification performance. This list is released under $INSIGHTFACE_ROOT/src/megaface/ . 1. Align all face images of facescrub dataset and megaface distractors. Please check the alignment scripts under $INSIGHTFACE_ROOT/src/align/ . 2. Generate feature files for both facescrub and megaface images. python u gen_megaface.py 3. Remove Megaface noises which generates new feature files. python u remove_noises.py 4. Run megaface development kit to produce final result. 512 D Feature Embedding In this part, we assume you are in the directory $INSIGHTFACE_ROOT/deploy/ . The input face image should be generally centre cropped. We use RNet+ONet of MTCNN to further align the image before sending it to the feature embedding network. 1. Prepare a pre trained model. 2. Put the model under $INSIGHTFACE_ROOT/models/ . For example, $INSIGHTFACE_ROOT/models/model r34 amf . 3. Run the test script $INSIGHTFACE_ROOT/deploy/test.py . For single cropped face image(112x112), total inference time is only 17ms on our testing server(Intel E5 2660 @ 2.00GHz, Tesla M40, LResNet34E IR ). Third party Re implementation TensorFlow: InsightFace_TF Face Alignment Todo Face Detection Todo Citation If you find InsightFace useful in your research, please consider to cite the following related papers: @article{deng2018arcface, title {ArcFace: Additive Angular Margin Loss for Deep Face Recognition}, author {Deng, Jiankang and Guo, Jia and Zafeiriou, Stefanos}, journal {arXiv:1801.07698}, year {2018} } Contact Jia Guo (guojia at gmail.com) Jiankang Deng (jiankangdeng at gmail.com)",Face Verification,Face Verification 2467,Computer Vision,Computer Vision,Computer Vision,"Face recognition using Facenet Model In this Project we are able to recognize the trained faces in real time using laptop/PC webcam. An Overview of Project: > Downloading the Facenet Model and making it ready to use. > Capturing the image/video in which faces should be recognized. > Apply Face Detection on the frame/image using Haar Caascade/dlib(Here dlib is used). > Extracting features of each face using Facenet model. > Recognizing the face using various technique,threshold values(We should be able to correctly recognize the face and also should be able recognize whether the face is of stranger) To know about Facenet Model , go through",Face Verification,Face Verification 2514,Computer Vision,Computer Vision,Computer Vision,"Facenet for face recognition using pytorch Pytorch implementation of the paper: FaceNet: A Unified Embedding for Face Recognition and Clustering . Training of network is done using triplet loss. How to train/validate model Download vggface2 (for training) and lfw (for validation) datasets. Align face image files by following David Sandberg's instruction (part of Face alignment ). Write list file of face images by running datasets/write_csv_for_making_dataset.ipynb This is aready in the directory of datasets so that you don't need to do again if you are urgent. To run this one need to modify paths in accordance with location of image dataset. Train Again, one need to modify paths in accordance with location of image dataset. Also feel free to change some parameters. Results Accuracy on VGGFace2 and LFW datasets ! accuracy (./log/tmp/accuracy.jpg) Triplet loss on VGGFace2 and LFW datasets ! loss (./log/tmp/loss.jpg) ROC curve on LFW datasets for validation ! roc curve (./log/tmp/roc_valid_epoch_39.png) References",Face Verification,Face Verification 2569,Computer Vision,Computer Vision,Computer Vision,"Real time face recognition with mtcnn, kcf tracker and arcface (insightface) This is an inference implementation of real time face recogtion, wholely written in C++ with TVM inference runtine. Using mtcnn for facedetection, kcf tracker and insightface model. MTCNN ZHANG2016 Zhang, K., Zhang, Z., Li, Z., and Qiao, Y. (2016). Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters, 23(10):1499–1503. Insightface Deng, Jiankang and Guo, Jia and Niannan, Xue and Zafeiriou, Stefanos. ArcFace: Additive Angular Margin Loss for Deep Face Recognition. CVPR 2019. ArXiv tecnical report Requirements OpenCV 3.4+ Boost FileSystem (1.58+) CMake 3.2+ Clang 6.0+ TVM runtime You can folowing this doc file (install_requirements.md) or official doc to install TVM runtime Build with flowing command Change two line cmake config in facerecogtion model (facerecognition/CMakeLists.txt) to your TVM source that you downloaded: set (DMLC_INCLUDE /home/thongpb/works/tvm/3rdparty/dmlc core/include ) set (DLPACK_INC /home/thongpb/works/tvm/3rdparty/dlpack/include ) Compile: cd realtime facereccpp mkdir build cd build cmake .. make j4 Sample command You can modify parameters in params.xml (data/params.xml) ./facerecognition/facerecognition ../data/params.xml Acknowledgments The MTCNN implementation take from The model used for facial feature extraction came from insightface MODEL_ZOO The mtcnn model files are taken from",Face Verification,Face Verification 2671,Computer Vision,Computer Vision,Computer Vision,"Face Recognition A light weight face recognition implementation using a pre trained facenet model. Most of the code is taken from David Sandberg's facenet repository. Steps to follow: 1. Create a dataset of faces for each person and arrange them in below order root folder │ └───Person 1 │ │───IMG1 │ │───IMG2 │ │ .... └───Person 2 │───IMG1 │───IMG2 .... 2. Align the faces using MTCNN or dllib. Please use the scripts available in lib/src/align. For this project i aligned faces using MTCNN.(Please have a look at aligning faces if you need any clarifications) Before alignment After alignment 3. Download pre trained weight ,extract and keep it in lib/src/ckpt folder (for detailed info about availabe weights: available weights ) 4. Create face embeddings using pre trained facenet model. Run the below scripts by changing the folder paths.(edit paths in lines ) python lib/src/create_face_embeddings.py Once you run the script succesfully, a pickle file with name face_embeddings.pickle will be generated inside lib/src folder 5. Start the server by running the command python server/rest server.py access the UI using url It will show a button with face recognition as label. Once you click on it, automatically your primary camera will get turned on and start recognizing the faces. sample result ! alt text for more information, please go through my blog NOTE: The faces are identified using retrievel method, instead if you have enough data, you can train a classifier on top of face embeddings ( Train a classifier on own images ) References: Deepface paper Facenet paper Pre trained facenet model",Face Verification,Face Verification 2686,Computer Vision,Computer Vision,Computer Vision,"SphereFace : Deep Hypersphere Embedding for Face Recognition By Weiyang Liu, Yandong Wen, Zhiding Yu, Ming Li, Bhiksha Raj and Le Song License SphereFace is released under the MIT License (refer to the LICENSE file for details). Update 2018.2.1 : As requested, the prototxt files for SphereFace 64 are released. 2018.1.27 : We updated the appendix of our SphereFace paper with useful experiments and analysis. Take a look here . The content contains: The intuition of removing the last ReLU; Why do we want to normalize the weights other than because we need more geometric interpretation? Empirical experiment of zeroing out the biases; More 2D visualization of A Softmax loss on MNIST; Angular Fisher score for evaluating the angular feature discriminativeness, which is a new and straightforward evluation metric other than the final accuracy. Experiments of SphereFace on MegaFace with different convolutional layers; The annealing optimization strategy for A Softmax loss; Details of the 3 patch ensemble strategy in MegaFace challenge; 2018.1.20 : We updated some resources to summarize the current advances in angular margin learning. Take a look here ( resources for angular margin learning). Contents 0. Introduction ( introduction) 0. Citation ( citation) 0. Requirements ( requirements) 0. Installation ( installation) 0. Usage ( usage) 0. Models ( models) 0. Results ( results) 0. Video Demo ( video demo) 0. Note ( note) 0. Third party re implementation ( third party re implementation) 0. Resources for angular margin learning ( resources for angular margin learning) Introduction The repository contains the entire pipeline (including all the preprocessings) for deep face recognition with SphereFace . The recognition pipeline contains three major steps: face detection, face alignment and face recognition. SphereFace is a recently proposed face recognition method. It was initially described in an arXiv technical report and then published in CVPR 2017 . The most up to date paper with more experiments can be found at arXiv or here . To facilitate the face recognition research, we give an example of training on CAISA WebFace and testing on LFW using the 20 layer CNN architecture described in the paper (i.e. SphereFace 20). In SphereFace, our network architecures use residual units as building blocks, but are quite different from the standrad ResNets (e.g., BatchNorm is not used, the prelu replaces the relu, different initializations, etc). We proposed 4 layer, 20 layer, 36 layer and 64 layer architectures for face recognition (details can be found in the paper ( ) and prototxt files ). We provided the 20 layer architecure as an example here. If our proposed architectures also help your research, please consider to cite our paper. SphereFace achieves the state of the art verification performance (previously No.1) in MegaFace Challenge under the small training set protocol. Citation If you find SphereFace useful in your research, please consider to cite: @InProceedings{Liu_2017_CVPR, title {SphereFace: Deep Hypersphere Embedding for Face Recognition}, author {Liu, Weiyang and Wen, Yandong and Yu, Zhiding and Li, Ming and Raj, Bhiksha and Song, Le}, booktitle {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, year {2017} } Our another closely related previous work in ICML'16 ( more ): @InProceedings{Liu_2016_ICML, title {Large Margin Softmax Loss for Convolutional Neural Networks}, author {Liu, Weiyang and Wen, Yandong and Yu, Zhiding and Yang, Meng}, booktitle {Proceedings of The 33rd International Conference on Machine Learning}, year {2016} } Requirements 1. Requirements for Matlab 2. Requirements for Caffe and matcaffe (see: Caffe installation instructions ) 3. Requirements for MTCNN (see: MTCNN face detection & alignment ) and Pdollar toolbox (see: Piotr's Image & Video Matlab Toolbox ). Installation 1. Clone the SphereFace repository. We'll call the directory that you cloned SphereFace as SPHEREFACE_ROOT . Shell git clone recursive 2. Build Caffe and matcaffe Shell cd $SPHEREFACE_ROOT/tools/caffe sphereface Now follow the Caffe installation instructions here: make all j8 && make matcaffe Usage After successfully completing the installation ( installation) , you are ready to run all the following experiments. Part 1: Preprocessing Note: In this part, we assume you are in the directory $SPHEREFACE_ROOT/preprocess/ 1. Download the training set ( CASIA WebFace ) and test set ( LFW ) and place them in data/ . Shell mv /your_path/CASIA_WebFace data/ ./code/get_lfw.sh tar xvf data/lfw.tgz C data/ Please make sure that the directory of data/ contains two datasets. 2. Detect faces and facial landmarks in CAISA WebFace and LFW datasets using MTCNN (see: MTCNN face detection & alignment ). Matlab In Matlab Command Window run code/face_detect_demo.m This will create a file dataList.mat in the directory of result/ . 3. Align faces to a canonical pose using similarity transformation. Matlab In Matlab Command Window run code/face_align_demo.m This will create two folders ( CASIA WebFace 112X96/ and lfw 112X96/ ) in the directory of result/ , containing the aligned face images. Part 2: Train Note: In this part, we assume you are in the directory $SPHEREFACE_ROOT/train/ 1. Get a list of training images and labels. Shell&Matlab mv ../preprocess/result/CASIA WebFace 112X96 data/ In Matlab Command Window run code/get_list.m The aligned face images in folder CASIA WebFace 112X96/ are moved from preprocess folder to train folder. A list CASIA WebFace 112X96.txt is created in the directory of data/ for the subsequent training. 2. Train the sphereface model. Shell ./code/sphereface_train.sh 0,1 After training, a model sphereface_model_iter_28000.caffemodel and a corresponding log file sphereface_train.log are placed in the directory of result/sphereface/ . Part 3: Test Note: In this part, we assume you are in the directory $SPHEREFACE_ROOT/test/ 1. Get the pair list of LFW ( view 2 ). Shell mv ../preprocess/result/lfw 112X96 data/ ./code/get_pairs.sh Make sure that the LFW dataset and pairs.txt in the directory of data/ 1. Extract deep features and test on LFW. Matlab In Matlab Command Window run code/evaluation.m Finally we have the sphereface_model.caffemodel , extracted features pairs.mat in folder result/ , and accuracy on LFW like this: fold 1 2 3 4 5 6 7 8 9 10 AVE : : : : : : : : : : : : : : : : : : : : : : : : ACC 99.33% 99.17% 98.83% 99.50% 99.17% 99.83% 99.17% 98.83% 99.83% 99.33% 99.30% Models 1. Visualizations of network architecture (tools from ethereon ): SphereFace 20: link 2. Model file SphereFace 20: Google Drive Baidu Third party SphereFace 4 & SphereFace 6: here by zuoqing1988 Results 1. Following the instruction, we go through the entire pipeline for 5 times. The accuracies on LFW are shown below. Generally, we report the average but we release the model 3 ( models) here. Experiment 1 2 3 (released) 4 5 : : : : : : : : : : : : ACC 99.24% 99.20% 99.30% 99.27% 99.13% 2. Other intermediate results: LFW features: Google Drive Baidu Training log: Google Drive Baidu Video Demo SphereFace Demo Please click the image to watch the Youtube video. For Youku users, click here . Details: 1. It is an open set face recognition scenario. The video is processed frame by frame, following the same pipeline in this repository. 2. Gallery set consists of 6 identities. Each main character has only 1 gallery face image. All the detected faces are included in probe set. 3. There is no overlap between gallery set and training set (CASIA WebFace). 4. The scores between each probe face and gallery set are computed by cosine similarity. If the maximal score of a probe face is smaller than a pre definded threshold, the probe face would be considered as an outlier. 5. Main characters are labeled by boxes with different colors. ( ! ff0000 Rachel, ! ffff00 Monica, ! ff80ff Phoebe, ! 00ffff Joey, ! 0000ff Chandler, ! 00ff00 Ross) Note 1. Backward gradient In this implementation, we did not strictly follow the equations in paper. Instead, we normalize the scale of gradient. It can be interpreted as a varying strategy for learning rate to help converge more stably. Similar idea and intuition also appear in normalized gradients and projected gradient descent . More specifically, if the original gradient of f w.r.t x can be written as df/dx coeff_w \ w + coeff_x \ x , we use the normalized version df/dx (coeff_w \ w + coeff_x \ x) / norm_wx to perform backward propragation, where norm_wx is sqrt(coeff_w^2 + coeff_x^2) . The same operation is also applied to the gradient of f w.r.t w . In fact, you do not necessarily need to use the original gradient, since the original gradient sometimes is not an optimal design. One important criterion for modifying the backprop gradient is that the new gradient (strictly speaking, it is not a gradient anymore) need to make the objective value decrease stably and consistently. (In terms of some failure cases for gradient based back prop, I recommand a great talk by Shai Shalev Shwartz ) If you use the original gradient to do the backprop, you could still make it work but may need different lambda settings, iteration number and learning rate decay strategy. 2. Lambda and Note for training (When the loss becomes 87) Please refer to our previous note and explanation . 3. According to recent advances, using feature normalization with a tunable scaling parameter s can significantly improve the performance of SphereFace on MegaFace challenge This is supported by the experiments done by CosFace . Similar idea also appears in additive margin softmax . 4. Difficulties in convergence When you encounter difficulties in convergence (it may appear if you use SphereFace in another dataset), usually there are a few easy ways to address it. First, try to use large mini batch size. Second, try to use PReLU instead of ReLU. Third, increase the width and depth of our network. Fourth, try to use better initialization. For example, use the pretrained model from the original softmax loss (it is also equivalent to finetuning). Last and the most effective thing you could try is to change the hyper parameters for lambda_min, lambda and its decay speed. Third party re implementation PyTorch: code by clcarwin . TensorFlow: code by pppoe . TensorFlow: code by hujun100 . TensorFlow: code by HiKapok . TensorFlow: code by andrewhuman . MXNet: code by deepinsight (by setting loss type 1: SphereFace). Caffe2: code by tpys . Trained on MS 1M: code by KaleidoZhouYN . System: A cool face demo system using SphereFace by tpys . Third party pretrained models: code by goodluckcwl . Resources for angular margin learning L Softmax loss and SphereFace present a promising framework for angular representation learning, which is shown very effective in deep face recognition. We are super excited that our works has inspired many well performing methods (and loss functions). We list a few of them for your potential reference (not very up to date): Additive margin softmax: paper and code CosFace: paper ArcFace/InsightFace: paper and code NormFace: paper and code L2 Softmax: paper von Mises Fisher Mixture Model: paper COCO loss: paper and code Angular Triplet Loss: code To evaluate the effectiveness of the angular margin learning method, you may consider to use the angular Fisher score proposed in the Appendix E of our SphereFace Paper . Disclaimer: Some of these methods may not necessarily be inspired by us, but we still list them due to its relevance and excellence. Contact Weiyang Liu and Yandong Wen Questions can also be left as issues in the repository. We will be happy to answer them.",Face Verification,Face Verification 2689,Computer Vision,Computer Vision,Computer Vision,"Ring Loss Keras Keras implementation of Ring Loss : Convex Feature Normalization for Face Recognition. Based on What? This paper highlights the importance of feature normalization in feature space for better clustering, unlike earlier methods (e.g L2 Constrained Softmax). The authors have designed a novel loss called Ring Loss to optimize over this norm constraint. Why? The direct approach to feature normalization through the hard normalization operation results in a non convex formulation. Instead, Ring loss applies soft normalization, where it gradually learns to constrain the norm to the scaled unit circle while preserving convexity leading to more robust features. Getting Started Install Tensorflow and Keras. Download ringloss keras.py to your working directory and import everything. Usage Initialize a Ring Loss layer and call the layer with your input feature lambda_ring 0.1 Loss Weight Range : 0.1 0.5 to ensure that ring loss doesn't dominate the optimization process. Since this is a relaxed normalization constraint, keep it chill... ring_loss Ring_Loss(radius 1.0, loss_type 'huber', trainable False, name 'ring_loss')(feature) > loss_types 'cauchy', 'geman', 'huber', 'squared' 'huber' is default > shape of feature (batch_size, feature_dims) Finally, compile your model with joint loss Softmax and Ringloss init number of classes num_classes 10 init final fully connected layer after the feature layer x_final Dense(num_classes, name 'final_layer', kernel_initializer 'he_normal')(feature) output Activation('softmax', name 'softmax_out')(x_final) compile model with joint loss (softmax loss + lambda_ring ring loss) model Model(inputs inputs, outputs output, ring_loss ) model.compile(loss {'softmax_out' : 'categorical_crossentropy', 'ring_loss': identity_loss}, optimizer opt, metrics 'accuracy' , loss_weights 1,lambda_ring ) Training Pass a random output for ring loss during the batch data generation to satisfy the outputs. random_y_train np.random.rand(batch_size,1) x_label, y_label data , y_trues, random_y_train References . Contact To ask questions or report issues, please open an issue on the issues tracker .",Face Verification,Face Verification 2740,Computer Vision,Computer Vision,Computer Vision,"SphereFace : Deep Hypersphere Embedding for Face Recognition By Weiyang Liu, Yandong Wen, Zhiding Yu, Ming Li, Bhiksha Raj and Le Song License SphereFace is released under the MIT License (refer to the LICENSE file for details). Update 2018.8.14 : We recommand an interesting ECCV 2018 paper that comprehensively evaluates SphereFace (A Softmax) on current widely used face datasets and their proposed noise controlled IMDb Face dataset. Interested users can try to train SphereFace on their IMDb Face dataset. Take a look here . 2018.5.23 : A new SphereFace+ that explicitly enhances the inter class separability has been introduced in our technical report. Check it out here . Code is released here . 2018.2.1 : As requested, the prototxt files for SphereFace 64 are released. 2018.1.27 : We updated the appendix of our SphereFace paper with useful experiments and analysis. Take a look here . The content contains: The intuition of removing the last ReLU; Why do we want to normalize the weights other than because we need more geometric interpretation? Empirical experiment of zeroing out the biases; More 2D visualization of A Softmax loss on MNIST; Angular Fisher score for evaluating the angular feature discriminativeness, which is a new and straightforward evluation metric other than the final accuracy. Experiments of SphereFace on MegaFace with different convolutional layers; The annealing optimization strategy for A Softmax loss; Details of the 3 patch ensemble strategy in MegaFace challenge; 2018.1.20 : We updated some resources to summarize the current advances in angular margin learning. Take a look here ( resources for angular margin learning). Contents 0. Introduction ( introduction) 0. Citation ( citation) 0. Requirements ( requirements) 0. Installation ( installation) 0. Usage ( usage) 0. Models ( models) 0. Results ( results) 0. Video Demo ( video demo) 0. Note ( note) 0. Third party re implementation ( third party re implementation) 0. Resources for angular margin learning ( resources for angular margin learning) Introduction The repository contains the entire pipeline (including all the preprocessings) for deep face recognition with SphereFace . The recognition pipeline contains three major steps: face detection, face alignment and face recognition. SphereFace is a recently proposed face recognition method. It was initially described in an arXiv technical report and then published in CVPR 2017 . The most up to date paper with more experiments can be found at arXiv or here . To facilitate the face recognition research, we give an example of training on CAISA WebFace and testing on LFW using the 20 layer CNN architecture described in the paper (i.e. SphereFace 20). In SphereFace, our network architecures use residual units as building blocks, but are quite different from the standrad ResNets (e.g., BatchNorm is not used, the prelu replaces the relu, different initializations, etc). We proposed 4 layer, 20 layer, 36 layer and 64 layer architectures for face recognition (details can be found in the paper ( ) and prototxt files ). We provided the 20 layer architecure as an example here. If our proposed architectures also help your research, please consider to cite our paper. SphereFace achieves the state of the art verification performance (previously No.1) in MegaFace Challenge under the small training set protocol. Citation If you find SphereFace useful in your research, please consider to cite: @InProceedings{Liu_2017_CVPR, title {SphereFace: Deep Hypersphere Embedding for Face Recognition}, author {Liu, Weiyang and Wen, Yandong and Yu, Zhiding and Li, Ming and Raj, Bhiksha and Song, Le}, booktitle {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, year {2017} } Our another closely related previous work in ICML'16 ( more ): @InProceedings{Liu_2016_ICML, title {Large Margin Softmax Loss for Convolutional Neural Networks}, author {Liu, Weiyang and Wen, Yandong and Yu, Zhiding and Yang, Meng}, booktitle {Proceedings of The 33rd International Conference on Machine Learning}, year {2016} } Requirements 1. Requirements for Matlab 2. Requirements for Caffe and matcaffe (see: Caffe installation instructions ) 3. Requirements for MTCNN (see: MTCNN face detection & alignment ) and Pdollar toolbox (see: Piotr's Image & Video Matlab Toolbox ). Installation 1. Clone the SphereFace repository. We'll call the directory that you cloned SphereFace as SPHEREFACE_ROOT . Shell git clone recursive 2. Build Caffe and matcaffe Shell cd $SPHEREFACE_ROOT/tools/caffe sphereface Now follow the Caffe installation instructions here: make all j8 && make matcaffe Usage After successfully completing the installation ( installation) , you are ready to run all the following experiments. Part 1: Preprocessing Note: In this part, we assume you are in the directory $SPHEREFACE_ROOT/preprocess/ 1. Download the training set ( CASIA WebFace ) and test set ( LFW ) and place them in data/ . Shell mv /your_path/CASIA_WebFace data/ ./code/get_lfw.sh tar xvf data/lfw.tgz C data/ Please make sure that the directory of data/ contains two datasets. 2. Detect faces and facial landmarks in CAISA WebFace and LFW datasets using MTCNN (see: MTCNN face detection & alignment ). Matlab In Matlab Command Window run code/face_detect_demo.m This will create a file dataList.mat in the directory of result/ . 3. Align faces to a canonical pose using similarity transformation. Matlab In Matlab Command Window run code/face_align_demo.m This will create two folders ( CASIA WebFace 112X96/ and lfw 112X96/ ) in the directory of result/ , containing the aligned face images. Part 2: Train Note: In this part, we assume you are in the directory $SPHEREFACE_ROOT/train/ 1. Get a list of training images and labels. Shell&Matlab mv ../preprocess/result/CASIA WebFace 112X96 data/ In Matlab Command Window run code/get_list.m The aligned face images in folder CASIA WebFace 112X96/ are moved from preprocess folder to train folder. A list CASIA WebFace 112X96.txt is created in the directory of data/ for the subsequent training. 2. Train the sphereface model. Shell ./code/sphereface_train.sh 0,1 After training, a model sphereface_model_iter_28000.caffemodel and a corresponding log file sphereface_train.log are placed in the directory of result/sphereface/ . Part 3: Test Note: In this part, we assume you are in the directory $SPHEREFACE_ROOT/test/ 1. Get the pair list of LFW ( view 2 ). Shell mv ../preprocess/result/lfw 112X96 data/ ./code/get_pairs.sh Make sure that the LFW dataset and pairs.txt in the directory of data/ 1. Extract deep features and test on LFW. Matlab In Matlab Command Window run code/evaluation.m Finally we have the sphereface_model.caffemodel , extracted features pairs.mat in folder result/ , and accuracy on LFW like this: fold 1 2 3 4 5 6 7 8 9 10 AVE : : : : : : : : : : : : : : : : : : : : : : : : ACC 99.33% 99.17% 98.83% 99.50% 99.17% 99.83% 99.17% 98.83% 99.83% 99.33% 99.30% Models 1. Visualizations of network architecture (tools from ethereon ): SphereFace 20: link 2. Model file SphereFace 20: Google Drive Baidu Third party SphereFace 4 & SphereFace 6: here by zuoqing1988 Results 1. Following the instruction, we go through the entire pipeline for 5 times. The accuracies on LFW are shown below. Generally, we report the average but we release the model 3 ( models) here. Experiment 1 2 3 (released) 4 5 : : : : : : : : : : : : ACC 99.24% 99.20% 99.30% 99.27% 99.13% 2. Other intermediate results: LFW features: Google Drive Baidu Training log: Google Drive Baidu Video Demo SphereFace Demo Please click the image to watch the Youtube video. For Youku users, click here . Details: 1. It is an open set face recognition scenario. The video is processed frame by frame, following the same pipeline in this repository. 2. Gallery set consists of 6 identities. Each main character has only 1 gallery face image. All the detected faces are included in probe set. 3. There is no overlap between gallery set and training set (CASIA WebFace). 4. The scores between each probe face and gallery set are computed by cosine similarity. If the maximal score of a probe face is smaller than a pre definded threshold, the probe face would be considered as an outlier. 5. Main characters are labeled by boxes with different colors. ( ! ff0000 Rachel, ! ffff00 Monica, ! ff80ff Phoebe, ! 00ffff Joey, ! 0000ff Chandler, ! 00ff00 Ross) Note 1. Backward gradient In this implementation, we did not strictly follow the equations in paper. Instead, we normalize the scale of gradient. It can be interpreted as a varying strategy for learning rate to help converge more stably. Similar idea and intuition also appear in normalized gradients and projected gradient descent . More specifically, if the original gradient of f w.r.t x can be written as df/dx coeff_w \ w + coeff_x \ x , we use the normalized version df/dx (coeff_w \ w + coeff_x \ x) / norm_wx to perform backward propragation, where norm_wx is sqrt(coeff_w^2 + coeff_x^2) . The same operation is also applied to the gradient of f w.r.t w . In fact, you do not necessarily need to use the original gradient, since the original gradient sometimes is not an optimal design. One important criterion for modifying the backprop gradient is that the new gradient (strictly speaking, it is not a gradient anymore) need to make the objective value decrease stably and consistently. (In terms of some failure cases for gradient based back prop, I recommand a great talk by Shai Shalev Shwartz ) If you use the original gradient to do the backprop, you could still make it work but may need different lambda settings, iteration number and learning rate decay strategy. 2. Lambda and Note for training (When the loss becomes 87) Please refer to our previous note and explanation . 3. According to recent advances, using feature normalization with a tunable scaling parameter s can significantly improve the performance of SphereFace on MegaFace challenge This is supported by the experiments done by CosFace . Similar idea also appears in additive margin softmax . 4. Difficulties in convergence When you encounter difficulties in convergence (it may appear if you use SphereFace in another dataset), usually there are a few easy ways to address it. First, try to use large mini batch size. Second, try to use PReLU instead of ReLU. Third, increase the width and depth of our network. Fourth, try to use better initialization. For example, use the pretrained model from the original softmax loss (it is also equivalent to finetuning). Last and the most effective thing you could try is to change the hyper parameters for lambda_min, lambda and its decay speed. Third party re implementation PyTorch: code by clcarwin . PyTorch: code by Joyako . TensorFlow: code by pppoe . TensorFlow (with awesome animations): code by YunYang1994 . TensorFlow: code by hujun100 . TensorFlow: code by HiKapok . TensorFlow: code by andrewhuman . MXNet: code by deepinsight (by setting loss type 1: SphereFace). Model compression for SphereFace: code by Siyang Liu (useful in practice) Caffe2: code by tpys . Trained on MS 1M: code by KaleidoZhouYN . System: A cool face demo system using SphereFace by tpys . Third party pretrained models: code by goodluckcwl . Resources for angular margin learning L Softmax loss and SphereFace present a promising framework for angular representation learning, which is shown very effective in deep face recognition. We are super excited that our works has inspired many well performing methods (and loss functions). We list a few of them for your potential reference (not very up to date): Additive margin softmax: paper and code CosFace: paper ArcFace/InsightFace: paper and code NormFace: paper and code L2 Softmax: paper von Mises Fisher Mixture Model: paper COCO loss: paper and code Angular Triplet Loss: code To evaluate the effectiveness of the angular margin learning method, you may consider to use the angular Fisher score proposed in the Appendix E of our SphereFace Paper . Disclaimer: Some of these methods may not necessarily be inspired by us, but we still list them due to its relevance and excellence. Contact Weiyang Liu and Yandong Wen Questions can also be left as issues in the repository. We will be happy to answer them.",Face Verification,Face Verification 2805,Computer Vision,Computer Vision,Computer Vision,Facenet: Real time FaceRecognition SmartCar face recognition (Driver / Passenger) This is completly based on deep learning nueral network and implented using Tensorflow framework. Here you will get how to implement fastly and you can find code at github and uses is demonstrated at YouTube. Installation Python Libraries: Tensorflow (1.10.0) Scipy (1.1.0) Scikit learn (0.19.1) Opencv (3.4.4.19) For ipmlementation and run this code follow this BLOG link . Implemention in video is shown YouTube . Download FaceNet Pre trained models: 20170512 110547.pb Google Drive / Baidu Drive Training using the VGGFace2 dataset Link Getting Started: (make sure you have Pre trained models .pb file) Put your Pre trained models __20170512 110547.pb__ file under __'./model/'__ folder. data_preprocess.py put img under __'./train_img'__ folder and run data_preprocess.py output preprocessed img to __'./pre_img'__ folder. train_classifier.py training your own data under __'./pre_img'__ folder and output __classifier.pkl__ file to __'./classifier/'__ folder. identify_face_image.py face recognition on your own images(change 'input_image' path) identify_face_video.py face recognition on your own videos(change 'input_video' path) References: Facenet is defined and implementation of facenet paper published in Arxiv (FaceNet: A Unified Embedding for Face Recognition and Clustering).The project also uses ideas from the paper Deep Face Recognition from the Visual Geometry Group at Oxford. davidsandberg/facenet Github FaceNet FaceNet: A Unified Embedding for Face Recognition and Clustering Face alignment using MTCNN Joint Face Detection and Alignment using Multi task Cascaded Convolutional Networks @AI Sangam,Face Verification,Face Verification 2903,Computer Vision,Computer Vision,Computer Vision,"InsightFace: 2D and 3D Face Analysis Project By Jia Guo and Jiankang Deng License The code of InsightFace is released under the MIT License. Recent Update 2018.06.14 : There's a large scale Asian training dataset provided by Glint, see this discussion for detail. 2018.05.16 : A new training dataset released here which can easily achieve much better accuracy. See discussion for detail. 2018.04.23 : Our implementation of MobileFaceNet is now available. Please set network y1 to use this lightweight but powerful backbone. 2018.03.26 : We can train with combined margin(loss type 5), see Verification Results On Combined Margin ( verification results on combined margin). 2018.02.13 : We achieved state of the art performance on MegaFace Challenge . Please check our paper and code for implementation details. Contents Deep Face Recognition ( deep face recognition) Introduction ( introduction) Training Data ( training Data) Train ( train) Pretrained Models ( pretrained models) Verification Results On Combined Margin ( verification results on combined margin) Test on MegaFace ( test on megaface) 512 D Feature Embedding ( 512 d feature embedding) Third party Re implementation ( third party re implementation) Face Alignment ( face alignment) Face Detection ( face detection) Citation ( citation) Contact ( contact) Deep Face Recognition Introduction In this repository, we provide training data, network settings and loss designs for deep face recognition. The training data includes the normalised MS1M and VGG2 datasets, which were already packed in the MxNet binary format. The network backbones include ResNet, InceptionResNet_v2, DenseNet, DPN and MobiletNet. The loss functions include Softmax, SphereFace, CosineFace, ArcFace and Triplet (Euclidean/Angular) Loss. loss type 0: Softmax loss type 1: SphereFace loss type 2: CosineFace loss type 4: ArcFace loss type 5: Combined Margin loss type 12: TripletLoss ! margin penalty for target logit Our method, ArcFace, was initially described in an arXiv technical report . By using this repository, you can simply achieve LFW 99.80%+ and Megaface 98%+ by a single model. This repository can help researcher/engineer to develop deep face recognition algorithms quickly by only two steps: download the binary dataset and run the training script. Training Data All face images are aligned by MTCNN and cropped to 112x112: Refined MS1M@BaiduDrive , Refined MS1M@GoogleDrive VGGFace2@BaiduDrive , VGGFace2@GoogleDrive Please check src/data/face2rec2.py on how to build a binary face dataset. Any public available MTCNN can be used to align the faces, and the performance should not change. We will improve the face normalisation step by full pose alignment methods recently. Note: If you use the refined MS1M dataset and the cropped VGG2 dataset, please cite the original papers. Train 1. Install MXNet with GPU support (Python 2.7). pip install mxnet cu80 2. Clone the InsightFace repository. We call the directory insightface as INSIGHTFACE_ROOT . git clone recursive 3. Download the training set ( MS1M ) and place it in $INSIGHTFACE_ROOT/datasets/ . Each training dataset includes following 7 files: Shell faces_ms1m_112x112/ train.idx train.rec property lfw.bin cfp_ff.bin cfp_fp.bin agedb_30.bin The first three files are the training dataset while the last four files are verification sets. 4. Train deep face recognition models. In this part, we assume you are in the directory $INSIGHTFACE_ROOT/src/ . export MXNET_CPU_WORKER_NTHREADS 24 export MXNET_ENGINE_TYPE ThreadedEnginePerDevice We give some examples below. Our experiments were conducted on the Tesla P40 GPU. (1). Train ArcFace with LResNet100E IR. Shell CUDA_VISIBLE_DEVICES '0,1,2,3' python u train_softmax.py network r100 loss type 4 margin m 0.5 data dir ../datasets/faces_ms1m_112x112 prefix ../model r100 It will output verification results of LFW , CFP FF , CFP FP and AgeDB 30 every 2000 batches. You can check all command line options in train\_softmax.py . This model can achieve LFW 99.80+ and MegaFace 98.0%+ . (2). Train CosineFace with LResNet50E IR. Shell CUDA_VISIBLE_DEVICES '0,1,2,3' python u train_softmax.py network r50 loss type 2 margin m 0.35 data dir ../datasets/faces_ms1m_112x112 prefix ../model r50 amsoftmax (3). Train Softmax with LMobileNetE. Shell CUDA_VISIBLE_DEVICES '0,1,2,3' python u train_softmax.py network m1 loss type 0 data dir ../datasets/faces_ms1m_112x112 prefix ../model m1 softmax (4). Fine turn the above Softmax model with Triplet loss. Shell CUDA_VISIBLE_DEVICES '0,1,2,3' python u train_softmax.py network m1 loss type 12 lr 0.005 mom 0.0 per batch size 150 data dir ../datasets/faces_ms1m_112x112 pretrained ../model m1 softmax,50 prefix ../model m1 triplet (5). Train LDPN107E network with Softmax loss on VGGFace2 dataset. Shell CUDA_VISIBLE_DEVICES '0,1,2,3,4,5,6,7' python u train_softmax.py network p107 loss type 0 per batch size 64 data dir ../datasets/faces_vgg_112x112 prefix ../model p107 softmax 5. Verification results. LResNet100E IR network trained on MS1M dataset with ArcFace loss: Method LFW(%) CFP FF(%) CFP FP(%) AgeDB 30(%) Ours 99.80+ 99.85+ 94.0+ 97.90+ LResNet50E IR network trained on VGGFace2 dataset with ArcFace loss: Method LFW(%) CFP FF(%) CFP FP(%) AgeDB 30(%) Ours 99.7+ 99.6+ 97.1+ 95.7+ We report the verification accuracy after removing training set overlaps to strictly follow the evaluation metric. (C) means after cleaning Dataset Identities Images Identites(C) Images(C) Acc Acc(C) LFW 85742 3850179 80995 3586128 99.83 99.81 CFP FP 85742 3850179 83706 3736338 94.04 94.03 AgeDB 30 85742 3850179 83775 3761329 98.08 97.87 Pretrained Models You can use $INSIGHTFACE/src/eval/verification.py to test all the pre trained models. 1. LResNet50E IR@BaiduDrive , @GoogleDrive Performance: Method LFW(%) CFP FF(%) CFP FP(%) AgeDB 30(%) MegaFace(%) Ours 99.80 99.83 92.74 97.76 97.64 2. LResNet34E IR@BaiduDrive Performance: Method LFW(%) CFP FF(%) CFP FP(%) AgeDB 30(%) MegaFace(%) Ours 99.65 99.77 92.12 97.70 96.70 Caffe LResNet50E IR@BaiduDrive , converted by above MXNet model. Performance: Method LFW(%) CFP FF(%) CFP FP(%) AgeDB 30(%) MegaFace1M(%) Ours 99.74 TBD TBD TBD TBD Verification Results on Combined Margin A combined margin method was proposed as a function of target logits value and original θ : COM(θ) cos(m_1 θ+m_2) m_3 For training with m1 0.9, m2 0.4, m3 0.15 , run following command: CUDA_VISIBLE_DEVICES '0,1,2,3' python u train_softmax.py network r100 loss type 5 margin a 0.9 margin m 0.4 margin b 0.15 data dir ../datasets/faces_ms1m_112x112 prefix ../model r100 Method m1 m2 m3 LFW CFP FP AgeDB 30 W&F Norm Softmax 1 0 0 99.28 88.50 95.13 SphereFace 1.5 0 0 99.76 94.17 97.30 CosineFace 1 0 0.35 99.80 94.4 97.91 ArcFace 1 0.5 0 99.83 94.04 98.08 Combined Margin 1.2 0.4 0 99.80 94.08 98.05 Combined Margin 1.1 0 0.35 99.81 94.50 98.08 Combined Margin 1 0.3 0.2 99.83 94.51 98.13 Combined Margin 0.9 0.4 0.15 99.83 94.20 98.16 Test on MegaFace In this part, we assume you are in the directory $INSIGHTFACE_ROOT/src/megaface/ . Note: We found there are overlap identities between facescrub dataset and Megaface distractors, which significantly affects the identification performance. This list is released under $INSIGHTFACE_ROOT/src/megaface/ . 1. Align all face images of facescrub dataset and megaface distractors. Please check the alignment scripts under $INSIGHTFACE_ROOT/src/align/ . 2. Generate feature files for both facescrub and megaface images. python u gen_megaface.py 3. Remove Megaface noises which generates new feature files. python u remove_noises.py 4. Run megaface development kit to produce final result. 512 D Feature Embedding In this part, we assume you are in the directory $INSIGHTFACE_ROOT/deploy/ . The input face image should be generally centre cropped. We use RNet+ONet of MTCNN to further align the image before sending it to the feature embedding network. 1. Prepare a pre trained model. 2. Put the model under $INSIGHTFACE_ROOT/models/ . For example, $INSIGHTFACE_ROOT/models/model r34 amf . 3. Run the test script $INSIGHTFACE_ROOT/deploy/test.py . For single cropped face image(112x112), total inference time is only 17ms on our testing server(Intel E5 2660 @ 2.00GHz, Tesla M40, LResNet34E IR ). Third party Re implementation TensorFlow: InsightFace_TF Face Alignment Todo Face Detection Todo Citation If you find InsightFace useful in your research, please consider to cite the following related papers: @article{deng2018arcface, title {ArcFace: Additive Angular Margin Loss for Deep Face Recognition}, author {Deng, Jiankang and Guo, Jia and Zafeiriou, Stefanos}, journal {arXiv:1801.07698}, year {2018} } Contact Jia Guo (guojia at gmail.com) Jiankang Deng (jiankangdeng at gmail.com) insight_mx insight mx insight mx insight_mx insight_mx",Face Verification,Face Verification 2908,Computer Vision,Computer Vision,Computer Vision,"SphereFace : Deep Hypersphere Embedding for Face Recognition 左庆新增:MobileNet SphereFace。 在之前基础上加了batch norm, depthwise convolution, train\code文件夹里面有多种不同的模型可以训练 可能会用到的数据 CASIA WebFace下载链接: 密码:3c67 MS Celeb 1M_clean_list.txt下载链接: 密码:fwwq MS Celeb 1M下载链接链接: 密码:z04o Ms celeb 1M_clean 112X96下载链接 以下是我fork时的信息 By Weiyang Liu, Yandong Wen, Zhiding Yu, Ming Li, Bhiksha Raj and Le Song License SphereFace is released under the MIT License (refer to the LICENSE file for details). Update 2018.1.27 : We updated the appendix of our SphereFace paper with useful experiments and analysis. Take a look here . The content contains: The intuition of removing the last ReLU; Why do we want to normalize the weights other than because we need more geometric interpretation? Empirical experiment of zeroing out the biases; More 2D visualization of A Softmax loss on MNIST; Angular Fisher score for evaluating the angular feature discriminativeness, which is a new and straightforward evluation metric other than the final accuracy. Experiments of SphereFace on MegaFace with different convolutional layers; The annealing optimization strategy for A Softmax loss; Details of the 3 patch ensemble strategy in MegaFace challenge; 2018.1.20 : We updated some resources to summarize the current advances in angular margin learning. Take a look here ( resources for angular margin learning). Contents 0. Introduction ( introduction) 0. Citation ( citation) 0. Requirements ( requirements) 0. Installation ( installation) 0. Usage ( usage) 0. Models ( models) 0. Results ( results) 0. Video Demo ( video demo) 0. Note ( note) 0. Third party re implementation ( third party re implementation) 0. Resources for angular margin learning ( resources for angular margin learning) Introduction The repository contains the entire pipeline (including all the preprocessings) for deep face recognition with SphereFace . The recognition pipeline contains three major steps: face detection, face alignment and face recognition. SphereFace is a recently proposed face recognition method. It was initially described in an arXiv technical report and then published in CVPR 2017 . The most up to date paper with more experiments can be found at arXiv or here . To facilitate the face recognition research, we give an example of training on CAISA WebFace and testing on LFW using the 20 layer CNN architecture described in the paper (i.e. SphereFace 20). In SphereFace, our network architecures use residual units as building blocks, but are quite different from the standrad ResNets (e.g., BatchNorm is not used, the prelu replaces the relu, different initializations, etc). We proposed 4 layer, 20 layer, 36 layer and 64 layer architectures for face recognition (details can be found in the paper ( ) and prototxt files ). We provided the 20 layer architecure as an example here. If our proposed architectures also help your research, please consider to cite our paper. SphereFace achieves the state of the art verification performance (previously No.1) in MegaFace Challenge under the small training set protocol. Citation If you find SphereFace useful in your research, please consider to cite: @InProceedings{Liu_2017_CVPR, title {SphereFace: Deep Hypersphere Embedding for Face Recognition}, author {Liu, Weiyang and Wen, Yandong and Yu, Zhiding and Li, Ming and Raj, Bhiksha and Song, Le}, booktitle {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, year {2017} } Our another closely related previous work in ICML'16 ( more ): @InProceedings{Liu_2016_ICML, title {Large Margin Softmax Loss for Convolutional Neural Networks}, author {Liu, Weiyang and Wen, Yandong and Yu, Zhiding and Yang, Meng}, booktitle {Proceedings of The 33rd International Conference on Machine Learning}, year {2016} } Requirements 1. Requirements for Matlab 2. Requirements for Caffe and matcaffe (see: Caffe installation instructions ) 3. Requirements for MTCNN (see: MTCNN face detection & alignment ) and Pdollar toolbox (see: Piotr's Image & Video Matlab Toolbox ). Installation 1. Clone the SphereFace repository. We'll call the directory that you cloned SphereFace as SPHEREFACE_ROOT . Shell git clone recursive 2. Build Caffe and matcaffe Shell cd $SPHEREFACE_ROOT/tools/caffe sphereface Now follow the Caffe installation instructions here: make all j8 && make matcaffe Usage After successfully completing the installation ( installation) , you are ready to run all the following experiments. Part 1: Preprocessing Note: In this part, we assume you are in the directory $SPHEREFACE_ROOT/preprocess/ 1. Download the training set ( CASIA WebFace ) and test set ( LFW ) and place them in data/ . Shell mv /your_path/CASIA_WebFace data/ ./code/get_lfw.sh tar xvf data/lfw.tgz C data/ Please make sure that the directory of data/ contains two datasets. 2. Detect faces and facial landmarks in CAISA WebFace and LFW datasets using MTCNN (see: MTCNN face detection & alignment ). Matlab In Matlab Command Window run code/face_detect_demo.m This will create a file dataList.mat in the directory of result/ . 3. Align faces to a canonical pose using similarity transformation. Matlab In Matlab Command Window run code/face_align_demo.m This will create two folders ( CASIA WebFace 112X96/ and lfw 112X96/ ) in the directory of result/ , containing the aligned face images. Part 2: Train Note: In this part, we assume you are in the directory $SPHEREFACE_ROOT/train/ 1. Get a list of training images and labels. Shell&Matlab mv ../preprocess/result/CASIA WebFace 112X96 data/ In Matlab Command Window run code/get_list.m The aligned face images in folder CASIA WebFace 112X96/ are moved from preprocess folder to train folder. A list CASIA WebFace 112X96.txt is created in the directory of data/ for the subsequent training. 2. Train the sphereface model. Shell ./code/sphereface/sphereface_train.sh 0,1 After training, a model sphereface_model_iter_28000.caffemodel and a corresponding log file sphereface_train.log are placed in the directory of result/sphereface/ . Part 3: Test Note: In this part, we assume you are in the directory $SPHEREFACE_ROOT/test/ 1. Get the pair list of LFW ( view 2 ). Shell mv ../preprocess/result/lfw 112X96 data/ ./code/get_pairs.sh Make sure that the LFW dataset and pairs.txt in the directory of data/ 1. Extract deep features and test on LFW. Matlab In Matlab Command Window run code/evaluation.m Finally we have the sphereface_model.caffemodel , extracted features pairs.mat in folder result/ , and accuracy on LFW like this: fold 1 2 3 4 5 6 7 8 9 10 AVE : : : : : : : : : : : : : : : : : : : : : : : : ACC 99.33% 99.17% 98.83% 99.50% 99.17% 99.83% 99.17% 98.83% 99.83% 99.33% 99.30% Models 1. Visualizations of network architecture (tools from ethereon ): SphereFace 20: link 2. Model file SphereFace 20: Google Drive Baidu Results 1. Following the instruction, we go through the entire pipeline for 5 times. The accuracies on LFW are shown below. Generally, we report the average but we release the model 3 ( models) here. Experiment 1 2 3 (released) 4 5 : : : : : : : : : : : : ACC 99.24% 99.20% 99.30% 99.27% 99.13% 2. Other intermediate results: LFW features: Google Drive Baidu Training log: Google Drive Baidu Video Demo SphereFace Demo Please click the image to watch the Youtube video. For Youku users, click here . Details: 1. It is an open set face recognition scenario. The video is processed frame by frame, following the same pipeline in this repository. 2. Gallery set consists of 6 identities. Each main character has only 1 gallery face image. All the detected faces are included in probe set. 3. There is no overlap between gallery set and training set (CASIA WebFace). 4. The scores between each probe face and gallery set are computed by cosine similarity. If the maximal score of a probe face is smaller than a pre definded threshold, the probe face would be considered as an outlier. 5. Main characters are labeled by boxes with different colors. ( ! ff0000 Rachel, ! ffff00 Monica, ! ff80ff Phoebe, ! 00ffff Joey, ! 0000ff Chandler, ! 00ff00 Ross) Note 1. Backward gradient In this implementation, we did not strictly follow the equations in paper. Instead, we normalize the scale of gradient. It can be interpreted as a varying strategy for learning rate to help converge more stably. Similar idea and intuition also appear in normalized gradients and projected gradient descent . More specifically, if the original gradient of f w.r.t x can be written as df/dx coeff_w \ w + coeff_x \ x , we use the normalized version df/dx (coeff_w \ w + coeff_x \ x) / norm_wx to perform backward propragation, where norm_wx is sqrt(coeff_w^2 + coeff_x^2) . The same operation is also applied to the gradient of f w.r.t w . In fact, you do not necessarily need to use the original gradient, since the original gradient sometimes is not an optimal design. One important criterion for modifying the backprop gradient is that the new gradient (strictly speaking, it is not a gradient anymore) need to make the objective value decrease stably and consistently. (In terms of some failure cases for gradient based back prop, I recommand a great talk by Shai Shalev Shwartz ) If you use the original gradient to do the backprop, you could still make it work but may need different lambda settings, iteration number and learning rate decay strategy. 2. Lambda and Note for training (When the loss becomes 87) Please refer to our previous note and explanation . 3. According to recent advances, using feature normalization with a tunable scaling parameter s can significantly improve the performance of SphereFace on MegaFace challenge This is supported by the experiments done by CosFace . Similar idea also appears in additive margin softmax . Third party re implementation PyTorch: code by clcarwin . TensorFlow: code by pppoe . TensorFlow: code by hujun100 . MXNet: code by deepinsight (by setting loss type 1: SphereFace) MXNet: code by HaoLiuHust . Caffe2: code by tpys . Trained on MS 1M: code by KaleidoZhouYN . System: A cool face demo system using SphereFace by tpys . Third party pretrained models: code by goodluckcwl Resources for angular margin learning L Softmax loss and SphereFace present a promising framework for angular representation learning, which is shown very effective in deep face recognition. We are super excited that our works has inspired many well performing methods (and loss functions). We list a few of them for your potential reference: Additive margin softmax: paper and code CosFace: paper ArcFace/InsightFace: paper and code NormFace: paper and code L2 Softmax: paper von Mises Fisher Mixture Model: paper COCO loss: paper and code Angular Triplet Loss: code To evaluate the effectiveness of the angular margin learning method, you may consider to use the angular Fisher score proposed in the Appendix E of our SphereFace Paper . Disclaimer: Some of these methods may not necessarily be inspired by us, but we still list them due to its relevance and excellence. Contact Weiyang Liu and Yandong Wen Questions can also be left as issues in the repository. We will be happy to answer them.",Face Verification,Face Verification 2045,Natural Language Processing,Natural Language Processing,Natural Language Processing,"F LM Language modeling. This codebase contains implementation of G LSTM and F LSTM cells from 1 . It also might contain some ongoing experiments. This code was forked from and contains BIGLSTM language model baseline from 2 . Current code runs on Tensorflow r1.5 and supports multi GPU data parallelism using synchronized gradient updates. Perplexity On One Billion Words benchmark using 8 GPUs in one DGX 1, BIG G LSTM G4 was able to achieve 24.29 after 2 weeks of training and 23.36 after 3 weeks. __On 02/06/2018 We found an issue with our experimental setup which makes perplexity numbers listed in the paper invalid.__ __See current numbers in the table below.__ On DGX Station, after 1 week of training using all 4 GPUs (Tesla V100) and batch size of 256 per GPU: Model Perplexity Steps WPS : : : : : : BIGLSTM 35.1 0.99M 33.8K BIG F LSTM F512 36.3 1.67M 56.5K BIG G LSTM G4 40.6 1.65M 56K BIG G LSTM G2 36 1.37M 47.1K BIG G LSTM G8 39.4 1.7M 58.5 Dependencies TensorFlow r1.5 Python 2.7 (should work with Python 3 too) 1B Word Benchmark Dataset To run Assuming the data directory is in: /raid/okuchaiev/Data/LM1B/1 billion word language modeling benchmark r13output/ , execute: export CUDA_VISIBLE_DEVICES 0,1,2,3 SECONDS 604800 LOGSUFFIX FLSTM F512 1week python /home/okuchaiev/repos/f lm/single_lm_train.py logdir /raid/okuchaiev/Workspace/LM/GLSTM G4/$LOGSUFFIX num_gpus 4 datadir /raid/okuchaiev/Data/LM/LM1B/1 billion word language modeling benchmark r13output/ hpconfig run_profiler False,float16_rnn False,max_time $SECONDS,num_steps 20,num_shards 8,num_layers 2,learning_rate 0.2,max_grad_norm 1,keep_prob 0.9,emb_size 1024,projected_size 1024,state_size 8192,num_sampled 8192,batch_size 256,fact_size 512 >> train_$LOGSUFFIX.log 2>&1 python /home/okuchaiev/repos/f lm/single_lm_train.py logdir /raid/okuchaiev/Workspace/LM/GLSTM G4/$LOGSUFFIX num_gpus 1 mode eval_full datadir /raid/okuchaiev/Data/LM/LM1B/1 billion word language modeling benchmark r13output/ hpconfig run_profiler False,float16_rnn False,max_time $SECONDS,num_steps 20,num_shards 8,num_layers 2,learning_rate 0.2,max_grad_norm 1,keep_prob 0.9,emb_size 1024,projected_size 1024,state_size 8192,num_sampled 8192,batch_size 1,fact_size 512 To use G LSTM cell specify num_of_groups parameter. To use F LSTM cell specify fact_size parameter. Note, that current data reader may miss some tokens when constructing mini batches which can have a minor effect on final perplexity. For most accurate results , use batch_size 1 and num_steps 1 in evaluation. Thanks to Ciprian for noticing this. To change hyper parameters The command accepts and additional argument hpconfig which allows to override various hyper parameters, including: batch_size 128 batch size per GPU . Global batch size batch_size num_gpus num_steps 20 number of LSTM cell timesteps num_shards 8 embedding and softmax matrices are split into this many shards num_layers 1 numer of LSTM layers learning_rate 0.2 learning rate for optimizer max_grad_norm 10.0 maximum acceptable gradient norm for LSTM layers keep_prob 0.9 dropout keep probability optimizer 0 which optimizer to use: Adagrad(0), Momentum(1), Adam(2), RMSProp(3), SGD(4) vocab_size 793470 vocabluary size emb_size 512 size of the embedding (should be same as projected_size) state_size 2048 LSTM cell size projected_size 512 LSTM projection size num_sampled 8192 training uses sampled softmax, number of samples) do_summaries False generate weight and grad stats for Tensorboard max_time 180 max time (in seconds) to run fact_size to use F LSTM cell, this should be set to factor size num_of_groups 0 to use G LSTM cell, this should be set to number of groups save_model_every_min 30 how often to checkpoint save_summary_every_min 16 how often to save summaries use_residual False whether to use LSTM residual connections Feedback Forked code and GLSTM/FLSTM cells: okuchaiev@nvidia.com. References 1 Factorization tricks for LSTM networks , ICLR 2017 workshop. 2 Exploring the Limits of Language Modeling",Language Modelling,Language Modelling 2071,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Convolutional neural networks for computer vision Build Status GitHub License Python Version This repo is used to research convolutional networks for task of computer vision. For this purpose, the repo contains (re)implementations of various classification and segmentation models and scripts for training/evaluating/converting. The following frameworks are used: MXNet/Gluon ( info ), PyTorch ( info ), Chainer ( info ), Keras ( info ), TensorFlow ( info ). For each supported framework, there is a PIP package containing pure models without auxiliary scripts. List of packages: gluoncv2 for Gluon, pytorchcv for PyTorch, chainercv2 for Chainer, kerascv for Keras, tensorflowcv for TensorFlow. Currently, models are mostly implemented on Gluon and then ported to other frameworks. Some models are pretrained on ImageNet 1K , CIFAR 10/100 , SVHN , Pascal VOC2012 , ADE20K , Cityscapes , and COCO datasets. All pretrained weights are loaded automatically during use. See examples of such automatic loading of weights in the corresponding sections of the documentation dedicated to a particular package: Gluon models (gluon/README.md), PyTorch models (pytorch/README.md), Chainer models (chainer_/README.md), Keras models (keras_/README.md), TensorFlow models (tensorflow_/README.md). Installation To use training/evaluating scripts as well as all models, you need to clone the repository and install dependencies: git clone git@github.com:osmr/imgclsmob.git pip install r requirements.txt Table of implemented classification models Some remarks: Repo is an author repository, if it exists. A , B , C , and D means the implementation of a model for ImageNet 1K, CIFAR 10, CIFAR 100, and SVHN, respectively. A+ , B+ , C+ , and D+ means having a pre trained model for corresponding datasets. Model Gluon (gluon/README.md) PyTorch (pytorch/README.md) Chainer (chainer_/README.md) Keras (keras_/README.md) TensorFlow (tensorflow_/README.md) Paper Repo Year AlexNet A+ A+ A+ A+ A+ link link 2012 ZFNet A A A link 2013 VGG A+ A+ A+ A+ A+ link 2014 BN VGG A+ A+ A+ A+ A+ link 2015 BN Inception A+ A+ A+ link 2015 ResNet A+B+C+D+ A+B+C+D+ A+B+C+D+ A+ A+ link link 2015 PreResNet A+B+C+D+ A+B+C+D+ A+B+C+D+ A+ A+ link link 2016 ResNeXt A+B+C+D+ A+B+C+D+ A+B+C+D+ A+ A+ link link 2016 SENet A+ A+ A+ A+ A+ link link 2017 SE ResNet A+ A+ A+ A+ A+ link link 2017 SE PreResNet A A A A A link link 2017 SE ResNeXt A+ A+ A+ A+ A+ link link 2017 IBN ResNet A+ A+ link link 2018 IBN ResNeXt A+ A+ link link 2018 IBN DenseNet A+ A+ link link 2018 AirNet A+ A+ A+ link link 2018 AirNeXt A+ A+ A+ link link 2018 BAM ResNet A+ A+ A+ link link 2018 CBAM ResNet A+ A+ A+ link link 2018 ResAttNet A A A link link 2017 SKNet A A A link link 2019 PyramidNet A+B+C+D+ A+B+C+D+ A+B+C+D+ link link 2016 DiracNetV2 A+ A+ A+ link link 2017 ShaResNet A A A link link 2017 CRU Net A+ link link 2018 DenseNet A+B+C+D+ A+B+C+D+ A+B+C+D+ A+ A+ link link 2016 CondenseNet A+ A+ A+ link link 2017 SparseNet A A A link link 2018 PeleeNet A+ A+ A+ link link 2018 Oct ResNet ABCD A A link 2019 Res2Net A link 2019 WRN A+B+C+D+ A+B+C+D+ A+B+C+D+ link link 2016 WRN 1bit B+C+D+ B+C+D+ B+C+D+ link link 2018 DRN C A+ A+ A+ link link 2017 DRN D A+ A+ A+ link link 2017 DPN A+ A+ A+ link link 2017 DarkNet Ref A+ A+ A+ A+ A+ link link DarkNet Tiny A+ A+ A+ A+ A+ link link DarkNet 19 A A A A A link link DarkNet 53 A+ A+ A+ A+ A+ link link 2018 ChannelNet A A A A link link 2018 iSQRT COV ResNet A A link link 2017 RevNet A link link 2017 i RevNet A+ A+ A+ link link 2018 BagNet A+ A+ A+ link link 2019 DLA A+ A+ A+ link link 2017 MSDNet A AB link link 2017 FishNet A+ A+ A+ link link 2018 ESPNetv2 A+ A+ A+ link link 2018 X DenseNet AB+C+D+ AB+C+D+ AB+C+D+ link link 2017 SqueezeNet A+ A+ A+ A+ A+ link link 2016 SqueezeResNet A+ A+ A+ A+ A+ link 2016 SqueezeNext A+ A+ A+ A+ A+ link link 2018 ShuffleNet A+ A+ A+ A+ A+ link 2017 ShuffleNetV2 A+ A+ A+ A+ A+ link 2018 MENet A+ A+ A+ A+ A+ link link 2018 MobileNet A+ A+ A+ A+ A+ link link 2017 FD MobileNet A+ A+ A+ A+ A+ link link 2018 MobileNetV2 A+ A+ A+ A+ A+ link link 2018 IGCV3 A+ A+ A+ A+ A+ link link 2018 MnasNet A+ A+ A+ A+ A+ link 2018 DARTS A+ A+ A+ link link 2018 ProxylessNAS A+ A+ A+ link link 2018 Xception A+ A+ A+ link link 2016 InceptionV3 A+ A+ A+ link link 2015 InceptionV4 A+ A+ A+ link link 2016 InceptionResNetV2 A+ A+ A+ link link 2016 PolyNet A+ A+ A+ link link 2016 NASNet Large A+ A+ A+ link link 2017 NASNet Mobile A+ A+ A+ link link 2017 PNASNet Large A+ A+ A+ link link 2017 NIN B+C+D+ B+C+D+ B+C+D+ link link 2013 RoR 3 B+C+D+ B+C+D+ B+C+D+ link 2016 RiR B+C+D+ B+C+D+ B+C+D+ link 2016 ResDrop ResNet BCD BCD BCD link link 2016 Shake Shake ResNet B+C+D+ B+C+D+ B+C+D+ link link 2017 ShakeDrop ResNet BCD BCD BCD link 2018 FractalNet BC BC link link 2016 Table of implemented segmentation models Some remarks: A corresponds to Pascal VOC2012. B corresponds to Pascal ADE20K. C corresponds to Pascal Cityscapes. D corresponds to Pascal COCO. Model Gluon (gluon/README.md) PyTorch (pytorch/README.md) Chainer (chainer_/README.md) Keras (keras_/README.md) TensorFlow (tensorflow_/README.md) Paper Repo Year PSPNet A+B+C+D+ A+B+C+D+ A+B+C+D+ link 2016 DeepLabv3 A+B+CD+ A+B+CD+ A+B+CD+ link 2017 FCN 8s(d) A+B+CD+ A+B+CD+ A+B+CD+ link 2014",Language Modelling,Language Modelling 2138,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Language Modelling Transfer Learning in Langage Modelling (Classification) Implementation of Universal Language Model Fine tuning for Text Classification (Fast.ai). It's used as default now. Blog on ULMFit at Intel IDZ Transformer Support has been added Prerequisites: Tensorop For Installation, visit docs for transformers ( ) Usage: from tensorop import from tensorop.nlp import lm Language_Model('gpt2') lm.run_model() It will ask for the prompt Support for Transformer XL,BERT is WIP Requirements: Pytorch Numpy Python 3.x Fast.ai lib Acquire the Repo shell $ git clone Contributions Contributions are always welcome in the form of pull requests with explanatory comments.",Language Modelling,Language Modelling 2142,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Neural Language Models This repository contains neural language model implementations trained and tested on Penn Treebank. 1. Multi layer LSTM with Dropout : The link to the notebook is here . It receives perplexity around 80.6 on test set on default parameters. 2. Gated Convolutional Networks with Residual Connections : The link to the notebook is here . It receives perplexity around 70.9 on test set on default parameters. GCNN trains a lot faster than LSTM, due to stacked convolutions performaing parallely. However, this implementation is currently done for fixed word lengths. I am still unclear how to approach for variable lengths. Requirements You will need Pytorch 0.4 and Python 3.5 to run this. How to run 1. For LSTM code simply run like python3 rnn.py 2. For GCNN code simply run like python3 gcnn.py References LSTM: 1. Pytorch Language Model 2. Offical Pytorch Tutorial on LSTM GCNN: 1. Language Modeling with Gated Convolutional Networks on arXiv 2. Unofficial implementation 1 of GCNN 3. Unofficial implementation 2 of GCNN",Language Modelling,Language Modelling 2186,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Differentiable Architecture Search Code accompanying the paper > DARTS: Differentiable Architecture Search \ > Hanxiao Liu, Karen Simonyan, Yiming Yang.\ > _arXiv:1806.09055_. The algorithm is based on continuous relaxation and gradient descent in the architecture space. It is able to efficiently design high performance convolutional architectures for image classification (on CIFAR 10 and ImageNet) and recurrent architectures for language modeling (on Penn Treebank and WikiText 2). Only a single GPU is required. Requirements Python > 3.5.5, PyTorch 0.3.1, torchvision 0.2.0 NOTE: PyTorch 0.4 is not supported at this moment and would lead to OOM. Datasets Instructions for acquiring PTB and WT2 can be found here . While CIFAR 10 can be automatically downloaded by torchvision, ImageNet needs to be manually downloaded (preferably to a SSD) following the instructions here . Pretrained models The easist way to get started is to evaluate our pretrained DARTS models. CIFAR 10 ( cifar10_model.pt ) cd cnn && python test.py auxiliary model_path cifar10_model.pt Expected result: 2.63% test error rate with 3.3M model params. PTB ( ptb_model.pt ) cd rnn && python test.py model_path ptb_model.pt Expected result: 55.68 test perplexity with 23M model params. ImageNet ( imagenet_model.pt ) cd cnn && python test_imagenet.py auxiliary model_path imagenet_model.pt Expected result: 26.7% top 1 error and 8.7% top 5 error with 4.7M model params. Architecture search (using small proxy models) To carry out architecture search using 2nd order approximation, run cd cnn && python train_search.py unrolled for conv cells on CIFAR 10 cd rnn && python train_search.py unrolled for recurrent cells on PTB Note the _validation performance in this step does not indicate the final performance of the architecture_. One must train the obtained genotype/architecture from scratch using full sized models, as described in the next section. Also be aware that different runs would end up with different local minimum. To get the best result, it is crucial to repeat the search process with different seeds and select the best cell(s) based on validation performance (obtained by training the derived cell from scratch for a small number of epochs). Please refer to fig. 3 and sect. 3.2 in our arXiv paper. Figure: Snapshots of the most likely normal conv, reduction conv, and recurrent cells over time. Architecture evaluation (using full sized models) To evaluate our best cells by training from scratch, run cd cnn && python train.py auxiliary cutout CIFAR 10 cd rnn && python train.py PTB cd rnn && python train.py data ../data/wikitext 2 \ WT2 dropouth 0.15 emsize 700 nhidlast 700 nhid 700 wdecay 5e 7 cd cnn && python train_imagenet.py auxiliary ImageNet Customized architectures are supported through the arch flag once specified in genotypes.py . The CIFAR 10 result at the end of training is subject to variance due to the non determinism of cuDNN back prop kernels. _It would be misleading to report the result of only a single run_. By training our best cell from scratch, one should expect the average test error of 10 independent runs to fall in the range of 2.76 +/ 0.09% with high probability. Figure: Expected learning curves on CIFAR 10 (4 runs), ImageNet and PTB. Visualization Package graphviz is required to visualize the learned cells python visualize.py DARTS where DARTS can be replaced by any customized architectures in genotypes.py . Citation If you use any part of this code in your research, please cite our paper : @article{liu2018darts, title {DARTS: Differentiable Architecture Search}, author {Liu, Hanxiao and Simonyan, Karen and Yang, Yiming}, journal {arXiv preprint arXiv:1806.09055}, year {2018} }",Language Modelling,Language Modelling 2197,Natural Language Processing,Natural Language Processing,Natural Language Processing,"lm The codebase implements LSTM language model baseline from The code supports running on the machine with multiple GPUs using synchronized gradient updates (which is the main difference with the paper). The code was tested on a box with 8 Geforce Titan X and LSTM 2048 512 (default configuration) can process up to 100k words per second. The perplexity on the holdout set after 5 epochs is about 48.7 (vs 47.5 in the paper), which can be due to slightly different hyper parameters. It takes about 16 hours to reach these results on 8 Titan Xs. DGX 1 is about 30% faster on the baseline model. Dependencies Anaconda TensorFlow 0.10 Python 3.5 (should work with 2.7 but haven't tested it recently) 1B Word Benchmark Dataset tmux (the start script opens up a tmux session with multiple windows) To run Assuming the data directory is in: /home/rafal/datasets/lm1b/ , execute: python single_lm_run.py datadir /home/rafal/datasets/lm1b/ logdir It'll start a tmux session and you can connect to it with: tmux a . It should contain several windows: (window:0) training worker (window:1) evaluation script (window:2) tensorboard (window:3) htop The scripts above executes the following commands, which can be run manually: CUDA_VISIBLE_DEVICES 0,1,2,3,4,5,6,7 python single_lm_train.py logdir num_gpus 8 datadir CUDA_VISIBLE_DEVICES python single_lm_train.py logdir mode eval_test_ave datadir tensorboard logdir port 12012 Please note that this assumes the user has 8 GPUs available. Changing the CUDA_VISIBLE_DEVICES mask and num_gpus flag to something else will work but the training will obviously be slower. Results can be monitored using TensorBoard, listening on port 12012. To change hyper parameters The command accepts and additional argument hpconfig which allows to override various hyper parameters, including: batch_size 128 batch size num_steps 20 number of unrolled LSTM steps num_shards 8 embedding and softmax matrices are split into this many shards num_layers 1 number of LSTM layers learning_rate 0.2 learning rate for adagrad max_grad_norm 10.0 maximum acceptable gradient norm keep_prob 0.9 for dropout between layers (here: 10% dropout before and after each LSTM layer) emb_size 512 size of the embedding state_size 2048 LSTM state size projected_size 512 LSTM projection size num_sampled 8192 number of word target samples for IS objective during training To run a version of the model with 2 layers and 4096 state size, simply call: python single_lm_run.py datadir /home/rafal/datasets/lm1b/ logdir hpconfig num_layers 2,state_size 4096 Feedback Let me know if you have any questions or comments at rafjoz@gmail.com",Language Modelling,Language Modelling 2214,Natural Language Processing,Natural Language Processing,Natural Language Processing,"PyTorch Large Scale Language Model A Large Scale PyTorch Language Model trained on the 1 Billion Word (LM1B) / (GBW) dataset Latest Results 39.98 Perplexity after 5 training epochs using LSTM Language Model with Adam Optimizer Trained in 26 hours using 1 Nvidia V100 GPU ( 5.1 hours per epoch ) with 2048 batch size ( 10.7 GB GPU memory ) Previous Results 46.47 Perplexity after 5 training epochs on a 1 layer, 2048 unit, 256 projection LSTM Language Model 3 Trained for 3 days using 1 Nvidia P100 GPU ( 12.5 hours per epoch ) Implemented Sampled Softmax and Log Uniform Sampler functions GPU Hardware Requirement Type LM Memory Size GPU w/o tied weights 9 GB Nvidia 1080 TI, Nvidia Titan X w/ tied weights 6 7 GB Nvidia 1070 or higher There is an option to tie the word embedding and softmax weight matrices together to save GPU memory. Hyper Parameters 3 Parameter Value Epochs 5 Training Batch Size 128 Evaluation Batch Size 1 BPTT 20 Embedding Size 256 Hidden Size 2048 Projection Size 256 Tied Embedding + Softmax False Layers 1 Optimizer AdaGrad Learning Rate 0.10 Gradient Clipping 1.00 Dropout 0.01 Weight Decay (L2 Penalty) 1e 6 Setup Torch Data Format 1. Download Google Billion Word Dataset for Torch Link 2. Run process_gbw.py on the train_data.th7 file to create the train_data.sid file 3. Install Cython framework and build Log_Uniform Sampler I leverage the GBW data preprocessed for the Torch framework. (See Torch GBW ) Each data tensor contains all the words in data partition. The train_data.sid file marks the start and end positions for each independent sentence. The preprocessing step and train_data.sid file speeds up loading the massive training data. Data Tensors (test_data, valid_data, train_data, train_small, train_tiny) ( words x 2) matrix (sentence id, word id) Sentence ID Tensor ( sentences x 2) matrix (start position, sentence length) Setup Original Data Format 1. Download 1 Billion Word Dataset Link The Torch Data Format loads the entire dataset at once, so it requires at least 32 GB of memory. The original format partitions the dataset into smaller chunks, but it runs slower. References 1. Exploring the Limits of Language Modeling Github 2. Factorization Tricks for LSTM networks Github 3. Efficient softmax approximation for GPUs Github 4. Candidate Sampling 5. Torch GBW 6. Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling",Language Modelling,Language Modelling 2275,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Pyramidal Recurrent Units (PRU) for Language Modeling This repository contains the source code of our paper, Pyramidal Recurrent units for language modeling , which is accepted for publication at EMNLP'18 . NOTE : Though we tested our module (PRU) on a highly competitive task of language modeling, our module is generic enough and can be used for different applications where RNNs (such as LSTMs and GRUs) are currently used, such as question answering, text classification, and machine translation. Block diagram of PRU ! PRU (images/pru.png) Downloading Language Modeling Datasets You can download the dataset by running the following script bash getdata.sh This script will download the data and place in directory named data . Training PRU on the PenTree dataset You can train PRU on the PenTree dataset (or PTB) by using following command: CUDA_VISIBLE_DEVICES 0 python main.py model PRU g 4 k 2 emsize 400 nhid 1400 data ./data/penn where model specifies the recurrent unit (either LSTM or PRU) g specifies the number of groups to be used in the grouped linear transformation, k species the number of pyramidal levels in the pyramidal transformation emsize specifies the embedding layer size nhid specifies the hidden layer size data specifies the location of the data. Please see main.py for details about other supported command line arguments. If you want to train language model using LSTM on PTB dataset, then you can do it by using the following command: CUDA_VISIBLE_DEVICES 0 python main.py model LSTM emsize 400 nhid 1000 data ./data/penn NOTE: Our implementation currently supports training on single GPU. However, you can easily update it to support multiple GPUs by using DataParallel module in PyTorch. Testing PRU on the PenTree dataset You can test the models using following command CUDA_VISIBLE_DEVICES 0 python test.py data ./data/penn weightFile batchSize 1 Please see test.py for details about command line arguments. Pretrained models on the Pentree dataset Below table compares the perplexity scores of language models with LSTM and PRU as a recurrent unit (with standard dropout only). We can see that PRU has better generalization properties than LSTMs and enables learning representations in very high dimensional space efficiently. Model g k emsize nhid Params Perplexity (val) Perplexity (test) Model Size (in MB) Model Link LSTM NA NA 400 1000 19.87 67.8 66.05 159 Link LSTM NA NA 400 1200 25.79 69.29 67.17 206 Link LSTM NA NA 400 1400 32.68 70.23 68.32 261 Link PRU 1 2 400 1000 18.97 69.99 68.06 151 Link PRU 2 2 400 1200 18.51 66.39 64.30 148 Link PRU 4 2 400 1400 18.90 64.40 62.62 151 Link NOTE : The performance of PRU can be further improved by using advanced methods such as weight dropout and dynamic evaluations . If you evaluate above pretrained models with dynamic evaluation, then you should see an improvement in perplexity by about 6 8% for both LSTM and PRU based models. Replacing LSTM with PRU in AWD LSTM , we obtained the best results, however, with fewer parameters. See our paper for more details. Pre requisite To run this code, you need to have following libraries: PyTorch We tested with v0.3.1 Python We tested our code with Pythonv3. If you are using Python v2, please feel free to make necessary changes to the code. We recommend to use Anaconda . We have tested our code on Ubuntu 16.04. Acknowledgements A large portion of this repo is borrowed from AWD LSTM repository. Citation If PRU is useful for your research, then please cite our paper. @inproceedings{mehta2018pru, title {Pyramidal Recurrent Unit for Language Modeling}, author {Sachin Mehta, Rik Koncel Kedziorski, Mohammad Rastegari, and Hannaneh Hajishirzi}, booktitle {EMNLP}, year {2018} } References 1. Hochreiter, Sepp, and Jürgen Schmidhuber. Long short term memory. Neural computation 9.8 (1997): 1735 1780. 2. Merity, Stephen, Nitish Shirish Keskar, and Richard Socher. Regularizing and optimizing LSTM language models. arXiv preprint arXiv:1708.02182 (2017).",Language Modelling,Language Modelling 2285,Natural Language Processing,Natural Language Processing,Natural Language Processing,"LSTM and QRNN Language Model Toolkit This repository contains the code used for two Salesforce Research papers: + Regularizing and Optimizing LSTM Language Models + An Analysis of Neural Language Modeling at Multiple Scales This code was originally forked from the PyTorch word level language modeling example . The model comes with instructions to train: + word level language models over the Penn Treebank (PTB), WikiText 2 (WT2), and WikiText 103 (WT103) datasets + character level language models over the Penn Treebank (PTBC) and Hutter Prize dataset (enwik8) The model can be composed of an LSTM or a Quasi Recurrent Neural Network (QRNN) which is two or more times faster than the cuDNN LSTM in this setup while achieving equivalent or better accuracy. + Install PyTorch 0.4 + Run getdata.sh to acquire the Penn Treebank and WikiText 2 datasets + Train the base model using main.py + (Optionally) Finetune the model using finetune.py + (Optionally) Apply the continuous cache pointer to the finetuned model using pointer.py If you use this code or our results in your research, please cite as appropriate: @article{merityRegOpt, title {{Regularizing and Optimizing LSTM Language Models}}, author {Merity, Stephen and Keskar, Nitish Shirish and Socher, Richard}, journal {arXiv preprint arXiv:1708.02182}, year {2017} } @article{merityAnalysis, title {{An Analysis of Neural Language Modeling at Multiple Scales}}, author {Merity, Stephen and Keskar, Nitish Shirish and Socher, Richard}, journal {arXiv preprint arXiv:1803.08240}, year {2018} } Update (June/13/2018) The codebase is now PyTorch 0.4 compatible for most use cases (a big shoutout to for a fairly comprehensive PR Mild readjustments to hyperparameters may be necessary to obtain quoted performance. If you desire exact reproducibility (or wish to run on PyTorch 0.3 or lower), we suggest using an older commit of this repository. We are still working on pointer , finetune and generate functionalities. Software Requirements Python 3 and PyTorch 0.4 are required for the current codebase. Included below are hyper parameters to get equivalent or better results to those included in the original paper. If you need to use an earlier version of the codebase, the original code and hyper parameters accessible at the PyTorch 0.1.12 release, with Python 3 and PyTorch 0.1.12 are required. If you are using Anaconda, installation of PyTorch 0.1.12 can be achieved via: conda install pytorch 0.1.12 c soumith . Experiments The codebase was modified during the writing of the paper, preventing exact reproduction due to minor differences in random seeds or similar. We have also seen exact reproduction numbers change when changing underlying GPU. The guide below produces results largely similar to the numbers reported. For data setup, run ./getdata.sh . This script collects the Mikolov pre processed Penn Treebank and the WikiText 2 datasets and places them in the data directory. Next, decide whether to use the QRNN or the LSTM as the underlying recurrent neural network model. The QRNN is many times faster than even Nvidia's cuDNN optimized LSTM (and dozens of times faster than a naive LSTM implementation) yet achieves similar or better results than the LSTM for many word level datasets. At the time of writing, the QRNN models use the same number of parameters and are slightly deeper networks but are two to four times faster per epoch and require less epochs to converge. The QRNN model uses a QRNN with convolutional size 2 for the first layer, allowing the model to view discrete natural language inputs (i.e. New York ), while all other layers use a convolutional size of 1. Finetuning Note: Fine tuning modifies the original saved model model.pt file if you wish to keep the original weights you must copy the file. Pointer note: BPTT just changes the length of the sequence pushed onto the GPU but won't impact the final result. Character level enwik8 with LSTM + python u main.py epochs 50 nlayers 3 emsize 400 nhid 1840 alpha 0 beta 0 dropoute 0 dropouth 0.1 dropouti 0.1 dropout 0.4 wdrop 0.2 wdecay 1.2e 6 bptt 200 batch_size 128 optimizer adam lr 1e 3 data data/enwik8 save ENWIK8.pt when 25 35 Character level Penn Treebank (PTB) with LSTM + python u main.py epochs 500 nlayers 3 emsize 200 nhid 1000 alpha 0 beta 0 dropoute 0 dropouth 0.25 dropouti 0.1 dropout 0.1 wdrop 0.5 wdecay 1.2e 6 bptt 150 batch_size 128 optimizer adam lr 2e 3 data data/pennchar save PTBC.pt when 300 400 Word level WikiText 103 (WT103) with QRNN + python u main.py epochs 14 nlayers 4 emsize 400 nhid 2500 alpha 0 beta 0 dropoute 0 dropouth 0.1 dropouti 0.1 dropout 0.1 wdrop 0 wdecay 0 bptt 140 batch_size 60 optimizer adam lr 1e 3 data data/wikitext 103 save WT103.12hr.QRNN.pt when 12 model QRNN Word level Penn Treebank (PTB) with LSTM The instruction below trains a PTB model that without finetuning achieves perplexities of approximately 61.2 / 58.8 (validation / testing), with finetuning achieves perplexities of approximately 58.8 / 56.5 , and with the continuous cache pointer augmentation achieves perplexities of approximately 53.2 / 52.5 . + python main.py batch_size 20 data data/penn dropouti 0.4 dropouth 0.25 seed 141 epoch 500 save PTB.pt + python finetune.py batch_size 20 data data/penn dropouti 0.4 dropouth 0.25 seed 141 epoch 500 save PTB.pt + python pointer.py data data/penn save PTB.pt lambdasm 0.1 theta 1.0 window 500 bptt 5000 Word level Penn Treebank (PTB) with QRNN The instruction below trains a QRNN model that without finetuning achieves perplexities of approximately 60.6 / 58.3 (validation / testing), with finetuning achieves perplexities of approximately 59.1 / 56.7 , and with the continuous cache pointer augmentation achieves perplexities of approximately 53.4 / 52.6 . + python u main.py model QRNN batch_size 20 clip 0.2 wdrop 0.1 nhid 1550 nlayers 4 emsize 400 dropouth 0.3 seed 9001 dropouti 0.4 epochs 550 save PTB.pt + python u finetune.py model QRNN batch_size 20 clip 0.2 wdrop 0.1 nhid 1550 nlayers 4 emsize 400 dropouth 0.3 seed 404 dropouti 0.4 epochs 300 save PTB.pt + python pointer.py model QRNN lambdasm 0.1 theta 1.0 window 500 bptt 5000 save PTB.pt Word level WikiText 2 (WT2) with LSTM The instruction below trains a PTB model that without finetuning achieves perplexities of approximately 68.7 / 65.6 (validation / testing), with finetuning achieves perplexities of approximately 67.4 / 64.7 , and with the continuous cache pointer augmentation achieves perplexities of approximately 52.2 / 50.6 . + python main.py epochs 750 data data/wikitext 2 save WT2.pt dropouth 0.2 seed 1882 + python finetune.py epochs 750 data data/wikitext 2 save WT2.pt dropouth 0.2 seed 1882 + python pointer.py save WT2.pt lambdasm 0.1279 theta 0.662 window 3785 bptt 2000 data data/wikitext 2 Word level WikiText 2 (WT2) with QRNN The instruction below will a QRNN model that without finetuning achieves perplexities of approximately 69.3 / 66.8 (validation / testing), with finetuning achieves perplexities of approximately 68.5 / 65.9 , and with the continuous cache pointer augmentation achieves perplexities of approximately 53.6 / 52.1 . Better numbers are likely achievable but the hyper parameters have not been extensively searched. These hyper parameters should serve as a good starting point however. + python u main.py epochs 500 data data/wikitext 2 clip 0.25 dropouti 0.4 dropouth 0.2 nhid 1550 nlayers 4 seed 4002 model QRNN wdrop 0.1 batch_size 40 save WT2.pt + python finetune.py epochs 500 data data/wikitext 2 clip 0.25 dropouti 0.4 dropouth 0.2 nhid 1550 nlayers 4 seed 4002 model QRNN wdrop 0.1 batch_size 40 save WT2.pt + python u pointer.py save WT2.pt model QRNN lambdasm 0.1279 theta 0.662 window 3785 bptt 2000 data data/wikitext 2 Speed For speed regarding character level PTB and enwik8 or word level WikiText 103, refer to the relevant paper. The default speeds for the models during training on an NVIDIA Quadro GP100: + Penn Treebank (batch size 20): LSTM takes 65 seconds per epoch, QRNN takes 28 seconds per epoch + WikiText 2 (batch size 20): LSTM takes 180 seconds per epoch, QRNN takes 90 seconds per epoch The default QRNN models can be far faster than the cuDNN LSTM model, with the speed ups depending on how much of a bottleneck the RNN is. The majority of the model time above is now spent in softmax or optimization overhead (see PyTorch QRNN discussion on speed ). Speeds are approximately three times slower on a K80. On a K80 or other memory cards with less memory you may wish to enable the cap on the maximum sampled sequence length to prevent out of memory (OOM) errors, especially for WikiText 2. If speed is a major issue, SGD converges more quickly than our non monotonically triggered variant of ASGD though achieves a worse overall perplexity. Details of the QRNN optimization For full details, refer to the PyTorch QRNN repository . Details of the LSTM optimization All the augmentations to the LSTM, including our variant of DropConnect (Wan et al. 2013) termed weight dropping which adds recurrent dropout, allow for the use of NVIDIA's cuDNN LSTM implementation. PyTorch will automatically use the cuDNN backend if run on CUDA with cuDNN installed. This ensures the model is fast to train even when convergence may take many hundreds of epochs.",Language Modelling,Language Modelling 2337,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Deep Image Prior with Transfer Learning Intro Project conducted with: Anirudh Singh Shekhawat, Aishma Raghu, Michelle Sit for CSE 253 (Neural Networks for Pattern Recognition) with Prof. Gary Cottrell. This project consists in: 1) An implementation of Ulyanov et al's Deep Image Prior work . 2) An extension to transfer learning across images in inpainting To do Add pictures of results (aiming to add by mid December) Files The notebook A pdf of the notebook (since the notebook doesn't load in the browser) The pdf report in NIPS format Deep Image Prior Deep image prior is a convolutional neural network with a fixed input. The structure of the network functions as a prior for natural images; the network only trains on one image to complete an image enhancement task. We have implemented two of those tasks based on the original paper: In superresolution, the goal is to enhance an image by increasing its resolution. The output of the network has width and height equal to 4x the original image's width and size. The network's output image is downsampled (in a differentiable way) and the loss is computed against the original image itself. In inpainting, the goal is to fill in a region that is corrupted. In our implementation, we assume the filter defining the corrupt pixels is given. The network's output is the same size as the original image, and the loss is computed only for non corrupt pixels. The network hallucinates the corrupt pixels and achieves impressive results. Transfer Learning Can this set up be used more efficiently? For example, in video data, a lot of the images are very similar to each other, and if we wanted to remove subtitles from a movie, we shouldn't have to rerun the network with random weight initialization every time. Here, we tested and confirmed the hypothesis that using the weights from the previous image train faster than randomly initialized weights. Averaging over weights of successive images also yields faster learning; we experiment with a few schemes. Two possible avenues for future work: Using the weights of a network to encode in image can be used for image compression. In typical ML compression schemes, an autoencoder is used, with the encoder creating the compressed representation and the decoder unpacking it. Here the training would be the process by which the compressed representation is created, and the compressed representation would be the weights of the network. Using DARTS (Differentiable Architecture Search: to learn a network that trains faster for new images in a long sequence of images Other Subtitle Pixels Extraction: A portion of the code is dedicated to creating a filter that selects the pixels that are in the subtitle. Using this code This code was written in Google Colaboratory so it will be easiest to first run there. Some of the image files loaded from Google Drive may not be publicly accessible; please email samrsabri@gmail.com if you encounter any issues.",Language Modelling,Language Modelling 2398,Natural Language Processing,Natural Language Processing,Natural Language Processing,reference pytorch nlp framework nlp tfidf nlp tool wide & deep pytorch v1.0 fastai ENAS BERT state of the art fasttext tuning: tecent word vector alibaba x_deep_learning DeepPavlov(conversation) tensorflow models,Language Modelling,Language Modelling 2412,Natural Language Processing,Natural Language Processing,Natural Language Processing,COOP papers,Language Modelling,Language Modelling 2522,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Predefined Sparseness in Recurrent Sequence Models This repository contains code to run the exeriments presented in our paper Predefined Sparseness in Recurrent Sequence Models , presented at CoNLL 2018. The package sparse_seq contains the implementation of predefined sparse LSTM's and embedding layers, as described in that paper. rnn.py : contains SparseLSTM , a pytorch module that allows composing a sparse single layer LSTM based on elementary dense LSTM's, for a given parameter density, or given fractions in terms of input and hidden representation size For example, with reduce_in 0.5 and reduce_out 0.5 , the sparse LSTM would have the same number of trainable parameters as a dense LSTM with half the number of input and output dimensions. Next step would be rewriting SparseLSTM for running in parallel on multiple devices, to gain in speed and memory capacity compared to the dense LSTM. embedding.py : contains SparseEmbedding , a pytorch module that composes a sparse embedding layer by building the total embedding matrix as a composition of a user specified number individual trainable embedding blocks with smaller dimensions. As shown in the paper, this only behaves as intended, if the vocabulary is sorted from least to most frequent terms. Both embedding regularization mechanisms described in Merity's paper Regularizing and Optimizing LSTM Language Models are included in the code. The folders language_modeling and sequence_labeling contain the code for the language modeling and part of speech tagging experiments described in our paper. The code was developed on Python 3.6.4, with pytorch 0.4.0 (CUDA V8.0, CuDNN 6.0) and all experiments were run on a GeForce GTX 1080 core. The code is not heavily documented. I've cleaned it a little, but it's still dynamically grown research code (you know what I mean). I'll be happy to provide more detailed descriptions if needed. Don't hesitate to drop me an email if you have any questions: Language modeling experiments The language_modeling code is mostly based on but uses some parts from Given the strong dependence on hyperparameters and corresponding computational cost, we only presented results for the Merity's model AWD LSTM and its sparse counterpart. Still, the code should be ready for use with the Yang et al.'s Mixture of Softmaxes output layer, but we haven't tested it to avoid heavy hyperparameter tuning. In any case, larger language modeling datasets would present a stronger test setup. The code should work with a sparse embedding layer, but given the small relative number of embedding parameters in the setup, and to keep the analysis untangled, we only ran experiments for AWD LSTM with a sparse LSTM layer. Baseline AWD LSTM The baseline can be run from the language_modeling folder as follows (after downloading the data by running getdata.sh ): console python main.py seed 0 save logs/awd lstm python finetune save logs/awd lstm for the initial optimization run, and the finetune run, respectively. The default parameter settings can be found in the file args.py . The result, averaged over different seeds, is given in Table 1 in the paper. The sparse model with wider middle LSTM layer (1725 dimensions instead of 1150) but predefined sparseness to maintain the same number of recurrent layer parameters (also see Table 1) can be run with console python main.py sparse_mode sparse_hidden sparse_fract 0.66666 nhid 1725 save logs/awd lstm sparse python finetune.py sparse_mode sparse_hidden sparse_fract 0.66666 nhid 1725 save logs/awd lstm sparse Finally, the learning to recite experiments can be run as follows. The baseline with original dimensions and 24M parameters can be run with console python main_overfit.py save logs/awd lstm overfit dense epochs 150 lr 5 where main_overfit.py is based on main.py and args.py in which for this particular experiment all regularization parameters are set to 0. The different setups with 7.07M unknowns in Table 3 can be run as follows console python main_overfit.py save logs/awd lstm overfit dense_reduced emsize 200 nhid 575 epochs 150 lr 5 python main_overfit.py save logs/awd lstm overfit sparse1 emsize 200 sparse_mode sparse_hidden sparse_fract 0.5 epochs 150 lr 5 python main_overfit.py save logs/awd lstm overfit sparse2 emblocks 10 emdensity 0.5 sparse_mode sparse_all sparse_fract 0.5 epochs 150 lr 5 in which emblocks and emdensity configure the sparse embedding layer, whereas sparse_mode and sparse_fract configure the stacked SparseLSTM layer. Sequence labeling experiments The POS tagging baseline is based on code contributed by Frederic Godin , augmented with the SparseEmbedding 's in the sparse_seq package. Dense model with reduced dimensions (Fig. 3), e.g., for embedding size 5, for one particular setting of the regularization parameters (reported results were averaged over multiple random seeds, and tuned over a grid of hyperparameters) console python main.py emsize 5 nhid 10 epochs 50 dropouti 0.2 wdrop 0.2 save logs/pos_dense The counterpart with predefined sparse embedding layer (note that the vocab is sorted by default) console python main.py emsize 20 emb_density 0.25 emb_blocks 20 nhid 10 epochs 50 dropouti 0.2 wdrop 0.2 save logs/pos_sparse Finally, vocabulary sorting can be influenced with the flag vocab_order . Simulating the effect of inversing the vocabulary order (such that predefined sparseness in the embedding layer corresponds to shorter embeddings for more frequent terms, rather than the proposed ordering) can be done for instance as console python main.py emsize 20 emb_density 0.25 emb_blocks 20 nhid 10 epochs 50 dropouti 0.2 wdrop 0.2 vocab_order down save logs/pos_sparse_vocab_down",Language Modelling,Language Modelling 2531,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Mesh TensorFlow Model Parallelism Made Easier PyPI version GitHub Issues Contributions welcome (CONTRIBUTING.md) License Travis Introduction Mesh TensorFlow ( mtf ) is a language for distributed deep learning, capable of specifying a broad class of distributed tensor computations. The purpose of Mesh TensorFlow is to formalize and implement distribution strategies for your computation graph over your hardware/processors. For example: Split the batch over rows of processors and split the units in the hidden layer across columns of processors. Mesh TensorFlow is implemented as a layer over TensorFlow. Watch our YouTube video . Do I need Mesh TensorFlow? If you just want data parallel training (batch splitting), then you do not need Mesh TensorFlow, though Mesh TensorFlow can do this. The most common reasons for more sophisticated parallel computation are: The parameters of the model do not fit on one device e.g. a 5 billion parameter language model. An example is so large that the activations do not fit on one device. e.g. large images. TODO(noam): we still need to implement spatially partitioned convolutions Lower latency parallel inference (at batch size 1). The Mesh TensorFlow Approach to Distributed Computation A Mesh is an n dimensional array of processors, connected by a network. Each tensor is distributed (split and/or replicated) across all processors in a mesh. Tensor dimensions and mesh dimensions are named. The layouts of all tensors follow from a set of user defined layout rules which specify which tensor dimensions are split across which mesh dimensions. This ensures that the corresponding dimensions in different tensors are split in the same manner. Layouts do not affect results only performance. The implementation of an operation involves parallel computation on all processors in the mesh, and sometimes also collective communication. A processor usually just manipulates the slices of the input tensors already resident on that processor, and produces the slice of the output that goes on that processor. Getting Started Installation To install the latest stable version, run sh pip install mesh tensorflow To install the latest development version, run sh pip install e git+ Installing mesh tensorflow does not automatically install or update TensorFlow. We recommend installing it via pip install tensorflow or pip install tensorflow gpu . See TensorFlow’s installation instructions for details . If you're using a development version of Mesh TensorFlow, you may need to use TensorFlow's nightly package ( tf nightly ). Example Network (MNIST) To illustrate, let us consider a simple model for the MNIST image classification task. Our network has one hidden layer with 1024 units, and an output layer with 10 units (corresponding to the 10 digit classes). The code consists of two parts, the first describing the mathematical operations, and the second describing the devices and tensor/computation layout. For the full example, see examples/mnist.py ( TODO(noam): verify that this code works. Python tf_images is a tf.Tensor with shape 100, 28, 28 and dtype tf.float32 tf_labels is a tf.Tensor with shape 100 and dtype tf.int32 graph mtf.Graph() mesh mtf.Mesh(graph, my_mesh ) batch_dim mtf.Dimension( batch , 100) rows_dim mtf.Dimension( rows , 28) cols_dim mtf.Dimension( cols , 28) hidden_dim mtf.Dimension( hidden , 1024) classes_dim mtf.Dimension( classes , 10) images mtf.import_tf_tensor( mesh, tf_images, shape batch_dim, rows_dim, cols_dim ) labels mtf.import_tf_tensor(mesh, tf_labels, batch_dim ) w1 mtf.get_variable(mesh, w1 , rows_dim, cols_dim, hidden_dim ) w2 mtf.get_variable(mesh, w2 , hidden_dim, classes_dim ) einsum is a generalization of matrix multiplication (see numpy.einsum) hidden mtf.relu(mtf.einsum(images, w1, output_shape batch_dim, hidden_dim )) logits mtf.einsum(hidden, w2, output_shape batch_dim, classes_dim ) loss mtf.reduce_mean(mtf.layers.softmax_cross_entropy_with_logits( logits, mtf.one_hot(labels, classes_dim), classes_dim)) w1_grad, w2_grad mtf.gradients( loss , w1, w2 ) update_w1_op mtf.assign(w1, w1 w1_grad 0.001) update_w2_op mtf.assign(w2, w2 w2_grad 0.001) In the code above, we have built a Mesh TensorFlow graph, which is simply a Python structure. We have completely defined the mathematical operations. In the code below, we specify the mesh of processors and the layout of the computation. Python devices gpu:0 , gpu:1 , gpu:2 , gpu:3 mesh_shape ( all_processors , 4) layout_rules ( batch , all_processors ) mesh_impl mtf.placement_mesh_impl.PlacementMeshImpl( mesh_shape, layout_rules, devices) lowering mtf.Lowering(graph, {mesh:mesh_impl}) tf_update_ops lowering.lowered_operation(update_w1_op), lowering.lowered_operation(update_w2_op) The particular layout above implements data parallelism, splitting the batch of examples evenly across all four processors. Any Tensor with a batch dimension (e.g. images , h , logits , and their gradients) is split in that dimension across all processors, while any tensor without a batch dimension (e.g. the model parameters) is replicated identically on every processor. Alternatively, for model parallelism, we can set layout_rules ( hidden , all_processors ) . In this case, any tensor with a hidden dimension (e.g. hidden , w1 , w2 ) is split, while any other tensor (e.g. image , logits ) is fully replicated. We can even combine data parallelism and model parallelism on a 2 dimensional mesh of processors. We split the batch along one dimension of the mesh, and the units in the hidden layer along the other dimension of the mesh, as below. In this case, the hidden layer is actually tiled between the four processors, being split in both the batch and hidden_units dimensions. Python mesh_shape ( processor_rows , 2), ( processor_cols , 2) layout_rules ( batch , processor_rows ), ( hidden , processor_cols ) Where does the network communication happen? Some Mesh TensorFlow operations cause network communication. For example, an einsum (generalized matrix multiplication) is computed as follows: On each processor, compute the einsum of the slices of the two operands that are local to that processor. If no reduced out dimensions are split, then we are done. If reduced out dimensions are split, then perform an allreduce operation on the resulting slices summing across any mesh dimensions over which the reduced out dimensions are split. Where the allreduces happen depends will depend on the computation layout. For example, in a data parallel layout where the batch dimension is split, allreduces will happen when computing the parameter gradients, since this involves matrix multiplications which reduce out the batch dimension. How do I pick a layout? While results do not depend on layout (except in the realm of roundoff errors and random seeds), performance and memory consumption depend heavily on layout. Fortunately, the auto_mtf subpackage provides a method for automatically choosing a layout. For more information about what auto_mtf is doing to choose a layout, see its README (mesh_tensorflow/auto_mtf/README.md) file. Python import mesh_tensorflow.auto_mtf graph mtf.Graph() mesh mtf.Mesh(graph, my_mesh ) Insert model code here. outputs logits, loss iterable of mtf.Tensor, the outputs you're computing mesh_shape ( processor_rows , 2), ( processor_cols , 2) layout_rules mtf.auto_mtf.layout(graph, mesh_shape, outputs) It is possible for advanced users to eke out additional performance by tuning the layout (and model) further. Mesh TensorFlow helps by accumulating and printing counters of computation/communication. To start, here are some tricks/guidelines. It is illegal for two dimensions of the same tensor to be split across the same mesh dimension. For any compute intense operation (e.g. einsum), make sure that all mesh dimensions are used to split dimensions of the inputs or outputs. Otherwise, computation is duplicated. To keep the ratio of compute/communication high (i.e. not be bandwidth bound), split dimensions into large chunks. This should be familiar in the data parallelism case, where we want a large batch size per processor to avoid spending most of our time communicating. The Mesh TensorFlow Language Mesh TensorFlow (v0.0) is implemented as a Python library which can generate part of a TensorFlow graph. The user first builds a mtf.Graph (the analog of a TensorFlow graph) made up of mtf.Tensor s and mtf.Operation s. As in TensorFlow, this graph consists of simple Python objects. The user then creates a mtf.Lowering object, which lowers the mtf.Graph into TensorFlow, adding to the default TensorFlow graph. The Mesh TensorFlow language is nearly identical to TensorFlow, with the familiar notion of a Graph, Tensors, Operations, and automatic gradient computation. The principal differences are as follows: Meshes replace devices A Mesh is a n dimensional array of processors with named dimensions. Each Tensor is assigned to a Mesh , instead of a device. Tensor dimensions are named Each Tensor has a static Shape , which is a tuple of different Dimensions . A Dimension is a (name, size) pair. For example, the shape of a Tensor representing a batch of images might be: ( batch , 100), ( rows , 28 ), ( cols , 28), ( channels , 3) . Layouts A Tensor is laid out on its mesh with one slice on each processor. A Tensor layout , is an injective partial map specifying which dimensions of the tensor are (evenly) split across which dimensions of the mesh. No dimension of a tensor may be split across two dimensions of its mesh and no two dimensions of a tensor may be split across the same dimension of its mesh. The user defines a global set of layout rules in the form of (tensor dimension name, mesh dimension name) pairs. A dimension of a tensor is split across a dimension of its mesh if there is a matching rule. Example Layouts Take our example Tensor image_batch with shape: ( batch , 100), ( rows , 28 ), ( cols , 28), ( channels , 3) Assume that this Tensor is assigned to a mesh of 8 processors with shape: ( processor_rows , 2), ( processor_cols , 4) If we use an empty set of layout rules , we get no splitting. Each processor contains the whole Tensor . If we use the layout rules batch:processor_cols , then the batch dimension of the Tensor is split across the processor_cols dimension of the batch. This means that each processor contains a Tensor slice with shape 25, 28, 28, 3 . For example, processors (0, 3) and (1, 3) contain identical slices image_batch 75:100, :, :, : . If we use the layout rules rows:processor_rows;cols:processor_cols , then the image is split in two dimensions, with each processor containing one spatial tile with shape 100, 14, 7, 3 . For example, processor (0, 1) contains the slice image_batch :, 0:14, 7:14, : . Some layout rules would lead to illegal layouts: batch:processor_rows;rows:processor_rows is illegal because two tensor dimensions could not be split across the same mesh dimension. channels:processor_rows is illegal because the size of the tensor dimension is not evenly divisible by the size of the mesh dimension. Einsum Mesh TensorFlow uses Einstein summation notation, mtf.einsum(inputs, output_shape) , using the (named) Dimensions as the symbols. Matrix multiplication, broadcast, sum reduction, and transposition can all be expressed as special cases of mtf.einsum , though the familiar interfaces are also supported. The operation is lowered to slice wise tf.einsum s, followed by allreduce across any mesh dimensions corresponding to the summed out Tensor dimensions. Reshape can be expensive mtf.reshape(x, new_shape) is used to change a Tensor 's shape, potentially leading to a new tensor layout and hence network communication. CPU/GPU/TPU implementations Mesh TensorFlow works on CPU, GPU and TPU. The TPU implementation is very different from the CPU/GPU implementation. Multi CPU/GPU meshes are implemented with PlacementMeshImpl . In this case Mesh TensorFlow emits separate TensorFlow operations placed on the different devices, all in one big TensorFlow graph. TPU meshes are implemented in with SimdMeshImpl . In this case, Mesh TensorFlow emits TensorFlow operations (and communication collectives) from the perspective of one core, and this same program runs on every core, relying on the fact that each core actually performs the same operations. This piggy backs on the TPU data parallelism infrastructure, which operates the same way. This SIMD approach keeps the TensorFlow and XLA graphs from growing with the number of cores. The differences between cores are as follows: different slices of the variables (this works now) different positions in the collective communication (this works now) different slices of the infed and outfed tensors. We currently work around this by requiring that all imported/exported tensors be fully replicated. In the future, we should handle this correctly. Instructions for running on cloud tpu Note: It requires tensorflow> 1.11.0 . Prerequisite Please go through the Transformer tutorial . Create VM and TPU instance in Cloud console TODO(trandustin,ylc): update given mtf pypi package sh ctpu up name ylc mtf donut tf version nightly tpu size v2 8 zone us central1 b SSH into VM sh git clone cd mesh/ pip install user . Run the Transfomer model (no Tensor2Tensor dependencies) sh pip install tensorflow_datasets cd mesh/ DATA_DIR gs://noam mtf/data MODEL_DIR gs://noam mtf/transformer_standalone TPU noam mtf donut MODEL HPARAMS AND DIRECTORY (uncomment one) base model MODEL ./transformer/gin/model_base.gin 5B parameters (too big for this dataset, only trains with model parallelism) MODEL ./transformer/gin/model_5b.gin UNCOMMENT ONE OF THESE Data parallelism LAYOUT ./transformer/gin/layout_data_parallel.gin Model parallelism LAYOUT ./transformer/gin/layout_model_parallel.gin Data parallelism and Model Parallelism LAYOUT ./transformer/gin/layout_data_and_model_parallel.gin TRAIN python examples/transformer_standalone.py \ tpu $TPU data_dir $DATA_DIR model_dir $MODEL_DIR gin_file $MODEL \ gin_file $LAYOUT gin_param run.mode 'train' EVAL python examples/transformer_standalone.py \ tpu $TPU data_dir $DATA_DIR model_dir $MODEL_DIR gin_file $MODEL \ gin_file $LAYOUT gin_param run.mode 'evaluate' The above code will train on the LM1B language modeling benchmark, as specified in examples/transformer_standalone_defaults.gin . To train a sequence to sequence model on WMT14 en de, change utils.run.dataset to wmt_translate_ende/ende_subwords8k_t2t and set utils.run.mode to True . Note that the wmt_translate_ende/ende_subwords8k_t2t dataset was removed from TensorFlow Datasets in commit 211cb6f , so in order to train a model using this dataset you need to install a version of TFDS before this commit. Then, you can decode the WMT en de development set and evaluate it using SacreBLEU like so: INFER pip3 install sacrebleu mkdir /input /output DECODE_INPUT /home/$USER/input/ende.dev DECODE_OUTPUT /home/$USER/output/ende.dev.out /.local/bin/sacrebleu t wmt13 l en de echo src > $DECODE_INPUT python examples/transformer_standalone.py \ tpu $TPU data_dir $DATA_DIR model_dir $MODEL_DIR gin_file $MODEL \ gin_file $LAYOUT \ gin_param decode_from_file.input_filename '$DECODE_INPUT' \ gin_param decode_from_file.output_filename '$DECODE_OUTPUT' \ gin_param run.mode 'infer' Compute BLEU score for dev set cat $DECODE_OUTPUT /.local/bin/sacrebleu t wmt13 l en de tok intl Run the Transfomer model with Tensor2Tensor config sh git clone cd tensor2tensor/ pip install user . Before running the model, you need to prepare the training data and bucket for storing checkpoints. Refer to the Transformer tutorial to learn how to generate the training data and create buckets. sh CONF mtf_transformer_paper_tr_0_mesh_8 NAME ende_$CONF\_0828 MODEL mtf_transformer PROBLEM translate_ende_wmt32k_packed DATA_DIR gs://xxxx OUT_DIR gs://xxxx TPU_NAME ylc mtf donut tensor2tensor/bin/t2t trainer \ model $MODEL \ hparams_set $CONF \ problem $PROBLEM \ train_steps 10000 \ eval_steps 200 \ data_dir $DATA_DIR \ output_dir $OUT_DIR \ use_tpu True \ cloud_tpu_name $TPU_NAME Run the toy model without Tensor2Tensor dependencies This toy model contains two fully connected layers which aim to train a identity function: f(x) x. Since there are 8 TPU cores, we can arbitrary change the FLAGS.mesh_shape and FLAGS.layout to achieve different data parallelism and model parallelism strategies. sh MODEL_DIR gs://xxxx TPU_NAME ylc mtf donut 2 ways data parallelism and 4 ways model parallelism. In this configuration, we split the batch dimension into 2 cores and the hidden dimension into 4 cores. python examples/toy_model_tpu.py \ tpu $TPU \ model_dir $MODEL_DIR \ io_size 8 \ hidden_size 8 \ mesh_shape 'x:2;y:4' \ layout 'batch:x;hidden:y' 8 ways model parallelism. In this configuration, We split the hidden dimension into 8 cores. python examples/toy_model_tpu.py \ tpu $TPU \ model_dir $MODEL_DIR \ io_size 8 \ hidden_size 8 \ mesh_shape 'all:8' \ layout 'hidden:all' References > N. Shazeer, Y. Cheng, N. Parmar, D. Tran, A. Vaswani, P. Koanantakool, > P. Hawkins, H. Lee, M. Hong, C. Young, R. Sepassi, and B. Hechtman. > Mesh TensorFlow: Deep learning for supercomputers. > In _Neural Information Processing Systems_, 2018. none @inproceedings{shazeer2018mesh, author {Noam Shazeer and Youlong Cheng and Niki Parmar and Dustin Tran and Ashish Vaswani and Penporn Koanantakool and Peter Hawkins and HyoukJoong Lee and Mingsheng Hong and Cliff Young and Ryan Sepassi and Blake Hechtman}, title {{Mesh TensorFlow}: Deep Learning for Supercomputers}, booktitle {Neural Information Processing Systems}, year {2018}, }",Language Modelling,Language Modelling 2543,Natural Language Processing,Natural Language Processing,Natural Language Processing,"TensorFlow Models This repository contains a number of different models implemented in TensorFlow : The official models (official) are a collection of example models that use TensorFlow's high level APIs. They are intended to be well maintained, tested, and kept up to date with the latest stable TensorFlow API. They should also be reasonably optimized for fast performance while still being easy to read. We especially recommend newer TensorFlow users to start here. The research models are a large collection of models implemented in TensorFlow by researchers. They are not officially supported or available in release branches; it is up to the individual researchers to maintain the models and/or provide support on issues and pull requests. The samples folder (samples) contains code snippets and smaller models that demonstrate features of TensorFlow, including code presented in various blog posts. The tutorials folder (tutorials) is a collection of models described in the TensorFlow tutorials . Contribution guidelines If you want to contribute to models, be sure to review the contribution guidelines (CONTRIBUTING.md). License Apache License 2.0 (LICENSE)",Language Modelling,Language Modelling 2589,Natural Language Processing,Natural Language Processing,Natural Language Processing,"DARTS for RNN with fastai Language model on Penn Treebank using Differentiable Architecture Search (DARTS) and fastai library.\ Blog post .\ Based on DARTS: Differentiable Architecture Search by Hanxiao Liu, Karen Simonyan, Yiming Yang.\ Check out the original implementation . Requirements fastai 1.0.52.dev0 (latest as of 10th April 2019), PyTorch 1.0. Instructions 1. Run databunch_nb.ipynb to create databunch 2. Run train_search_nb.ipynb to search for genotype. 5 hours on 1 v100 gpu for 1 run.\ RNN search is sensitive to initialization so there should be several runs with different seed 3. Train that genotype from scratch on train_nb.ipynb. 1.5 days for 1600 epochs. 4. Test a model using test_nb.ipynb Pretrained model Pretrained model of DARTS_V1 genotype after 600 epochs darts_V1.pth .\ Place the file at data/models and run test_nb.ipynb. Loss 4.22, 68.0 perplexity.\ Caveat: I haven't been able to get 58.0 test perplexity like the original implementation. fastai dev version installation bash git clone cd fastai tools/run after git clone pip install e . dev git pull",Language Modelling,Language Modelling 2626,Natural Language Processing,Natural Language Processing,Natural Language Processing,"This is the codebase of the paper: Building Language Model for Text with Named Enitities, Rizwan et., al., (ACL, 18) . 1. Setting the data path accordingly, we use command python3 main.py with default params to train baseline AWD_LSTM model, and type model. uncleaned data is here . 2. To train entity composite model we use command python3 main_ori_with_type.py with default params. 3. At inference, we use inference.py file. 4. To reproduce our result, simply run command python3 inference_loaded.py of full_pretrained_project which will use the already trained models. In this version, we also show that with our joint inference schema, AWD_LSTM itself can work suffciiently well and replace the entity composite model. Also note that we used nltk tokenizer while annotationg the types in this version. So it is slightly different from our current release. 5. The corresponding data are in awd lstm lm/data folder in link shared above. 6. The uncleaned datasets are also relesead for future challenge (can be found in awd lstm lm/recipies/data/corpus in the google drive path shared above) 7. The code corpus can be found at here . Although we report basic LSTM performance in the paper, running the AWD_LSTM model on this dataset may give better result. To reproduce our result, run with inference3.py from here . Please note that, for the code corpus, as the variable scope is limited to each method, the context is initialized anew for each method instance. To train either a simple Type model or original state of art language model (i.e., both of forward and backward LSTM) main.py (python3 main.py) is used with respective data file, and for training an entity composite model main2.py ( python3 main2.py is used. inference3.py does the the joint inference laterwards. Another important note is this version does NOT support cuda. Our code does not support 'bidir' rather it computes forward and bckward seperately. If you use this code or data or our results in your research, please cite: @InProceedings{P18 1221, author Parvez, Md Rizwan and Chakraborty, Saikat and Ray, Baishakhi and Chang, Kai Wei , title Building Language Models for Text with Named Entities , booktitle Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , year 2018 , publisher Association for Computational Linguistics , pages 2373 2383 , location Melbourne, Australia , url } About Baseline (forked from the baseline source code path) AWD LSTM / AWD QRNN Language Model Averaged Stochastic Gradient Descent with Weight Dropped LSTM or QRNN This repository contains the code used for Salesforce Research 's Regularizing and Optimizing LSTM Language Models paper, originally forked from the PyTorch word level language modeling example . The model comes with instructions to train a word level language model over the Penn Treebank (PTB) and WikiText 2 (WT2) datasets, though the model is likely extensible to many other datasets. The model can be composed of an LSTM or a Quasi Recurrent Neural Network (QRNN) which is two or more times faster than the cuDNN LSTM in this setup while achieving equivalent or better accuracy. + Install PyTorch 0.2 + Run getdata.sh to acquire the Penn Treebank and WikiText 2 datasets + Train the base model using main.py + Finetune the model using finetune.py + Apply the continuous cache pointer to the finetuned model using pointer.py If you use this code or our results in your research, please cite: @article{merityRegOpt, title {{Regularizing and Optimizing LSTM Language Models}}, author {Merity, Stephen and Keskar, Nitish Shirish and Socher, Richard}, journal {arXiv preprint arXiv:1708.02182}, year {2017} } Software Requirements Python 3 and PyTorch 0.2 are required for the current codebase. Included below are hyper parameters to get equivalent or better results to those included in the original paper. If you need to use an earlier version of the codebase, the original code and hyper parameters accessible at the PyTorch 0.1.12 release, with Python 3 and PyTorch 0.1.12 are required. If you are using Anaconda, installation of PyTorch 0.1.12 can be achieved via: conda install pytorch 0.1.12 c soumith . Experiments For recipe dataset python main.py batch_size 20 data ../data/recipe_ori/ dropouti 0.4 dropouth 0.25 seed 141 epoch 50 save RCP_LSTM_ori_with_type.pt python main.py batch_size 20 data ../data/recipe_type/ dropouti 0.4 dropouth 0.25 seed 141 epoch 50 save RCP_type_LSTM_one_vocab.pt Experiments The codebase was modified during the writing of the paper, preventing exact reproduction due to minor differences in random seeds or similar. We have also seen exact reproduction numbers change when changing underlying GPU. The guide below produces results largely similar to the numbers reported. For data setup, run ./getdata.sh . This script collects the Mikolov pre processed Penn Treebank and the WikiText 2 datasets and places them in the data directory. Next, decide whether to use the QRNN or the LSTM as the underlying recurrent neural network model. The QRNN is many times faster than even Nvidia's cuDNN optimized LSTM (and dozens of times faster than a naive LSTM implementation) yet achieves similar or better results than the LSTM. At the time of writing, the QRNN models use the same number of parameters and are slightly deeper networks but are two to four times faster per epoch and require less epochs to converge. The QRNN model uses a QRNN with convolutional size 2 for the first layer, allowing the model to view discrete natural language inputs (i.e. New York ), while all other layers use a convolutional size of 1. Finetuning Note: Fine tuning modifies the original saved model model.pt file if you wish to keep the original weights you must copy the file. Pointer note: BPTT just changes the length of the sequence pushed onto the GPU but won't impact the final result. Penn Treebank (PTB) with LSTM The instruction below trains a PTB model that without finetuning achieves perplexities of approximately 61.2 / 58.8 (validation / testing), with finetuning achieves perplexities of approximately 58.8 / 56.5 , and with the continuous cache pointer augmentation achieves perplexities of approximately 53.2 / 52.5 . + python main.py batch_size 20 data data/penn dropouti 0.4 dropouth 0.25 seed 141 epoch 500 save PTB.pt + python finetune.py batch_size 20 data data/penn dropouti 0.4 dropouth 0.25 seed 141 epoch 500 save PTB.pt + python pointer.py data data/penn save PTB.pt lambdasm 0.1 theta 1.0 window 500 bptt 5000 Penn Treebank (PTB) with QRNN The instruction below trains a QRNN model that without finetuning achieves perplexities of approximately 60.6 / 58.3 (validation / testing), with finetuning achieves perplexities of approximately 59.1 / 56.7 , and with the continuous cache pointer augmentation achieves perplexities of approximately 53.4 / 52.6 . + python u main.py model QRNN batch_size 20 clip 0.2 wdrop 0.1 nhid 1550 nlayers 4 emsize 400 dropouth 0.3 seed 9001 dropouti 0.4 epochs 550 save PTB.pt + python u finetune.py model QRNN batch_size 20 clip 0.2 wdrop 0.1 nhid 1550 nlayers 4 emsize 400 dropouth 0.3 seed 404 dropouti 0.4 epochs 300 save PTB.pt + python pointer.py model QRNN lambdasm 0.1 theta 1.0 window 500 bptt 5000 save PTB.pt WikiText 2 (WT2) with LSTM The instruction below trains a PTB model that without finetuning achieves perplexities of approximately 68.7 / 65.6 (validation / testing), with finetuning achieves perplexities of approximately 67.4 / 64.7 , and with the continuous cache pointer augmentation achieves perplexities of approximately 52.2 / 50.6 . + python main.py epochs 750 data data/wikitext 2 save WT2.pt dropouth 0.2 seed 1882 + python finetune.py epochs 750 data data/wikitext 2 save WT2.pt dropouth 0.2 seed 1882 + python pointer.py save WT2.pt lambdasm 0.1279 theta 0.662 window 3785 bptt 2000 data data/wikitext 2 WikiText 2 (WT2) with QRNN The instruction below will a QRNN model that without finetuning achieves perplexities of approximately 69.3 / 66.8 (validation / testing), with finetuning achieves perplexities of approximately 68.5 / 65.9 , and with the continuous cache pointer augmentation achieves perplexities of approximately 53.6 / 52.1 . Better numbers are likely achievable but the hyper parameters have not been extensively searched. These hyper parameters should serve as a good starting point however. + python u main.py epochs 500 data data/wikitext 2 clip 0.25 dropouti 0.4 dropouth 0.2 nhid 1550 nlayers 4 seed 4002 model QRNN wdrop 0.1 batch_size 40 save WT2.pt + python finetune.py epochs 500 data data/wikitext 2 clip 0.25 dropouti 0.4 dropouth 0.2 nhid 1550 nlayers 4 seed 4002 model QRNN wdrop 0.1 batch_size 40 save WT2.pt + python u pointer.py save WT2.pt model QRNN lambdasm 0.1279 theta 0.662 window 3785 bptt 2000 data data/wikitext 2 Speed The default speeds for the models during training on an NVIDIA Quadro GP100: + Penn Treebank (batch size 20): LSTM takes 65 seconds per epoch, QRNN takes 28 seconds per epoch + WikiText 2 (batch size 20): LSTM takes 180 seconds per epoch, QRNN takes 90 seconds per epoch The default QRNN models can be far faster than the cuDNN LSTM model, with the speed ups depending on how much of a bottleneck the RNN is. The majority of the model time above is now spent in softmax or optimization overhead (see PyTorch QRNN discussion on speed ). Speeds are approximately three times slower on a K80. On a K80 or other memory cards with less memory you may wish to enable the cap on the maximum sampled sequence length to prevent out of memory (OOM) errors, especially for WikiText 2. If speed is a major issue, SGD converges more quickly than our non monotonically triggered variant of ASGD though achieves a worse overall perplexity. Details of the QRNN optimization For full details, refer to the PyTorch QRNN repository . Details of the LSTM optimization All the augmentations to the LSTM, including our variant of DropConnect (Wan et al. 2013) termed weight dropping which adds recurrent dropout, allow for the use of NVIDIA's cuDNN LSTM implementation. PyTorch will automatically use the cuDNN backend if run on CUDA with cuDNN installed. This ensures the model is fast to train even when convergence may take many hundreds of epochs.",Language Modelling,Language Modelling 2664,Natural Language Processing,Natural Language Processing,Natural Language Processing,awd lstm pytorch implementation Resources: Original Source Code Link to Paper Ppt Blog,Language Modelling,Language Modelling 2666,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Differentiable Architecture Search Code accompanying the paper > DARTS: Differentiable Architecture Search \ > Hanxiao Liu, Karen Simonyan, Yiming Yang.\ > _arXiv:1806.09055_. The algorithm is based on continuous relaxation and gradient descent in the architecture space. It is able to efficiently design high performance convolutional architectures for image classification (on CIFAR 10 and ImageNet) and recurrent architectures for language modeling (on Penn Treebank and WikiText 2). Only a single GPU is required. Requirements Python > 3.5.5, PyTorch 0.3.1, torchvision 0.2.0 NOTE: PyTorch 0.4 is not supported at this moment and would lead to OOM. Datasets Instructions for acquiring PTB and WT2 can be found here . While CIFAR 10 can be automatically downloaded by torchvision, ImageNet needs to be manually downloaded (preferably to a SSD) following the instructions here . Pretrained models The easist way to get started is to evaluate our pretrained DARTS models. CIFAR 10 ( cifar10_model.pt ) cd cnn && python test.py auxiliary model_path cifar10_model.pt Expected result: 2.63% test error rate with 3.3M model params. PTB ( ptb_model.pt ) cd rnn && python test.py model_path ptb_model.pt Expected result: 55.68 test perplexity with 23M model params. ImageNet ( imagenet_model.pt ) cd cnn && python test_imagenet.py auxiliary model_path imagenet_model.pt Expected result: 26.7% top 1 error and 8.7% top 5 error with 4.7M model params. Architecture search (using small proxy models) To carry out architecture search using 2nd order approximation, run cd cnn && python train_search.py unrolled for conv cells on CIFAR 10 cd rnn && python train_search.py unrolled for recurrent cells on PTB Note the _validation performance in this step does not indicate the final performance of the architecture_. One must train the obtained genotype/architecture from scratch using full sized models, as described in the next section. Also be aware that different runs would end up with different local minimum. To get the best result, it is crucial to repeat the search process with different seeds and select the best cell(s) based on validation performance (obtained by training the derived cell from scratch for a small number of epochs). Please refer to fig. 3 and sect. 3.2 in our arXiv paper. Figure: Snapshots of the most likely normal conv, reduction conv, and recurrent cells over time. Architecture evaluation (using full sized models) To evaluate our best cells by training from scratch, run cd cnn && python train.py auxiliary cutout CIFAR 10 cd rnn && python train.py PTB cd rnn && python train.py data ../data/wikitext 2 \ WT2 dropouth 0.15 emsize 700 nhidlast 700 nhid 700 wdecay 5e 7 cd cnn && python train_imagenet.py auxiliary ImageNet Customized architectures are supported through the arch flag once specified in genotypes.py . The CIFAR 10 result at the end of training is subject to variance due to the non determinism of cuDNN back prop kernels. _It would be misleading to report the result of only a single run_. By training our best cell from scratch, one should expect the average test error of 10 independent runs to fall in the range of 2.76 +/ 0.09% with high probability. Figure: Expected learning curves on CIFAR 10 (4 runs), ImageNet and PTB. Visualization Package graphviz is required to visualize the learned cells python visualize.py DARTS where DARTS can be replaced by any customized architectures in genotypes.py . Citation If you use any part of this code in your research, please cite our paper : @article{liu2018darts, title {DARTS: Differentiable Architecture Search}, author {Liu, Hanxiao and Simonyan, Karen and Yang, Yiming}, journal {arXiv preprint arXiv:1806.09055}, year {2018} }",Language Modelling,Language Modelling 2739,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Differentiable Architecture Search Code accompanying the paper > DARTS: Differentiable Architecture Search \ > Hanxiao Liu, Karen Simonyan, Yiming Yang.\ > _arXiv:1806.09055_. The algorithm is based on continuous relaxation and gradient descent in the architecture space. It is able to efficiently design high performance convolutional architectures for image classification (on CIFAR 10 and ImageNet) and recurrent architectures for language modeling (on Penn Treebank and WikiText 2). Only a single GPU is required. Requirements Python > 3.5.5, PyTorch 0.3.1, torchvision 0.2.0 NOTE: PyTorch 0.4 is not supported at this moment and would lead to OOM. Datasets Instructions for acquiring PTB and WT2 can be found here . While CIFAR 10 can be automatically downloaded by torchvision, ImageNet needs to be manually downloaded (preferably to a SSD) following the instructions here . Pretrained models The easist way to get started is to evaluate our pretrained DARTS models. CIFAR 10 ( cifar10_model.pt ) cd cnn && python test.py auxiliary model_path cifar10_model.pt Expected result: 2.63% test error rate with 3.3M model params. PTB ( ptb_model.pt ) cd rnn && python test.py model_path ptb_model.pt Expected result: 55.68 test perplexity with 23M model params. ImageNet ( imagenet_model.pt ) cd cnn && python test_imagenet.py auxiliary model_path imagenet_model.pt Expected result: 26.7% top 1 error and 8.7% top 5 error with 4.7M model params. Architecture search (using small proxy models) To carry out architecture search using 2nd order approximation, run cd cnn && python train_search.py unrolled for conv cells on CIFAR 10 cd rnn && python train_search.py unrolled for recurrent cells on PTB Note the _validation performance in this step does not indicate the final performance of the architecture_. One must train the obtained genotype/architecture from scratch using full sized models, as described in the next section. Also be aware that different runs would end up with different local minimum. To get the best result, it is crucial to repeat the search process with different seeds and select the best cell(s) based on validation performance (obtained by training the derived cell from scratch for a small number of epochs). Please refer to fig. 3 and sect. 3.2 in our arXiv paper. Figure: Snapshots of the most likely normal conv, reduction conv, and recurrent cells over time. Architecture evaluation (using full sized models) To evaluate our best cells by training from scratch, run cd cnn && python train.py auxiliary cutout CIFAR 10 cd rnn && python train.py PTB cd rnn && python train.py data ../data/wikitext 2 \ WT2 dropouth 0.15 emsize 700 nhidlast 700 nhid 700 wdecay 5e 7 cd cnn && python train_imagenet.py auxiliary ImageNet Customized architectures are supported through the arch flag once specified in genotypes.py . The CIFAR 10 result at the end of training is subject to variance due to the non determinism of cuDNN back prop kernels. _It would be misleading to report the result of only a single run_. By training our best cell from scratch, one should expect the average test error of 10 independent runs to fall in the range of 2.76 +/ 0.09% with high probability. Figure: Expected learning curves on CIFAR 10 (4 runs), ImageNet and PTB. Visualization Package graphviz is required to visualize the learned cells python visualize.py DARTS where DARTS can be replaced by any customized architectures in genotypes.py . Citation If you use any part of this code in your research, please cite our paper : @article{liu2018darts, title {DARTS: Differentiable Architecture Search}, author {Liu, Hanxiao and Simonyan, Karen and Yang, Yiming}, journal {arXiv preprint arXiv:1806.09055}, year {2018} }",Language Modelling,Language Modelling 2857,Natural Language Processing,Natural Language Processing,Natural Language Processing,"ENAS: PNN: Graph would be a nn.Sequential model that allows for skip connections to be registered g Graph( Conv2D(...) MaxPool2D(...) Conv2D(...) ) Generates a connector for different layer sizes g.add_link(g.layers 0 , g.layers 2 ) Mutes an input, useful for ENAS g.toggle_link(g.layers 0 , g.layers 2 ) Graphs are also expandable, new nodes are added in front. Node is a special layer that allows for on the fly hyperparam search. Unlike a layer, it specifies it's in and out dim size and initializes candidate cells to that size. Connector is responsible for reshaping and learning identity mappings across layers. World implements the game logic and actions available to the ENAS agent. DagSearchEnv is the Gym environment.",Language Modelling,Language Modelling 2891,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Breaking the Softmax Bottleneck: A High Rank Language Model This is the code we used in our paper > Breaking the Softmax Bottleneck: A High Rank RNN Language Model >Zhilin Yang\ , Zihang Dai\ , Ruslan Salakhutdinov, William W. Cohen ( : equal contribution) >Preprint 2017 Requirements Python 3.6, PyTorch 0.4.1 Notes on PyTorch update The original implementation and tuning were based on PyTorch 0.2.0. The code base has been upgraded to be compatible with 0.4.1. To exactly reproduce the results in our paper, you would need to use PyTorch 0.2.0 and do git checkout 4c43dee3f8a0aacea759c07f10d8f80dc0bb9bb2 to roll back to the previous version. Below are results of the current version on Penn Treebank as reported in . One may need further tuning to match the original results. MoS w/o finetune: Valid 58.34 Test 56.18 MoS: Valid 56.83 Test 54.64 MoS + dynamic evaluation: Valid 49.03 Test: 48.43 Download the data ./get_data.sh Train the models (to reproduce our results) Penn Treebank First, train the model python main.py data data/penn dropouti 0.4 dropoutl 0.29 dropouth 0.225 seed 28 batch_size 12 lr 20.0 epoch 1000 nhid 960 nhidlast 620 emsize 280 n_experts 15 save PTB single_gpu Second, finetune the model python finetune.py data data/penn dropouti 0.4 dropoutl 0.29 dropouth 0.225 seed 28 batch_size 12 lr 25.0 epoch 1000 nhid 960 emsize 280 n_experts 15 save PATH_TO_FOLDER single_gpu where PATH_TO_FOLDER is the folder created by the first step (concatenation of PTB with a timestamp). Third, run dynamic evaluation python dynamiceval.py model PATH_TO_FOLDER/finetune_model.pt lamb 0.075 WikiText 2 (Single GPU) First, train the model python main.py epochs 1000 data data/wikitext 2 save WT2 dropouth 0.2 seed 1882 n_experts 15 nhid 1150 nhidlast 650 emsize 300 batch_size 15 lr 15.0 dropoutl 0.29 small_batch_size 5 max_seq_len_delta 20 dropouti 0.55 single_gpu Second, finetune the model python finetune.py epochs 1000 data data/wikitext 2 save PATH_TO_FOLDER dropouth 0.2 seed 1882 n_experts 15 nhid 1150 emsize 300 batch_size 15 lr 20.0 dropoutl 0.29 small_batch_size 5 max_seq_len_delta 20 dropouti 0.55 single_gpu Third, run dynamic evaluation python dynamiceval.py data data/wikitext 2 model PATH_TO_FOLDER/finetune_model.pt epsilon 0.002 WikiText 2 (3 GPUs) This will yield the same results as using one single GPU, but will be faster. First, train the model CUDA_VISIBLE_DEVICES 0,1,2 python main.py epochs 1000 data data/wikitext 2 save WT2 dropouth 0.2 seed 1882 n_experts 15 nhid 1150 nhidlast 650 emsize 300 batch_size 15 lr 15.0 dropoutl 0.29 small_batch_size 15 max_seq_len_delta 20 dropouti 0.55 Second, finetune the model CUDA_VISIBLE_DEVICES 0,1,2 python finetune.py epochs 1000 data data/wikitext 2 save PATH_TO_FOLDER dropouth 0.2 seed 1882 n_experts 15 nhid 1150 emsize 300 batch_size 15 lr 20.0 dropoutl 0.29 small_batch_size 15 max_seq_len_delta 20 dropouti 0.55 Third, run dynamic evaluation python dynamiceval.py data data/wikitext 2 model PATH_TO_FOLDER/finetune_model.pt epsilon 0.002 Acknowledgements A large portion of this repo is borrowed from the following repos: and",Language Modelling,Language Modelling 2024,Natural Language Processing,Natural Language Processing,Natural Language Processing,"strong_s2s_baseline_parser An Empirical Study of Building a Strong Baseline for Constituency Parsing @inproceedings{P18 2097, title An Empirical Study of Building a Strong Baseline for Constituency Parsing , author Suzuki, Jun and Takase, Sho and Kamigaito, Hidetaka and Morishita, Makoto and Nagata, Masaaki , booktitle Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) , month July , year 2018 , address Melbourne, Australia , publisher Association for Computational Linguistics , url pages 612 618 , } Requirement chainer see instruction Preparing Dataset Obtain, a modified version of ptbconv 3.0 bash git clone cd ptbconv 3.0 ./configure make Convert format of Penn Treebank3 .mrg by ptbconv 3.0 copy ptb3 for i in 00 01 02 03 04 05 06 07 08 09 seq 10 24 ;do \ cat /path to ptb3/${i}/WSJ_ .MRG > tmp/sec.${i}.mrg ;\ done get raw sentences and pos sequences via the ptbconv dependency format for i in 00 01 02 03 04 05 06 07 08 09 seq 10 24 ;do \ cat tmp/sec.${i}.mrg ptbconv 3.0/ptbconv D > tmp/sec.${i}.dep.txt ;\ cat tmp/sec.${i}.dep.txt perl scripts/get_column.pl 0 3 > tmp/sec.${i}.sent ;\ cat tmp/sec.${i}.dep.txt perl scripts/get_column.pl 1 3 > tmp/sec.${i}.pos ;\ done remove function tags and empty symbols, and then conver word >XX format for i in 00 01 02 03 04 05 06 07 08 09 seq 10 24 ;do \ cat tmp/sec.${i}.mrg perl scripts/remove_function_none_tag.pl scripts/strip start end bracket.sh > tmp/sec.${i}.cnt.txt ;\ perl scripts/encode.pl tmp/sec.${i}.cnt.txt tmp/sec.${i}.const ;\ cat tmp/sec.${i}.const scripts/add start end marker.sh > tmp/sec.${i}.se.const ;\ done finalize input files (word sequence files) for i in 02 03 04 05 06 07 08 09 seq 10 21 ;do \ cat tmp/sec.${i}.sent ;\ done > data/sec.02 21.sent cp tmp/sec.22.sent tmp/sec.23.sent data finalize output files (words with brackets) {pfor i in 02 03 04 05 06 07 08 09 seq 10 21 ;do \ cat tmp/sec.${i}.se.const ;\ done > data/sec.02 21.se.const cp tmp/sec.22.se.const data make output file with pos info cat data/sec.02 21.se.const perl scripts/combine_pos.pl data/sec.02 21.pos2 > data/sec.02 21.wposA.se.const cat data/sec.22.se.const perl scripts/combine_pos.pl data/sec.22.pos2 > data/sec.22.wposA.se.const make gold data for evaluation cat tmp/sec.22.cnt.txt perl pe 'chomp; $_ (TOP .$_. )\n ' > data/sec.22.gold cat tmp/sec.23.cnt.txt perl pe 'chomp; $_ (TOP .$_. )\n ' > data/sec.23.gold copy pos files for also evaluation cp tmp/sec.22.pos tmp/sec.23.pos data Get subword nmt for obtaining subword information bash git clone Make input files bash /path subword nmt/learn_bpe.py s 1000 data/sec.02 21.sent.bpe1000.dict /path subword nmt/apply_bpe.py c data/sec.02 21.sent.bpe1000.dict data/sec.02 21.sent.bpe1000 /path subword nmt/apply_bpe.py c data/sec.02 21.sent.bpe1000.dict data/sec.22.sent.bpe1000 /path subword nmt/apply_bpe.py c data/sec.02 21.sent.bpe1000.dict data/sec.23.sent.bpe1000 perl scripts/combine_bpe.pl data/sec.02 21.sent data/sec.22.sent_bpe1000 > data/sec.02 21.sent_w_bpe1000_wunk perl scripts/combine_bpe.pl data/sec.22.sent data/sec.22.sent_bpe1000 > data/sec.22.sent_w_bpe1000_wunk perl scripts/combine_bpe.pl data/sec.23.sent data/sec.23.sent_bpe1000 > data/sec.23.sent_w_bpe1000_wunk Run training/evaluation code Get the mlpnlp nmt code for training/test encoder decoder model bash git clone cd mlpnlp nmt git checkout for_parser Make vocab files bash cat data/sec.02 21.sent_w_bpe1000_wunk data/sec.22.sent_w_bpe1000_wunk data/sec.23.sent_w_bpe1000_wunk perl pe 's/\ \ \ / /g' python /path to mlpnlp nmt/count_freq.py 0 grep v > data/all.sent_w_bpe1000.vocab cat data/sec.02 21.se.const python /path to mlpnlp nmt/count_freq.py 0 > data/sec.02 21.se.const.vocab cat data/sec.02 21.wposA.se.const python /path to mlpnlp nmt/count_freq.py 0 > data/sec.02 21.wposA.se.const.vocab Obtain evalb to evaluate parser performance Run training/evaluation script bash ./train_PTB_enc_dec_0508_wbpeU_wXX_wPA_woglove.sh 0 2720 10 models bash for SEED in 2720 2721 2722 2723 2724 2725 2726 2727 2728 2729 ;do \ ./train_PTB_enc_dec_0508_wbpeU_wXX_wPA_woglove.sh 0 ${SEED} ;\ done Ensemble (evaluation) bash DIR models_encdec_mb16_SGD_e300h200L2_gc1_wbpeU_wXX_wPA_wog_wTying_wMergeFWBW_rs ; \ ./train_PTB_enc_dec_0508_ENSEMBLE.sh 0 \ ${DIR}2720/model.setting \ ${DIR}2720/model.epoch100:${DIR}2721/model.epoch100:${DIR}2722/model.epoch100:${DIR}2723/model.epoch100:${DIR}2724/model.epoch100:${DIR}2725/model.epoch100:${DIR}2726/model.epoch100:${DIR}2727/model.epoch100 \ ${DIR}2720/ \ data/sec.22.sent_w_bpe1000_wunk \ data/sec.23.sent_w_bpe1000_wunk",Constituency Parsing,NLP Other 2030,Natural Language Processing,Natural Language Processing,Natural Language Processing,"A Multilayer Convolutional Encoder Decoder Neural Network for Grammatical Error Correction Code and model files for the paper : A Multilayer Convolutional Encoder Decoder Neural Network for Grammatical Error Correction (In AAAI 18). If you use any part of this work, make sure you include the following citation: @InProceedings{chollampatt2018mlconv, author {Chollampatt, Shamil and Ng, Hwee Tou}, title {A Multilayer Convolutional Encoder Decoder Neural Network for Grammatical Error Correction}, booktitle {Proceedings of the Thirty Second AAAI Conference on Artificial Intelligence}, month {February}, year {2018}, } Setting Up 1. Clone this repository. 2. Download the pre requisite software: Fairseq py Subword NMT N best Reranker (Requires KenLM Python module) NOTE : For training and evaluation of the models, we suggest that you download the exact revisions of the above software. Go to software/ directory and run download.sh directory to download the exact revisions of these software. 3. Compile and install Fairseq py. For testing with pre trained models 1. Go to data/ directory and run prepare_test_data.sh script to download and process CoNLL 2014 test dataset 2. Go to models/ directory and run download.sh to download the required model files 3. For running the system, run the run.sh script with the following format ./run.sh : path to tokenized input data : typically 0,1,2 etc to be used with the environment variable CUDA_VISIBLE_DEVICES : could be the path to a single model file or a directory having multiple model files alone. You can also run the script by adding optional arguments for re ranking ./run.sh : path to trained feature weights for the re ranker (within models/reranker_weights : use 'eo' for edit operation features, and 'eolm' for both edit operations and language model features. For training from scratch Data Preparation 1. Update the paths to NUCLE_TAR and LANG8V2 within prepare_data.sh 2. Run the script prepare_data.sh from within data/ directory. ( NOTE : To get the exact data you may need to use LangID.py v1.1.6 for language filtering and NLTK v2.0b7 for tokenization. The prepared training data ( data/train.tok.{src,trg} ) will have 2210277 sentence pairs with 26,557,233 source tokens and 30,028,798 target tokens). Training For training, download the version of Fairseq py In the training/ directory, within the preprocess.sh script, place paths to the the training datasets and development datasets. The source and target files must be tokenized. 1. Go to training/ directory 2. Run ./preprocess.sh script 3. To train the models without pre trainined embeddings use the train.sh script. To train the models with pre trained word embeddings use the train_embed.sh script. ( NOTE : The pre trained embeddings are trained using Wikipedia data segmented using the released BPE model. If your training data and BPE model are different, we suggest that you pre train fastText embeddings on Wikipedia text segmented with your own BPE model and modify the paths within the script accordingly.) 4. To train the re ranker, you would additionally need to have compiled Moses software. Run train_reranker.sh script with the following arguments: ./train_reranker.sh : directory to store temporary files and final output weights.txt file. 5. Run the trained model from within training/ directory using the script run_trained_model.py . License The code and models in this repository are licensed under the GNU General Public License Version 3. For commercial use of this code and models, separate commercial licensing is also available. Please contact: Shamil Chollampatt (shamil@u.nus.edu) Hwee Tou Ng (nght@comp.nus.edu.sg)",Grammatical Error Correction,NLP Other 2031,Natural Language Processing,Natural Language Processing,Natural Language Processing,"NeuQE (Neural Quality Estimation) Neural quality estimation toolkit which can be used for natural language generation tasks such as grammatical error correction, machine translation, simplification, and summarization. The source code is repository was used in this paper: Neural Quality Estimation Models for Grammatical Error Correction (EMNLP 2018). If you use this code for your work, please cite this paper : @InProceedings{chollampatt2018neuqe, title {Neural Quality Estimation of Grammatical Error Correction}, authors {Chollampatt, Shamil and Ng, Hwee Tou}, booktitle {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing}, month {November}, year {2018}, address {Brussels, Belgium} } Prerequisites Python 3 PyTorch 0.3 Training Pre training the predictor model To train the predictor models, use the script train_predictor.py . See the available options by using help flag For example, the CNN based predictor for EMNLP 2018 GEC QE was trained using the following command: python train_predictor.py train $TRAIN_PATH_PREFIX valid $VALID_PATH_PREFIX ssuf src tsuf trg \ arch cnn nsvocab 30000 ntvocab 30000 \ nslayers 7 ntlayers 7 skwidth 3 tkwidth 3 \ nhid 700 nsembed 500 ntembed 500 \ nepochs 10 bsize 64 lrate 1.0 cnorm 5.0 maxslen 50 maxtlen 50 \ logafter 1000 outdir $MODEL_OUT_DIR For training the RNN based predictor, the following command was used: python train_predictor.py train $TRAIN_PATH_PREFIX valid $VALID_PATH_PREFIX ssuf src tsuf trg \ arch rnn nsvocab 30000 ntvocab 30000 \ nhid 700 nsembed 500 ntembed 500 \ nepochs 10 bsize 64 lrate 1.0 cnorm 5.0 maxslen 50 maxtlen 50 \ logafter 1000 outdir $MODEL_OUT_DIR Training the estimator model To train the estimator model, use the script train_estimator.py . See the available options by using the help flag. For training the estimator for EMNLP 2018 GEC QE model, the following command was used ( $ARCH can be cnn or rnn ): python train_estimator.py \ train $QE_TRAIN_DATA_PATH_PREFIX \ valid $QE_VALID_DATA_PATH_PREFIX \ ssuf src hsuf hyp scoresuf $SCORE_SUFFIX \ pmodel $PRED_MODEL_PATH \ arch $ARCH nhid 100 qvectype pre \ opt adam lrate 0.0005 bsize 32 validbsize 1 do 0.5 nepochs 50 \ metrics pc mae rmse outdir $EST_MODEL_OUT_DIR The GEC system used for generating system hypotheses for training the EMNLP 2018 GEC QE system was a multilayer convolutional sequence to sequence model trained on Lang 8 ( code ). The downstream GEC system that was improved using QE scores also used the same underlying architecture with additional techniques described in the paper . Testing To test the estimator model, use the script test_predictor_estimator . An example is shown below: python test_predictor_estimator.py \ test $QE_TEST_DATA_PATH_PREFIX \ ssuf src hsuf hyp scoresuf $SCORE_SUFFIX \ pemodel $PRED_MODEL_PATH $EST_MODEL_PATH metrics pc rmse outdir $OUT_DIR If you want to use multiple estimators while testing, use multiple pemodel flags specifying the paths to each predictor estimator model pair. Pre trained Models For downloading the pre trained models used for quality estimation of grammatical error correction for EMNLP 2018 paper, run the download_models.sh script inside examples/gec_emnlp18/ directory. License The source code is licensed under GNU GPL 3.0 (see LICENSE (LICENSE.md)) for non commerical use. For commercial use of this code, separate commercial licensing is also available. Please contact: Shamil Chollampatt (shamil@u.nus.edu) Hwee Tou Ng (nght@comp.nus.edu.sg)",Grammatical Error Correction,NLP Other 2035,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Passage Re ranking with BERT Introduction \ \ \ \ \ Most of the code in this repository was copied from the original BERT repository . \ \ \ \ \ This repository contains the code to reproduce our entry to the MSMARCO passage ranking task , which was placed first with a large margin over the second place. It also contains the code to reproduce our result on the TREC CAR dataset , which is 22 MAP points higher than the best entry from 2017 and a well tuned BM25. MSMARCO Passage Re Ranking Leaderboard (Jan 8th 2019) Eval MRR@10 Dev MRR@10 : : : : 1st Place BERT (this code) 35.87 36.53 2nd Place IRNet 28.06 27.80 3rd Place Conv KNRM 27.12 29.02 TREC CAR Test Set (Automatic Annotations) MAP : : BERT (this code) 33.5 BM25 Anserini 15.6 MacAvaney et al., 2017 (TREC CAR 2017 Best Entry) 14.8 The paper describing our implementation is here . MS MARCO Download and extract the data First, we need to download and extract MS MARCO and BERT files: DATA_DIR ./data mkdir ${DATA_DIR} wget P ${DATA_DIR} wget P ${DATA_DIR} wget P ${DATA_DIR} wget P ${DATA_DIR} wget P ${DATA_DIR} tar xvf ${DATA_DIR}/triples.train.small.tar.gz C ${DATA_DIR} tar xvf ${DATA_DIR}/top1000.dev.tar.gz C ${DATA_DIR} tar xvf ${DATA_DIR}/top1000.eval.tar.gz C ${DATA_DIR} unzip ${DATA_DIR}/uncased_L 24_H 1024_A 16.zip d ${DATA_DIR} Convert MS MARCO to TFRecord format Next, we need to convert MS MARCO train, dev, and eval files to TFRecord files, which will be later consumed by BERT. mkdir ${DATA_DIR}/tfrecord python convert_msmarco_to_tfrecord.py \ output_folder ${DATA_DIR}/tfrecord \ vocab_file ${DATA_DIR}/uncased_L 24_H 1024_A 16/vocab.txt \ train_dataset_path ${DATA_DIR}/triples.train.small.tsv \ dev_dataset_path ${DATA_DIR}/top1000.dev.tsv \ eval_dataset_path ${DATA_DIR}/top1000.eval.tsv \ dev_qrels_path ${DATA_DIR}/qrels.dev.tsv \ max_query_length 64\ max_seq_length 512 \ num_eval_docs 1000 This conversion takes 30 40 hours. Alternatively, you may download the TFRecord files here (23GB). Training We can now start training. We highly recommend using the free TPUs in our Google's Colab . Otherwise, a modern V100 GPU with 16GB cannot fit even a small batch size of 2 when training a BERT Large model. In case you opt for not using the Colab, here is the command line to start training: python run_msmarco.py \ data_dir ${DATA_DIR}/tfrecord \ bert_config_file ${DATA_DIR}/uncased_L 24_H 1024_A 16/bert_config.json \ init_checkpoint ${DATA_DIR}/uncased_L 24_H 1024_A 16/bert_model.ckpt \ output_dir ${DATA_DIR}/output \ msmarco_output True \ do_train True \ do_eval True \ num_train_steps 400000 \ num_warmup_steps 40000 \ train_batch_size 32 \ eval_batch_size 32 \ learning_rate 1e 6 Training for 400k iterations takes approximately 70 hours on a TPU v2. Alternatively, you can download the trained model used in our submission here (3.4GB). TREC CAR We describe in the next sections how to reproduce our results on the TREC CAR dataset. Downloading qrels, run and TFRecord files The next steps (Indexing, Retrieval, and TFRecord conversion) take many hours. Alternatively, you can skip them and download the necessary files for training and evaluation here (4.0GB), namely: queries ( .topics); query relevant passage pairs ( .qrels); query candidate passage pairs ( .run). TFRecord files ( .tf) After downloading, you need to extract them to the TRECCAR_DIR folder: TRECCAR_DIR ./treccar/ tar xf treccar_files.tar.xz directory ${TRECCAR_DIR} And you are ready to go to the training/evaluation section. Downloading and Extracting the data If you decided to index, retrieve and convert to the TFRecord format, you first need to download and extract the TREC CAR data: TRECCAR_DIR ./treccar/ DATA_DIR ./data mkdir ${DATA_DIR} wget P ${TRECCAR_DIR} wget P ${TRECCAR_DIR} wget P ${TRECCAR_DIR} wget P ${DATA_DIR} tar xf ${TRECCAR_DIR}/paragraphCorpus.v2.0.tar.xz tar xf ${TRECCAR_DIR}/train.v2.0.tar.xz tar xf ${TRECCAR_DIR}/benchmarkY1 test.v2.0.tar.xz tar xzf ${DATA_DIR}/BERT_Large_pretrained_on_TREC_CAR_training_set_1M_iterations.tar.gz Indexing TREC CAR We need to index the corpus and retrieve documents using the BM25 algorithm for each query so we have query document pairs for training. We index the TREC CAR corpus using Anserini , an excelent toolkit for information retrieval research. First, we need to install Maven, and clone and compile Anserini's repository: sudo apt get install maven git clone cd Anserini mvn clean package appassembler:assemble tar xvfz eval/trec_eval.9.0.4.tar.gz C eval/ && cd eval/trec_eval.9.0.4 && make cd ../ndeval && make Now we can index the corpus (.cbor files): sh target/appassembler/bin/IndexCollection collection CarCollection \ generator LuceneDocumentGenerator threads 40 input ${TRECCAR_DIR}/paragraphCorpus.v2.0 index \ ${TRECCAR_DIR}/lucene index.car17.pos+docvectors+rawdocs storePositions storeDocvectors \ storeRawDocs You should see a message like this after it finishes: 2019 01 15 20:26:28,742 INFO main index.IndexCollection (IndexCollection.java:578) Total 29,794,689 documents indexed in 03:20:35 Retrieving pairs of query candidate document We now retrieve candidate documents for each query using the BM25 algorithm. But first, we need to convert the TREC CAR files to a format that Anserini can consume. First, we merge qrels folds 0, 1, 2, and 3 into a single file for training. Fold 4 will be the dev set. for f in ${TRECCAR_DIR}/train/fold 0 3 base.train.cbor hierarchical.qrels; do (cat ${f} ; echo); done >${TRECCAR_DIR}/train.qrels cp ${TRECCAR_DIR}/train/fold 4 base.train.cbor hierarchical.qrels ${TRECCAR_DIR}/dev.qrels cp ${TRECCAR_DIR}/benchmarkY1/benchmarkY1 test/test.pages.cbor hierarchical.qrels ${TRECCAR_DIR}/test.qrels We need to extract the queries (first column in the space separated files): cat ${TRECCAR_DIR}/train.qrels cut d' ' f1 > ${TRECCAR_DIR}/train.topics cat ${TRECCAR_DIR}/dev.qrels cut d' ' f1 > ${TRECCAR_DIR}/dev.topics cat ${TRECCAR_DIR}/test.qrels cut d' ' f1 > ${TRECCAR_DIR}/test.topics And remove all duplicated queries: sort u o ${TRECCAR_DIR}/train.topics ${TRECCAR_DIR}/train.topics sort u o ${TRECCAR_DIR}/dev.topics ${TRECCAR_DIR}/dev.topics sort u o ${TRECCAR_DIR}/test.topics ${TRECCAR_DIR}/test.topics We now retrieve the top 10 documents per query for training and development sets. nohup target/appassembler/bin/SearchCollection topicreader Car index ${TRECCAR_DIR}/lucene index.car17.pos+docvectors+rawdocs topics ${TRECCAR_DIR}/train.topics output ${TRECCAR_DIR}/train.run hits 10 bm25 & nohup target/appassembler/bin/SearchCollection topicreader Car index ${TRECCAR_DIR}/lucene index.car17.pos+docvectors+rawdocs topics ${TRECCAR_DIR}/dev.topics output ${TRECCAR_DIR}/dev.run hits 10 bm25 & And we retrieve top 1,000 documents per query for the test set. nohup target/appassembler/bin/SearchCollection topicreader Car index ${TRECCAR_DIR}/lucene index.car17.pos+docvectors+rawdocs topics ${TRECCAR_DIR}/test.topics output ${TRECCAR_DIR}/test.run hits 1000 bm25 & After it finishes, you should see an output message like this: (SearchCollection.java:166) Finished Ranking with similarity: BM25(k1 0.9,b 0.4) 2019 01 16 23:40:56,538 INFO pool 2 thread 1 search.SearchCollection$SearcherThread (SearchCollection.java:167) Run 2254 topics searched in 01:53:32 2019 01 16 23:40:56,922 INFO main search.SearchCollection (SearchCollection.java:499) Total run time: 01:53:36 This retrieval step takes 40 80 hours for the training set. We can speed it up by increasing the number of threads (ex: threads 6) and loading the index into memory ( inmem option). Measuring BM25 Performance (optional) To be sure that indexing and retrieval worked fine, we can measure the performance of this list of documents retrieved with BM25: eval/trec_eval.9.0.4/trec_eval m map m recip_rank c ${TRECCAR_DIR}/test.qrels ${TRECCAR_DIR}/test.run It is important to use the c option as it assigns a score of zero to queries that had no passage returned. The output should be like this: map all 0.1528 recip_rank all 0.2294 Converting TREC CAR to TFRecord We can now convert qrels (query relevant document pairs), run ( query candidate document pairs), and the corpus into training, dev, and test TFRecord files that will be consumed by BERT. (we need to install CBOR package: pip install cbor) python convert_treccar_to_tfrecord.py \ output_folder ${TRECCAR_DIR}/tfrecord \ vocab_file ${DATA_DIR}/uncased_L 24_H 1024_A 16/vocab.txt \ corpus ${TRECCAR_DIR}/paragraphCorpus/dedup.articles paragraphs.cbor \ qrels_train ${TRECCAR_DIR}/train.qrels \ qrels_dev ${TRECCAR_DIR}/dev.qrels \ qrels_test ${TRECCAR_DIR}/test.qrels \ run_train ${TRECCAR_DIR}/train.run \ run_dev ${TRECCAR_DIR}/dev.run \ run_test ${TRECCAR_DIR}/test.run \ max_query_length 64\ max_seq_length 512 \ num_train_docs 10 \ num_dev_docs 10 \ num_test_docs 1000 This step requires at least 64GB of RAM as we load the entire corpus onto memory. Training/Evaluating Before start training, you need to download a BERT Large model pretrained on the training set of TREC CAR . This pretraining was necessary because the official pre trained BERT models were pre trained on the full Wikipedia, and therefore they have seen, although in an unsupervised way, Wikipedia documents that are used in the test set of TREC CAR. Thus, to avoid this leak of test data into training, we pre trained the BERT re ranker only on the half of Wikipedia used by TREC CAR’s training set. Similar to MS MARCO training, we made available this Google Colab to train and evaluate on TREC CAR. In case you opt for not using the Colab, here is the command line to start training: python run_treccar.py \ data_dir ${TRECCAR_DIR}/tfrecord \ bert_config_file ${DATA_DIR}/uncased_L 24_H 1024_A 16/bert_config.json \ init_checkpoint ${DATA_DIR}/pretrained_models_exp898_model.ckpt 1000000 \ output_dir ${TRECCAR_DIR}/output \ trec_output True \ do_train True \ do_eval True \ trec_output True \ num_train_steps 400000 \ num_warmup_steps 40000 \ train_batch_size 32 \ eval_batch_size 32 \ learning_rate 1e 6 \ max_dev_examples 3000 \ num_dev_docs 10 \ max_test_examples None \ num_test_docs 1000 Because trec_output is set to True, this script will produce a TREC formatted run file bert_predictions_test.run . We can evaluate the final performance of our BERT model using the official TREC eval tool, which is included in Anserini: eval/trec_eval.9.0.4/trec_eval m map m recip_rank c ${TRECCAR_DIR}/test.qrels ${TRECCAR_DIR}/output/bert_predictions_test.run And the output should be: map all 0.3356 recip_rank all 0.4787 We made available our run file here . Trained models You can download our BERT Large trained on TREC CAR here . How do I cite this work? @article{nogueira2019passage, title {Passage Re ranking with BERT}, author {Nogueira, Rodrigo and Cho, Kyunghyun}, journal {arXiv preprint arXiv:1901.04085}, year {2019} }",Passage Re-Ranking,NLP Other 2177,Natural Language Processing,Natural Language Processing,Natural Language Processing,"CircleCI PolyAI (polyai logo.png) conversational datasets A collection of large datasets for conversational response selection. This repository provides tools to create reproducible datasets for training and evaluating models of conversational response. This includes: Reddit (reddit) 3.7 billion comments structured in threaded conversations OpenSubtitles (opensubtitles) over 400 million lines from movie and television subtitles (available in English and other languages) Amazon QA (amazon_qa) over 3.6 million question response pairs in the context of Amazon products Machine learning methods work best with large datasets such as these. At PolyAI we train models of conversational response on huge conversational datasets and then adapt these models to domain specific tasks in conversational AI. This general approach of pre training large models on huge datasets has long been popular in the image community and is now taking off in the NLP community. Rather than providing the raw processed data, we provide scripts and instructions to generate the data yourself. This allows you to view and potentially manipulate the pre processing and filtering. The instructions define standard datasets, with deterministic train/test splits, which can be used to define reproducible evaluations in research papers. Datasets Each dataset has its own directory, which contains a dataflow script, instructions for running it, and unit tests. Train set size Test set size Reddit (reddit) 2015 2019 654 million 72 million OpenSubtitles (opensubtitles) English (other languages available) 286 million 33 million Amazon QA (amazon_qa) 3 million 0.3 million Note that these are the dataset sizes after filtering and other processing. For instance, the Reddit dataset is based on a raw database of 3.7 billion comments, but consists of 726 million examples because the script filters out long comments, short comments, uninformative comments (such as ' deleted ' , and comments with no replies. Benchmarks Benchmark results for each of the datasets can be found in BENCHMARKS.md (BENCHMARKS.md). Conversational Dataset Format This repo contains scripts for creating datasets in a standard format any dataset in this format is referred to elsewhere as simply a conversational dataset . Datasets are stored as tensorflow record files ( containing serialized tensorflow example protocol buffers. The training set is stored as one collection of tensorflow record files, and the test set as another. Examples are shuffled randomly (and not necessarily reproducibly) within the tensorflow record files. The train/test split is always deterministic, so that whenever the dataset is generated, the same train/test split is created. Each tensorflow example contains a conversational context and a response that goes with that context. For example: javascript { 'context/1': Hello, how are you? , 'context/0': I am fine. And you? , 'context': Great. What do you think of the weather? , 'response': It doesn't feel like February. } Explicitly, each example contains a number of string features: A context feature, the most recent text in the conversational context A response feature, the text that is in direct response to the context . A number of extra context features , context/0 , context/1 etc. going back in time through the conversation. They are named in reverse order so that context/i always refers to the i^th most recent extra context, so that no padding needs to be done, and datasets with different numbers of extra contexts can be mixed. Depending on the dataset, there may be some extra features also included in each example. For instance, in Reddit the author of the context and response are identified using additional features. Reading conversational datasets The tools/tfrutil.py (tools/tfrutil.py) and baselines/run_baseline.py (baselines/run_baseline.py) scripts demonstrate how to read a conversational dataset in Python, using functions from the tensorflow library. You can use tools/tfrutil.py (tools/tfrutil.py) to compute the number of examples in a tensorflow record file: $ python tools/tfrutil.py size data/reddit test 726158 It can also be used to display the examples in a readable format: $ python tools/tfrutil.py pp data/reddit test Example 0 Context : Airplane? What is it? Response : Airplane! The movie. It's an amazing parody of plane movies which sounds terrible but it is actually 10/10. Extra Contexts: context/2 : Unfortunately, they all had the fish for dinner. context/1 : This is some sort of reference? I don't get it. context/0 : Airplane. Drop everything and watch it right now Other features: context_author : Doctor_Insano_MD response_author : ThegreatandpowerfulR subreddit : todayilearned thread_id : 41ar0l ... Below is some example tensorflow code for reading a conversational dataset into a tensorflow graph: python num_extra_contexts 10 batch_size 100 pattern gs://your bucket/dataset/train .tfrecords if not tf.gfile.Glob(pattern): raise ValueError( No files matched pattern + pattern) dataset tf.data.Dataset.list_files(pattern) dataset dataset.apply( tf.contrib.data.parallel_interleave( lambda file: tf.data.TFRecordDataset(file), cycle_length 8)) dataset dataset.apply( tf.data.experimental.shuffle_and_repeat( buffer_size 8 batch_size)) dataset dataset.batch(batch_size) def _parse_function(serialized_examples): parse_spec { context : tf.FixedLenFeature( , tf.string), response : tf.FixedLenFeature( , tf.string) } parse_spec.update({ context/{} .format(i): tf.FixedLenFeature( , tf.string, default_value ) for i in range(num_extra_contexts) }) return tf.parse_example(serialized_examples, parse_spec) dataset dataset.map(_parse_function, num_parallel_calls 8) dataset dataset.prefetch(8) iterator dataset.make_one_shot_iterator() tensor_dict iterator.get_next() The tensorflow graph can now access tensor_dict context , tensor_dict response etc. as batches of string features (unicode bytes). Getting Started Conversational datasets are created using Apache Beam pipeline scripts, run on Google Dataflow . This parallelises the data processing pipeline across many worker machines. Apache Beam requires python 2.7, so you will need to set up a python 2.7 virtual environment: bash python2.7 m virtualenv venv . venv/bin/activate pip install r requirements.txt The Dataflow scripts write conversational datasets to Google cloud storage, so you will need to create a bucket to save the dataset to. Lastly, you will need to set up authentication by creating a service account with access to Dataflow and Cloud Storage, and set GOOGLE_APPLICATION_CREDENTIALS : bash export GOOGLE_APPLICATION_CREDENTIALS {{json file key location}} This should be enough to follow the instructions for creating each individual dataset. Evaluation Of course you may evaluate your models in any way you like. However, when publishing results, we encourage you to include the 1 of 100 ranking accuracy, which is becoming a research community standard. The 1 of 100 ranking accuracy is a Recall@k metric. In general Recall@k takes N responses to the given conversational context, where only one response is relevant. It indicates whether the relevant response occurs in the top k ranked candidate responses. The 1 of 100 metric is obtained when k 1 and N 100 . This effectively means that, for each query, we indicate if the correct response is the top ranked response among 100 candidates. The final score is the average across all queries. The 1 of 100 metric is computed using random batches of 100 examples so that the responses from other examples in the batch are used as random negative candidates. This allows for efficiently computing the metric across many examples in batches. While it is not guaranteed that the random negatives will indeed be 'true' negatives, the 1 of 100 metric still provides a useful evaluation signal that correlates with downstream tasks. The following tensorflow code shows how this metric can be computed for a dot product style encoder model, where the score for each context and response is a dot product between corresponding vectors: python Encode the contexts and responses as vectors using tensorflow ops. The following are both 100, encoding_size matrices. context_encodings _encode_contexts(tensor_dict 'context' ) response_encodings _encode_responses(tensor_dict 'response' ) scores tf.matmul( context_encodings, response_encodings, transpose_b True) A 100, 100 matrix. batch_size tf.shape(context_encodings) 0 accuracy_1_of_100 tf.metrics.accuracy( labels tf.range(batch_size), predictions tf.argmax(scores, 1) ) See also the baselines (baselines) for example code computing the 1 of 100 metric. Many studies have used Recall@k in the context of retrieval based dialogue, including the following papers: The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi Turn Dialogue Systems , Lowe et al. SIGDIAL 2015. Neural Utterance Ranking Model for Conversational Dialogue Systems , Inaba and Takahashi. SIGDIAL 2016. Strategy and Policy Learning for Non task oriented Conversational Systems , Yu et al. SIGDIAL 2016. Training End to End Dialogue Systems with the Ubuntu Dialogue Corpus , Lowe et al. Dialogue and Discourse 2017. Sequential Matching Network: A New Architecture for Multi turn Response Selection in Retrieval based Chatbots , Wu et al. ACL 2017. Improving Response Selection in Multi turn Dialogue Systems by Incorporating Domain Knowledge , Chaudhuri et al. CoNLL 2018. Data Augmentation for Neural Online Chats Response Selection , Du and Black. SCAI 2018. Customized Nonlinear Bandits for Online Response Selection in Neural Conversational Models , Liu et al. AAAI 2018. DSTC7 task 1: Noetic end to end response selection , Gunasekara et al. 2019. Multi representation Fusion Network for Multi Turn Response Selection in Retrieval based Chatbots , Tao et al. WSDM 2019. Multi Turn Response Selection for Chatbots with Deep Attention Matching Network , Zhou et al. ACL 2018. The following papers use the 1 of 100 ranking accuracy in particular: Conversational Contextual Cues: The Case of Personalization and History for Response Ranking. , Al Rfou et al. arXiv pre print 2016. Efficient Natural Language Response Suggestion for Smart Reply , Henderson et al. arXiv pre print 2017. Question Answer Selection in User to User Marketplace Conversations , Kumar et al. IWSDS 2018. Universal Sentence Encoder , Cer et al. arXiv pre print 2018. Learning Semantic Textual Similarity from Conversations. . Yang et al. Workshop on Representation Learning for NLP 2018. Citations When using these datasets in your work, please cite our paper, A Repository of Conversational Datasets : bibtex @Article{Henderson2019, author {Matthew Henderson and Pawe{\l} Budzianowski and I{\{n}}igo Casanueva and Sam Coope and Daniela Gerz and Girish Kumar and Nikola Mrk{\v{s}}i\'c and Georgios Spithourakis and Pei Hao Su and Ivan Vulic and Tsung Hsien Wen}, title {A Repository of Conversational Datasets}, year {2019}, month {apr}, note {Data available at github.com/PolyAI LDN/conversational datasets}, journal {CoRR}, volume {abs/1904.06472}, url { } Contributing We happily accept contributions in the form of pull requests. Each pull request is tested in CircleCI it is first linted with flake8 , and then the unit tests are run. In particular we would be interested in: new datasets adaptations to the scripts so that they work better in your environment (e.g. other Apache Beam runners, other cloud storage solutions, other example formats) results from your methods in the benchmarks the benchmarks page (BENCHMARKS.md). code for new baselines and improvements to existing baselines",Conversational Response Selection,NLP Other 2179,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Graph Convolution over Pruned Dependency Trees for Relation Extraction This repo contains the PyTorch code for the paper Graph Convolution over Pruned Dependency Trees Improves Relation Extraction . This paper/code introduces a graph convolutional neural network (GCN) over pruned dependency trees for the task of relation extraction. A special tree pruning technique called the Path centric Pruning is also introduced to eliminate irrelevant information from the trees while maximally maintaining relevant information. Compared to sequence models such as various LSTM based models, this GCN model makes use of dependency structures to bridge remote words, therefore improves performance for long range relations. Compared to previous recursive models such as the TreeLSTM, this GCN model achieves better performance while being much eariser to parallelize and therefore much more efficient. See below for an overview of the model architecture: ! GCN Architecture (fig/architecture.png GCN Architecture ) Requirements Python 3 (tested on 3.6.5) PyTorch (tested on 0.4.0) tqdm unzip, wget (for downloading only) Preparation The code requires that you have access to the TACRED dataset (LDC license required). The TACRED dataset is currently scheduled for public release via LDC in December 2018. For possible early access to this data please contact us at yuhao.zhang at stanford.edu . Once you have the TACRED data, please put the JSON files under the directory dataset/tacred . For completeness, we only include sample data files from the TACRED dataset in this repo. First, download and unzip GloVe vectors from the Stanford NLP group website, with: chmod +x download.sh; ./download.sh Then prepare vocabulary and initial word vectors with: python prepare_vocab.py dataset/tacred dataset/vocab glove_dir dataset/glove This will write vocabulary and word vectors as a numpy matrix into the dir dataset/vocab . Training To train a graph convolutional neural network (GCN) model, run: bash train_gcn.sh 0 Model checkpoints and logs will be saved to ./saved_models/00 . To train a Contextualized GCN (C GCN) model, run: bash train_cgcn.sh 1 Model checkpoints and logs will be saved to ./saved_models/01 . For details on the use of other parameters, such as the pruning distance k, please refer to train.py . Evaluation To run evaluation on the test set, run: python eval.py saved_models/00 dataset test This will use the best_model.pt file by default. Use model checkpoint_epoch_10.pt to specify a model checkpoint file. Retrain Reload a pretrained model and finetune it, run: python train.py load model_file saved_models/01/best_model.pt optim sgd lr 0.001 Related Repo The paper also includes comparisons to the position aware attention LSTM (PA LSTM) model for relation extraction. To reproduce the corresponding results, please refer to this repo . Citation @inproceedings{zhang2018graph, author {Zhang, Yuhao and Qi, Peng and Manning, Christopher D.}, booktitle {Empirical Methods in Natural Language Processing (EMNLP)}, title {Graph Convolution over Pruned Dependency Trees Improves Relation Extraction}, url { year {2018} } License All work contained in this package is licensed under the Apache License, Version 2.0. See the included LICENSE file.",Relation Extraction,NLP Other 2257,Natural Language Processing,Natural Language Processing,Natural Language Processing,"A Joint Many Task Model: Growing a Neural Network for Multiple NLP Tasks, ICLR 2017 Multiple Different Natural Language Processing Tasks in a Single Deep Model. This, in my opinion, is indeed a very good paper. It demonstrates how a neural model can be trained from low level to higher level in a fashion such that lower layers correspond to word level tasks and the higher layers correspond to tasks which are performed at sentence level. The authors also show how to retain the information at lower layers while training the higher layers by successive regularization. It is also clearly shown that transfer learning is possible where different datasets are exploited simultaneously after jointly pre trained for word embeddings. Catastrophic inference is a very crucial thing to deal with in this mode. It is basically the inference in other layer's learned parameters while training a particular layer. As an example, you want to retain information about POS while training for, say, chunking later! Model Architecture: ! (images/model.png) Data: Conll2000 SICK data Tasks: POS Tagging (word level) Chunking (word level) Semantic Relatedness (sentence level) Textual Entailment (sentence level) Usage: data.py Preprocesses data for the model run.py Runs the main model. Sample input: python task_desc { 'pos': 'this has increased the risk', 'chunk': 'this has increased the risk', 'relatedness': 'two dogs are wrestling and hugging', 'there is no dog wrestling and hugging' , 'entailment': 'Two dogs are wrestling and hugging', 'There is no dog wrestling and hugging' } Sample Output: ! (images/result.png) Note: The original paper contains one more task which is dependency parsing. Currently, that is not incorporated in the model due to non availability of good public data. Also need to add successive regularization. Citations: A Joint Many Task Model: Growing a Neural Network for Multiple NLP Tasks Kazuma Hashimoto, Caiming Xiong, Yoshimasa Tsuruoka, Richard Socher",Chunking,NLP Other 2292,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Passage Re ranking with BERT Introduction \ \ \ \ \ Most of the code in this repository was copied from the original BERT repository . \ \ \ \ \ This repository contains the code to reproduce our entry to the MSMARCO passage ranking task , which was placed first with a large margin over the second place. It also contains the code to reproduce our result on the TREC CAR dataset , which is 22 MAP points higher than the best entry from 2017 and a well tuned BM25. MSMARCO Passage Re Ranking Leaderboard (Jan 8th 2019) Eval MRR@10 Eval MRR@10 : : : : 1st Place BERT (this code) 35.87 36.53 2nd Place IRNet 28.06 27.80 3rd Place Conv KNRM 27.12 29.02 TREC CAR Test Set (Automatic Annotations) MAP : : BERT (this code) 33.5 BM25 Anserini 15.6 MacAvaney et al., 2017 (TREC CAR 2017 Best Entry) 14.8 The paper describing our implementation is here . MS MARCO Download and extract the data First, we need to download and extract MS MARCO and BERT files: DATA_DIR ./data mkdir ${DATA_DIR} wget P ${DATA_DIR} wget P ${DATA_DIR} wget P ${DATA_DIR} wget P ${DATA_DIR} wget P ${DATA_DIR} tar xvf ${DATA_DIR}/triples.train.small.tar.gz C ${DATA_DIR} tar xvf ${DATA_DIR}/top1000.dev.tar.gz C ${DATA_DIR} tar xvf ${DATA_DIR}/top1000.eval.tar.gz C ${DATA_DIR} unzip ${DATA_DIR}/uncased_L 24_H 1024_A 16.zip d ${DATA_DIR} Convert MS MARCO to TFRecord format Next, we need to convert MS MARCO train, dev, and eval files to TFRecord files, which will be later consumed by BERT. mkdir ${DATA_DIR}/tfrecord python convert_msmarco_to_tfrecord.py \ output_folder ${DATA_DIR}/tfrecord \ vocab_file ${DATA_DIR}/uncased_L 24_H 1024_A 16/vocab.txt \ train_dataset_path ${DATA_DIR}/triples.train.small.tsv \ dev_dataset_path ${DATA_DIR}/top1000.dev.tsv \ eval_dataset_path ${DATA_DIR}/top1000.eval.tsv \ dev_qrels_path ${DATA_DIR}/qrels.dev.tsv \ max_query_length 64\ max_seq_length 512 \ num_eval_docs 1000 This conversion takes 30 40 hours. Alternatively, you may download the TFRecord files here (23GB). Training We can now start training. We highly recommend using the free TPUs in our Google's Colab . Otherwise, a modern V100 GPU with 16GB cannot fit even a small batch size of 2 when training a BERT Large model. In case you opt for not using the Colab, here is the command line to start training: python run_msmarco.py \ data_dir ${DATA_DIR}/tfrecord \ bert_config_file ${DATA_DIR}/uncased_L 24_H 1024_A 16/bert_config.json \ init_checkpoint ${DATA_DIR}/uncased_L 24_H 1024_A 16/bert_model.ckpt \ output_dir ${DATA_DIR}/output \ msmarco_output True \ do_train True \ do_eval True \ num_train_steps 400000 \ num_warmup_steps 40000 \ train_batch_size 32 \ eval_batch_size 32 \ learning_rate 1e 6 Training for 400k iterations takes approximately 70 hours on a TPU v2. Alternatively, you can download the trained model used in our submission here (3.4GB). TREC CAR We describe in the next sections how to reproduce our results on the TREC CAR dataset. Downloading qrels, run and TFRecord files The next steps (Indexing, Retrieval, and TFRecord conversion) take many hours. Alternatively, you can skip them and download the necessary files for training and evaluation here (4.0GB), namely: queries ( .topics); query relevant passage pairs ( .qrels); query candidate passage pairs ( .run). TFRecord files ( .tf) After downloading, you need to extract them to the TRECCAR_DIR folder: TRECCAR_DIR ./treccar/ tar xf treccar_files.tar.xz directory ${TRECCAR_DIR} And you are ready to go to the training/evaluation section. Downloading and Extracting the data If you decided to index, retrieve and convert to the TFRecord format, you first need to download and extract the TREC CAR data: TRECCAR_DIR ./treccar/ wget P ${TRECCAR_DIR} wget P ${TRECCAR_DIR} wget P ${TRECCAR_DIR} tar xf ${TRECCAR_DIR}/paragraphCorpus.v2.0.tar.xz tar xf ${TRECCAR_DIR}/train.v2.0.tar.xz tar xf ${TRECCAR_DIR}/benchmarkY1 test.v2.0.tar.xz Indexing TREC CAR We need to index the corpus and retrieve documents using the BM25 algorithm for each query so we have query document pairs for training. We index the TREC CAR corpus using Anserini , an excelent toolkit for information retrieval research. First, we need to install Maven, and clone and compile Anserini's repository: sudo apt get install maven git clone cd Anserini mvn clean package appassembler:assemble tar xvfz eval/trec_eval.9.0.4.tar.gz C eval/ && cd eval/trec_eval.9.0.4 && make cd ../ndeval && make Now we can index the corpus (.cbor files): sh target/appassembler/bin/IndexCollection collection CarCollection \ generator LuceneDocumentGenerator threads 40 input ${TRECCAR_DIR}/paragraphCorpus.v2.0 index \ ${TRECCAR_DIR}/lucene index.car17.pos+docvectors+rawdocs storePositions storeDocvectors \ storeRawDocs You should see a message like this after it finishes: 2019 01 15 20:26:28,742 INFO main index.IndexCollection (IndexCollection.java:578) Total 29,794,689 documents indexed in 03:20:35 Retrieving pairs of query candidate document We now retrieve candidate documents for each query using the BM25 algorithm. But first, we need to convert the TREC CAR files to a format that Anserini can consume. First, we merge qrels folds 0, 1, 2, and 3 into a single file for training. Fold 4 will be the dev set. for f in ${TRECCAR_DIR}/train/fold 0 3 base.train.cbor hierarchical.qrels; do (cat ${f} ; echo); done >${TRECCAR_DIR}/train.qrels cp ${TRECCAR_DIR}/train/fold 4 base.train.cbor hierarchical.qrels ${TRECCAR_DIR}/dev.qrels cp ${TRECCAR_DIR}/benchmarkY1/benchmarkY1 test/test.pages.cbor hierarchical.qrels ${TRECCAR_DIR}/test.qrels We need to extract the queries (first column in the space separated files): cat ${TRECCAR_DIR}/train.qrels cut d' ' f1 > ${TRECCAR_DIR}/train.topics cat ${TRECCAR_DIR}/dev.qrels cut d' ' f1 > ${TRECCAR_DIR}/dev.topics cat ${TRECCAR_DIR}/test.qrels cut d' ' f1 > ${TRECCAR_DIR}/test.topics And remove all duplicated queries: sort u o ${TRECCAR_DIR}/train.topics ${TRECCAR_DIR}/train.topics sort u o ${TRECCAR_DIR}/dev.topics ${TRECCAR_DIR}/dev.topics sort u o ${TRECCAR_DIR}/test.topics ${TRECCAR_DIR}/test.topics We now retrieve the top 10 documents per query for training and development sets. nohup target/appassembler/bin/SearchCollection topicreader Car index ${TRECCAR_DIR}/lucene index.car17.pos+docvectors+rawdocs topics ${TRECCAR_DIR}/train.topics output ${TRECCAR_DIR}/train.run hits 10 bm25 & nohup target/appassembler/bin/SearchCollection topicreader Car index ${TRECCAR_DIR}/lucene index.car17.pos+docvectors+rawdocs topics ${TRECCAR_DIR}/dev.topics output ${TRECCAR_DIR}/dev.run hits 10 bm25 & And we retrieve top 1,000 documents per query for the test set. nohup target/appassembler/bin/SearchCollection topicreader Car index ${TRECCAR_DIR}/lucene index.car17.pos+docvectors+rawdocs topics ${TRECCAR_DIR}/test.topics output ${TRECCAR_DIR}/test.run hits 1000 bm25 & After it finishes, you should see an output message like this: (SearchCollection.java:166) Finished Ranking with similarity: BM25(k1 0.9,b 0.4) 2019 01 16 23:40:56,538 INFO pool 2 thread 1 search.SearchCollection$SearcherThread (SearchCollection.java:167) Run 2254 topics searched in 01:53:32 2019 01 16 23:40:56,922 INFO main search.SearchCollection (SearchCollection.java:499) Total run time: 01:53:36 This retrieval step takes 40 80 hours for the training set. We can speed it up by increasing the number of threads (ex: threads 6) and loading the index into memory ( inmem option). Measuring BM25 Performance (optional) To be sure that indexing and retrieval worked fine, we can measure the performance of this list of documents retrieved with BM25: eval/trec_eval.9.0.4/trec_eval m map m recip_rank c ${TRECCAR_DIR}/test.qrels ${TRECCAR_DIR}/test.run It is important to use the c option as it assigns a score of zero to queries that had no passage returned. The output should be like this: map all 0.1528 recip_rank all 0.2294 Converting TREC CAR to TFRecord We can now convert qrels (query relevant document pairs), run ( query candidate document pairs), and the corpus into training, dev, and test TFRecord files that will be consumed by BERT. (we need to install CBOR package: pip install cbor) python convert_treccar_to_tfrecord.py \ output_folder ${TRECCAR_DIR}/tfrecord \ vocab_file ${DATA_DIR}/uncased_L 24_H 1024_A 16/vocab.txt \ corpus ${TRECCAR_DIR}/paragraphCorpus.v2.0/dedup.articles paragraphs.cbor \ qrels_train ${TRECCAR_DIR}/train.qrels \ qrels_dev ${TRECCAR_DIR}/dev.qrels \ qrels_test ${TRECCAR_DIR}/test.qrels \ run_train ${TRECCAR_DIR}/train.run \ run_dev ${TRECCAR_DIR}/dev.run \ run_test ${TRECCAR_DIR}/test.run \ max_query_length 64\ max_seq_length 512 \ num_train_docs 10 \ num_dev_docs 10 \ num_test_docs 1000 This step requires at least 64GB of RAM as we load the entire corpus onto memory. Training/Evaluating Before start training, you need to download a BERT Large model pretrained on the training set of TREC CAR . This pretraining was necessary because the official pre trained BERT models were pre trained on the full Wikipedia, and therefore they have seen, although in an unsupervised way, Wikipedia documents that are used in the test set of TREC CAR. Thus, to avoid this leak of test data into training, we pre trained the BERT re ranker only on the half of Wikipedia used by TREC CAR’s training set. Similar to MS MARCO training, we made available this Google Colab to train and evaluate on TREC CAR. In case you opt for not using the Colab, here is the command line to start training: python run_treccar.py \ data_dir ${TRECCAR_DIR}/tfrecord \ bert_config_file ${DATA_DIR}/uncased_L 24_H 1024_A 16/bert_config.json \ init_checkpoint /path_to_bert_pretrained_on_treccar/model.ckpt 1000000 \ output_dir ${TRECCAR_DIR}/output \ trec_output True \ do_train True \ do_eval True \ trec_output True \ num_train_steps 400000 \ num_warmup_steps 40000 \ train_batch_size 32 \ eval_batch_size 32 \ learning_rate 1e 6 \ max_dev_examples 3000 \ num_dev_docs 10 \ max_test_examples None \ num_test_docs 1000 Because trec_output is set to True, this script will produce a TREC formatted run file bert_predictions_test.run . We can evaluate the final performance of our BERT model using the official TREC eval tool, which is included in Anserini: eval/trec_eval.9.0.4/trec_eval m map m recip_rank c ${TRECCAR_DIR}/test.qrels ${TRECCAR_DIR}/output/bert_predictions_test.run And the output should be: map all 0.3356 recip_rank all 0.4787 We made available our run file here . Trained models You can download our BERT Large trained on TREC CAR here . How do I cite this work? @article{nogueira2019passage, title {Passage Re ranking with BERT}, author {Nogueira, Rodrigo and Cho, Kyunghyun}, journal {arXiv preprint arXiv:1901.04085}, year {2019} }",Passage Re-Ranking,NLP Other 2322,Natural Language Processing,Natural Language Processing,Natural Language Processing,"image2story Written by LIN Jinghong, WEI Qing, SHI Siyuan project introduction We set up a model for generating a story of some specific style from pictures and named it image2story . The model is based on the previous image caption model and we expand the image caption model to generate a complete paragraph instead of a single sentence. Our model can be roughly divided into three parts. First, the sentence description is extracted from the picture through the image caption network, at the same time, the skip thought vectors model is trained to realize the conversion between sentence and vector. Then we feed the sentences extracted by image caption into the encoder of skip thought vectors, and convert the image caption into a vector with its own text style. Then we use romantic novels as our data set and train a story generator, the principle of which is similar to that of skip thought vectors. Finally, the vector is fed into the story generator to generate romantic paragraphs. pretrained model The pretrained model merges the image caption model from and a sentence to story model extracted from First, download the needed model this , put the caption.npy into the caption_model folder, put the stv_model folder as well as romance_models folder into the story_model folder. Then put the image you would like to test into the test/images folder and run the demo.py the generated stories will be stored in the variable passages . Also, it will produce the image caption images each with 3 captions, which is stored in the test/results folder. dependencies Python 3 Theano tensorflow output samples ! Result ! Result ! Result ! Result ! Result ! Result ! Result ! Result ! Result",Grammatical Error Correction,NLP Other 2323,Natural Language Processing,Natural Language Processing,Natural Language Processing,"image2story Written by LIN Jinghong, WEI Qing, SHI Siyuan project introduction We set up a model for generating a story of some specific style from pictures and named it image2story . The model is based on the previous image caption model and we expand the image caption model to generate a complete paragraph instead of a single sentence. Our model can be roughly divided into three parts. First, the sentence description is extracted from the picture through the image caption network, at the same time, the skip thought vectors model is trained to realize the conversion between sentence and vector. Then we feed the sentences extracted by image caption into the encoder of skip thought vectors, and convert the image caption into a vector with its own text style. Then we use romantic novels as our data set and train a story generator, the principle of which is similar to that of skip thought vectors. Finally, the vector is fed into the story generator to generate romantic paragraphs. pretrained model The pretrained model merges the image caption model from and a sentence to story model extracted from First, download the needed model this , put the caption.npy into the caption_model folder, put the stv_model folder as well as romance_models folder into the story_model folder. Then put the image you would like to test into the test/images folder and run the demo.py the generated stories will be stored in the variable passages . Also, it will produce the image caption images each with 3 captions, which is stored in the test/results folder. dependencies Python 3 Theano tensorflow output samples ! Result ! Result ! Result ! Result ! Result ! Result ! Result ! Result ! Result",Grammatical Error Correction,NLP Other 2324,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Hello, this repository contains an implementation of the DRMM model. Original paper : A Deep Relevance Matching Model for Ad hoc Retrieval (Guo et al.) Some code is taken from MatchZoo (referenced in the respective files). Table of Contents Preprocessing : Contains all preprocessing code (starting from raw TREC data) to create all input files for the neural ranking model (this includes generating the histogram, idf informations). See the peprocessing Readme (preprocessing/README.md) for more. Neural Ranking : Contains the neural ranking model Results : Contains a description of experiments & their raw and trec_eval evaluated results. Dependencies TREC 8 corpus data (not preprocessed: fbis,fr94,ft,latimes) Python 3: Latest Keras and Tensorflow",Ad-Hoc Information Retrieval,NLP Other 2333,Natural Language Processing,Natural Language Processing,Natural Language Processing,"RumorEval2019 BiLSTM jmculnan SVM maxaalexeeva CNN seongjinpark 88 CRF yiyunzhao; baseline, flattened structure and FFNN Writing YY baseline, description of flattening & FFNN, CRF SJP conclusion, description of CNN, MA introduction, description of SVM, JC previous research, description of BiLSTM, final editing Contents saved_dataRumEval2019_npy_files files generated from running the preprocessing in the paper: Turing at SemEval 2017 Task 8: Sequential Approach to Rumour Stance Classification with Branch LSTM",Stance Detection,NLP Other 2353,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Coreference Resolution PyTorch 0.4.1 Python 3.6.5 This repository consists of an efficient, annotated PyTorch reimplementation of the EMNLP paper End to end Neural Coreference Resolution by Lee et al., 2017. Main code can be found in this file . Data The source code assumes access to the English train, test, and development data of OntoNotes Release 5.0. This data should be located in a folder called 'data' inside the main directory. The data consists of 2,802 training documents, 343 development documents, and 348 testing documents. The average length of all documents is 454 words with a maximum length of 4,009 words. The number of mentions and coreferences in each document varies drastically, but is generally correlated with document length. Since the data require a license from the Linguistic Data Consortium to use, they are thus not supplied here. Information on how to download and preprocess them can be found here and here , respectively. Beyond the data, the source files also assume access to both Turian embeddings and GloVe embeddings . Problem Definition Coreference is defined as occurring when one or more expressions in a document refer back to the an entity that came before it/them. Coreference resolution, then, is the task of finding all expressions that are coreferent with any of the entities found in a given text. While this problem definition seems simple enough, oftentimes the nomenclature found in papers regarding coreference resolution is quite confusing. Visualizing them makes things a bit easier to understand: ! (/imgs/nomenclature.png) Words are colored according to whether they are entities or not. Different colored groups of words are members of the same coreference cluster. Entities that are the only member of their cluster are known as 'singleton' entities. Why Corefence Resolution is Hard Entities can be very long and coreferent entities can occur extremely far away from one another. A greedy system would compute every possible span (sequence) of tokens and then compare it to every possible span that came before it. This makes the complexity of the problem O(T 4 ), where T is the document length. For a 100 word document this would be 100 million possible options and for the longest document in our dataset, this equates to almost one quadrillion possible combinations. If this does not make it concrete, imagine that we had the sentence Arya Stark walks her direwolf, Nymeria. Here we have three entities: Arya Stark , her , and Nymeria . As a native speaker of English it should be trivial to tell that her refers to Arya Stark . But to a machine with no knowledge, how should it know that Arya and Stark should be a single entity rather than two separate ones, that Nymeria does not refer back to her even though they are arguably related, or even that that Arya Stark walks her direwolf, Nymeria is not just one big entity in and of itself? For another example, consider the sentence Napoleon and all of his marvelously dressed, incredibly well trained, loyal troops marched all the way across the Europe to enter into Russia in an, ultimately unsuccessful, effort to conquer it for their country. The word their is referent to Napoleon and all of his marvelously dressed, incredibly well trained, loyal troops ; entities can span many, many tokens. Coreferent entities can also occur far away from one another. Model Architecture As a forewarning, this paper presents a beast of a model. The authors present the following series of images to provide clarity as to what the model is doing. ! (/imgs/architecture.png) 1. Token Representation Tokens are represented using 300 dimension static GloVe embeddings, 50 dimensional static Turian embeddings, and 8 dimensional character embeddings from a CNN with 50 dimensional filter sizes 3, 4, and 5. Dropout with p 0.50 is applied to these embeddings. The token representations are passed into a 2 layer bidirectional LSTM with hidden state sizes of 200. Dropout with p 0.20 is applied to the output of the LSTM. 2. Span Representation Using the regularized output, span representations are computed by extracting the LSTM hidden states between the index of the first word and the last word. These are used to compute a weighted sum of the hidden states. Then, we concatenate the first and last index with the weighted attention sum and a 20 dimensional feature representation for the total width (length) of the span under consideration. This is done for all spans up to length 10 in the document. 3. Pruning The span representations are passed into a 3 layer, 150 dimensional feedforward network with ReLU activations and p 0.20 dropout applied between each layer. The output of this feedfoward network is 1 dimensional and represents the 'mention score' of each span in the document. Spans are then pruned in decreasing order of mention score unless, when considering a span i, there exists a previously accepted span j such that START(i) < START(j) < END(i) < END(j) or START(j) < START(i) < END(j) < END(j). Only LAMBDA T spans are kept at the end, where LAMBDA is set to 0.40 and T is the document length. 4. Pairwise Representation For these spans, pairwise representations are computed for a given span i and its antecedent j by concatenating the span representation for span i, the span representation for span j, the dot product between these representations, and 20 dimensional feature embeddings for genre, distance between the spans, and whether or not the two spans have the same speaker. 5. Final Score and Loss These representations are passed into a feedforward network similar to that of scoring the spans. Clusters are then formed for these coreferences by identifying chains of coreference links (e.g. span j and span k both refer to span i). The learning objective is to maximize the log likelihood of all correct antecedents that were not pruned. Results Originally from the paper, ! (/imgs/results.png) Recent Work The authors have since published another paper , which achieves an F1 score of 73.0.",Coreference Resolution,NLP Other 2443,Natural Language Processing,Natural Language Processing,Natural Language Processing,"! StellarGraph Machine Learning library logo StellarGraph Machine Learning Library Table of Contents Introduction ( introduction) Guiding Principles ( guiding principles) Getting Started ( getting started) Installation ( installation) Install StellarGraph using pip ( install stellargraph using pip) Install StellarGraph from Github source ( install stellargraph from github source) Docker Image ( docker image) Running the examples ( running the examples) Running the examples with docker ( Running the examples with docker) Algorithms ( algorithms) Getting Help ( getting help) Discourse Community ( discourse community) CI ( ci) Citing ( citing) References ( references) Introduction StellarGraph is a Python library for machine learning on graph structured (or equivalently, network structured) data. Graph structured data represent entities, e.g., people, as nodes (or equivalently, vertices), and relationships between entities, e.g., friendship, as links (or equivalently, edges). Nodes and links may have associated attributes such as age, income, and time when a friendship was established, etc. StellarGraph supports analysis of both homogeneous networks (with nodes and links of one type) and heterogeneous networks (with more than one type of nodes and/or links). The StellarGraph library implements several state of the art algorithms for applying machine learning methods to discover patterns and answer questions using graph structured data. The StellarGraph library can be used to solve tasks using graph structured data, such as: Representation learning for nodes and edges, to be used for visualisation and various downstream machine learning tasks; Classification and attribute inference of nodes or edges; Link prediction. We provide examples of using StellarGraph to solve such tasks using several real world datasets. Guiding Principles StellarGraph uses the Keras library and adheres to the same guiding principles as Keras: user friendliness, modularity, and easy extendability. Modules and layers of StellarGraph library are designed so that they can be used together with standard Keras layers and modules, if required. This enables flexibility in using existing or creating new models and workflows for machine learning on graphs. Getting Started To get started with StellarGraph you'll need data structured as a homogeneous or heterogeneous graph, including attributes for the entities represented as graph nodes. NetworkX is used to represent the graph and Pandas or Numpy are used to store node attributes. Detailed and narrated examples of various machine learning workflows on network data, supported by StellarGraph, from data ingestion into graph structure to inference, are given in the demos directory of this repository. Installation StellarGraph is a Python 3 library and requires Python version 3.6 to function (note that the library uses Keras with the Tensorflow backend, and thus does not currently work in python 3.7). The required Python version can be downloaded and installed from python.org . Alternatively, use the Anaconda Python environment, available from anaconda.com . The StellarGraph library can be installed in one of two ways, described next. Install StellarGraph using pip: To install StellarGraph library from PyPi using pip , execute the following command: pip install stellargraph Some of the examples require installing additional dependencies as well as stellargraph . To install these dependencies using pip , execute the following command: pip install stellargraph demos Install StellarGraph from Github source: First, clone the StellarGraph repository using git : git clone Then, cd to the StellarGraph folder, and install the library by executing the following commands: cd stellargraph pip install r requirements.txt pip install . Docker Image stellargraph/stellargraph : Docker image with stellargraph installed. Images can be pulled via docker pull stellargraph/stellargraph Running the examples See the README in the demos directory for more information about the examples and how to run them. Algorithms The StellarGraph library currently includes the following algorithms for graph machine learning: GraphSAGE 1 Supports supervised as well as unsupervised representation learning, node classification/regression, and link prediction for homogeneous networks. The current implementation supports multiple aggregation methods, including mean, maxpool, meanpool, and attentional aggregators. HinSAGE Extension of GraphSAGE algorithm to heterogeneous networks. Supports representation learning, node classification/regression, and link prediction/regression for heterogeneous graphs. The current implementation supports mean aggregation of neighbour nodes, taking into account their types and the types of links between them. GAT Graph ATtention Network algorithm 4 for homogeneous graphs. The implementation supports representation learning and node classification for homogeneous graphs. GCN Graph Convolutional Network algorithm 5 for homogeneous graphs. The implementation supports representation learning and node classification for homogeneous graphs. Node2Vec 2 Unsupervised representation learning for homogeneous networks, taking into account network structure while ignoring node attributes. The node2vec algorithm is implemented by combining StellarGraph's random walk generator with the word2vec algorithm from Gensim . Learned node representations can be used in downstream machine learning models implemented using Scikit learn , Keras , Tensorflow or any other Python machine learning library. Metapath2Vec 3 Unsupervised, metapath guided representation learning for heterogeneous networks, taking into account network structure while ignoring node attributes. The implementation combines StellarGraph's metapath guided random walk generator and Gensim word2vec algorithm. As with node2vec, the learned node representations (node embeddings) can be used in downstream machine learning models to solve tasks such as node classification, link prediction, etc, for heterogeneous networks. Getting Help Documentation for StellarGraph can be found here . Discourse Community Feel free to ask questions and discuss problems on the StellarGraph Discourse forum . CI buildkite integration Pipeline is defined in .buildkite/pipeline.yml Docker images Tests: Uses the official python:3.6 image. Style: Uses black from the stellargraph docker hub organisation. Citing StellarGraph is designed, developed and supported by CSIRO's Data61 . If you use any part of this library in your research, please cite it using the following BibTex entry latex @misc{StellarGraph, author {CSIRO's Data61}, title {StellarGraph Machine Learning Library}, year {2018}, publisher {GitHub}, journal {GitHub Repository}, howpublished {\url{ } References 1. Inductive Representation Learning on Large Graphs. W.L. Hamilton, R. Ying, and J. Leskovec arXiv:1706.02216 cs.SI , 2017. ( link ) 2. Node2Vec: Scalable Feature Learning for Networks. A. Grover, J. Leskovec. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2016. ( link ) 3. Metapath2Vec: Scalable Representation Learning for Heterogeneous Networks. Yuxiao Dong, Nitesh V. Chawla, and Ananthram Swami. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 135–144, 2017 ( link ) 4. Graph Attention Networks. P. Velickovic et al. ICLR 2018 ( link ) 5. Graph Convolutional Networks (GCN): Semi Supervised Classification with Graph Convolutional Networks. Thomas N. Kipf, Max Welling. International Conference on Learning Representations (ICLR), 2017 ( link )",Relation Extraction,NLP Other 2448,Natural Language Processing,Natural Language Processing,Natural Language Processing,"LISA: Linguistically Informed Self Attention ! (./lisa.jpg) This is a work in progress, but much improved, re implementation of the linguistically informed self attention (LISA) model described in the following paper: > Emma Strubell, Patrick Verga, Daniel Andor, David Weiss, and Andrew McCallum. Linguistically Informed > Self Attention for Semantic Role Labeling . > Conference on Empirical Methods in Natural Language Processing (EMNLP) . > Brussels, Belgium. October 2018. To exactly replicate the results in the paper at the cost of an unpleasantly hacky codebase, you can use the original LISA code here . Requirements: \> Python 3.6 \> TensorFlow 1.9 (tested up to 1.12) Quick start: Data setup (CoNLL 2005): 1. Get pre trained word embeddings (GloVe): wget P embeddings unzip j embeddings/glove.6B.zip glove.6B.100d.txt d embeddings 2. Get CoNLL 2005 data in the right format using this repo . Follow the instructions all the way through further preprocessing . 3. Make sure the correct data paths are set in config/conll05.conf Train a model: To train a model with save directory model using the configuration conll05 lisa.conf : bin/train.sh config/conll05 lisa.conf save_dir model Evaluate a model: To evaluate the latest checkpoint saved in the directory model : bin/evaluate.sh config/conll05 lisa.conf save_dir model Evaluate an exported model: To evaluate the best 1 ( f1) checkpoint so far, saved in the directory model (with id 1554216594): bin/evaluate exported.sh config/conll05 lisa.conf save_dir model/export/best_exporter/1554216594 Training The bin/train.sh (bin/train.sh) script calls src/train.py (src/train.py) with parameters specified in top level configs ( custom configuration wip) (i.e. conll05 lisa.conf (config/conll05 lisa.conf)) which is the entry point for training. The following table describes the command line parameters that may be passed to src/train.py to configure training: Name Type Description Default value train files string Comma separated list of training data files. None dev files string Comma separated list of development data files. None save dir string Directory to save models, outputs, etc. If the directory already exists and contains a trained model, training will restart where it left off. Vocabularies will be re used. None transition_stats string File containing pre computed transition statistics between labels. Tab separated file with one label label probability triple per line. None hparams string Comma separated list of name value hyperparameter ( hyperparameters) settings. None debug string Whether to run in debug mode: a little faster and smaller. False data_config string Path to data configuration json. None model_configs string Comma separated list of paths to model configuration json. None task_configs string Comma separated list of paths to data configuration json. None layer_configs string Comma separated list of paths to data configuration json. None attention_configs string Comma separated list of paths to attention configuration json. None keep_k_best_models int Number of best models to keep. 1 best_eval_key string Key corresponding to the evaluation to be used for determining early stopping. The value must correspond to a named eval under the eval_fns entry in a task config ( task configs). None Hyperparameters The following table lists optimization/training hyperparameters that can be set through the hparams command line flag. Hyperparameters are initialized to the default values are defined in src/constants.py (src/constants.py). Then, these are overridden by hyperparameters set in the model config (e.g., glove_basic.json (config/model_configs/glove_basic.json)). Finally, these are overridden by hyperparameters specified at the command line. Hyperparameter loading is implemented in src/train_utils.py (src/train_utils.py 10). Name Type Description Default value learning_rate float Initial learning rate. 0.04 beta1 float Adam first moment decay rate. 0.9 beta2 float Adam second moment decay rate. 0.98 epsilon float Adam epsilon. 1e 12 decay_rate float Exponential rate of decay for learning rate. 1.5 use_nesterov boolean Whether to use Nesterov momentum in Adam. true decay_steps int If warmup_steps is not set, perform stepwise decay of learning rate every this many steps. 5000 warmup_steps int Number of training steps to linearly increase learning rate before exponential decay. 8000 batch size int Approximate number of sentences per batch. 256 shuffle_buffer_multiplier int Value to multiply by batch size to determine buffer size for efficient shuffling of examples during training. Higher means better shuffles, lower means less initial time required to fill shuffle buffer. 100 eval_throttle_secs int Do not run evaluation unless at least this many seconds have passed since the last evaluation. 1000 eval_every_steps int Evaluate every this many steps. 1000 num_train_epochs int Iterate through the full training data this many times. 10000 gradient_clip_norm float Clip gradients to this maximum value. 5.0 label_smoothing float Amount of label corruption for smoothing. Smoothing not performed if this value is 0. 0.1 moving_average_decay float Rate of decay for moving average of model parameters. Averaging not performed if this value is 0. 0.999 average_norms boolean Whether to average variables representing norms in parameter averaging. false input_dropout float Dropout rate on input layer (src/model.py L132) (embeddings). 1.0 bilinear_dropout float Dropout rate used in bilinear classifier (src/nn_utils.py L219). 1.0 mlp_dropout float Dropout used in MLP layers (src/nn_utils.py L130) 1.0 attn_dropout float Dropout rate on attention (src/transformer.py L162) in transformer. 1.0 ff_dropout float Dropout rate in feed forward layer (src/transformer.py L127) in transformer. 1.0 prepost_dropout float Dropout rate applied before (src/transformer.py L255) and after (src/transformer.py L260) the feed forward part of transformer layer. 1.0 random_seed int Random seed to use for training. time.time() Model hyperparameters (e.g. layer size, number of self attention heads) are set in the model config ( model configs) json. Evaluation TODO Custom configuration WIP LISA model configuration is defined through a combination of configuration files. A top level config defines a specific model configuration and dataset by setting other configurations. Top level configs are written in bash, and bottom level configs are written in json. Here is an example top level config, conll05 lisa.conf (config/conll05 lisa.conf), which defines the basic LISA model and CoNLL 2005 data: use CoNLL 2005 data source config/conll05.conf take glove embeddings as input model_configs config/model_configs/glove_basic.json joint pos/predicate layer, parse heads and labels, and srl task_configs config/task_configs/joint_pos_predicate.json,config/task_configs/parse_heads.json,config/task_configs/parse_labels.json,config/task_configs/srl.json use parse in attention attention_configs config/attention_configs/parse_attention.json specify the layers layer_configs config/layer_configs/lisa_layers.json And the top level data config for the CoNLL 2005 dataset that it loads, conll05.conf (config/conll05.conf): data_config config/data_configs/conll05.json data_dir $DATA_DIR/conll05st release new train_files $data_dir/train set.gz.parse.sdeps.combined.bio dev_files $data_dir/dev set.gz.parse.sdeps.combined.bio test_files $data_dir/test.wsj.gz.parse.sdeps.combined.bio,$data_dir/test.brown.gz.parse.sdeps.combined.bio Note that $DATA_DIR is a bash global variable, but all the other variables are defined in these configs. There are five types of bottom level configurations, specifying different aspects of the model: data configs ( data configs): Data configs define a mapping from columns in a one word per line formatted file (e.g. the CoNLL X format) to named features and labels that will be provided to the model as batches. model configs ( model configs): Model configs define hyperparameters, both model hyperparameters , like various embedding dimensions, and optimization hyperparameters , like learning rate. Optimization hyperparameters can be reset at the command line using the hparams command line parameter, which takes a comma separated list of name value hyperparameter settings. Model hyperparameters cannot be redefined in this way, since this would invalidate a serialized model. task configs ( task configs): Task configs define a task: label, evaluation, and how predictions are formed from the model. Each task (e.g. SRL, parse edges, parse labels) should have its own task config. layer configs ( layer configs): Layer configs attach tasks to layers, defining which layer representations should be trained to predict named labels (from the data config). The number of layers in the model is determined by the maximum depth listed in layer configs. attention configs ( attention configs) (optional): Attention configs define special attention functions which replace attention heads, i.e. syntactically informed self attention. Omitting any attention configs results in a model performing simple single or multi task learning. How these different configuration files work is specified in more detail below. Data configs An full example data config can be seen here: conll05.json (config/data_configs/conll05.json). Each top level entry in the json defines a named feature or label that will be provided to the model. The following table describes the possible parameters for configuring how each input is interpreted. Field Type Description Default value conll_idx int or list Column in the data file corresponding to this input. N/A (required) vocab string Name of the vocabulary used to map this (string) input to int. None (output of converter is int) type string Type of conll_idx . Possible types are: range, other (int/list). range can be used to specify that a variable length range of columns should be read in at once and passed to the converter. Otherwise, the given single int or list of columns is read in and passed to the converter. other (int/list) feature boolean Whether this input should be used as a feature, i.e. provided to the model as input. false label boolean Whether this input should be used as a label, i.e. provided to the model as a label. false updatable boolean Whether this vocab should be updated after its initial creation (i.e. after creating a vocab based on the training data). false converter json A json object defining a function (name and, optionally, parameters) for converting the raw input. These functions are defined in src/data_converters.py (src/data_converters.py). idx_list_converter oov boolean Whether an OOV entry should be added to this input's vocabulary. false Converters The data config specifies a converter function and vocabulary for each desired column in the input data file. For each entry in the data config and each line in the input file, the column values specified by conll_idx are read in and provided to the given converter. Data generators, which take the data config and data file as input to perform this mapping, are defined in src/data_generator.py (src/data_generator.py). New converter functions can be defined in src/data_converters.py (src/data_converters.py). At a minimum, every converter function takes two parameters: split_line , the current line in the data file split by whitespace, and idx , the value of conll_idx . Converters may also take additional parameters, whose values are defined via the params field in the converter json object. The output of a converter is a list of strings. For example, the default converter, idx_list_converter , simply takes a single column index or list of indices and returns a list containing the corresponding column values in the input file: python def idx_list_converter(split_line, idx): if isinstance(idx, int): return split_line idx return split_line i for i in idx Vocabs When a vocab is specified for an entry in the data config, that vocab is used to map the string output of the converter to integer values suitable for features/labels in a TensorFlow model. 2 ( f2) This mapping occurs in the map_strings_to_ints function in src/dataset.py (src/dataset.py). TODO: vocab initialization TODO: pre trained word embeddings Model configs TODO Layer configs TODO Task configs TODO Attention configs TODO Footnotes 1 : Best is determined by best_eval_key , with default value for a given dataset in the top level data config, e.g. config/conll05.conf (config/conll05.conf). The value of best_eval_key must correspond to a named eval under the eval_fns entry in a task config ( task configs). ↩︎ ( f1) 2 : If no vocab is specified, then it's assumed that the output of the converter can be interpreted as an integer. ↩︎ ( f2)",Semantic Role Labeling,NLP Other 2497,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Positional Encoding to Control Output Sequence Length This repository contains source files we used in our paper >Positional Encoding to Control Output Sequence Length >Sho Takase, Naoaki Okazaki > Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies Requirements Python 3.6 or later for training Python 2.7 for calculating rouge PyTorch 0.4 Test data Test data used in our paper for each length Each file contains SOURCE PART tab HEADLINE Acknowledgements A large portion of this repo is borrowed from the following repos: and",Text Summarization,NLP Other 2511,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Spider: A Large Scale Human Labeled Dataset for Complex and Cross Domain Semantic Parsing and Text to SQL Task Spider is a large human labeled dataset for complex and cross domain semantic parsing and text to SQL task (natural language interfaces for relational databases). It is released along with our EMNLP 2018 paper: Spider: A Large Scale Human Labeled Dataset for Complex and Cross Domain Semantic Parsing and Text to SQL Task . This repo contains all code for evaluation, preprocessing, and all baselines used in our paper. Please refer to the task site for more general introduction and the leaderboard. Changelog 1/14/2019 The submission toturial is ready! Please follow it to get your results on the unreleased test data. 12/17/2018 We updated 7 sqlite database files. Please download the Spider data from the official website again. Please refer to the issue 14 for more details. 10/25/2018 : evaluation script is updated so that the table in count( ) cases will be evaluated as well. Please check out the issue 5 for more info. Results of all baselines and syntaxSQL on the papers are updated as well. 10/25/2018 : to get the latest SQL parsing results (a few small bugs fixed), please use preprocess/parse_raw_json.py to update. Please refer to the issue 3 for more details. Citation The dataset is annotated by 11 college students. When you use the Spider dataset, we would appreciate it if you cite the following: @inproceedings{Yu&al.18c, year 2018, title {Spider: A Large Scale Human Labeled Dataset for Complex and Cross Domain Semantic Parsing and Text to SQL Task}, booktitle {EMNLP}, author {Tao Yu and Rui Zhang and Kai Yang and Michihiro Yasunaga and Dongxu Wang and Zifan Li and James Ma and Irene Li and Qingning Yao and Shanelle Roman and Zilin Zhang and Dragomir Radev } } Installation evaluation.py and process_sql.py are written in Python 2.7. Enviroment setup for each baseline is in README under each baseline directory. Data Content and Format Question, SQL, and Parsed SQL Each file in train.json and dev.json contains the following fields: question : the natural language question question_toks : the natural language question tokens db_id : the database id to which this question is addressed. query : the SQL query corresponding to the question. query_toks : the SQL query tokens corresponding to the question. sql : parsed results of this SQL query using process_sql.py . Please refer to parsed_sql_examples.sql in the preprocess directory for the detailed documentation. { db_id : world_1 , query : SELECT avg(LifeExpectancy) FROM country WHERE Name NOT IN (SELECT T1.Name FROM country AS T1 JOIN countrylanguage AS T2 ON T1.Code T2.CountryCode WHERE T2.Language \ English\ AND T2.IsOfficial \ T\ ) , query_toks : SELECT , avg , ( , LifeExpectancy , ) , FROM , ... , question : What is average life expectancy in the countries where English is not the official language? , question_toks : What , is , average , life , ... , sql : { except : null, from : { conds : , table_units : ... }, groupBy : , having : , intersect : null, limit : null, orderBy : , select : ... , union : null, where : true, ... { except : null, from : { conds : false, 2, ... }, groupBy : , having : , intersect : null, limit : null, orderBy : , select : false, ... union : null, where : false, 2, 0, ... } }, Tables tables.json contains the following information for each database: db_id : database id table_names_original : original table names stored in the database. table_names : cleaned and normalized table names. We make sure the table names are meaningful. to be changed column_names_original : original column names stored in the database. Each column looks like: 0, id . 0 is the index of table names in table_names , which is city in this case. id is the column name. column_names : cleaned and normalized column names. We make sure the column names are meaningful. to be changed column_types : data type of each column foreign_keys : foreign keys in the database. 3, 8 means column indices in the column_names . These two columns are foreign keys of two different tables. primary_keys : primary keys in the database. Each number is the index of column_names . { column_names : 0, id , 0, name , 0, country code , 0, district , . . . , column_names_original : 0, ID , 0, Name , 0, CountryCode , 0, District , . . . , column_types : number , text , text , text , . . . , db_id : world_1 , foreign_keys : 3, 8 , 23, 8 , primary_keys : 1, 8, 23 , table_names : city , sqlite sequence , country , country language , table_names_original : city , sqlite_sequence , country , countrylanguage } Databases All table contents are contained in corresponding SQLite3 database files. Evaluation Our evaluation metrics include Component Matching, Exact Matching, and Execution Accuracy. For component and exact matching evaluation, instead of simply conducting string comparison between the predicted and gold SQL queries, we decompose each SQL into several clauses, and conduct set comparison in each SQL clause. For Execution Accuracy, our current models do not predict any value in SQL conditions so that we do not provide execution accuracies. However, we encourage you to provide it in the future submissions. For value prediction, you can assume that a list of gold values for each question is given. Your model has to fill them into the right slots in the SQL. Please refer to our paper () and this page for more details and examples. python evaluation.py gold gold file pred predicted file etype evaluation type db database dir table table file arguments: gold file gold.sql file where each line is a gold SQL \t db_id predicted file predicted sql file where each line is a predicted SQL evaluation type match for exact set matching score, exec for execution score, and all for both database dir directory which contains sub directories where each SQLite3 database is stored table file table.json file which includes foreign key info of each database FAQ",Semantic Parsing,NLP Other 2578,Natural Language Processing,Natural Language Processing,Natural Language Processing,"New Code in Tensorflow is available at Neural Relation Extraction (NRE) Neural relation extraction aims to extract relations from plain text with neural models, which has been the state of the art methods for relation extraction. In this project, we provide our implementations of CNN Zeng et al., 2014 and PCNN Zeng et al.,2015 and their extended version with sentence level attention scheme Lin et al., 2016 . Evaluation Results Precion/recall curves of CNN, CNN+ONE, CNN+AVE, CNN+ATT ! image Precion/recall curves of PCNN, PCNN+ONE, PCNN+AVE, PCNN+ATT ! image Data We provide NYT10 dataset we used for the task relation extraction in data/ directory. We preprocess the original data to make it satisfy the input format of our codes. The original data of NYT10 can be downloaded from: Relation Extraction: NYT10 is originally released by the paper Sebastian Riedel, Limin Yao, and Andrew McCallum. Modeling relations and their mentions without labeled text. Download ( Pre Trained Word Vectors are learned from New York Times Annotated Corpus (LDC Data LDC2008T19), which should be obtained from LDC . Our train set is generated by merging all training data of manual and held out datasets, deleted those data that have overlap with the test set, and used the remain one as our training data. To run our code, the dataset should be put in the folder data/ using the following format, containing six files + train.txt: training file, format (fb_mid_e1, fb_mid_e2, e1_name, e2_name, relation, sentence). + test.txt: test file, same format as train.txt. + entity2id.txt: all entities and corresponding ids, one per line. + relation2id.txt: all relations and corresponding ids, one per line. + vec.bin: the pre train word embedding file Codes The source codes of various methods are put in the folders CNN+ONE/, CNN+ATT/, PCNN+ONE/, PCNN+ATT/. Compile Just type make in the corresponding folders. Train For training, you need to type the following command in each model folder: ./train The training model file will be saved in folder out/ . Test For testing, you need to type the following command in each model folder: ./test The testing result which reports the precision/recall curve will be shown in pr.txt. Cite If you use the code, please cite the following paper: Lin et al., 2016 Yankai Lin, Shiqi Shen, Zhiyuan Liu, Huanbo Luan, and Maosong Sun. Neural Relation Extraction with Selective Attention over Instances. In Proceedings of ACL. pdf Reference Zeng et al., 2014 Daojian Zeng, Kang Liu, Siwei Lai, Guangyou Zhou, and Jun Zhao. Relation classification via convolutional deep neural network. In Proceedings of COLING. Zeng et al.,2015 Daojian Zeng,Kang Liu,Yubo Chen,and Jun Zhao. Distant supervision for relation extraction via piecewise convolutional neural networks. In Proceedings of EMNLP.",Relationship Extraction (Distant Supervised),NLP Other 2623,Natural Language Processing,Natural Language Processing,Natural Language Processing,"NPRF NPRF: A Neural Pseudo Relevance Feedback Framework for Ad hoc Information Retrieval pdf If you use the code, please cite the following paper: @inproceedings{li2018nprf, title {NPRF: A Neural Pseudo Relevance Feedback Framework for Ad hoc Information Retrieval}, author {Li, Canjia and Sun, Yingfei and He, Ben and Wang, Le and Hui, Kai and Yates, Andrew and Sun, Le and Xu, Jungang}, booktitle {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing}, year {2018} } Requirement Tensorflow Keras gensim numpy Getting started Training data preparation To capture the top k terms from top n documents, one needs to extract term document frequency from index. Afterwards, you are required to generate the similarity matrix upon the query and document given the pre trained word embedding (e.g. word2vec). Related functions can be found in preprocess/prepare_d2d.py. Training meta data preparation We introduce two classes for the ease of training. The class Relevance incorporates the relevance information from the baseline and qrels file. The class Result simplify the write/read operation on standard TREC result file. Other information like query idf is dumped as a pickle file. Model training Configure the MODEL_config.py file, then run python MODEL.py fold fold_number temp_file_path You need to run 5 fold cross valiation, which can be automatically done by running the runfold.sh script. The temp file is a temporary file to write the result of the validation set in TREC format. A training log sample on the first fold of TREC 1 3 dataset is provided for reference, see sample_log . Evaluation After training, the evaluation result of each fold is retained in the result path as you specify in the MODEL_config.py file. One can simply run cat res >> merge_file to merge results from all folds. Thereafter, run the trec_eval script to evaluate your model. Reference Some snippets of the code follow the implementation of K NRM , MatchZoo .",Ad-Hoc Information Retrieval,NLP Other 2662,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Unsupervised Hypernymy Detection Data and code for the experiments in: Hypernyms under Siege: Linguistically motivated Artillery for Hypernymy Detection Vered Shwartz, Enrico Santus and Dominik Schlechtweg. EACL 2017. link Usage note: The scripts in source/measures/ should be run directly from their directory. If you wish to do otherwise, you may have to change the path you add to the path attribute in sys.path.append('../') in the respective measure script.",Hypernym Discovery,NLP Other 2811,Natural Language Processing,Natural Language Processing,Natural Language Processing,"BioWordVec & BioSentVec: pre trained embeddings for biomedical words and sentences Text corpora We created biomedical word and sentence embeddings using PubMed and the clinical notes from MIMIC III Clinical Database . Both PubMed and MIMIC III texts were split and tokenized using NLTK . We also lowercased all the words. The statistics of the two corpora are shown below. Sources Documents Sentences Tokens : : : : PubMed 28,714,373 181,634,210 4,354,171,148 MIMIC III Clinical notes 2,083,180 41,674,775 539,006,967 BioWordVec: biomedical word embeddings with fastText We applied fastText to compute 200 dimensional word embeddings. We set the window size to be 20, learning rate 0.05, sampling threshold 1e 4, and negative examples 10. Both the word vectors and the model with hyperparameters are available for download below. The model file can be used to compute word vectors that are not in the dictionary (i.e. out of vocabulary terms). BioWordVec vector 13GB (200dim, trained on PubMed+MIMIC III, word2vec bin format) BioWordVec model 26GB (200dim, trained on PubMed+MIMIC III) We evaluated BioWordVec for medical word pair similarity. We used the MayoSRS (101 medical term pairs; download here ) and UMNSRS_similarity (566 UMLS concept pairs; download here ) datasets. Model MayoSRS UMNSRS_similarity : : : word2vec 0.513 0.626 BioWordVec model 0.552 0.660 BioSentVec 1 : biomedical sentence embeddings with sent2vec We applied sent2vec to compute the 700 dimensional sentence embeddings. We used the bigram model and set window size to be 20 and negative examples 10. BioSentVec model 21GB (700dim, trained on PubMed+MIMIC III) We evaluated BioSentVec for clinical sentence pair similarity tasks. We used the BIOSSES (100 sentence pairs; download here ) and the MedSTS (1068 sentence pairs; download here ) datasets. BIOSSES MEDSTS : : Unsupervised methods doc2vec 0.787 Levenshtein Distance 0.680 Averaged word embeddings 0.694 0.747 Universal Sentence Encoder 0.345 0.714 BioSentVec (PubMed) 0.817 0.750 BioSentVec (MIMIC III) 0.350 0.759 BioSentVec (PubMed + MIMIC III) 0.795 0.767 Supervised methods Linear Regression 0.836 Random Forest 0.818 Deep learning + Averaged word embeddings 0.703 0.784 Deep learning + Universal Sentence Encoder 0.401 0.774 Deep learning + BioSentVec (PubMed) 0.824 0.819 Deep learning + BioSentVec (MIMIC III) 0.353 0.805 Deep learning + BioSentVec (PubMed + MIMIC III) 0.848 0.836 References When using some of our pre trained models for your application, please cite the following paper: 1. Chen Q, Peng Y, Lu Z. BioSentVec: creating sentence embeddings for biomedical texts . 2018. arXiv:1810.09302 . Acknowledgments This work was supported by the Intramural Research Programs of the National Institutes of Health, National Library of Medicine. We are grateful to the authors of fastText, sent2vec, MayoSRS, UMNSRS, BIOSSES, and MedSTS for making their software and data publicly available.",Semantic Textual Similarity,NLP Other 2840,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Position aware Attention RNN Model for Relation Extraction This repo contains the PyTorch code for paper Position aware Attention and Supervised Data Improve Slot Filling . The TACRED dataset : Details on the TAC Relation Extraction Dataset can be found on this dataset website . Requirements Python 3 (tested on 3.6.2) PyTorch (tested on 1.0.0) unzip, wget (for downloading only) Preparation First, download and unzip GloVe vectors from the Stanford website, with: chmod +x download.sh; ./download.sh Then prepare vocabulary and initial word vectors with: python prepare_vocab.py dataset/tacred dataset/vocab glove_dir dataset/glove This will write vocabulary and word vectors as a numpy matrix into the dir dataset/vocab . Training Train a position aware attention RNN model with: python train.py data_dir dataset/tacred vocab_dir dataset/vocab id 00 info Position aware attention model Use topn N to finetune the top N word vectors only. The script will do the preprocessing automatically (word dropout, entity masking, etc.). Train an LSTM model with: python train.py data_dir dataset/tacred vocab_dir dataset/vocab no attn id 01 info LSTM model Model checkpoints and logs will be saved to ./saved_models/00 . Evaluation Run evaluation on the test set with: python eval.py saved_models/00 dataset test This will use the best_model.pt by default. Use model checkpoint_epoch_10.pt to specify a model checkpoint file. Add out saved_models/out/test1.pkl to write model probability output to files (for ensemble, etc.). Ensemble Please see the example script ensemble.sh . License All work contained in this package is licensed under the Apache License, Version 2.0. See the included LICENSE file.",Relation Extraction,NLP Other 2365,Computer Vision,Computer Vision,Computer Vision,"Convolutional Pose Machines Shih En Wei, Varun Ramakrishna, Takeo Kanade, Yaser Sheikh, Convolutional Pose Machines , CVPR 2016. This project is licensed under the terms of the GPL v2 license. By using the software, you are agreeing to the terms of the license agreement . Contact: Shih En Wei (weisteady@gmail.com) ! Teaser? Recent Updates Synced our fork of caffe with most recent version (Dec. 2016) so that Pascal GPUs can work (tested with CUDA 8.0 and CUDNN 5). Including a VGG pretrained model in matlab (and also python) code. This model was used in CVPR'16 demo. It scores 90.1% on MPI test set, and can be trained in much shorter time than previous models. We are working on releasing code of our new work in multi person pose estimation demonstrated in ECCV'16 (best demo award!). Before Everything Watch some videos . Install Caffe . If you are interested in training this model on your own machines, or realtime systems, please use our version (a submodule in this repo) with customized layers. Make sure you have compiled python and matlab interface. This repository at least runs on Ubuntu 14.04, OpenCV 2.4.10, CUDA 8.0, and CUDNN 5. The following assumes you use cmake to compile caffe in /caffe/build . // : ( Copy caffePath.cfg.example to caffePath.cfg and set your own path in it.) Include /caffe/build/install/lib in environment variable $LD_LIBRARY_PATH . Include /caffe/build/install/python in environment variable $PYTHONPATH . Testing First, run testing/get_model.sh to retreive trained models from our web server. Python This demo file shows how to detect multiple people's poses as we demonstrated in CVPR'16. For real time performance, please read it for further explanation. Matlab 1. CPM_demo.m : Put the testing image into sample_image then run it! You can select models (we provided 4) or other parameters in config.m . If you just want to try our best scoring model, leave them default. 2. CPM_benchmark.m : Run the model on test benchmark and see the scores. Prediction files will be saved in testing/predicts . Training Run get_data.sh to get datasets including FLIC Dataset , LEEDS Sport Dataset and its extended training set , and MPII Dataset . Run genJSON( ) to generate a json file in training/json/ folder (you'll have to create it). Dataset name can be MPI , LEEDS , or FLIC . The json files contain raw informations needed for training from each individual dataset. Run python genLMDB.py to generate LMDBs for CPM data layer in our caffe . Change the main function to select dataset, and note that you can generate a LMDB with multiple datasets. Run python genProto.py to get prototxt for caffe. Read further explanation for layer parameters. Train with generated prototxts and collect caffemodels. Related Repository Convolutional Pose Machines in Tensorflow Citation Please cite CPM in your publications if it helps your research: @inproceedings{wei2016cpm, author {Shih En Wei and Varun Ramakrishna and Takeo Kanade and Yaser Sheikh}, booktitle {CVPR}, title {Convolutional pose machines}, year {2016} }",Pose Estimation,Pose Estimation 2384,Computer Vision,Computer Vision,Computer Vision,flowtrack.pytorch Pytorch implementation of FlowTrack . Simple Baselines for Human Pose Estimation and Tracking TO DO: x Human detection x Single person pose estimation x Optical flow estimation x Box propagation Pose tracking Requirements pytorch > 0.4.0 torchvision pycocotools tensorboardX Installation shell cd lib ./make.sh Disable cudnn for batch_norm: PYTORCH /path/to/pytorch for pytorch v0.4.0 sed i 1194s/torch\.backends\.cudnn\.enabled/False/g ${PYTORCH}/torch/nn/functional.py for pytorch v0.4.1 sed i 1254s/torch\.backends\.cudnn\.enabled/False/g ${PYTORCH}/torch/nn/functional.py Training Pose Estimation Download data folder as $ROOT/data . shell python ./tools/pose/main.py The official code is released on Microsoft/human pose estimation.pytorch . Demo Pose Estimation TODO Detection Download pretrained detection model into models/detection/ . Refer to pytorch faster rcnn for more information. shell python ./tools/detection/demo.py Optical Flow Estimation Download pretrained flownet into models/flownet/ . Refer to flownet2 pytorch for more information. shell python ./tools/flownet/demo.py model Update 2018.12.05: Add Pose Estimation Models Deconv DenseNet Stacked Hourglass Network FPN,Pose Estimation,Pose Estimation 2404,Computer Vision,Computer Vision,Computer Vision,"YogAI YogAI is a responsive virtual yoga instructor using pose estimation to guide and correct a yogi that runs on a raspberry pi smart mirror. Dependencies You'll need have the following installed: python3 tensorflow 1.11 pip wheel for 3.5 w/tflite working thanks to PINTO0309 opencv3 sci kit learn Hardware raspberry pi 3+ webcam speaker with aux monitor one way mirror + framing materials Install $ git clone $ cd YogAI $ ./install.sh Model We're using a tflite Convolutional Pose Machine (CPM) model we found here . The table below offers more information about the model we are running for labeling and inference. Model Input shape Output shape Model size Inference time (rpi3) CPM 1, 192, 192, 3 1, 96, 96, 14 2.6 MB 2.56 FPS Using this model and the label.py script on yoga sample poses will output 28 dim arrays of body part coordinates into a csv file. Training The Hackster post will show you how to obtain training samples for your desired poses. Use the label.py script to transform the images into 28 dim arrays with labels. The knn.ipynb is a jupyter notebook to help you train a KNN to classify yoga poses. You want to make sure your samples follow this directory structure: ├── poses │ ├── plank │ │ ├── sample1.jpg │ │ ├── sample2.jpg │ │ ├── ... │ ├── cow │ │ ├── sample1.jpg │ │ ├── sample2.jpg │ │ ├── ... . . . . Run After you've trained the classifier on your samples, you should have a pickled model in the ./models directory. Simply run python3 app.py to get your YogAI instructor running! References 1 Convolutional Pose Machine : 2 Tensorflow wheels w/ tflite : 3 Pose estimation for mobile : 4 Pose estimation tensorflow implementation :",Pose Estimation,Pose Estimation 2415,Computer Vision,Computer Vision,Computer Vision,"Parsing R CNN Parsing R CNN for Instance Level Human Analysis Sorry, this repo is still in progress.",Pose Estimation,Pose Estimation 2447,Computer Vision,Computer Vision,Computer Vision,"c2f 3dhm human caffe This is the caffe reimplementation of Coarse to Fine Volumetric Prediction for Single Image 3D Human Pose You can find screenshots of eval on test set in figs/ ( d2 16, 32, 64 ). (Random or full test) External links (closely related projects) I1 I1 with group normalization, batch size 1 News under the headline Reaches 67.1 mm MPJPE on entire test set! Overture I write C++ faster than Python. I write faster C++ than Python. I know C++ / Caffe is not easy to understand. People don't like Netscape Browser or iPhone 4s any more. hmmmmmm. People tend to use Hourglass for human pose while powerful ResNet is enough. hmmmmmmmmmmmmmm. Your briefing 2 stage Hourglass ( d1 1, d2 16/32/64 ) w/ batch size 3 🌝 🌚 😈 Exquisite ResNet w/ integral coming up soon. 💪 Caffe Hourglass is imported from GNet pose . Many thanks! About comprehensive readme: code.pdf provides details about custom layers. data.pdf provides details about data format etc. prototxt.pdf provides training/testing pipeline about the configuration prototxt file. Environment Ubuntu / Windows For Ubuntu, I used two 12 GB TITAN Xp . For Windows, \emph{ TO DO } SSD You'll need SSD for online data loading. General Structure ${POSE_ROOT} + caffe_code + data + figs + models + training + testing + README.md Installation 1. install Caffe from GNet Caffe repository . 2. I have developed a myriad of layers. Code structure is shell ${POSE_ROOT} caffe_code include caffe deep_human_model_layers.hpp This includes operations about 2d/3d heatmap /integral / augmentation / local global transformation etc. h36m.h This includes definitions of joint / part / bone (h36m 32 joints / usable 16 joints / c2f 17 joints etc.) operations.hpp This includes operations w.r.t scalar / vector / fetch file / output data. src caffe layers DeepHumanModel deep_human_model_argmax_2d_hm_layer.cpp This takes argmax operation on 2d heatmap deep_human_model_convert_2d_layer.cpp h36m provides full 32 joints, of which we only care 16 joints. Conversion from 16x2 32x2 deep_human_model_convert_3d_layer.cpp Conversion from 16x3 32x3 deep_human_model_convert_depth_layer.cpp Conversion from root relative camera coordinate 1, 1 normalized depth deep_human_model_gen_3d_heatmap_in_more_detail_v3_layer.cpp Generate groud truth for 3d heatmap. Closely follows c2f Torch code. deep_human_model_h36m_cha_gen_joint_fr_xyz_heatmap_layer.cpp Argmax operation on 3d heatmap deep_human_model_h36m_gen_aug_3d_layer.cpp Generate augmented 3d ground truth according to augmented 2d gt and 3d gt deep_human_model_h36m_gen_pred_mono_3d_layer.cpp 2.5D > 3D camera frame coordinate deep_human_model_integral_vector_layer.cpp \sum_{i 0}^{D 1} probability position deep_human_model_integral_x_layer.cpp Integral along X axis deep_human_model_integral_y_layer.cpp Integral along Y axis deep_human_model_integral_z_layer.cpp Integral along Z axis deep_human_model_norm_3d_hm_layer.cpp Normalize 3D heatmap responses to make them sum up to 1.0 deep_human_model_normalization_response_v0_layer.cpp 2D heatmap normalization deep_human_model_numerical_coordinate_regression_layer.cpp Integral over normalized 2D heatmap > (x, y) deep_human_model_output_heatmap_sep_channel_layer.cpp Output heatmap of different joints to different folders deep_human_model_output_joint_on_skeleton_map_h36m_layer.cpp Plot predicted joints on raw image deep_human_model_softmax_3d_hm_layer.cpp Softmax normalization on 3d heatmap deep_human_model_softmax_hm_layer.cpp Softmax normalization on 2d heatmap Operations adaptive_weight_euc_loss_layer.cpp Adaptive weight controlling on different euclidean regression loss add_vector_by_constant_layer.cpp Add each element of vector by a scalar add_vector_by_single_vector_layer.cpp Add two vectors element wisely add_vector_by_constant_layer.cpp Add each element of vector by a scalar cross_validation_random_choose_index_layer.cpp Select an index from different training split sources gen_heatmap_all_channels_layer.cpp Generate 2d heatmap ground truth. Closely follows Yichen Wei simple baseline & CPM caffe CPMDataLayer gen_rand_index_layer.cpp Randomly generate a index for training/testing gen_sequential_index_layer.cpp Sequentially generate index for testing gen_unified_data_and_label_layer.cpp Generate augmentend training data and label (2D). Adapated from CPMDataLayer joint_3d_square_root_loss_layer.cpp Display average joint error MPJPE (mm) js_regularization_loss_layer.cpp Jenson Shannon regularization loss mul_rgb_layer.cpp Scale rgb image by a scalar output_blob_layer.cpp Output blob to files for debugging output_heatmap_one_channel_layer.cpp Output heatmap of one specific joint to file read_blob_from_file_indexing_layer.cpp Read data from disk w/ file index (id) read_blob_from_file_layer.cpp Read blob from a specific file read_image_from_file_name_layer.cpp Read image from file path read_image_from_image_path_file_layer.cpp Read image from a single file describing path for all images in the set read_image_layer.cpp See code read_index_layer.cpp Read image index from file scale_vector_layer.cpp Multiply vector by a constant scalar 3. Copy ${POSE_ROOT}/caffe_code/include/caffe/ to ${CAFFE_ROOT}/include/caffe/ 4. Copy ${POSE_ROOT}/caffe_code/src/caffe/layers/ to ${CAFFE_ROOT}/src/caffe/layers/ after running the following cd ${CAFFE_ROOT}src/caffe/layers mkdir DeepHumanModel mkdir Operations 5. Configure caffe.proto Add contents in LayerParameter of ${POSE_ROOT}/caffe_code/src/caffe/proto/custom_layers_mine.proto to ${CAFFE_ROOT}/src/caffe/proto/caffe.proto Replace TransformationParameter in ${CAFFE_ROOT}/src/caffe/proto/caffe.proto with the one in mine ${POSE_ROOT}/caffe_code/src/caffe/proto/custom_layers_mine.proto Add other layer parameter fields in ${POSE_ROOT}/caffe_code/src/caffe/proto/custom_layers_mine.proto to ${CAFFE_ROOT}/src/caffe/proto/caffe.proto Make sure ID of LayerParameter do not conflict with each other. 6. Compile sudo make all j128 Note 1: For ubuntu, you will have to modify header section of gen_unified_data_and_label_layer.cpp like this ifdef USE_OPENCV include // include // include include endif // USE_OPENCV Note 2: For windows, you will have to modify header section of gen_unified_data_and_label_layer.cpp like this ifdef USE_OPENCV include include include include endif // USE_OPENCV Still can't compile? Contact me. Data One thing I have realized over the years is that HDF5 , LMDB , JSON , tar.gz , pth.tar or whatever is totally redundant, and suffers from a major downside: it needs to be loaded into memory. For python based framework e.g. Keras, it is time consuming (sometimes 30 seconds ++) to load offline data. Even for caffe, it takes several seconds. I have thus far switched to a simple and naive data format i.e. txt . Each txt represents an annotation for a sample e.g. ground truth 3d, bbx. SSD is required. See data.pdf for a thorough discussion and joint definition. (full 32 joints vs usable 16 joints) Folder Name Download Link Description A Toy Example : : : : : : : : bbx_all_new bbx (bbx_x1, bbx_y1, bbx_x2, bbx_y2) bbx center_x center_x center_x (constant: 112.0) center_x center_y center_y center_y (constant: 112.0) center_y scale scale person image scale (constant: 224.0) scale gt_joint_2d_raw_new gt_2d 2d gt on 224x224 cropped image (32x2) gt_2d image_path_file image path for each sample img_path_file gt_joint_3d_mono_raw gt_3d monocular 3d gt in camera coordiante (32x3) gt_3d camera_all camera intrinsic & extrinsic camera parameters camera index_range ind_range index range per (subject, action) ind_range info_all basic_info video/action name/subaction/camera id/frame id basic_info images img all the cropped images (224x224) img Download data, place to ${POSE_ROOT} data full bbx_all_new center_x center_y scale gt_joint_2d_raw_new gt_joint_3d_mono_raw image_path_file camera_all index_range info_all images Train Index: 0 1559571 Test Index: 1559572 2108570 Trained models Method d2 MPJPE(mm) Caffe Model Solver State : : : : : : : : : : Mine 64 67.1 Google Drive (net_iter_720929.caffemodel) Google Drive (net_iter_720929.solverstate) Mine 32 68.6 Google Drive (net_iter_640000.caffemodel) Google Drive (net_iter_640000.solverstate) Mine 16 73.6 Google Drive (net_iter_560000.caffemodel) Google Drive (net_iter_560000.solverstate) C2F 64 69.8 None None Integral 64 68.0 None None C2F and Integral are Included for reference. Download models, place to ${POSE_ROOT} models net_iter_560000.caffemodel net_iter_560000.solverstate net_iter_640000.caffemodel net_iter_640000.solverstate net_iter_720929.caffemodel net_iter_720929.solverstate Kick off the testing As you know, evaluation on the entire dataset takes time. For testing on a random subset, I implemented a random index generation layer. See screenshot figs/test_d64_rand.png , figs/test_d32_rand.png , figs/test_d16_rand.png for details. I should claim that this is just for fun, please do not not take it seriously. You might get, say, 68.2 mm and 68.4 mm in two different runs. d2 64 cd testing $CAFFE_ROOT/build/tools/caffe test model test_d64_rand.prototxt weights models/net_iter_720929.caffemodel iterations 500 This will give you figs/rand_test_d64.png (unstable number around 68 mm due to small number of samples) d2 32 $CAFFE_ROOT/build/tools/caffe test model test_d32_rand.prototxt weights models/net_iter_640000.caffemodel iterations 500 This will give you figs/rand_test_d32.png (unstable number around 71 mm ) d2 16 $CAFFE_ROOT/build/tools/caffe test model test_d16_rand.prototxt weights models/net_iter_560000.caffemodel iterations 500 This will give you figs/rand_test_d16.png (unstable number around 74 mm ) Full testing For full evaluation on H36M test set d2 64 cd testing $CAFFE_ROOT/build/tools/caffe test model test_d16_statsfalse.prototxt weights models/net_iter_720929.caffemodel iterations 183000 This will give you 67.1 mm ( figs/test_d64_full.png ) d2 32 $CAFFE_ROOT/build/tools/caffe test model test_d32_statsfalse.prototxt weights models/net_iter_640000.caffemodel iterations 183000 This will give you 68.6 mm ( figs/test_d32_full.png ) d2 16 $CAFFE_ROOT/build/tools/caffe test model test_d16_statsfalse.prototxt weights models/net_iter_560000.caffemodel iterations 183000 This will give you 73.6 mm ( figs/test_d16_full.png ) Training Training is a bit tricky. For code structure about prototxt, see prototxt.pdf . Here's the thing: Note I started with MPII pretrained caffemodel improved hourglass_iter_640000.caffemodel from GNet repo. I started with d2 2 to warm up. Simply run cd training $CAFFE_ROOT/build/tools/caffe train solver solver_d2.prototxt weights improved hourglass_iter_640000.caffemodel I trained from MPII 2D HM pretrained model, with 2.5e 5 as base_lr and RMSProp . 2 GPUs were used unless otherwise specified. Weight initialization is gaussian w/ 0.01 std . Loss ratio of 3d HM to 2d HM is 0.1:1 . d2 4 Finetune weights from d2 2 after convergence. $CAFFE_ROOT/build/tools/caffe train solver solver_d4.prototxt snapshot net_iter_XXX.solverstate You will get around 137 mm on train and 150 mm on test. For eval on training set, simply uncomment index_lower_bound: 0 index_upper_bound: 1559571 of GenRandIndex layer. Loss ratio is 0.3:1 . d2 8 Finetune weights from d2 4 after convergence. $CAFFE_ROOT/build/tools/caffe train solver solver_d8.prototxt snapshot net_iter_XXX.solverstate You will get around 72 mm on train and 86 mm on test. Loss ratio is 0.1:1 . d2 16 Finetune weights from d2 8 after convergence $CAFFE_ROOT/build/tools/caffe train solver solver_d16.prototxt snapshot net_iter_XXX.solverstate You will get around 47 mm on train and 72 mm on test. Loss ratio is 0.03:1 . d2 32 Finetune weights from d2 16 after net_iter_560000.solverstate $CAFFE_ROOT/build/tools/caffe train solver solver_d32.prototxt snapshot net_iter_560000.solverstate You will get around 39 mm on train and 71 mm on test. Loss ratio is 0.03:1 . I changed the weight initialization of 3D heatmap to normal distribution with 0.001 std in place of previous 0.01 as I found the MPJPE did not slump. d2 64 Finetune weights from d2 32 after net_iter_640000.solverstate $CAFFE_ROOT/build/tools/caffe train solver solver_d64.prototxt snapshot net_iter_640000.solverstate You will get around 37 mm on train and 68 mm on test. Loss ratio is 0.03:1 . I again changed weight initialization of 3D heatmap from 0.001 gaussian $\rightarrow$ 0.0003 . This sounds pretty sketchy, right? Another way to train this is simply train d1 1, d2 64 from scratch. Details: \emph{missing, TO DO} Notes: I set use_global_stats to false during inference due to small batch size, otherwise you would get a totally different MPJPE number. I cannot recall the paper that mentioned it. Let me find the paper. The major differences between prototxts lies in: a) depth dimension param (Use sublime or notepad++ to search keywords depth_dims ) b) 3d heatmap slicing layer. (Simply search cube_ ) c) 3d heatmap reshaping layer ( heatmap2_flat_scale ) d) loss ratio of 3d heatmap and 2d heatmap. Basic rule is magnitude of these two losses should be the same. e) different weight initialization of last conv layer for 3d heatmap. I only used L2 loss during training. Nevertheless I have Jenson Shannon regularization loss , smooth L1 loss , adaptive loss , and integral loss in prototxt, as can be seen in figs/ .png. Adaptive loss tries to automatically balance weight magnitude of different euclidean regression loss. See code.pdf for details about integral loss. MPJPE error of argmax operation is error(mm)_3d_s2_max . Windows This line is just a test. Don't excoriate windows. Mac, ubuntu, windows are all excellent operating systems. Start cmd.exe run caffe train .... Should you have issues installing windows caffe, contact me. FAQ Feel free to contact me at strawberryfgalois@gmail.com if you have any problem or suggestion.",Pose Estimation,Pose Estimation 2459,Computer Vision,Computer Vision,Computer Vision,"Learning Feature Pyramids for Human Pose Estimation Training and testing code for the paper > Learning Feature Pyramids for Human Pose Estimation > Wei Yang, Shuang Li, Wanli Ouyang, Hongsheng Li, Xiaogang Wang > ICCV, 2017 This code is based on stacked hourglass networks and fb.resnet.torch . Thanks to the authors. Install 1. Install Torch . 2. Install dependencies. luarocks install hdf5 luarocks install matio luarocks install optnet 3. (Optional) Install nccl for better performance when training with multi GPUs. git clone cd nccl make make install luarocks install nccl set LD_LIBRARY_PATH in file /.bashrc if libnccl.so is not found. 4. Prepare dataset. Create a symbolic link to the images directory of the MPII dataset: ln s PATH_TO_MPII_IMAGES_DIR data/mpii/images Create a symbolic link to the images directory of the LSP dataset (images are stored in PATH_TO_LSP_DIR/images ): ln s PATH_TO_LSP_DIR data/lsp/lsp_dataset Create a symbolic link to the images directory of the LSP extension dataset (images are stored in PATH_TO_LSPEXT_DIR/images ): ln s PATH_TO_LSPEXT_DIR data/lsp/lspet_dataset Training and Testing Quick Start Testing from our pretrained model Download our pretrained model to ./pretrained folder from Google Drive . Test on the MPII validation set by running the following command qlua main.lua batchSize 1 nGPU 1 nStack 8 minusMean true loadModel pretrained/model_250.t7 testOnly true debug true ! Example (data/example.png) For multi scale testing, run qlua evalPyra.lua batchSize 1 nGPU 1 nStack 8 minusMean true loadModel pretrained/model_250.t7 testOnly true debug true Note : If you DO NOT want to visualize the training results. Set debug false and use th instead of qlua . you may set the number of scales in evalPyra.lua (Line 22 ). Use fewer number of scales or multiple GPUs if out of memory occurs. use loadModel MODEL_PATH to load a specific model for testing or training Train a two stack hourglass model Train an example two stack hourglass model on the MPII dataset with the proposed Pyramids Residual Modules (PRMs) sh ./experiments/mpii/hg prm stack2.sh Customize your own training and testing procedure A sample script for training on the MPII dataset with 8 stack hourglass model. bash !/usr/bin/env sh expID mpii/mpii_hg8 snapshots and log file will save in checkpoints/$expID dataset mpii mpii mpii lsp lsp gpuID 0,1 GPUs visible to program nGPU 2 how many GPUs will be used to train the model batchSize 16 LR 6.7e 4 netType hg prm network architecture nStack 2 nResidual 1 nThreads 4 how many threads will be used to load data minusMean true nClasses 16 nEpochs 200 snapshot 10 save models for every $snapshot OMP_NUM_THREADS 1 CUDA_VISIBLE_DEVICES $gpuID th main.lua \ dataset $dataset \ expID $expID \ batchSize $batchSize \ nGPU $nGPU \ LR $LR \ momentum 0.0 \ weightDecay 0.0 \ netType $netType \ nStack $nStack \ nResidual $nResidual \ nThreads $nThreads \ minusMean $minusMean \ nClasses $nClasses \ nEpochs $nEpochs \ snapshot $snapshot \ resume checkpoints/$expID \ uncomment this line to resume training testOnly true \ uncomment this line to test on validation data testRelease true \ uncomment this line to test on test data (MPII dataset) Evaluation You may evaluate the PCKh score of your model on the MPII validation set. To get start, download our prediction pred_multiscale_250.h5 to ./pretrained from Google Drive , and run the MATLAB script evaluation/eval_PCKh.m . You'll get the following results Head , Shoulder , Elbow , Wrist , Hip , Knee , Ankle , Mean , name , 97.41 , 96.16 , 91.10 , 86.88 , 90.05 , 86.00 , 83.89 , 90.27 Citation If you find this code useful in your research, please consider citing: @inproceedings{yang2017pyramid, Title {Learning Feature Pyramids for Human Pose Estimation}, Author {Yang, Wei and Li, Shuang and Ouyang, Wanli and Li, Hongsheng and Wang, Xiaogang}, Booktitle {arXiv preprint arXiv:1708.01101}, Year {2017} }",Pose Estimation,Pose Estimation 2470,Computer Vision,Computer Vision,Computer Vision,"DensePose: Dense Human Pose Estimation In The Wild _Rıza Alp Güler, Natalia Neverova, Iasonas Kokkinos_ densepose.org arXiv BibTeX ( CitingDensePose) Dense human pose estimation aims at mapping all human pixels of an RGB image to the 3D surface of the human body. DensePose RCNN is implemented in the Detectron framework and is powered by Caffe2 . In this repository, we provide the code to train and evaluate DensePose RCNN. We also provide notebooks to visualize the collected DensePose COCO dataset and show the correspondences to the SMPL model. Visualization of DensePose COCO annotations: See notebooks/DensePose COCO Visualize.ipynb (notebooks/DensePose COCO Visualize.ipynb) to visualize the DensePose COCO annotations on the images: DensePose COCO in 3D: See notebooks/DensePose COCO on SMPL.ipynb (notebooks/DensePose COCO on SMPL.ipynb) to localize the DensePose COCO annotations on the 3D template ( SMPL ) model: Visualize DensePose RCNN Results: See notebooks/DensePose RCNN Visualize Results.ipynb (notebooks/DensePose RCNN Visualize Results.ipynb) to visualize the inferred DensePose RCNN Results. DensePose RCNN Texture Transfer: See notebooks/DensePose RCNN Texture Transfer.ipynb (notebooks/DensePose RCNN Texture Transfer.ipynb) to localize the DensePose COCO annotations on the 3D template ( SMPL ) model: If you use Densepose, please use the following BibTeX entry. @InProceedings{Guler2018DensePose, title {DensePose: Dense Human Pose Estimation In The Wild}, author {R\{i}za Alp G\ uler, Natalia Neverova, Iasonas Kokkinos}, journal {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, year {2018} }",Pose Estimation,Pose Estimation 2606,Computer Vision,Computer Vision,Computer Vision,"Human Pose Estimation with TensorFlow ! (images/teaser.png) Here you can find the implementation of the Human Body Pose Estimation algorithm, presented in the ArtTrack and DeeperCut papers: Eldar Insafutdinov, Leonid Pishchulin, Bjoern Andres, Mykhaylo Andriluka and Bernt Schiele DeeperCut: A Deeper, Stronger, and Faster Multi Person Pose Estimation Model. In _European Conference on Computer Vision (ECCV)_, 2016 Eldar Insafutdinov, Mykhaylo Andriluka, Leonid Pishchulin, Siyu Tang, Evgeny Levinkov, Bjoern Andres and Bernt Schiele ArtTrack: Articulated Multi person Tracking in the Wild. In _Conference on Computer Vision and Pattern Recognition (CVPR)_, 2017 For more information visit Python 3 is required to run this code. First of all, you should install TensorFlow as described in the official documentation . We recommended to use virtualenv . You will also need to install the following Python packages: $ pip3 install scipy scikit image matplotlib pyyaml easydict cython munkres When running training or prediction scripts, please make sure to set the environment variable TF_CUDNN_USE_AUTOTUNE to 0 (see this ticket for explanation). If your machine has multiple GPUs, you can select which GPU you want to run on by setting the environment variable, eg. CUDA_VISIBLE_DEVICES 0 . Demo code Single Person (if there is only one person in the image) Download pre trained model files $ cd models/mpii $ ./download_models.sh $ cd Run demo of single person pose estimation $ TF_CUDNN_USE_AUTOTUNE 0 python3 demo/singleperson.py Multiple People Compile dependencies $ ./compile.sh Download pre trained model files $ cd models/coco $ ./download_models.sh $ cd Run demo of multi person pose estimation $ TF_CUDNN_USE_AUTOTUNE 0 python3 demo/demo_multiperson.py Training models Please follow these instructions (models/README.md) Citation Please cite ArtTrack and DeeperCut in your publications if it helps your research: @inproceedings{insafutdinov2017cvpr, title {ArtTrack: Articulated Multi person Tracking in the Wild}, booktitle {CVPR'17}, url { author {Eldar Insafutdinov and Mykhaylo Andriluka and Leonid Pishchulin and Siyu Tang and Evgeny Levinkov and Bjoern Andres and Bernt Schiele} } @article{insafutdinov2016eccv, title {DeeperCut: A Deeper, Stronger, and Faster Multi Person Pose Estimation Model}, booktitle {ECCV'16}, url { author {Eldar Insafutdinov and Leonid Pishchulin and Bjoern Andres and Mykhaylo Andriluka and Bernt Schiele} }",Pose Estimation,Pose Estimation 2633,Computer Vision,Computer Vision,Computer Vision,"Stacked Hourglass Networks for Human Pose Estimation (Demo Code) This repository includes Torch code for evaluation and visualization of the network presented in: Alejandro Newell, Kaiyu Yang, and Jia Deng, Stacked Hourglass Networks for Human Pose Estimation , arXiv:1603.06937 , 2016. A pretrained model is available on the project site . Include the model in the main directory of this repository to run the demo code. Check out the training and experimentation code now available at: In addition, if you download the full MPII Human Pose dataset and replace this repository's images directory you can generate full predictions on the validation and test sets. To run this code, the following must be installed: Torch7 hdf5 (and the torch hdf5 package) cudnn qlua (for displaying results) For displaying the demo images: qlua main.lua demo For generating predictions: th main.lua predict valid or test For evaluation on a set of validation predictions: th main.lua eval Testing your own images To use the network off the shelf, it is critical that the target person is centered in the input image. There is some robustness to scale, but for best performance the person should be sized such that their full height is roughly three quarters of the input height. Play around with different scale settings to see the impact it has on the network output. We offer a convenient function for generating an input image: inputImg crop(img, center, scale, rot, res) res should be set to 256 for our network. rot is offered if you wish to rotate the image (in degrees). You can run the input image through the network, and get the (x,y) coordinates with: outputHm m:forward(inputImg:view(1,3,256,256):cuda()) predsHm,predsImg getPreds(outputHm, center, scale) The two outputs of getPreds are coordinates with respect to either the heatmap or the original image (using center and scale to apply the appropriate transformation back to the image space). The MPII images come with center and scale annotations already. An important detail with regards to the annotations: we have modified their format slightly for ease of use with our code. In addition, we adjusted the original center and scale annotations uniformly across all images so as to reduce the chances of our function cropping out feet from the bottom of the image. This mostly involved moving the center down a fraction.",Pose Estimation,Pose Estimation 2634,Computer Vision,Computer Vision,Computer Vision,"Stacked Hourglass Networks for Human Pose Estimation (Training Code) This is the training pipeline used for: Alejandro Newell, Kaiyu Yang, and Jia Deng, Stacked Hourglass Networks for Human Pose Estimation , arXiv:1603.06937 , 2016. A pretrained model is available on the project site . You can use the option loadModel path/to/model to try fine tuning. To run this code, make sure the following are installed: Torch7 hdf5 cudnn Getting Started Download the full MPII Human Pose dataset , and place the images directory in data/mpii . From there, it is as simple as running th main.lua expID test run (the experiment ID is arbitrary). To run on FLIC , again place the images in a directory data/flic/images then call th main.lua dataset flic expID test run . Most of the command line options are pretty self explanatory, and can be found in src/opts.lua . The expID option will be used to save important information in a directory like pose hg train/exp/mpii/test run . This directory will include snapshots of the trained model, training/validations logs with loss and accuracy information, and details of the options set for that particular experiment. Running experiments There are a couple features to make experiments a bit easier: Experiment can be continued with th main.lua expID example exp continue it will pick up where the experiment left off with all of the same options set. But let's say you want to change an option like the learning rate, then you can do the same call as above but add the option LR 1e 5 for example and it will preserve all old options except for the new learning rate. In addition, the branch option allows for the initialization of a new experiment directory leaving the original experiment intact. For example, if you have trained for a while and want to drop the learning rate but don't know what to change it to, you can do something like the following: th main.lua branch old exp expID new exp 01 LR 1e 5 and then compare to a separate experiment th main.lua branch old exp expID new exp 02 LR 5e 5 . In src/misc there's a simple script for monitoring a set of experiments to visualize and compare training curves. Getting final predictions To generate final test set predictions for MPII, you can call: th main.lua branch your exp expID final preds finalPredictions nEpochs 0 This assumes there is an experiment that has already been run. If you just want to provide a pre trained model, that's fine too and you can call: th main.lua expID final preds finalPredictions nEpochs 0 loadModel /path/to/model Training accuracy metric For convenience during training, the accuracy function evaluates PCK by comparing the output heatmap of the network to the ground truth heatmap. The normalization in this case will be slightly different than the normalization done when officially evaluating on FLIC or MPII. So there will be some discrepancy between the numbers, but the heatmap based accuracy still provides a good picture of how well the network is learning during training. Final notes In the paper, the training time reported was with an older version of cuDNN, and after switching to cuDNN 4, training time was cut in half. Now, with a Titan X NVIDIA GPU, training time from scratch is under 3 days for MPII, and about 1 day for FLIC. pypose/ Included in this repository is a folder with a bunch of old python code that I used. It hasn't been updated in a while, and might not be totally functional at the moment. There are a number of useful functions for doing evaluation and analysis on pose predictions and it is worth digging into. It will be updated and cleaned up soon. Questions? I am sure there is a lot not covered in the README at the moment so please get in touch if you run into any issues or have any questions! Acknowledgements Thanks to Soumith Chintala, this pipeline is largely built on his example ImageNet training code available at:",Pose Estimation,Pose Estimation 2659,Computer Vision,Computer Vision,Computer Vision,"simple_pose_hourglass This is an implementation of an already existing work on Pose Estimation by Walid Benbihi which in turn is based on the hourglass model by Alejandro Newell et al This work was a bit of an experiment and the thing I did differently is with the ground truth inputs. The original implementation used a gaussian heatmap to mark the joints. I used the pose data to build an input where the torso was green, legs were red and the hands were blue, all with a black background. The idea was to infer an image and get a prediction in the same style as the ground truth input. There are 2 jupyter notebooks: one for training and the other for inference. The MPII dataset was used for training. Code used: Tensorflow 1.9 for GPU along with the requisite CUDA and CuDNN libraries on a Win10 machine with Ryzen 1600, 16GB RAM and NVIDIA 1070. Training over 25000 images (at 256x256x3) took 1000s for one epoch (batch size of 16). Network was trained for 60 epochs. Learning rate was set at 2.5e 4. 4 hourglass stacks were used. Here are the outputs. The gifs were made from frames captured and inferred in real time from a webcam feed. ! 1 ! 2 ! 3 ! 4 More details over here:",Pose Estimation,Pose Estimation 2665,Computer Vision,Computer Vision,Computer Vision,"Simple Baselines for Human Pose Estimation and Tracking (sample) This repository contains testing code for the paper . Original repository Introduction This demo created for quick testing original models for mpii dataset (other datasets and models not tested) by your own images. My code draws joints, which founded by model, on your image and save it as another image. This code doesn't use any detection models, therefore searches joints by a center person of your image Requirements Python 3.6 PyTorch 0.4.1 (should also work with 1.0, but not tested) (Optional) Install dependencies from original repository Testing 1. Download required models from original repository from step 8 of Installation 2. Prepare image, that you want to use for testing 3. run script: python demo.py cfg \ image file \ save transform image use webcam use crop mode gpus \ min confidence threshold \ Description of args: cfg (only for demo.py) : You should choose config with the same name as a model, that you are want to use. This file includes different configs for these models. Your model and config must be placed in same directory model file (only for openvino demo.py) : You should set it to your .xml model save transform image: You can set it for saving temp image after resizing and drawing bounding box (it works for webcam too) use webcam : Use webcam for getting images for predict use crop mode : Use crop mode for cropping person, that are you want (after adding this parameter, you get a new window with your photo, where you should highlight a required zone) min confidence threshold : Minumal confidence threshold of joints, that will be drawing on image. Default: 0.5 Important! Person for estimation must be at the center of image, else it can work wrong! Example: Bad positions: ! Image of BadPosition ! Image of BadPosition Good position: ! Image of GoodPosition ! Image of GoodPosition Note: If you don't know, at the center of the image your person or not, you can use option save transform image. After this, you get an image transformed.jpg , where you can see a blue box. Your person must be into this box fully or most of the body Examples: ! Image of Good ! Image of Good ! Image of Bad ! Image of Bad",Pose Estimation,Pose Estimation 2693,Computer Vision,Computer Vision,Computer Vision,"Training ImageNet and PASCAL VOC2012 via Learning Feature Pyramids The code is provided by Guangrun Wang ( Rongcong Chen also provides contributions). Sun Yat sen University (SYSU) Table of Contents 0. Introduction ( introduction) 0. ImageNet ( imagenet) 0. PASCAL VOC2012 ( voc) 0. Citation ( citation) Introduction This repository contains the training & testing code on ImageNet and PASCAL VOC2012 via learning feature pyramids (LFP). LFP is originally used for human pose machine, described in the paper Learning Feature Pyramids for Human Pose Estimation . We extend it to the semantic image segmentation. Results + Segmentation Visualization: 1. (a) input images; (b) segmentation results. ! segmentation visualization 2. (a) images & ground truths; (b) trimap of learning feature pyramids; (c) trimap of the original ResNet. ! trimaps 3. It achieves 81.0% mIoU on PASCAL VOC2011 segmentation leaderboard , a significance improvement over its baseline DeepLabV2 (79.6%). ImageNet + Training script: cd pyramid/ImageNet/ python imagenet resnet.py gpu 0,1,2,3,4,5,6,7 data_format NHWC d 101 mode resnet data ROOT OF IMAGENET DATASET + Testing script: cd pyramid/ImageNet/ python imagenet resnet.py gpu 0,1,2,3,4,5,6,7 load ROOT TO LOAD MODEL data_format NHWC d 101 mode resnet data ROOT OF IMAGENET DATASET eval + Trained Models: ResNet101: Baidu Pan , code: 269o Google Drive ResNet50: Baidu Pan , code: zvgd Google Drive PASCAL VOC2012 + Training script: Use the ImageNet classification model as pretrained model. Because ImageNet has 1,000 categories while voc only has 21 categories, we must first fix all the parameters except the last layer including 21 channels. We only train the last layer for adaption by adding: with freeze_variables(stop_gradient True, skip_collection True): in Line 206 of resnet_model_voc_aspp.py Then we finetune all the parameters. For evaluation on voc val set, the model is first trained on COCO, then on train_aug of voc. For evaluation on voc leaderboard (test set), the above model is further trained on voc val. it achieves 81.0% on voc leaderboard. a training script example is as follows. cd pyramid/VOC/ python resnet msc voc aspp.py gpu 0,1,2,3,4,5,6,7 load ROOT TO LOAD MODEL data_format NHWC d 101 mode resnet log_dir ROOT TO SAVE MODEL data ROOT OF TRAINING DATA + Testing script: cd pyramid/VOC/ python gr_test_pad_crf_msc_flip.py + Trained Models: Model trained for evaluation on voc val set: Baidu Pan , code: 7dl0 Google Drive Model trained for evaluation on voc leaderboard (test set) Baidu Pan , code: 7dl0 Google Drive Citation If you use these models in your research, please cite: @inproceedings{yang2017learning, title {Learning feature pyramids for human pose estimation}, author {Yang, Wei and Li, Shuang and Ouyang, Wanli and Li, Hongsheng and Wang, Xiaogang}, booktitle {The IEEE International Conference on Computer Vision (ICCV)}, volume {2}, year {2017} } Dependencies + Python 2.7 or 3 + TensorFlow > 1.3.0 + Tensorpack The code depends on Yuxin Wu's Tensorpack. For convenience, we provide a stable version 'tensorpack installed' in this repository. install tensorpack locally: cd tensorpack installed python setup.py install user",Pose Estimation,Pose Estimation 2694,Computer Vision,Computer Vision,Computer Vision,"Training COCO 2017 Object Detection and Segmentation via Learning Feature Pyramids The code is provided by Guangrun Wang . Sun Yat sen University (SYSU) Table of Contents 0. Introduction ( introduction) 0. Usage ( usage) 0. Citation ( citation) Introduction This repository contains the training & testing code on COCO 2017 object detection and instance segmentation via learning feature pyramids (LFP). LFP is originally used for human pose machine, described in the paper Learning Feature Pyramids for Human Pose Estimation . We extend it to the object detection and instance segmentation. Results These models are trained on COCO 2017 training set and evaluated on COCO 2017 validation set. MaskRCNN results contain both bbox and segm mAP. + COCO Object Detection Method MASKRCNN_BATCH resolution schedule AP bbox AP bbox 50 AP bbox 75 ResNet50 512 (800, 1333) 360k 37.7 57.9 40.9 Ours 512 (800, 1333) 360k 39.8 60.2 43.4 + COCO Instance Segmentation Method MASKRCNN_BATCH resolution schedule AP mask AP mask 50 AP mask 75 ResNet50 512 (800, 1333) 360k 32.8 54.3 34.7 Ours 512 (800, 1333) 360k 34.6 56.7 36.8 The schemes have the same configuration __and mAP__ as the R50 C4 2x entries in Detectron Model Zoo . Usage + The model is first pretrained on the ImageNet 1K, where the training scripts can be found Guangrun Wang's github . We also provide the trained ImageNet models as follows. Baidu Pan , code: zvgd Google Drive + Training script for COCO object detection and instance segmentation: python3 train.py load /home/grwang/seg/train_log_resnet50/imagenet resnet d50/model 510000 gpu 0,1,2,3,4,5,6,7 logdir mask pyramid train + Testing script for COCO object detection and instance segmentation: python3 train.py evaluate output.json load /home/grwang/seg/train_log_resnet50/imagenet resnet d50/model 510000 gpu 0,1,2,3,4,5,6,7 logdir mask pyramid test + Trained Models of COCO: Model trained for evaluation on COCO 2017 object detection and instance segmentation task: Baidu Pan , code: w7o9 Google Drive Citation If you use these models in your research, please cite: @inproceedings{yang2017learning, title {Learning feature pyramids for human pose estimation}, author {Yang, Wei and Li, Shuang and Ouyang, Wanli and Li, Hongsheng and Wang, Xiaogang}, booktitle {The IEEE International Conference on Computer Vision (ICCV)}, volume {2}, year {2017} } Dependencies + Python 3; TensorFlow > 1.4.0 (> 1.6.0 recommended due to a TF bug); + pycocotools , OpenCV. + Pre trained ImageNet model: google drive ; baidu pan (code: zvgd). + COCO data. It needs to have the following directory structure: DIR/ annotations/ instances_train2014.json instances_val2014.json instances_minival2014.json instances_valminusminival2014.json train2014/ COCO_train2014_ .jpg val2014/ COCO_val2014_ .jpg minival and valminusminival can be download from here . + Tensorpack The code depends on Yuxin Wu's Tensorpack. For convenience, we provide a stable version 'tensorpack installed' in this repository. install tensorpack locally: cd tensorpack installed python setup.py install user",Pose Estimation,Pose Estimation 2744,Computer Vision,Computer Vision,Computer Vision,"Default Config CUDA (+Python) CPU (+Python) OpenCL (+Python) Debug Unity : : : : : : : : : : : : : : Linux Status Status Status Status Status Status MacOS Status Status Status Status Status Status OpenPose represents the first real time multi person system to jointly detect human body, hand, facial, and foot keypoints (in total 135 keypoints) on single images . It is authored by Gines Hidalgo , Zhe Cao , Tomas Simon , Shih En Wei , Hanbyul Joo , and Yaser Sheikh . Currently, it is being maintained by Gines Hidalgo and Yaadhav Raaj . In addition, OpenPose would not be possible without the CMU Panoptic Studio dataset . We would also like to thank all the people who helped OpenPose in any way. The main contributors are listed in doc/contributors.md (doc/contributors.md). Authors Gines Hidalgo (left) and Hanbyul Joo (right) in front of the CMU Panoptic Studio Features Functionality : 2D real time multi person keypoint detection : 15 or 18 or 25 keypoint body/foot keypoint estimation . Running time invariant to number of detected people . 2x21 keypoint hand keypoint estimation . Currently, running time depends on number of detected people . 70 keypoint face keypoint estimation . Currently, running time depends on number of detected people . 3D real time single person keypoint detection : 3 D triangulation from multiple single views. Synchronization of Flir cameras handled. Compatible with Flir/Point Grey cameras, but provided C++ demos to add your custom input. Calibration toolbox : Easy estimation of distortion, intrinsic, and extrinsic camera parameters. Single person tracking for further speed up or visual smoothing. Input : Image, video, webcam, Flir/Point Grey and IP camera. Included C++ demos to add your custom input. Output : Basic image + keypoint display/saving (PNG, JPG, AVI, ...), keypoint saving (JSON, XML, YML, ...), and/or keypoints as array class. OS : Ubuntu (14, 16), Windows (8, 10), Mac OSX, Nvidia TX2. Others : Available: command line demo, C++ wrapper, and C++ API. Python API (doc/modules/python_module.md). Unity Plugin . CUDA (Nvidia GPU), OpenCL (AMD GPU), and CPU only (no GPU) versions. Training code included in the original CVPR 2017 GitHub repository . Latest Features Jan 2019: Unity plugin released ! Jan 2019: Improved Python API (doc/modules/python_module.md) released! Including body, face, hands, and all the functionality of the C++ API! Dec 2018: Foot dataset and new paper released ! Sep 2018: Experimental single person tracker (doc/quick_start.md tracking) for further speed up or visual smoothing! Jun 2018: Combined body foot model released! 40% faster and 5% more accurate (doc/installation.md)! Jun 2018: OpenCL/AMD graphic card version (doc/installation.md) released! Jun 2018: Calibration toolbox (doc/modules/calibration_module.md) released! For further details, check all released features (doc/released_features.md) and release notes (doc/release_notes.md). Results Body and Foot Estimation Testing the Crazy Uptown Funk flashmob in Sydney video sequence with OpenPose 3 D Reconstruction Module (Body, Foot, Face, and Hands) Testing the 3D Reconstruction Module of OpenPose Body, Foot, Face, and Hands Estimation Authors Gines Hidalgo (left image) and Tomas Simon (right image) testing OpenPose Unity Plugin Tianyi Zhao and Gines Hidalgo testing their OpenPose Unity Plugin Runtime Analysis Inference time comparison between the 3 available pose estimation libraries: OpenPose, Alpha Pose (fast Pytorch version), and Mask R CNN: This analysis was performed using the same images for each algorithm and a batch size of 1. Each analysis was repeated 1000 times and then averaged. This was all performed on a system with a Nvidia 1080 Ti and CUDA 8. Megvii (Face++) and MSRA GitHub repositories were excluded because they only provide pose estimation results given a cropped person. However, they suffer the same problem than Alpha Pose and Mask R CNN, their runtimes grow linearly with the number of people. Contents 1. Features ( features) 2. Latest Features ( latest features) 3. Results ( results) 4. Installation, Reinstallation and Uninstallation ( installation reinstallation and uninstallation) 5. Quick Start ( quick start) 6. Output ( output) 7. Speeding Up OpenPose and Benchmark ( speeding up openpose and benchmark) 8. Foot Dataset ( foot dataset) 9. Send Us Failure Cases and Feedback! ( send us failure cases and feedback) 10. Citation ( citation) 11. License ( license) Installation, Reinstallation and Uninstallation Windows portable version : Simply download and use the latest version from the Releases section. Otherwise, check doc/installation.md (doc/installation.md) for instructions on how to build OpenPose from source. Quick Start Most users do not need the OpenPose C++/Python API, but can simply use the OpenPose Demo: OpenPose Demo : To easily process images/video/webcam and display/save the results. See doc/demo_overview.md (doc/demo_overview.md). E.g., run OpenPose in a video with: Ubuntu ./build/examples/openpose/openpose.bin video examples/media/video.avi :: Windows Portable Demo bin\OpenPoseDemo.exe video examples\media\video.avi Calibration toolbox : To easily calibrate your cameras for 3 D OpenPose or any other stereo vision task. See doc/modules/calibration_module.md (doc/modules/calibration_module.md). OpenPose C++ API : If you want to read a specific input, and/or add your custom post processing function, and/or implement your own display/saving, check the C++ API tutorial on examples/tutorial_api_cpp/ (examples/tutorial_api_cpp/) and doc/library_introduction.md (doc/library_introduction.md). You can create your custom code on examples/user_code/ (examples/user_code/) and quickly compile it with CMake when compiling the whole OpenPose project. Quickly add your custom code : See examples/user_code/README.md (examples/user_code/README.md) for further details. OpenPose Python API : Analogously to the C++ API, find the tutorial for the Python API on examples/tutorial_api_python/ (examples/tutorial_api_python/). Adding an extra module : Check doc/library_add_new_module.md (./doc/library_add_new_module.md). Standalone face or hand detector : Face keypoint detection without body keypoint detection: If you want to speed it up (but also reduce amount of detected faces), check the OpenCV face detector approach in doc/standalone_face_or_hand_keypoint_detector.md (doc/standalone_face_or_hand_keypoint_detector.md). Use your own face/hand detector : You can use the hand and/or face keypoint detectors with your own face or hand detectors, rather than using the body detector. E.g., useful for camera views at which the hands are visible but not the body (OpenPose detector would fail). See doc/standalone_face_or_hand_keypoint_detector.md (doc/standalone_face_or_hand_keypoint_detector.md). Output Output (format, keypoint index ordering, etc.) in doc/output.md (doc/output.md). Speeding Up OpenPose and Benchmark Check the OpenPose Benchmark as well as some hints to speed up and/or reduce the memory requirements for OpenPose on doc/speed_up_openpose.md (doc/speed_up_openpose.md). Foot Dataset Check the foot dataset website and new OpenPose paper for more information. Send Us Failure Cases and Feedback! Our library is open source for research purposes, and we want to continuously improve it! So please, let us know if... 1. ... you find videos or images where OpenPose does not seems to work well. Feel free to send them to openposecmu@gmail.com (email only for failure cases!), we will use them to improve the quality of the algorithm! 2. ... you find any bug (in functionality or speed). 3. ... you added some functionality to some class or some new Worker subclass which we might potentially incorporate. 4. ... you know how to speed up or improve any part of the library. 5. ... you have a request about possible functionality. 6. ... etc. Just comment on GitHub or make a pull request and we will answer as soon as possible! Send us an email if you use the library to make a cool demo or YouTube video! Citation Please cite these papers in your publications if it helps your research (the face keypoint detector was trained using the procedure described in Simon et al. 2017 for hands): @inproceedings{cao2018openpose, author {Zhe Cao and Gines Hidalgo and Tomas Simon and Shih En Wei and Yaser Sheikh}, booktitle {arXiv preprint arXiv:1812.08008}, title {Open{P}ose: realtime multi person 2{D} pose estimation using {P}art {A}ffinity {F}ields}, year {2018} } @inproceedings{cao2017realtime, author {Zhe Cao and Tomas Simon and Shih En Wei and Yaser Sheikh}, booktitle {CVPR}, title {Realtime Multi Person 2D Pose Estimation using Part Affinity Fields}, year {2017} } @inproceedings{simon2017hand, author {Tomas Simon and Hanbyul Joo and Iain Matthews and Yaser Sheikh}, booktitle {CVPR}, title {Hand Keypoint Detection in Single Images using Multiview Bootstrapping}, year {2017} } @inproceedings{wei2016cpm, author {Shih En Wei and Varun Ramakrishna and Takeo Kanade and Yaser Sheikh}, booktitle {CVPR}, title {Convolutional pose machines}, year {2016} } Links to the papers: OpenPose: Realtime Multi Person 2D Pose Estimation using Part Affinity Fields Realtime Multi Person 2D Pose Estimation using Part Affinity Fields Hand Keypoint Detection in Single Images using Multiview Bootstrapping Convolutional Pose Machines License OpenPose is freely available for free non commercial use, and may be redistributed under these conditions. Please, see the license (LICENSE) for further details. Interested in a commercial license? Check this FlintBox link . For commercial queries, use the Directly Contact Organization section from the FlintBox link and also send a copy of that message to Yaser Sheikh . rapJetson2",Pose Estimation,Pose Estimation 2820,Computer Vision,Computer Vision,Computer Vision,"Default Config CUDA (+Python) CPU (+Python) OpenCL (+Python) Debug Unity : : : : : : : : : : : : : : Linux Status Status Status Status Status Status MacOS Status Status Status Status Status Status OpenPose represents the first real time multi person system to jointly detect human body, hand, facial, and foot keypoints (in total 135 keypoints) on single images . It is authored by Gines Hidalgo , Zhe Cao , Tomas Simon , Shih En Wei , Hanbyul Joo , and Yaser Sheikh . Currently, it is being maintained by Gines Hidalgo and Yaadhav Raaj . In addition, OpenPose would not be possible without the CMU Panoptic Studio dataset . We would also like to thank all the people who helped OpenPose in any way. The main contributors are listed in doc/contributors.md (doc/contributors.md). Authors Gines Hidalgo (left) and Hanbyul Joo (right) in front of the CMU Panoptic Studio Features Functionality : 2D real time multi person keypoint detection : 15 or 18 or 25 keypoint body/foot keypoint estimation . Running time invariant to number of detected people . 2x21 keypoint hand keypoint estimation . Currently, running time depends on number of detected people . 70 keypoint face keypoint estimation . Currently, running time depends on number of detected people . 3D real time single person keypoint detection : 3 D triangulation from multiple single views. Synchronization of Flir cameras handled. Compatible with Flir/Point Grey cameras, but provided C++ demos to add your custom input. Calibration toolbox : Easy estimation of distortion, intrinsic, and extrinsic camera parameters. Single person tracking for further speed up or visual smoothing. Input : Image, video, webcam, Flir/Point Grey and IP camera. Included C++ demos to add your custom input. Output : Basic image + keypoint display/saving (PNG, JPG, AVI, ...), keypoint saving (JSON, XML, YML, ...), and/or keypoints as array class. OS : Ubuntu (14, 16), Windows (8, 10), Mac OSX, Nvidia TX2. Others : Available: command line demo, C++ wrapper, and C++ API. Python API (doc/modules/python_module.md). Unity Plugin . CUDA (Nvidia GPU), OpenCL (AMD GPU), and CPU only (no GPU) versions. Training code included in the original CVPR 2017 GitHub repository . Latest Features Jan 2019: Unity plugin released ! Jan 2019: Improved Python API (doc/modules/python_module.md) released! Including body, face, hands, and all the functionality of the C++ API! Dec 2018: Foot dataset and new paper released ! Sep 2018: Experimental single person tracker (doc/quick_start.md tracking) for further speed up or visual smoothing! Jun 2018: Combined body foot model released! 40% faster and 5% more accurate (doc/installation.md)! Jun 2018: OpenCL/AMD graphic card version (doc/installation.md) released! Jun 2018: Calibration toolbox (doc/modules/calibration_module.md) released! For further details, check all released features (doc/released_features.md) and release notes (doc/release_notes.md). Results Body and Foot Estimation Testing the Crazy Uptown Funk flashmob in Sydney video sequence with OpenPose 3 D Reconstruction Module (Body, Foot, Face, and Hands) Testing the 3D Reconstruction Module of OpenPose Body, Foot, Face, and Hands Estimation Authors Gines Hidalgo (left image) and Tomas Simon (right image) testing OpenPose Unity Plugin Tianyi Zhao and Gines Hidalgo testing their OpenPose Unity Plugin Runtime Analysis Inference time comparison between the 3 available pose estimation libraries: OpenPose, Alpha Pose (fast Pytorch version), and Mask R CNN: This analysis was performed using the same images for each algorithm and a batch size of 1. Each analysis was repeated 1000 times and then averaged. This was all performed on a system with a Nvidia 1080 Ti and CUDA 8. Megvii (Face++) and MSRA GitHub repositories were excluded because they only provide pose estimation results given a cropped person. However, they suffer the same problem than Alpha Pose and Mask R CNN, their runtimes grow linearly with the number of people. Contents 1. Features ( features) 2. Latest Features ( latest features) 3. Results ( results) 4. Installation, Reinstallation and Uninstallation ( installation reinstallation and uninstallation) 5. Quick Start ( quick start) 6. Output ( output) 7. Speeding Up OpenPose and Benchmark ( speeding up openpose and benchmark) 8. Foot Dataset ( foot dataset) 9. Send Us Failure Cases and Feedback! ( send us failure cases and feedback) 10. Citation ( citation) 11. License ( license) Installation, Reinstallation and Uninstallation Windows portable version : Simply download and use the latest version from the Releases section. Otherwise, check doc/installation.md (doc/installation.md) for instructions on how to build OpenPose from source. Quick Start Most users do not need the OpenPose C++/Python API, but can simply use the OpenPose Demo: OpenPose Demo : To easily process images/video/webcam and display/save the results. See doc/demo_overview.md (doc/demo_overview.md). E.g., run OpenPose in a video with: Ubuntu ./build/examples/openpose/openpose.bin video examples/media/video.avi :: Windows Portable Demo bin\OpenPoseDemo.exe video examples\media\video.avi Calibration toolbox : To easily calibrate your cameras for 3 D OpenPose or any other stereo vision task. See doc/modules/calibration_module.md (doc/modules/calibration_module.md). OpenPose C++ API : If you want to read a specific input, and/or add your custom post processing function, and/or implement your own display/saving, check the C++ API tutorial on examples/tutorial_api_cpp/ (examples/tutorial_api_cpp/) and doc/library_introduction.md (doc/library_introduction.md). You can create your custom code on examples/user_code/ (examples/user_code/) and quickly compile it with CMake when compiling the whole OpenPose project. Quickly add your custom code : See examples/user_code/README.md (examples/user_code/README.md) for further details. OpenPose Python API : Analogously to the C++ API, find the tutorial for the Python API on examples/tutorial_api_python/ (examples/tutorial_api_python/). Adding an extra module : Check doc/library_add_new_module.md (./doc/library_add_new_module.md). Standalone face or hand detector : Face keypoint detection without body keypoint detection: If you want to speed it up (but also reduce amount of detected faces), check the OpenCV face detector approach in doc/standalone_face_or_hand_keypoint_detector.md (doc/standalone_face_or_hand_keypoint_detector.md). Use your own face/hand detector : You can use the hand and/or face keypoint detectors with your own face or hand detectors, rather than using the body detector. E.g., useful for camera views at which the hands are visible but not the body (OpenPose detector would fail). See doc/standalone_face_or_hand_keypoint_detector.md (doc/standalone_face_or_hand_keypoint_detector.md). Output Output (format, keypoint index ordering, etc.) in doc/output.md (doc/output.md). Speeding Up OpenPose and Benchmark Check the OpenPose Benchmark as well as some hints to speed up and/or reduce the memory requirements for OpenPose on doc/speed_up_openpose.md (doc/speed_up_openpose.md). Foot Dataset Check the foot dataset website and new OpenPose paper for more information. Send Us Failure Cases and Feedback! Our library is open source for research purposes, and we want to continuously improve it! So please, let us know if... 1. ... you find videos or images where OpenPose does not seems to work well. Feel free to send them to openposecmu@gmail.com (email only for failure cases!), we will use them to improve the quality of the algorithm! 2. ... you find any bug (in functionality or speed). 3. ... you added some functionality to some class or some new Worker subclass which we might potentially incorporate. 4. ... you know how to speed up or improve any part of the library. 5. ... you have a request about possible functionality. 6. ... etc. Just comment on GitHub or make a pull request and we will answer as soon as possible! Send us an email if you use the library to make a cool demo or YouTube video! Citation Please cite these papers in your publications if it helps your research (the face keypoint detector was trained using the procedure described in Simon et al. 2017 for hands): @inproceedings{cao2018openpose, author {Zhe Cao and Gines Hidalgo and Tomas Simon and Shih En Wei and Yaser Sheikh}, booktitle {arXiv preprint arXiv:1812.08008}, title {Open{P}ose: realtime multi person 2{D} pose estimation using {P}art {A}ffinity {F}ields}, year {2018} } @inproceedings{cao2017realtime, author {Zhe Cao and Tomas Simon and Shih En Wei and Yaser Sheikh}, booktitle {CVPR}, title {Realtime Multi Person 2D Pose Estimation using Part Affinity Fields}, year {2017} } @inproceedings{simon2017hand, author {Tomas Simon and Hanbyul Joo and Iain Matthews and Yaser Sheikh}, booktitle {CVPR}, title {Hand Keypoint Detection in Single Images using Multiview Bootstrapping}, year {2017} } @inproceedings{wei2016cpm, author {Shih En Wei and Varun Ramakrishna and Takeo Kanade and Yaser Sheikh}, booktitle {CVPR}, title {Convolutional pose machines}, year {2016} } Links to the papers: OpenPose: Realtime Multi Person 2D Pose Estimation using Part Affinity Fields Realtime Multi Person 2D Pose Estimation using Part Affinity Fields Hand Keypoint Detection in Single Images using Multiview Bootstrapping Convolutional Pose Machines License OpenPose is freely available for free non commercial use, and may be redistributed under these conditions. Please, see the license (LICENSE) for further details. Interested in a commercial license? Check this FlintBox link . For commercial queries, use the Directly Contact Organization section from the FlintBox link and also send a copy of that message to Yaser Sheikh .",Pose Estimation,Pose Estimation 2823,Computer Vision,Computer Vision,Computer Vision,"Linux Build Status OpenPose represents the first real time multi person system to jointly detect human body, hand, facial, and foot keypoints (in total 135 keypoints) on single images . Features Functionality : 2D real time multi person keypoint detection : 15 or 18 or 25 keypoint body/foot keypoint estimation . Running time invariant to number of detected people . 2x21 keypoint hand keypoint estimation . Currently, running time depends on number of detected people . 70 keypoint face keypoint estimation . Currently, running time depends on number of detected people . 3D real time single person keypoint detection : 3 D triangulation from multiple single views. Synchronization of Flir cameras handled. Compatible with Flir/Point Grey cameras, but provided C++ demos to add your custom input. Calibration toolbox : Easy estimation of distortion, intrinsic, and extrinsic camera parameters. Single person tracking for further speed up or visual smoothing. Input : Image, video, webcam, Flir/Point Grey and IP camera. Included C++ demos to add your custom input. Output : Basic image + keypoint display/saving (PNG, JPG, AVI, ...), keypoint saving (JSON, XML, YML, ...), and/or keypoints as array class. OS : Ubuntu (14, 16), Windows (8, 10), Mac OSX, Nvidia TX2. Others : Available: command line demo, C++ wrapper, and C++ API. CUDA (Nvidia GPU), OpenCL (AMD GPU), and CPU versions. Latest Features Dec 2018: Foot dataset and new paper released ! Sep 2018: Experimental single person tracker (doc/quick_start.md tracking) for further speed up or visual smoothing! Jun 2018: Combined body foot model released! 40% faster and 5% more accurate (doc/installation.md)! Jun 2018: Python API (doc/modules/python_module.md) released! Jun 2018: OpenCL/AMD graphic card version (doc/installation.md) released! Jun 2018: Calibration toolbox (doc/modules/calibration_module.md) released! For further details, check all released features (doc/released_features.md) and release notes (doc/release_notes.md). Results Body Foot Estimation Body, Face, and Hands Estimation 3 D Reconstruction Module Body and Hands Estimation Body Estimation Runtime Analysis Inference time comparison between the 3 available pose estimation libraries: OpenPose, Alpha Pose (fast Pytorch version), and Mask R CNN: This analysis was performed using the same images for each algorithm and a batch size of 1. Each analysis was repeated 1000 times and then averaged. This was all performed on a system with a Nvidia 1080 Ti and CUDA 8. Megvii (Face++) and MSRA GitHub repositories were excluded because they only provide pose estimation results given a cropped person. However, they suffer the same problem than Alpha Pose and Mask R CNN, their runtimes grow linearly with the number of people. Contents 1. Features ( features) 2. Latest Features ( latest features) 3. Results ( results) 4. Installation, Reinstallation and Uninstallation ( installation reinstallation and uninstallation) 5. Quick Start ( quick start) 6. Output ( output) 7. Speeding Up OpenPose and Benchmark ( speeding up openpose and benchmark) 8. Foot Dataset ( foot dataset) 9. Send Us Failure Cases and Feedback! ( send us failure cases and feedback) 10. Authors and Contributors ( authors and contributors) 11. Citation ( citation) 12. License ( license) Installation, Reinstallation and Uninstallation Windows portable version : Simply download and use the latest version from the Releases section. Otherwise, check doc/installation.md (doc/installation.md) for instructions on how to build OpenPose from source. Quick Start Most users do not need the OpenPose C++/Python API, but can simply use the OpenPose Demo: OpenPose Demo : To easily process images/video/webcam and display/save the results. See doc/demo_overview.md (doc/demo_overview.md). E.g., run OpenPose in a video with: Ubuntu ./build/examples/openpose/openpose.bin video examples/media/video.avi :: Windows Portable Demo bin\OpenPoseDemo.exe video examples\media\video.avi Calibration toolbox : To easily calibrate your cameras for 3 D OpenPose or any other stereo vision task. See doc/modules/calibration_module.md (doc/modules/calibration_module.md). OpenPose C++ API : If you want to read a specific input, and/or add your custom post processing function, and/or implement your own display/saving, check the C++ API tutorial on examples/tutorial_api_cpp/ (examples/tutorial_api_cpp/) and doc/library_introduction.md (doc/library_introduction.md). You can create your custom code on examples/user_code/ (examples/user_code/) and quickly compile it with CMake when compiling the whole OpenPose project. Quickly add your custom code : See examples/user_code/README.md (examples/user_code/README.md) for further details. OpenPose Python API : Analogously to the C++ API, find the tutorial for the Python API on examples/tutorial_api_python/ (examples/tutorial_api_python/). Adding an extra module : Check doc/library_add_new_module.md (./doc/library_add_new_module.md). Standalone face or hand detector : Face keypoint detection without body keypoint detection: If you want to speed it up (but also reduce amount of detected faces), check the OpenCV face detector approach in doc/standalone_face_or_hand_keypoint_detector.md (doc/standalone_face_or_hand_keypoint_detector.md). Use your own face/hand detector : You can use the hand and/or face keypoint detectors with your own face or hand detectors, rather than using the body detector. E.g., useful for camera views at which the hands are visible but not the body (OpenPose detector would fail). See doc/standalone_face_or_hand_keypoint_detector.md (doc/standalone_face_or_hand_keypoint_detector.md). Output Output (format, keypoint index ordering, etc.) in doc/output.md (doc/output.md). Speeding Up OpenPose and Benchmark Check the OpenPose Benchmark as well as some hints to speed up and/or reduce the memory requirements for OpenPose on doc/speed_up_preserving_accuracy.md (doc/speed_up_preserving_accuracy.md). Foot Dataset Check the foot dataset website and new OpenPose paper for more information. Send Us Failure Cases and Feedback! Our library is open source for research purposes, and we want to continuously improve it! So please, let us know if... 1. ... you find videos or images where OpenPose does not seems to work well. Feel free to send them to openposecmu@gmail.com (email only for failure cases!), we will use them to improve the quality of the algorithm! 2. ... you find any bug (in functionality or speed). 3. ... you added some functionality to some class or some new Worker subclass which we might potentially incorporate. 4. ... you know how to speed up or improve any part of the library. 5. ... you have a request about possible functionality. 6. ... etc. Just comment on GitHub or make a pull request and we will answer as soon as possible! Send us an email if you use the library to make a cool demo or YouTube video! Authors and Contributors OpenPose is authored by Gines Hidalgo , Zhe Cao , Tomas Simon , Shih En Wei , Hanbyul Joo , and Yaser Sheikh . Currently, it is being maintained by Gines Hidalgo and Yaadhav Raaj . The original CVPR 2017 repo includes Matlab and Python versions, as well as the training code. The body pose estimation work is based on the original ECCV 2016 demo . In addition, OpenPose would not be possible without the CMU Panoptic Studio dataset . We would also like to thank all the people who helped OpenPose in any way. The main contributors are listed in doc/contributors.md (doc/contributors.md). Citation Please cite these papers in your publications if it helps your research (the face keypoint detector was trained using the procedure described in Simon et al. 2017 for hands): @inproceedings{cao2018openpose, author {Zhe Cao and Gines Hidalgo and Tomas Simon and Shih En Wei and Yaser Sheikh}, booktitle {arXiv preprint arXiv:1812.08008}, title {Open{P}ose: realtime multi person 2{D} pose estimation using {P}art {A}ffinity {F}ields}, year {2018} } @inproceedings{cao2017realtime, author {Zhe Cao and Tomas Simon and Shih En Wei and Yaser Sheikh}, booktitle {CVPR}, title {Realtime Multi Person 2D Pose Estimation using Part Affinity Fields}, year {2017} } @inproceedings{simon2017hand, author {Tomas Simon and Hanbyul Joo and Iain Matthews and Yaser Sheikh}, booktitle {CVPR}, title {Hand Keypoint Detection in Single Images using Multiview Bootstrapping}, year {2017} } @inproceedings{wei2016cpm, author {Shih En Wei and Varun Ramakrishna and Takeo Kanade and Yaser Sheikh}, booktitle {CVPR}, title {Convolutional pose machines}, year {2016} } Links to the papers: OpenPose: Realtime Multi Person 2D Pose Estimation using Part Affinity Fields Realtime Multi Person 2D Pose Estimation using Part Affinity Fields Hand Keypoint Detection in Single Images using Multiview Bootstrapping Convolutional Pose Machines License OpenPose is freely available for free non commercial use, and may be redistributed under these conditions. Please, see the license (LICENSE) for further details. Interested in a commercial license? Check this FlintBox link . For commercial queries, use the Directly Contact Organization section from the FlintBox link and also send a copy of that message to Yaser Sheikh .",Pose Estimation,Pose Estimation 2833,Computer Vision,Computer Vision,Computer Vision,"Stacked_Hourglass_Network_Keras This is a Keras implementation for stacked hourglass network for single human pose estimation. The stacked hourglass network was proposed by Stacked Hourglass Networks for Human Pose Estimation . The official implementation built on top of torch is released under pose hg train , and pytorch version wrote by berapaw in repo pytorch pose . Most of code for image processing and evaluation come from above repos. Folder Structure data : data folder, mpii images : pictures for demo src : source code src/data_gen : data generator, augmentation and processnig code src/eval : evaluation code, eval callback src/net : net definition, hourglass network implementation src/tools : tool to draw accuracy curve and convert keras model to tf graph. top : top level entry to train/eval/demo network trained_models : folder to restore trained models. Demo Download pre trained model from shared drive and put them under trained_models BaiDu Pan: hg_s2_b1_mobile and hg_s2_b1 Google Drive: hg_s2_b1_mobile and hg_s2_b1 Run a quick demo to predict sample image python demo.py gpuID 0 model_json ../../trained_models/hg_s2_b1/net_arch.json model_weights ../../trained_models/hg_s2_b1/weights_epoch89.h5 conf_threshold 0.1 input_image ../../images/sample.jpg Train MPII Data Preparation Download MPII Dataset and put its images under data/mpii/images The json mpii_annotations.json contains all of images' annotations including train and validation. Train network Train from scratch, use python train.py help to check all the valid arguments. python train.py gpuID 0 epochs 100 batch_size 24 num_stack 2 model_path ../../trained_models/hg_s2_b1_m Arguments: gpuID gpu id, epochs number of epoch to train, batch_size batch size of samples to train, num_stack number of hourglass stack, model_path path to store trained model snapshot Note: When mobile set as True, SeparableConv2D() is used instead of standard convolution, which is much smaller and faster. Continue training from previous checkpoint python train.py gpuID 0 epochs 100 batch_size 24 num_stack 2 model_path ../../trained_models/hg_s2_b1_m resume True resume_model_json ../../trained_models/hg_s2_b1_m/net_arch.json resume_model ../../trained_models/hg_s2_b1_m/weights_epoch15.h5 init_epoch 16 Eval Run evaluation on MPII validation dataset by using PCKh 0.5. python eval.py gpuID 1 model_weights ../../trained_models/hg_s2_b1_mobile/weights_epoch70.h5 model_json ../../trained_models/hg_s2_b1_mobile/net_arch.json mat_file ../../trained_models/hg_s2_b1_mobile/preds.mat num_stack 2 The validation score curve for hg_s2_b1 and hg_s2_b1_mobile ! curve (./images/val_score.png) Issues Validation score drop significantly after 40 epochs. It is not stable as pytorch implementation. Did not root cause it yet.",Pose Estimation,Pose Estimation 2877,Computer Vision,Computer Vision,Computer Vision,"Quantized Densely Connected U Nets for Efficient Landmark Localization CU Net: Coupled U Nets Overview The follwoing figure gives an illustration of naive dense U Net, stacked U Nets and coupled U Nets (CU Net). The naive dense U Net and stacked U Nets have shortcut connections only inside each U Net. In contrast, the coupled U Nets also have connections for semantic blocks across U Nets. The CU Net is a hybrid of naive dense U Net and stacked U Net, integrating the merits of both dense connectivity, intermediate supervisions and multi stage top down and bottom up refinement. The resulted CU Net could save 70% parameters of the previous stacked U Nets but with comparable accuracy. If we couple each U Net pair in multiple U Nets, the coupling connections would have quadratic growth with respect to the U Net number. To make the model more parameter efficient, we propose the order K coupling to trim off the long distance coupling connections. For simplicity, each dot represents one U Net. The red and blue lines are the shortcut connections of inside semantic blocks and outside inputs. Order 0 connectivity (Top) strings U Nets together only by their inputs and outputs, i.e. stacked U Nets. Order 1 connectivity (Middle) has shortcut connections for adjacent U Nets. Similarly, order 2 connectivity (Bottom) has shortcut connections for 3 nearby U Nets. Prerequisites This package has the following requirements: Python 2.7 Pytorch v0.4.0 or Pytorch v0.1.12 Note that the script name with string prev version requires Pytorch v0.1.12 . Training python cu net.py gpu_id 0 exp_id cu net 2 layer_num 2 order 1 loss_num 2 is_train true bs 24 Validation python cu net.py gpu_id 0 exp_id cu net 2 layer_num 2 order 1 loss_num 2 resume_prefix your_pretrained_model.pth.tar is_train false bs 24 Model Options layer_num number of coupled U Nets order the order of coupling loss_num number of losses. Losses are uniformly distributed along the CU Net. Each U Net at most has one loss. (loss_num < layer_num) Project Page For more details, please refer to our project page . Citation If you find this code useful in your research, please consider citing: @inproceedings{tang2018quantized, title {Quantized densely connected U Nets for efficient landmark localization}, author {Tang, Zhiqiang and Peng, Xi and Geng, Shijie and Wu, Lingfei and Zhang, Shaoting and Metaxas, Dimitris}, booktitle {ECCV}, year {2018} } @inproceedings{tang2018cu, title {CU Net: Coupled U Nets}, author {Tang, Zhiqiang and Peng, Xi and Geng, Shijie and Zhu, Yizhe and Metaxas, Dimitris}, booktitle {BMVC}, year {2018} }",Pose Estimation,Pose Estimation 2913,Computer Vision,Computer Vision,Computer Vision,"Python (CUDA GPU) Python (CPU) CUDA GPU CPU Debug mode : : : : : : : : : : : : Linux Status Status Status Status Status MacOS Status Status Status OpenPose represents the first real time multi person system to jointly detect human body, hand, facial, and foot keypoints (in total 135 keypoints) on single images . It is authored by Gines Hidalgo , Zhe Cao , Tomas Simon , Shih En Wei , Hanbyul Joo , and Yaser Sheikh . Currently, it is being maintained by Gines Hidalgo and Yaadhav Raaj . In addition, OpenPose would not be possible without the CMU Panoptic Studio dataset . We would also like to thank all the people who helped OpenPose in any way. The main contributors are listed in doc/contributors.md (doc/contributors.md). Authors Gines Hidalgo (left) and Hanbyul Joo (right) in front of the CMU Panoptic Studio Features Functionality : 2D real time multi person keypoint detection : 15 or 18 or 25 keypoint body/foot keypoint estimation . Running time invariant to number of detected people . 2x21 keypoint hand keypoint estimation . Currently, running time depends on number of detected people . 70 keypoint face keypoint estimation . Currently, running time depends on number of detected people . 3D real time single person keypoint detection : 3 D triangulation from multiple single views. Synchronization of Flir cameras handled. Compatible with Flir/Point Grey cameras, but provided C++ demos to add your custom input. Calibration toolbox : Easy estimation of distortion, intrinsic, and extrinsic camera parameters. Single person tracking for further speed up or visual smoothing. Input : Image, video, webcam, Flir/Point Grey and IP camera. Included C++ demos to add your custom input. Output : Basic image + keypoint display/saving (PNG, JPG, AVI, ...), keypoint saving (JSON, XML, YML, ...), and/or keypoints as array class. OS : Ubuntu (14, 16), Windows (8, 10), Mac OSX, Nvidia TX2. Others : Available: command line demo, C++ wrapper, and C++ API. Python API (doc/modules/python_module.md). Unity Plugin . CUDA (Nvidia GPU), OpenCL (AMD GPU), and CPU versions. Training code included in the original CVPR 2017 GitHub repository . Latest Features Jan 2018: Unity plugin released ! Jan 2018: Improved Python API (doc/modules/python_module.md) released! Including body, face, hands, and all the functionality of the C++ API! Dec 2018: Foot dataset and new paper released ! Sep 2018: Experimental single person tracker (doc/quick_start.md tracking) for further speed up or visual smoothing! Jun 2018: Combined body foot model released! 40% faster and 5% more accurate (doc/installation.md)! Jun 2018: OpenCL/AMD graphic card version (doc/installation.md) released! Jun 2018: Calibration toolbox (doc/modules/calibration_module.md) released! For further details, check all released features (doc/released_features.md) and release notes (doc/release_notes.md). Results Body and Foot Estimation Testing the Crazy Uptown Funk flashmob in Sydney video sequence with OpenPose 3 D Reconstruction Module (Body, Foot, Face, and Hands) Testing the 3D Reconstruction Module of OpenPose Body, Foot, Face, and Hands Estimation Authors Gines Hidalgo (left image) and Tomas Simon (right image) testing OpenPose Unity Plugin Tianyi Zhao and Gines Hidalgo testing their OpenPose Unity Plugin Runtime Analysis Inference time comparison between the 3 available pose estimation libraries: OpenPose, Alpha Pose (fast Pytorch version), and Mask R CNN: This analysis was performed using the same images for each algorithm and a batch size of 1. Each analysis was repeated 1000 times and then averaged. This was all performed on a system with a Nvidia 1080 Ti and CUDA 8. Megvii (Face++) and MSRA GitHub repositories were excluded because they only provide pose estimation results given a cropped person. However, they suffer the same problem than Alpha Pose and Mask R CNN, their runtimes grow linearly with the number of people. Contents 1. Features ( features) 2. Latest Features ( latest features) 3. Results ( results) 4. Installation, Reinstallation and Uninstallation ( installation reinstallation and uninstallation) 5. Quick Start ( quick start) 6. Output ( output) 7. Speeding Up OpenPose and Benchmark ( speeding up openpose and benchmark) 8. Foot Dataset ( foot dataset) 9. Send Us Failure Cases and Feedback! ( send us failure cases and feedback) 10. Citation ( citation) 11. License ( license) Installation, Reinstallation and Uninstallation Windows portable version : Simply download and use the latest version from the Releases section. Otherwise, check doc/installation.md (doc/installation.md) for instructions on how to build OpenPose from source. Quick Start Most users do not need the OpenPose C++/Python API, but can simply use the OpenPose Demo: OpenPose Demo : To easily process images/video/webcam and display/save the results. See doc/demo_overview.md (doc/demo_overview.md). E.g., run OpenPose in a video with: Ubuntu ./build/examples/openpose/openpose.bin video examples/media/video.avi :: Windows Portable Demo bin\OpenPoseDemo.exe video examples\media\video.avi Calibration toolbox : To easily calibrate your cameras for 3 D OpenPose or any other stereo vision task. See doc/modules/calibration_module.md (doc/modules/calibration_module.md). OpenPose C++ API : If you want to read a specific input, and/or add your custom post processing function, and/or implement your own display/saving, check the C++ API tutorial on examples/tutorial_api_cpp/ (examples/tutorial_api_cpp/) and doc/library_introduction.md (doc/library_introduction.md). You can create your custom code on examples/user_code/ (examples/user_code/) and quickly compile it with CMake when compiling the whole OpenPose project. Quickly add your custom code : See examples/user_code/README.md (examples/user_code/README.md) for further details. OpenPose Python API : Analogously to the C++ API, find the tutorial for the Python API on examples/tutorial_api_python/ (examples/tutorial_api_python/). Adding an extra module : Check doc/library_add_new_module.md (./doc/library_add_new_module.md). Standalone face or hand detector : Face keypoint detection without body keypoint detection: If you want to speed it up (but also reduce amount of detected faces), check the OpenCV face detector approach in doc/standalone_face_or_hand_keypoint_detector.md (doc/standalone_face_or_hand_keypoint_detector.md). Use your own face/hand detector : You can use the hand and/or face keypoint detectors with your own face or hand detectors, rather than using the body detector. E.g., useful for camera views at which the hands are visible but not the body (OpenPose detector would fail). See doc/standalone_face_or_hand_keypoint_detector.md (doc/standalone_face_or_hand_keypoint_detector.md). Output Output (format, keypoint index ordering, etc.) in doc/output.md (doc/output.md). Speeding Up OpenPose and Benchmark Check the OpenPose Benchmark as well as some hints to speed up and/or reduce the memory requirements for OpenPose on doc/speed_up_preserving_accuracy.md (doc/speed_up_preserving_accuracy.md). Foot Dataset Check the foot dataset website and new OpenPose paper for more information. Send Us Failure Cases and Feedback! Our library is open source for research purposes, and we want to continuously improve it! So please, let us know if... 1. ... you find videos or images where OpenPose does not seems to work well. Feel free to send them to openposecmu@gmail.com (email only for failure cases!), we will use them to improve the quality of the algorithm! 2. ... you find any bug (in functionality or speed). 3. ... you added some functionality to some class or some new Worker subclass which we might potentially incorporate. 4. ... you know how to speed up or improve any part of the library. 5. ... you have a request about possible functionality. 6. ... etc. Just comment on GitHub or make a pull request and we will answer as soon as possible! Send us an email if you use the library to make a cool demo or YouTube video! Citation Please cite these papers in your publications if it helps your research (the face keypoint detector was trained using the procedure described in Simon et al. 2017 for hands): @inproceedings{cao2018openpose, author {Zhe Cao and Gines Hidalgo and Tomas Simon and Shih En Wei and Yaser Sheikh}, booktitle {arXiv preprint arXiv:1812.08008}, title {Open{P}ose: realtime multi person 2{D} pose estimation using {P}art {A}ffinity {F}ields}, year {2018} } @inproceedings{cao2017realtime, author {Zhe Cao and Tomas Simon and Shih En Wei and Yaser Sheikh}, booktitle {CVPR}, title {Realtime Multi Person 2D Pose Estimation using Part Affinity Fields}, year {2017} } @inproceedings{simon2017hand, author {Tomas Simon and Hanbyul Joo and Iain Matthews and Yaser Sheikh}, booktitle {CVPR}, title {Hand Keypoint Detection in Single Images using Multiview Bootstrapping}, year {2017} } @inproceedings{wei2016cpm, author {Shih En Wei and Varun Ramakrishna and Takeo Kanade and Yaser Sheikh}, booktitle {CVPR}, title {Convolutional pose machines}, year {2016} } Links to the papers: OpenPose: Realtime Multi Person 2D Pose Estimation using Part Affinity Fields Realtime Multi Person 2D Pose Estimation using Part Affinity Fields Hand Keypoint Detection in Single Images using Multiview Bootstrapping Convolutional Pose Machines License OpenPose is freely available for free non commercial use, and may be redistributed under these conditions. Please, see the license (LICENSE) for further details. Interested in a commercial license? Check this FlintBox link . For commercial queries, use the Directly Contact Organization section from the FlintBox link and also send a copy of that message to Yaser Sheikh .",Pose Estimation,Pose Estimation 2106,Computer Vision,Computer Vision,Computer Vision,"Optical Flow Prediction with Tensorflow This repo provides a TensorFlow based implementation of the wonderful paper PWC Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume, by Deqing Sun et al. (CVPR 2018). There are already a few attempts at implementing PWC Net using TensorFlow out there. However, they either use outdated architectures of the paper's CNN networks, only provide TF inference (no TF training), only work on Linux platforms, and do not support multi GPU training. This implementation provides both TF based training and inference . It is portable : because it doesn't use any dynamically loaded CUDA based TensorFlow user ops, it works on Linux and Windows . It also supports multi GPU training (the notebooks and results shown here were collected on a GTX 1080 Ti paired with a Titan X). The code also allows for mixed precision training . Finally, as shown in the Links to pre trained models ( links) section, we achieve better results than the ones reported in the official paper on the challenging MPI Sintel 'final' dataset. Table of Contents Background ( background) Environment Setup ( environment setup) Links to pre trained models ( links) PWC Net ( pwc net) + Basic Idea ( pwc net basic idea) + Network ( pwc net network) + Jupyter Notebooks ( pwc net jupyter notebooks) + Training ( pwc net training) Multisteps learning rate schedule ( pwc net training multisteps) Cyclic learning rate schedule ( pwc net training cyclic) Mixed precision training ( pwc net training mixed precision) + Evaluation ( pwc net eval) + Inference ( pwc net predict) Running inference on the test split of a dataset ( pwc net predict dataset) Running inference on image pairs ( pwc net predict img pairs) Datasets ( datasets) References ( references) Acknowledgments ( acknowledgments) Background The purpose of optical flow estimation is to generate a dense 2D real valued (u,v vector) map of the motion occurring from one video frame to the next. This information can be very useful when trying to solve computer vision problems such as object tracking, action recognition, video object segmentation , etc. Figure 2017a ( 2017a) (a) below shows training pairs (black and white frames 0 and 1) from the Middlebury Optical Flow dataset as well as the their color coded optical flow ground truth. Figure (b) indicates the color coding used for easy visualization of the (u,v) flow fields. Usually, vector orientation is represented by color hue while vector length is encoded by color saturation: ! (img/optical flow.png) The most common measures used to evaluate the quality of optical flow estimation are angular error (AE) and endpoint error (EPE) . The angular error between two optical flow vectors (u 0 , v 0 ) and (u 1 , v 1 ) is defined as arccos((u 0 , v 0 ) . (u 1 , v 1 )) . The endpoint error measures the distance between the endpoints of two optical flow vectors (u 0 , v 0 ) and (u 1 , v 1 ) and is defined as sqrt((u 0 u 1 ) 2 + (v 0 v 1 ) 2 ) . Environment Setup The code in this repo was developed and tested using Anaconda3 v.5.2.0. To reproduce our conda environment, please refer to the following files: On Ubuntu: conda list (tfoptflow/setup/dlubu36.txt) and conda env export (tfoptflow/setup/dlubu36.yml) On Windows: conda list (tfoptflow/setup/dlwin36.txt) and conda env export (tfoptflow/setup/dlwin36.yml) Links to pre trained models Pre trained models can be found here . They come in two flavors: small ( sm , with 4,705,064 learned parameters) models don't use dense connections or residual connections, large ( lg , with 14,079,050 learned parameters) models do. They are all built with a 6 level pyramid, upsampling level 2 by 4 in each dimension to generate the final prediction, and construct an 81 channel cost volume at each level from a search range (maximum displacement) of 4. Please note that we trained these models using slightly different dataset and learning rate schedules. The official multistep schedule discussed in 2018a ( 2018a) is as follows: S long 1.2M iters training, batch size 8 + S fine 500k iters finetuning, batch size 4). Ours is S long only, 1.2M iters, batch size 8, on a mix of FlyingChairs and FlyingThings3DHalfRes . FlyingThings3DHalfRes is our own version of FlyingThings3D where every input image pair and groundtruth flow has been downsampled by two in each dimension. We also use a different set of augmentation techniques . Model performance Model name Notebooks FlyingChairs (384x512) AEPE Sintel clean (436x1024) AEPE Sintel final (436x1024) AEPE : : : : : : : : : : pwcnet lg 6 2 multisteps chairsthingsmix train (tfoptflow/pwcnet_train_lg 6 2 multisteps chairsthingsmix.ipynb) 1.44 ( notebook (tfoptflow/pwcnet_eval_lg 6 2 multisteps chairsthingsmix_flyingchairs.ipynb)) 2.60 ( notebook (tfoptflow/pwcnet_eval_lg 6 2 multisteps chairsthingsmix_mpisintelclean.ipynb)) 3.70 ( notebook (tfoptflow/pwcnet_eval_lg 6 2 multisteps chairsthingsmix_mpisintelfinal.ipynb)) pwcnet sm 6 2 multisteps chairsthingsmix train (tfoptflow/pwcnet_train_sm 6 2 multisteps chairsthingsmix.ipynb) 1.71 ( notebook (tfoptflow/pwcnet_eval_sm 6 2 multisteps chairsthingsmix_flyingchairs.ipynb)) 2.96 ( notebook (tfoptflow/pwcnet_eval_sm 6 2 multisteps chairsthingsmix_mpisintelclean.ipynb)) 3.83 ( notebook (tfoptflow/pwcnet_eval_sm 6 2 multisteps chairsthingsmix_mpisintelfinal.ipynb)) As a reference, here are the official, reported results: ! (img/pwc net results.png) Model inference times We also measured the following MPI Sintel (436 x 1024) inference times on a few GPUs: Model name Titan X GTX 1080 GTX 1080 Ti : : : : : : : : pwcnet lg 6 2 cyclic chairsthingsmix 90ms 81ms 68ms pwcnet sm 6 2 cyclic chairsthingsmix 68.5ms 64.4ms 53.8ms A few clarifications about the numbers above... First, please note that this implementation is, by design, portable, i.e., it doesn't use any user defined CUDA kernels whereas the official NVidia implementation does. Ours will work on any OS and any hardware configuration (even one without a GPU) that can run TensorFlow. Second, the timing numbers we report are the inference times of the models trained on FlyingChairs and FlyingThings3DHalfRes . These are models that you can train longer if you want to, or finetune using an additional dataset, should you want to do so. In other words, these graphs haven't been frozen yet . In a typical production environment, you would freeze the model after final training/finetuning and optimize the graph to whatever platform(s) you need to distribute them on using TensorFlow XLA or TensorRT. In that important context, the inference numbers we report on unoptimized graphs are rather meaningless . PWC Net Basic Idea Per 2018a ( 2018a), PWC Net improves on FlowNet2 2016a ( 2016a) by adding domain knowledge into the design of the network. The basic idea behind optical flow estimation it that a pixel will retain most of its brightness over time despite a positional change from one frame to the next ( brightness constancy). We can grab a small patch around a pixel in video frame 1 and find another small patch in video frame 2 that will maximize some function (e.g., normalized cross correlation) of the two patches. Sliding that patch over the entire frame 1, looking for a peak, generates what's called a cost volume (the C in PWC). This techniques is fairly robust (invariant to color change) but is expensive to compute. In some cases, you may need a fairly large patch to reduce the number of false positives in frame1, raising the complexity even more. To alleviate the cost of generating the cost volume, the first optimization is to use pyramidal processing (the P in PWC). Using a lower resolution image lets you perform the search sliding a smaller patch from frame 1 over a smaller version of frame 2, yielding a smaller motion vector, then use that information as a hint to perform a more targeted search at the next level of resolution in the pyramid. That multiscale motion estimation can be performed in the image domain or in the feature domain (i.e., using the downscaled feature maps generated by a convnet). In practice, PWC warps (the W in PWC) frame 1 using an upsampled version of the motion flow estimated at a lower resolution because this will lead to searching for a smaller motion increment in the next higher resolution level of the pyramid (hence, allowing for a smaller search range). Here's a screenshot of a talk given by Deqing Sun that illustrates this process using a 2 level pyramid: ! (img/pwc net 2 level.png) Note that none of the three optimizations used here (P/W/C) are unique to PWC Net. These are techniques that were also used in SpyNet 2016b ( 2016b) and FlowNet2 2016a ( 2016a). However, here, they are used on the CNN features , rather than on an image pyramid: ! (img/pwc net vs others.png) The authors also acknowledge the fact that careful data augmentation (e.g., adding horizontal flipping) was necessary to reach best performance. To improve robustness, the authors also recommend training on multiple datasets (Sintel+KITTI+HD1K, for example) with careful class imbalance rebalancing. Since this algorithm only works on two continuous frames at a time, it has the same limitations as methods that only use image pairs (instead of n frames with n>2). Namely, if an object moves out of frame, the predicted flow will likely have a large EPE. As the authors remark, techniques that use a larger number of frames can accommodate for this limitation by propagating motion information over time. The model also sometimes fails for small, fast moving objects. Network Here's a picture of the network architecture described in 2018a ( 2018a): ! (img/pwc net.png) Jupyter Notebooks The recommended way to test this implementation is to use the following Jupyter notebooks: Optical flow datasets (prep and inspect) (tfoptflow/dataset_prep.ipynb): In this notebook, we: + Load the optical flow datasets and (automatically) create the additional data necessary to train our models (one time operation on first time load). + Show sample images/flows from each dataset. Note that you must have downloaded and unpacked the master data files already. See Datasets ( datasets) for download links to each dataset. PWC Net large model training (with multisteps learning rate schedule) (tfoptflow/pwcnet_train_lg 6 2 multisteps chairsthingsmix.ipynb): In this notebook, we: + Use a PWC Net large model (with dense and residual connections), 6 level pyramid, upsample level 2 by 4 as the final flow prediction + Train the model on a mix of the FlyingChairs and FlyingThings3DHalfRes dataset using the S long schedule described in 2016a ( 2016a) + In PWC Net small model training (with multisteps learning rate schedule) (tfoptflow/pwcnet_train_sm 6 2 multisteps chairsthingsmix.ipynb), we train the small version of the model (no dense or residual connections) + In PWC Net large model training (with cyclical learning rate schedule) (tfoptflow/pwcnet_train_lg 6 2 cyclic chairsthingsmix.ipynb), we train the large version of the model using the Cyclic short schedule + In PWC Net small model training (with cyclical learning rate schedule) (tfoptflow/pwcnet_train_sm 6 2 cyclic chairsthingsmix.ipynb), we train the small version of the model (no dense or residual connections) using the Cyclic short schedule PWC Net large model evaluation (on FlyingChairs validation split) (tfoptflow/pwcnet_eval_lg 6 2 multisteps chairsthingsmix_flyingchairs.ipynb): In this notebook, we: + Evaluate the PWC Net large model trained on a mix of the FlyingChairs and FlyingThings3DHalfRes datasets using the S long schedule + Run the evaluation on the validation split of the FlyingChairs dataset, yielding an average EPE of 1.44 + Perform basic error analysis PWC Net large model evaluation (on MPI Sintel 'clean') (tfoptflow/pwcnet_eval_lg 6 2 multisteps chairsthingsmix_mpisintelclean.ipynb): In this notebook, we: + Evaluate the PWC Net large model trained on a mix of the FlyingChairs and FlyingThings3DHalfRes datasets using the S long schedule + Run the evaluation on the 'clean' version of the MPI Sintel dataset, yielding an average EPE of 2.60 + Perform basic error analysis PWC Net large model evaluation (on MPI Sintel 'final') (tfoptflow/pwcnet_eval_lg 6 2 multisteps chairsthingsmix_mpisintelfinal.ipynb): In this notebook, we: + Evaluate the PWC Net large model trained on a mix of the FlyingChairs and FlyingThings3DHalfRes datasets using the S long schedule + Run the evaluation on the 'final' version of the MPI Sintel dataset, yielding an average EPE of 3.70 + Perform basic error analysis Training Multisteps learning rate schedule Differently from the original paper, we do not train on FlyingChairs and FlyingThings3D sequentially (i.e, pre train on FlyingChairs then finetune on FlyingThings3D ). This is because the average flow magnitude on the MPI Sintel dataset is only 13.5, while the average flow magnitudes on FlyingChairs and FlyingThings3D are 11.1 and 38, respectively. In our experiments, finetuning on FlyingThings3D would only yield worse results on MPI Sintel . We got more stable results by using a half resolution version of the FlyingThings3D dataset with an average flow magnitude of 19, much closer to FlyingChairs and MPI Sintel in that respect. We then trained on a mix of the FlyingChairs and FlyingThings3DHalfRes datasets. This mix, of course, could be extended with additional datasets. Here are the training curves for the S long training notebooks listed above: ! (img/loss_multisteps.png) ! (img/epe_multisteps.png) ! (img/lr_multisteps.png) Note that, if you click on the IMAGE tab in Tensorboard while running the training notebooks above, you will be able to visualize the progress of the training on a few validation samples (including the predicted flows at each pyramid level), as demonstrated here: ! (img/val2.png) ! (img/val4.png) Cyclic learning rate schedule If you don't want to use the long training schedule, but still would like to play with this code, try our very short cyclic learning rate schedule (100k iters, batch size 8). The results are nowhere near as good, but they allow for quick experimentation : Model name Notebooks FlyingChairs (384x512) AEPE Sintel clean (436x1024) AEPE Sintel final (436x1024) AEPE : : : : : : : : : : pwcnet lg 6 2 cyclic chairsthingsmix train (tfoptflow/pwcnet_train_lg 6 2 cyclic chairsthingsmix.ipynb) 2.67 ( notebook (tfoptflow/pwcnet_eval_lg 6 2 cyclic chairsthingsmix_flyingchairs.ipynb)) 3.99 ( notebook (tfoptflow/pwcnet_eval_lg 6 2 cyclic chairsthingsmix_mpisintelclean.ipynb)) 5.08 ( notebook (tfoptflow/pwcnet_eval_lg 6 2 cyclic chairsthingsmix_mpisintelfinal.ipynb)) pwcnet sm 6 2 cyclic chairsthingsmix train (tfoptflow/pwcnet_train_sm 6 2 cyclic chairsthingsmix.ipynb) 2.79 ( notebook (tfoptflow/pwcnet_eval_sm 6 2 cyclic chairsthingsmix_flyingchairs.ipynb)) 4.34 ( notebook (tfoptflow/pwcnet_eval_sm 6 2 cyclic chairsthingsmix_mpisintelclean.ipynb)) 5.3 ( notebook (tfoptflow/pwcnet_eval_sm 6 2 cyclic chairsthingsmix_mpisintelfinal.ipynb)) Below are the training curves for the Cyclic short training notebooks: ! (img/loss_cyclic.png) ! (img/epe_cyclic.png) ! (img/lr_cyclic.png) Mixed precision training You can speed up training even further by using mixed precision training. But, again, don't expect the same level of accuracy: Model name Notebooks FlyingChairs (384x512) AEPE Sintel clean (436x1024) AEPE Sintel final (436x1024) AEPE : : : : : : : : : : pwcnet sm 6 2 cyclic chairsthingsmix fp16 train (tfoptflow/pwcnet_train_sm 6 2 cyclic chairsthingsmix fp16.ipynb) 2.47 ( notebook (tfoptflow/pwcnet_eval_sm 6 2 cyclic chairsthingsmix fp16.ipynb)) 3.77 ( notebook (pwcnet_eval_sm 6 2 cyclic chairsthingsmix fp16.ipynb)) 4.90 ( notebook (pwcnet_eval_sm 6 2 cyclic chairsthingsmix fp16.ipynb)) Evaluation As shown in the evaluation notebooks, and as expected, it becomes harder for the PWC Net models to deliver accurate flow predictions if the average flow magnitude from one frame to the next is high: ! (img/error_analysis_epe_vs_avgflowmag.png) It is especially hard for this and any other 2 frame based motion estimator! model to generate accurate predictions when picture elements simply disappear out of frame or suddenly fly in: ! (img/error_analysis_10_worst.png) Still, when the average motion is moderate, both the small and large models generate remarkable results: ! (img/error_analysis_10_best.png) Inference There are two ways you can call the code provided here to generate flow predictions for your own dataset: Pass a list of image pairs to a ModelPWCNet object using its predict_from_img_pairs() method Pass an OpticalFlowDataset object to a ModelPWCNet object and call its predict() method Running inference on image pairs If you want to use a pre trained PWC Net model on your own set of images, you can pass a list of image pairs to a ModelPWCNet object using its predict_from_img_pairs() method, as demonstrated here: python from __future__ import absolute_import, division, print_function from copy import deepcopy from skimage.io import imread from model_pwcnet import ModelPWCNet, _DEFAULT_PWCNET_TEST_OPTIONS from visualize import display_img_pairs_w_flows Build a list of image pairs to process img_pairs for pair in range(1, 4): image_path1 f'./samples/mpisintel_test_clean_ambush_1_frame_00{pair:02d}.png' image_path2 f'./samples/mpisintel_test_clean_ambush_1_frame_00{pair+1:02d}.png' image1, image2 imread(image_path1), imread(image_path2) img_pairs.append((image1, image2)) TODO: Set device to use for inference Here, we're using a GPU (use '/device:CPU:0' to run inference on the CPU) gpu_devices '/device:GPU:0' controller '/device:GPU:0' TODO: Set the path to the trained model (make sure you've downloaded it first from ckpt_path './models/pwcnet lg 6 2 multisteps chairsthingsmix/pwcnet.ckpt 595000' Configure the model for inference, starting with the default options nn_opts deepcopy(_DEFAULT_PWCNET_TEST_OPTIONS) nn_opts 'verbose' True nn_opts 'ckpt_path' ckpt_path nn_opts 'batch_size' 1 nn_opts 'gpu_devices' gpu_devices nn_opts 'controller' controller We're running the PWC Net large model in quarter resolution mode That is, with a 6 level pyramid, and upsampling of level 2 by 4 in each dimension as the final flow prediction nn_opts 'use_dense_cx' True nn_opts 'use_res_cx' True nn_opts 'pyr_lvls' 6 nn_opts 'flow_pred_lvl' 2 The size of the images in this dataset are not multiples of 64, while the model generates flows padded to multiples of 64. Hence, we need to crop the predicted flows to their original size nn_opts 'adapt_info' (1, 436, 1024, 2) Instantiate the model in inference mode and display the model configuration nn ModelPWCNet(mode 'test', options nn_opts) nn.print_config() Generate the predictions and display them pred_labels nn.predict_from_img_pairs(img_pairs, batch_size 1, verbose False) display_img_pairs_w_flows(img_pairs, pred_labels) The code above can be found in the pwcnet_predict_from_img_pairs.ipynb (tfoptflow/pwcnet_predict_from_img_pairs.ipynb) notebook and the pwcnet_predict_from_img_pairs.py (tfoptflow/pwcnet_predict_from_img_pairs.py) script. Running inference on the test split of a dataset If you want to train a PWC Net model from scratch, or finetune a pre trained PWC Net model using your own dataset, you will need to implement a dataset handler that derives from the OpticalFlowDataset base class in dataset_base.py (tfoptflow/dataset_base.py). We provide several dataset handlers for well known datasets, such as MPI Sintel ( dataset_mpisintel.py (tfoptflow/dataset_mpisintel.py)), FlyingChairs ( dataset_flyingchairs.py (tfoptflow/dataset_flyingchairs.py)), FlyingThings3D ( dataset_flyingthings3d.py (tfoptflow/dataset_flyingthings3d.py)), and KITTI ( dataset_kitti.py (tfoptflow/dataset_kitti.py)). Anyone of them is a good starting point to figure out how to implement your own. Please note that that this is not complicated work; the derived class does little beyond telling the base class which list of files are to be used for training, validation, and testing, leaving the heavy lifting to the base class. Once you have a data handler, you can pass it to a ModelPWCNet object and call its predict() method to generate flow predictions for its test split, as shown in the pwcnet_predict.ipynb (tfoptflow/pwcnet_predict.ipynb) notebook and the pwcnet_predict.py (tfoptflow/pwcnet_predict.py) script. Datasets Datasets most commonly used for optical flow estimation include: FlyingThings3D image pairs + flows + all_unused_files.txt FlyingChairs images pairs + flows + FlyingChairs_train_val split MPI Sintel zip KITTI Flow 2012 zip and/or KITTI Flow 2015 zip Additional optical flow datasets (not used here): Middlebury Optical Flow web Heidelberg HD1K Flow web Per 2018a ( 2018a), KITTI and Sintel are currently the most challenging and widely used benchmarks for optical flow. The KITTI benchmark is targeted at autonomous driving applications and its semi dense ground truth is collected using LIDAR. The 2012 set only consists of static scenes. The 2015 set is extended to dynamic scenes via human annotations and more challenging to existing methods because of the large motion, severe illumination changes, and occlusions. The Sintel benchmark is created using the open source graphics movie Sintel with two passes, clean and final. The final pass contains strong atmospheric effects, motion blur, and camera noise, which cause severe problems to existing methods. References 2018 2018a Sun et al. 2018. PWC Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume. arXiv web PyTorch (Official) PyTorch PyTorch Caffe (Official) TensorFlow TensorFlow Video Video 2017 2017a Baghaie et al. 2017. Dense Descriptors for Optical Flow Estimation: A Comparative Study. web 2016 2016a Ilg et al. 2016. FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks. arXiv PyTorch (Official) TensorFlow 2016b Ranjan et al. 2016. SpyNet: Optical Flow Estimation using a Spatial Pyramid Network. arXiv Torch (Official) PyTorch 2015 2015a Fischer et al. 2015. FlowNet: Learning Optical Flow with Convolutional Networks. arXiv Tensorflow (FlowNet S) Acknowledgments Other TensorFlow implementations we are indebted to: by daigo0927 by djl11 by PatWie @InProceedings{Sun2018PWC Net, author {Deqing Sun and Xiaodong Yang and Ming Yu Liu and Jan Kautz}, title {{PWC Net}: {CNNs} for Optical Flow Using Pyramid, Warping, and Cost Volume}, booktitle CVPR, year {2018}, } @InProceedings\{DFIB15, author A. Dosovitskiy and P. Fischer and E. Ilg and P. H{\ a}usser and C. Hazirbas and V. Golkov and P. v.d. Smagt and D. Cremers and T. Brox , title FlowNet: Learning Optical Flow with Convolutional Networks , booktitle IEEE International Conference on Computer Vision (ICCV) , month Dec , year 2015 , url } Contact Info If you have any questions about this work, please feel free to contact us here:",Optical Flow Estimation,Vision Other 2116,Computer Vision,Computer Vision,Computer Vision,"I3D models trained on Kinetics Overview This repository contains trained models reported in the paper Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset by Joao Carreira and Andrew Zisserman. This code is based on Deepmind's Kinetics I3D . Including PyTorch versions of their models. Note This code was written for PyTorch 0.3. Version 0.4 and newer may cause issues. Fine tuning and Feature Extraction We provide code to extract I3D features and fine tune I3D for charades. Our fine tuned models on charades are also available in the models director (in addition to Deepmind's trained models). The deepmind pre trained models were converted to PyTorch and give identical results (flow_imagenet.pt and rgb_imagenet.pt). These models were pretrained on imagenet and kinetics (see Kinetics I3D for details). Fine tuning I3D train_i3d.py (train_i3d.py) contains the code to fine tune I3D based on the details in the paper and obtained from the authors. Specifically, this version follows the settings to fine tune on the Charades (allenai.org/plato/charades/) dataset based on the author's implementation that won the Charades 2017 challenge. Our fine tuned RGB and Flow I3D models are available in the model directory (rgb_charades.pt and flow_charades.pt). This relied on having the optical flow and RGB frames extracted and saved as images on dist. charades_dataset.py (charades_dataset.py) contains our code to load video segments for training. Feature Extraction extract_features.py (extract_features.py) contains the code to load a pre trained I3D model and extract the features and save the features as numpy arrays. The charades_dataset_full.py (charades_dataset_full.py) script loads an entire video to extract per segment features.",Action Recognition,Vision Other 2233,Computer Vision,Computer Vision,Computer Vision,"Optical Flow Prediction with Tensorflow This repo provides a TensorFlow based implementation of the wonderful paper PWC Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume, by Deqing Sun et al. (CVPR 2018). There are already a few attempts at implementing PWC Net using TensorFlow out there. However, they either use outdated architectures of the paper's CNN networks, only provide TF inference (no TF training), only work on Linux platforms, and do not support multi GPU training. This implementation provides both TF based training and inference . It is portable : because it doesn't use any dynamically loaded CUDA based TensorFlow user ops, it works on Linux and Windows . It also supports multi GPU training (the notebooks and results shown here were collected on a GTX 1080 Ti paired with a Titan X). The code also allows for mixed precision training . Finally, as shown in the Links to pre trained models ( links) section, we achieve better results than the ones reported in the official paper on the challenging MPI Sintel 'final' dataset. Table of Contents Background ( background) Environment Setup ( environment setup) Links to pre trained models ( links) PWC Net ( pwc net) + Basic Idea ( pwc net basic idea) + Network ( pwc net network) + Jupyter Notebooks ( pwc net jupyter notebooks) + Training ( pwc net training) Multisteps learning rate schedule ( pwc net training multisteps) Cyclic learning rate schedule ( pwc net training cyclic) Mixed precision training ( pwc net training mixed precision) + Evaluation ( pwc net eval) + Inference ( pwc net predict) Running inference on the test split of a dataset ( pwc net predict dataset) Running inference on image pairs ( pwc net predict img pairs) Datasets ( datasets) References ( references) Acknowledgments ( acknowledgments) Background The purpose of optical flow estimation is to generate a dense 2D real valued (u,v vector) map of the motion occurring from one video frame to the next. This information can be very useful when trying to solve computer vision problems such as object tracking, action recognition, video object segmentation , etc. Figure 2017a ( 2017a) (a) below shows training pairs (black and white frames 0 and 1) from the Middlebury Optical Flow dataset as well as the their color coded optical flow ground truth. Figure (b) indicates the color coding used for easy visualization of the (u,v) flow fields. Usually, vector orientation is represented by color hue while vector length is encoded by color saturation: ! (img/optical flow.png) The most common measures used to evaluate the quality of optical flow estimation are angular error (AE) and endpoint error (EPE) . The angular error between two optical flow vectors (u 0 , v 0 ) and (u 1 , v 1 ) is defined as arccos((u 0 , v 0 ) . (u 1 , v 1 )) . The endpoint error measures the distance between the endpoints of two optical flow vectors (u 0 , v 0 ) and (u 1 , v 1 ) and is defined as sqrt((u 0 u 1 ) 2 + (v 0 v 1 ) 2 ) . Environment Setup The code in this repo was developed and tested using Anaconda3 v.5.2.0. To reproduce our conda environment, please refer to the following files: On Ubuntu: conda list (tfoptflow/setup/dlubu36.txt) and conda env export (tfoptflow/setup/dlubu36.yml) On Windows: conda list (tfoptflow/setup/dlwin36.txt) and conda env export (tfoptflow/setup/dlwin36.yml) Links to pre trained models Pre trained models can be found here . They come in two flavors: small ( sm , with 4,705,064 learned parameters) models don't use dense connections or residual connections, large ( lg , with 14,079,050 learned parameters) models do. They are all built with a 6 level pyramid, upsampling level 2 by 4 in each dimension to generate the final prediction, and construct an 81 channel cost volume at each level from a search range (maximum displacement) of 4. Please note that we trained these models using slightly different dataset and learning rate schedules. The official multistep schedule discussed in 2018a ( 2018a) is as follows: S long 1.2M iters training, batch size 8 + S fine 500k iters finetuning, batch size 4). Ours is S long only, 1.2M iters, batch size 8, on a mix of FlyingChairs and FlyingThings3DHalfRes . FlyingThings3DHalfRes is our own version of FlyingThings3D where every input image pair and groundtruth flow has been downsampled by two in each dimension. We also use a different set of augmentation techniques . Model performance Model name Notebooks FlyingChairs (384x512) AEPE Sintel clean (436x1024) AEPE Sintel final (436x1024) AEPE : : : : : : : : : : pwcnet lg 6 2 multisteps chairsthingsmix train (tfoptflow/pwcnet_train_lg 6 2 multisteps chairsthingsmix.ipynb) 1.44 ( notebook (tfoptflow/pwcnet_eval_lg 6 2 multisteps chairsthingsmix_flyingchairs.ipynb)) 2.60 ( notebook (tfoptflow/pwcnet_eval_lg 6 2 multisteps chairsthingsmix_mpisintelclean.ipynb)) 3.70 ( notebook (tfoptflow/pwcnet_eval_lg 6 2 multisteps chairsthingsmix_mpisintelfinal.ipynb)) pwcnet sm 6 2 multisteps chairsthingsmix train (tfoptflow/pwcnet_train_sm 6 2 multisteps chairsthingsmix.ipynb) 1.71 ( notebook (tfoptflow/pwcnet_eval_sm 6 2 multisteps chairsthingsmix_flyingchairs.ipynb)) 2.96 ( notebook (tfoptflow/pwcnet_eval_sm 6 2 multisteps chairsthingsmix_mpisintelclean.ipynb)) 3.83 ( notebook (tfoptflow/pwcnet_eval_sm 6 2 multisteps chairsthingsmix_mpisintelfinal.ipynb)) As a reference, here are the official, reported results: ! (img/pwc net results.png) Model inference times We also measured the following MPI Sintel (436 x 1024) inference times on a few GPUs: Model name Titan X GTX 1080 GTX 1080 Ti : : : : : : : : pwcnet lg 6 2 cyclic chairsthingsmix 90ms 81ms 68ms pwcnet sm 6 2 cyclic chairsthingsmix 68.5ms 64.4ms 53.8ms A few clarifications about the numbers above... First, please note that this implementation is, by design, portable, i.e., it doesn't use any user defined CUDA kernels whereas the official NVidia implementation does. Ours will work on any OS and any hardware configuration (even one without a GPU) that can run TensorFlow. Second, the timing numbers we report are the inference times of the models trained on FlyingChairs and FlyingThings3DHalfRes . These are models that you can train longer if you want to, or finetune using an additional dataset, should you want to do so. In other words, these graphs haven't been frozen yet . In a typical production environment, you would freeze the model after final training/finetuning and optimize the graph to whatever platform(s) you need to distribute them on using TensorFlow XLA or TensorRT. In that important context, the inference numbers we report on unoptimized graphs are rather meaningless . PWC Net Basic Idea Per 2018a ( 2018a), PWC Net improves on FlowNet2 2016a ( 2016a) by adding domain knowledge into the design of the network. The basic idea behind optical flow estimation it that a pixel will retain most of its brightness over time despite a positional change from one frame to the next ( brightness constancy). We can grab a small patch around a pixel in video frame 1 and find another small patch in video frame 2 that will maximize some function (e.g., normalized cross correlation) of the two patches. Sliding that patch over the entire frame 1, looking for a peak, generates what's called a cost volume (the C in PWC). This techniques is fairly robust (invariant to color change) but is expensive to compute. In some cases, you may need a fairly large patch to reduce the number of false positives in frame1, raising the complexity even more. To alleviate the cost of generating the cost volume, the first optimization is to use pyramidal processing (the P in PWC). Using a lower resolution image lets you perform the search sliding a smaller patch from frame 1 over a smaller version of frame 2, yielding a smaller motion vector, then use that information as a hint to perform a more targeted search at the next level of resolution in the pyramid. That multiscale motion estimation can be performed in the image domain or in the feature domain (i.e., using the downscaled feature maps generated by a convnet). In practice, PWC warps (the W in PWC) frame 1 using an upsampled version of the motion flow estimated at a lower resolution because this will lead to searching for a smaller motion increment in the next higher resolution level of the pyramid (hence, allowing for a smaller search range). Here's a screenshot of a talk given by Deqing Sun that illustrates this process using a 2 level pyramid: ! (img/pwc net 2 level.png) Note that none of the three optimizations used here (P/W/C) are unique to PWC Net. These are techniques that were also used in SpyNet 2016b ( 2016b) and FlowNet2 2016a ( 2016a). However, here, they are used on the CNN features , rather than on an image pyramid: ! (img/pwc net vs others.png) The authors also acknowledge the fact that careful data augmentation (e.g., adding horizontal flipping) was necessary to reach best performance. To improve robustness, the authors also recommend training on multiple datasets (Sintel+KITTI+HD1K, for example) with careful class imbalance rebalancing. Since this algorithm only works on two continuous frames at a time, it has the same limitations as methods that only use image pairs (instead of n frames with n>2). Namely, if an object moves out of frame, the predicted flow will likely have a large EPE. As the authors remark, techniques that use a larger number of frames can accommodate for this limitation by propagating motion information over time. The model also sometimes fails for small, fast moving objects. Network Here's a picture of the network architecture described in 2018a ( 2018a): ! (img/pwc net.png) Jupyter Notebooks The recommended way to test this implementation is to use the following Jupyter notebooks: Optical flow datasets (prep and inspect) (tfoptflow/dataset_prep.ipynb): In this notebook, we: + Load the optical flow datasets and (automatically) create the additional data necessary to train our models (one time operation on first time load). + Show sample images/flows from each dataset. Note that you must have downloaded and unpacked the master data files already. See Datasets ( datasets) for download links to each dataset. PWC Net large model training (with multisteps learning rate schedule) (tfoptflow/pwcnet_train_lg 6 2 multisteps chairsthingsmix.ipynb): In this notebook, we: + Use a PWC Net large model (with dense and residual connections), 6 level pyramid, upsample level 2 by 4 as the final flow prediction + Train the model on a mix of the FlyingChairs and FlyingThings3DHalfRes dataset using the S long schedule described in 2016a ( 2016a) + In PWC Net small model training (with multisteps learning rate schedule) (tfoptflow/pwcnet_train_sm 6 2 multisteps chairsthingsmix.ipynb), we train the small version of the model (no dense or residual connections) + In PWC Net large model training (with cyclical learning rate schedule) (tfoptflow/pwcnet_train_lg 6 2 cyclic chairsthingsmix.ipynb), we train the large version of the model using the Cyclic short schedule + In PWC Net small model training (with cyclical learning rate schedule) (tfoptflow/pwcnet_train_sm 6 2 cyclic chairsthingsmix.ipynb), we train the small version of the model (no dense or residual connections) using the Cyclic short schedule PWC Net large model evaluation (on FlyingChairs validation split) (tfoptflow/pwcnet_eval_lg 6 2 multisteps chairsthingsmix_flyingchairs.ipynb): In this notebook, we: + Evaluate the PWC Net large model trained on a mix of the FlyingChairs and FlyingThings3DHalfRes datasets using the S long schedule + Run the evaluation on the validation split of the FlyingChairs dataset, yielding an average EPE of 1.44 + Perform basic error analysis PWC Net large model evaluation (on MPI Sintel 'clean') (tfoptflow/pwcnet_eval_lg 6 2 multisteps chairsthingsmix_mpisintelclean.ipynb): In this notebook, we: + Evaluate the PWC Net large model trained on a mix of the FlyingChairs and FlyingThings3DHalfRes datasets using the S long schedule + Run the evaluation on the 'clean' version of the MPI Sintel dataset, yielding an average EPE of 2.60 + Perform basic error analysis PWC Net large model evaluation (on MPI Sintel 'final') (tfoptflow/pwcnet_eval_lg 6 2 multisteps chairsthingsmix_mpisintelfinal.ipynb): In this notebook, we: + Evaluate the PWC Net large model trained on a mix of the FlyingChairs and FlyingThings3DHalfRes datasets using the S long schedule + Run the evaluation on the 'final' version of the MPI Sintel dataset, yielding an average EPE of 3.70 + Perform basic error analysis Training Multisteps learning rate schedule Differently from the original paper, we do not train on FlyingChairs and FlyingThings3D sequentially (i.e, pre train on FlyingChairs then finetune on FlyingThings3D ). This is because the average flow magnitude on the MPI Sintel dataset is only 13.5, while the average flow magnitudes on FlyingChairs and FlyingThings3D are 11.1 and 38, respectively. In our experiments, finetuning on FlyingThings3D would only yield worse results on MPI Sintel . We got more stable results by using a half resolution version of the FlyingThings3D dataset with an average flow magnitude of 19, much closer to FlyingChairs and MPI Sintel in that respect. We then trained on a mix of the FlyingChairs and FlyingThings3DHalfRes datasets. This mix, of course, could be extended with additional datasets. Here are the training curves for the S long training notebooks listed above: ! (img/loss_multisteps.png) ! (img/epe_multisteps.png) ! (img/lr_multisteps.png) Note that, if you click on the IMAGE tab in Tensorboard while running the training notebooks above, you will be able to visualize the progress of the training on a few validation samples (including the predicted flows at each pyramid level), as demonstrated here: ! (img/val2.png) ! (img/val4.png) Cyclic learning rate schedule If you don't want to use the long training schedule, but still would like to play with this code, try our very short cyclic learning rate schedule (100k iters, batch size 8). The results are nowhere near as good, but they allow for quick experimentation : Model name Notebooks FlyingChairs (384x512) AEPE Sintel clean (436x1024) AEPE Sintel final (436x1024) AEPE : : : : : : : : : : pwcnet lg 6 2 cyclic chairsthingsmix train (tfoptflow/pwcnet_train_lg 6 2 cyclic chairsthingsmix.ipynb) 2.67 ( notebook (tfoptflow/pwcnet_eval_lg 6 2 cyclic chairsthingsmix_flyingchairs.ipynb)) 3.99 ( notebook (tfoptflow/pwcnet_eval_lg 6 2 cyclic chairsthingsmix_mpisintelclean.ipynb)) 5.08 ( notebook (tfoptflow/pwcnet_eval_lg 6 2 cyclic chairsthingsmix_mpisintelfinal.ipynb)) pwcnet sm 6 2 cyclic chairsthingsmix train (tfoptflow/pwcnet_train_sm 6 2 cyclic chairsthingsmix.ipynb) 2.79 ( notebook (tfoptflow/pwcnet_eval_sm 6 2 cyclic chairsthingsmix_flyingchairs.ipynb)) 4.34 ( notebook (tfoptflow/pwcnet_eval_sm 6 2 cyclic chairsthingsmix_mpisintelclean.ipynb)) 5.3 ( notebook (tfoptflow/pwcnet_eval_sm 6 2 cyclic chairsthingsmix_mpisintelfinal.ipynb)) Below are the training curves for the Cyclic short training notebooks: ! (img/loss_cyclic.png) ! (img/epe_cyclic.png) ! (img/lr_cyclic.png) Mixed precision training You can speed up training even further by using mixed precision training. But, again, don't expect the same level of accuracy: Model name Notebooks FlyingChairs (384x512) AEPE Sintel clean (436x1024) AEPE Sintel final (436x1024) AEPE : : : : : : : : : : pwcnet sm 6 2 cyclic chairsthingsmix fp16 train (tfoptflow/pwcnet_train_sm 6 2 cyclic chairsthingsmix fp16.ipynb) 2.47 ( notebook (tfoptflow/pwcnet_eval_sm 6 2 cyclic chairsthingsmix fp16.ipynb)) 3.77 ( notebook (pwcnet_eval_sm 6 2 cyclic chairsthingsmix fp16.ipynb)) 4.90 ( notebook (pwcnet_eval_sm 6 2 cyclic chairsthingsmix fp16.ipynb)) Evaluation As shown in the evaluation notebooks, and as expected, it becomes harder for the PWC Net models to deliver accurate flow predictions if the average flow magnitude from one frame to the next is high: ! (img/error_analysis_epe_vs_avgflowmag.png) It is especially hard for this and any other 2 frame based motion estimator! model to generate accurate predictions when picture elements simply disappear out of frame or suddenly fly in: ! (img/error_analysis_10_worst.png) Still, when the average motion is moderate, both the small and large models generate remarkable results: ! (img/error_analysis_10_best.png) Inference There are two ways you can call the code provided here to generate flow predictions for your own dataset: Pass a list of image pairs to a ModelPWCNet object using its predict_from_img_pairs() method Pass an OpticalFlowDataset object to a ModelPWCNet object and call its predict() method Running inference on image pairs If you want to use a pre trained PWC Net model on your own set of images, you can pass a list of image pairs to a ModelPWCNet object using its predict_from_img_pairs() method, as demonstrated here: python from __future__ import absolute_import, division, print_function from copy import deepcopy from skimage.io import imread from model_pwcnet import ModelPWCNet, _DEFAULT_PWCNET_TEST_OPTIONS from visualize import display_img_pairs_w_flows Build a list of image pairs to process img_pairs for pair in range(1, 4): image_path1 f'./samples/mpisintel_test_clean_ambush_1_frame_00{pair:02d}.png' image_path2 f'./samples/mpisintel_test_clean_ambush_1_frame_00{pair+1:02d}.png' image1, image2 imread(image_path1), imread(image_path2) img_pairs.append((image1, image2)) TODO: Set device to use for inference Here, we're using a GPU (use '/device:CPU:0' to run inference on the CPU) gpu_devices '/device:GPU:0' controller '/device:GPU:0' TODO: Set the path to the trained model (make sure you've downloaded it first from ckpt_path './models/pwcnet lg 6 2 multisteps chairsthingsmix/pwcnet.ckpt 595000' Configure the model for inference, starting with the default options nn_opts deepcopy(_DEFAULT_PWCNET_TEST_OPTIONS) nn_opts 'verbose' True nn_opts 'ckpt_path' ckpt_path nn_opts 'batch_size' 1 nn_opts 'gpu_devices' gpu_devices nn_opts 'controller' controller We're running the PWC Net large model in quarter resolution mode That is, with a 6 level pyramid, and upsampling of level 2 by 4 in each dimension as the final flow prediction nn_opts 'use_dense_cx' True nn_opts 'use_res_cx' True nn_opts 'pyr_lvls' 6 nn_opts 'flow_pred_lvl' 2 The size of the images in this dataset are not multiples of 64, while the model generates flows padded to multiples of 64. Hence, we need to crop the predicted flows to their original size nn_opts 'adapt_info' (1, 436, 1024, 2) Instantiate the model in inference mode and display the model configuration nn ModelPWCNet(mode 'test', options nn_opts) nn.print_config() Generate the predictions and display them pred_labels nn.predict_from_img_pairs(img_pairs, batch_size 1, verbose False) display_img_pairs_w_flows(img_pairs, pred_labels) The code above can be found in the pwcnet_predict_from_img_pairs.ipynb (tfoptflow/pwcnet_predict_from_img_pairs.ipynb) notebook and the pwcnet_predict_from_img_pairs.py (tfoptflow/pwcnet_predict_from_img_pairs.py) script. Running inference on the test split of a dataset If you want to train a PWC Net model from scratch, or finetune a pre trained PWC Net model using your own dataset, you will need to implement a dataset handler that derives from the OpticalFlowDataset base class in dataset_base.py (tfoptflow/dataset_base.py). We provide several dataset handlers for well known datasets, such as MPI Sintel ( dataset_mpisintel.py (tfoptflow/dataset_mpisintel.py)), FlyingChairs ( dataset_flyingchairs.py (tfoptflow/dataset_flyingchairs.py)), FlyingThings3D ( dataset_flyingthings3d.py (tfoptflow/dataset_flyingthings3d.py)), and KITTI ( dataset_kitti.py (tfoptflow/dataset_kitti.py)). Anyone of them is a good starting point to figure out how to implement your own. Please note that that this is not complicated work; the derived class does little beyond telling the base class which list of files are to be used for training, validation, and testing, leaving the heavy lifting to the base class. Once you have a data handler, you can pass it to a ModelPWCNet object and call its predict() method to generate flow predictions for its test split, as shown in the pwcnet_predict.ipynb (tfoptflow/pwcnet_predict.ipynb) notebook and the pwcnet_predict.py (tfoptflow/pwcnet_predict.py) script. Datasets Datasets most commonly used for optical flow estimation include: FlyingThings3D image pairs + flows + all_unused_files.txt FlyingChairs images pairs + flows + FlyingChairs_train_val split MPI Sintel zip KITTI Flow 2012 zip and/or KITTI Flow 2015 zip Additional optical flow datasets (not used here): Middlebury Optical Flow web Heidelberg HD1K Flow web Per 2018a ( 2018a), KITTI and Sintel are currently the most challenging and widely used benchmarks for optical flow. The KITTI benchmark is targeted at autonomous driving applications and its semi dense ground truth is collected using LIDAR. The 2012 set only consists of static scenes. The 2015 set is extended to dynamic scenes via human annotations and more challenging to existing methods because of the large motion, severe illumination changes, and occlusions. The Sintel benchmark is created using the open source graphics movie Sintel with two passes, clean and final. The final pass contains strong atmospheric effects, motion blur, and camera noise, which cause severe problems to existing methods. References 2018 2018a Sun et al. 2018. PWC Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume. arXiv web PyTorch (Official) PyTorch PyTorch Caffe (Official) TensorFlow TensorFlow Video Video 2017 2017a Baghaie et al. 2017. Dense Descriptors for Optical Flow Estimation: A Comparative Study. web 2016 2016a Ilg et al. 2016. FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks. arXiv PyTorch (Official) TensorFlow 2016b Ranjan et al. 2016. SpyNet: Optical Flow Estimation using a Spatial Pyramid Network. arXiv Torch (Official) PyTorch 2015 2015a Fischer et al. 2015. FlowNet: Learning Optical Flow with Convolutional Networks. arXiv Tensorflow (FlowNet S) Acknowledgments Other TensorFlow implementations we are indebted to: by daigo0927 by djl11 by PatWie @InProceedings{Sun2018PWC Net, author {Deqing Sun and Xiaodong Yang and Ming Yu Liu and Jan Kautz}, title {{PWC Net}: {CNNs} for Optical Flow Using Pyramid, Warping, and Cost Volume}, booktitle CVPR, year {2018}, } @InProceedings\{DFIB15, author A. Dosovitskiy and P. Fischer and E. Ilg and P. H{\ a}usser and C. Hazirbas and V. Golkov and P. v.d. Smagt and D. Cremers and T. Brox , title FlowNet: Learning Optical Flow with Convolutional Networks , booktitle IEEE International Conference on Computer Vision (ICCV) , month Dec , year 2015 , url } Contact Info If you have any questions about this work, please feel free to contact us here:",Optical Flow Estimation,Vision Other 2326,Computer Vision,Computer Vision,Computer Vision,"3D ResNets for Action Recognition Update (2018/2/21) Our paper Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? is accepted to CVPR2018! We update the paper information. Update (2018/01/16) We uploaded some of fine tuned models on UCF 101 and HMDB 51. ResNeXt 101 fine tuned on UCF 101 (split1) ResNeXt 101 (64 frame inputs) fine tuned on UCF 101 (split1) ResNeXt 101 fine tuned on HMDB 51 (split1) ResNeXt 101 (64 frame inputs) fine tuned on HMDB 51 (split1) Update (2017/11/27) We published a new paper on arXiv. We also added the following new models and their Kinetics pretrained models in this repository. ResNet 50, 101, 152, 200 Pre activation ResNet 200 Wide ResNet 50 ResNeXt 101 DenseNet 121, 201 In addition, we supported new datasets (UCF 101 and HDMB 51) and fine tuning functions. Some minor changes are included. Outputs are normalized by softmax in test. If you do not want to perform the normalization, please use no_softmax_in_test option. Summary This is the PyTorch code for the following papers: Kensho Hara, Hirokatsu Kataoka, and Yutaka Satoh, Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? , Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6546 6555, 2018. Kensho Hara, Hirokatsu Kataoka, and Yutaka Satoh, Learning Spatio Temporal Features with 3D Residual Networks for Action Recognition , Proceedings of the ICCV Workshop on Action, Gesture, and Emotion Recognition, 2017. This code includes training, fine tuning and testing on Kinetics, ActivityNet, UCF 101, and HMDB 51. If you want to classify your videos or extract video features of them using our pretrained models, use this code . The Torch (Lua) version of this code is available here . Note that the Torch version only includes ResNet 18, 34, 50, 101, and 152. Citation If you use this code or pre trained models, please cite the following: bibtex @inproceedings{hara3dcnns, author {Kensho Hara and Hirokatsu Kataoka and Yutaka Satoh}, title {Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?}, booktitle {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, pages {6546 6555}, year {2018}, } Pre trained models Pre trained models are available here . All models are trained on Kinetics. ResNeXt 101 achieved the best performance in our experiments. (See paper in details.) misc resnet 18 kinetics.pth: model resnet model_depth 18 resnet_shortcut A resnet 34 kinetics.pth: model resnet model_depth 34 resnet_shortcut A resnet 34 kinetics cpu.pth: CPU ver. of resnet 34 kinetics.pth resnet 50 kinetics.pth: model resnet model_depth 50 resnet_shortcut B resnet 101 kinetics.pth: model resnet model_depth 101 resnet_shortcut B resnet 152 kinetics.pth: model resnet model_depth 152 resnet_shortcut B resnet 200 kinetics.pth: model resnet model_depth 200 resnet_shortcut B preresnet 200 kinetics.pth: model preresnet model_depth 200 resnet_shortcut B wideresnet 50 kinetics.pth: model wideresnet model_depth 50 resnet_shortcut B wide_resnet_k 2 resnext 101 kinetics.pth: model resnext model_depth 101 resnet_shortcut B resnext_cardinality 32 densenet 121 kinetics.pth: model densenet model_depth 121 densenet 201 kinetics.pth: model densenet model_depth 201 Some of fine tuned models on UCF 101 and HMDB 51 (split 1) are also available. misc resnext 101 kinetics ucf101_split1.pth: model resnext model_depth 101 resnet_shortcut B resnext_cardinality 32 resnext 101 64f kinetics ucf101_split1.pth: model resnext model_depth 101 resnet_shortcut B resnext_cardinality 32 sample_duration 64 resnext 101 kinetics hmdb51_split1.pth: model resnext model_depth 101 resnet_shortcut B resnext_cardinality 32 resnext 101 64f kinetics hmdb51_split1.pth: model resnext model_depth 101 resnet_shortcut B resnext_cardinality 32 sample_duration 64 Performance of the models on Kinetics This table shows the averaged accuracies over top 1 and top 5 on Kinetics. Method Accuracies : : : ResNet 18 66.1 ResNet 34 71.0 ResNet 50 72.2 ResNet 101 73.3 ResNet 152 73.7 ResNet 200 73.7 ResNet 200 (pre act) 73.4 Wide ResNet 50 74.7 ResNeXt 101 75.4 DenseNet 121 70.8 DenseNet 201 72.3 Requirements PyTorch bash conda install pytorch torchvision cuda80 c soumith FFmpeg, FFprobe bash wget tar xvf ffmpeg release 64bit static.tar.xz cd ./ffmpeg 3.3.3 64bit static/; sudo cp ffmpeg ffprobe /usr/local/bin; Python 3 Preparation ActivityNet Download videos using the official crawler . Convert from avi to jpg files using utils/video_jpg.py bash python utils/video_jpg.py avi_video_directory jpg_video_directory Generate fps files using utils/fps.py bash python utils/fps.py avi_video_directory jpg_video_directory Kinetics Download videos using the official crawler . Locate test set in video_directory/test . Convert from avi to jpg files using utils/video_jpg_kinetics.py bash python utils/video_jpg_kinetics.py avi_video_directory jpg_video_directory Generate n_frames files using utils/n_frames_kinetics.py bash python utils/n_frames_kinetics.py jpg_video_directory Generate annotation file in json format similar to ActivityNet using utils/kinetics_json.py The CSV files (kinetics_{train, val, test}.csv) are included in the crawler. bash python utils/kinetics_json.py train_csv_path val_csv_path test_csv_path dst_json_path UCF 101 Download videos and train/test splits here . Convert from avi to jpg files using utils/video_jpg_ucf101_hmdb51.py bash python utils/video_jpg_ucf101_hmdb51.py avi_video_directory jpg_video_directory Generate n_frames files using utils/n_frames_ucf101_hmdb51.py bash python utils/n_frames_ucf101_hmdb51.py jpg_video_directory Generate annotation file in json format similar to ActivityNet using utils/ucf101_json.py annotation_dir_path includes classInd.txt, trainlist0{1, 2, 3}.txt, testlist0{1, 2, 3}.txt bash python utils/ucf101_json.py annotation_dir_path HMDB 51 Download videos and train/test splits here . Convert from avi to jpg files using utils/video_jpg_ucf101_hmdb51.py bash python utils/video_jpg_ucf101_hmdb51.py avi_video_directory jpg_video_directory Generate n_frames files using utils/n_frames_ucf101_hmdb51.py bash python utils/n_frames_ucf101_hmdb51.py jpg_video_directory Generate annotation file in json format similar to ActivityNet using utils/hmdb51_json.py annotation_dir_path includes brush_hair_test_split1.txt, ... bash python utils/hmdb51_json.py annotation_dir_path Running the code Assume the structure of data directories is the following: misc / data/ kinetics_videos/ jpg/ .../ (directories of class names) .../ (directories of video names) ... (jpg files) results/ save_100.pth kinetics.json Confirm all options. bash python main.lua h Train ResNets 34 on the Kinetics dataset (400 classes) with 4 CPU threads (for data loading). Batch size is 128. Save models at every 5 epochs. All GPUs is used for the training. If you want a part of GPUs, use CUDA_VISIBLE_DEVICES ... . bash python main.py root_path /data video_path kinetics_videos/jpg annotation_path kinetics.json \ result_path results dataset kinetics model resnet \ model_depth 34 n_classes 400 batch_size 128 n_threads 4 checkpoint 5 Continue Training from epoch 101. (/data/results/save_100.pth is loaded.) bash python main.py root_path /data video_path kinetics_videos/jpg annotation_path kinetics.json \ result_path results dataset kinetics resume_path results/save_100.pth \ model_depth 34 n_classes 400 batch_size 128 n_threads 4 checkpoint 5 Fine tuning conv5_x and fc layers of a pretrained model (/data/models/resnet 34 kinetics.pth) on UCF 101. bash python main.py root_path /data video_path ucf101_videos/jpg annotation_path ucf101_01.json \ result_path results dataset ucf101 n_classes 400 n_finetune_classes 101 \ pretrain_path models/resnet 34 kinetics.pth ft_begin_index 4 \ model resnet model_depth 34 resnet_shortcut A batch_size 128 n_threads 4 checkpoint 5",Action Recognition,Vision Other 2335,Computer Vision,Computer Vision,Computer Vision,"Urban Safety Perception Bogota 2018 Image set imgset1 11.zip 5505 street images of the Chapinero locality Indexed actual vote image pair annotations descriptorIndexer_Jul_0518.txt Indexing starts at 1 not at 0 !! 18959 annotations Visual survey published image name list imgnames.txt Visual survey published image feature vector files cielabA, gist and hog features.zip 1. cielabA2.txt 2. cielabB2.txt 3. cielabL2.txt 4. gistSet.txt 5. hogSet.txt VGG19_features.zip 1. VGG19Chapinero_Ftrs.csv notebooks transfer4uspVGG16FtrExtr.ipynb Used for image VGG19 based feature extraction randomVoteSchemeIII.ipynb and randomVoteSchemeIV.ipynb Used to generate synthetic vote image pairs. transfer4uspVGG16SoftMaxNonEQU.ipynb Training notebook transfer4uspVGG16NonEQUVerify_Jul_0518.zip Tensor Flow model TrueskillImgScoreShmIII.ipynb Top 40 image rating visualization True Skill based predictors VGG19ChapineroTSkillPredictor.py VGG19MartiresTSkillPredictor.py VGG19UsaquenTSkillPredictor.py OpenMPI C Code for SVM parameter grid exploration This works along with NFS and MPI multi core machine cluster mpi_svmprtrs.c Papers Acosta, S., Camargo, J. City safety perception model based on visual content of street images . IEEE IV International Smart Cities Conference (ISC2), 2018. Acosta, S., Camargo, J. Predicting city safety perception based on visual image content . 23rd Iberoamerican Congress on Pattern Recognition (CIARP), 2018. Acosta S.F., Camargo J.E. (2019) Predicting City Safety Perception Based on Visual Image Content. In: Vera Rodriguez R., Fierrez J., Morales A. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2018. Lecture Notes in Computer Science, vol 11401. Springer, Cham.",Safety Perception Recognition,Vision Other 2347,Computer Vision,Computer Vision,Computer Vision,"PWC PWC PWC PWC Multi view to Novel view: Synthesizing Novel Views with Self Learned Confidence Descriptions This project is a TensorFlow implementation of Multi view to Novel view: Synthesizing Novel Views with Self Learned Confidence , which is published in ECCV 2018 . We provide codes, datasets , and checkpoints . In this work, we address the task of multi view novel view synthesis , where we are interested in synthesizing a target image with an arbitrary camera pose from given source images. An illustration of the task is as follows. We propose an end to end trainable framework that learns to exploit multiple viewpoints to synthesize a novel view without any 3D supervision. Specifically, our model consists of a flow prediction module ( flow predictor) and a pixel generation module ( recurrent pixel generator ) to directly leverage information presented in source views as well as hallucinate missing pixels from statistical priors. To merge the predictions produced by the two modules given multi view source images, we introduce a self learned confidence aggregation mechanism . An illustration of the proposed framework is as follows. We evaluate our model on images rendered from 3D object models ( ShapeNet ) as well as real and synthesized scenes ( KITTI and Synthia ). We demonstrate that our model is able to achieve state of the art results as well as progressively improve its predictions when more source images are available. A simpler novel view synthesis codebase can be found at Novel View Synthesis in TensorFlow , where all the data loaders, as well as training/testing scripts, are well configured, and you can just play with models. Prerequisites Python 2.7 Tensorflow 1.3.0 NumPy colorlog h5py imageio six Datasets All datasets are stored as HDF5 files, and the links are as follows. Each data point (HDF5 group) contains an image and its camera pose. ShapeNet Download from car (150GB) chair (14GB) Put the file to this directory ./datasets/shapenet . KITTI Download from here (4.3GB) Put the file to this directory ./datasets/kitti . Synthia Download from here (3.3GB) Put the file to this directory ./datasets/synthia . Usage After downloading the datasets, we can start to train models with the following command: Train bash $ python trainer.py batch_size 8 dataset car num_input 4 Selected arguments (see the trainer.py for more details) prefix: a nickname for the training dataset: choose among car , chair , kitti , and synthia . You can also add your own datasets. Checkpoints: specify the path to a pre trained checkpoint checkpoint: load all the parameters including the flow and pixel modules and the discriminator. Logging log\_setp: the frequency of outputing log info ( train step 681 Loss: 0.51319 (1.896 sec/batch, 16.878 instances/sec) ) ckpt\_save\_step: the frequency of saving a checkpoint test\_sample\_step: the frequency of performing testing inference during training (default 100) write\_summary\_step: the frequency of writing TensorBoard summaries (default 100) Hyperparameters num\_input: the number of source images batch\_size: the mini batch size (default 8) max\_steps: the max training iterations GAN gan\_type: the type of GAN losses such as LS GAN, WGAN, etc Interpret TensorBoard Launch Tensorboard and go to the specified port, you can see differernt losses in the scalars tab and plotted images in the images tab. The plotted images could be interpreted as follows. Test We can also evaluate trained models or the checkpoints provided by the authors with the following command: bash $ python evaler.py dataset car data_id_list ./testing_tuple_lists/id_car_random_elevation.txt train_dir /path/to/the/training/dir/ OR checkpoint /path/to/the/trained/model loss True write_summary True summary_file log_car.txt plot_image True output_dir img_car Selected arguments (see the evaler.py for more details) Id list data_id_list: specify a list of data point that you want to evaluate Task loss: report the loss write_summary: write the summary of this evaluation as a text file plot_image: render synthesized images Output quiet: only display the final report summary_file: the path to the summary file output_dir: the output dir of plotted images Result ShapeNet Cars More results for ShapeNet cars (1k randomly samlped results from all 10k testing data) ShapeNet Chairs More results for ShapeNet cars (1k randomly samlped results from all 10k testing data) Scenes: KITTI and Synthia Checkpoints We provide checkpoints and evaluation report files of our models for all eooxperiments. ShapeNet Cars ShapeNet Chairs KITTI Synthia Related work \ L_1\ Multi view 3D Models from Single Images with a Convolutional Network in CVPR 2016 \ Appearance Flow\ View Synthesis by Appearance Flow in ECCV 2016 \ TVSN\ Transformation Grounded Image Generation Network for Novel 3D View Synthesis in CVPR 2017 Neural scene representation and rendering in Science 2018 Weakly supervised Disentangling with Recurrent Transformations for 3D View Synthesis in NIPS 2015 DeepStereo: Learning to Predict New Views From the World's Imagery in CVPR 2016 Learning Based View Synthesis for Light Field Cameras in SIGGRAPH Asia 2016 Cite the paper If you find this useful, please cite @inproceedings{sun2018multiview, title {Multi view to Novel View: Synthesizing Novel Views with Self Learned Confidence}, author {Sun, Shao Hua and Huh, Minyoung and Liao, Yuan Hong and Zhang, Ning and Lim, Joseph J}, booktitle {European Conference on Computer Vision}, year {2018}, } Authors Shao Hua Sun , Minyoung Huh , Yuan Hong Liao , Ning Zhang , and Joseph J. Lim",Novel View Synthesis,Vision Other 2396,Computer Vision,Computer Vision,Computer Vision,"Scan2CAD (CVPR 2019 Oral) We present Scan2CAD , a novel data driven method that learns to align 3D CAD models from a shape database to 3D scans. Download Paper (.pdf) See Youtube Video Link to the annotation webapp source code Demo samples Scan2CAD Alignments Orientated Bounding Boxes for Objects Description Dataset used in the research project: Scan2CAD: Learning CAD Model Alignment in RGB D Scans For the public dataset, we provide annotations with: 97607 keypoint correspondences between Scan and CAD models 14225 objects between Scan and CAD 1506 scans An additional annotated hidden testset, that is used for our Scan2CAD benchmark contains: 7557 keypoint correspondences between Scan and CAD models 1160 objects between Scan and CAD 97 scans Benchmark We published a new benchmark for CAD model alignment in 3D scans (and more tasks to come) here . Get started 1. Clone repo: git clone 2. Ask for dataset: (see sections below. You will need ScanNet , ShapeNet and Scan2CAD ). 3. Copy dataset content into ./Routines/Script/ . 4. Visualize data: python3 ./Routines/Script/Annotation2Mesh.py 5. Compile c++ programs cd {Vox2Mesh, DFGen, CropCentered} make 6. Voxelize CADs (shapenet): python3 ./Routines/Script/CADVoxelization.py 7. Generate data (correspondences): python3 ./Routines/Script/GenerateCorrespondences.py 8. Start pytorch training for heatmap prediction: comming soon Download Scan2CAD Dataset (Annotation Data) If you would like to download the Scan2CAD dataset, please fill out this google form . A download link will be provided to download a .zip file (approx. 8MB) that contains the dataset. Format of the Datasets Format of full_annotions.json The file contains 1506 entries, where the field of one entry is described as: javascript { id_scan : scannet scene id , trs : { // Data Generation for Scan2CAD Alignment Scan and CAD Repository In this work we used 3D scans from the ScanNet dataset and CAD models from ShapeNetCore (version 2.0) . If you want to use it too, then you have to send an email and ask for the data they usually do it very quickly. Here is a sample (see in ./Assets/scannet sample/ and ./Assets/shapenet sample/ ): Voxelization of Data as Signed Distance Function (sdf) and unsigned Distance Function (df) files The data must be processed such that scans are represented as sdf and CADs as df voxel grids as illustrated here (see in ./Assets/scannet voxelized sdf sample/ and ./Assets/shapenet voxelized df sample/ ): In order to create sdf voxel grids from the scans, volumetric fusion is performed to fuse depth maps into a voxel grid containing the entire scene. For the sdf grid we used a voxel resolution of 3cm and a truncation distance of 15cm . In order to generate the df voxel grids for the CADs we used a modification (see CADVoxelization.py ) of this repo (thanks to @christopherbatty). Creating Training Samples In order to generate training samples for your CNN, you can run ./Routines/Script/GenerateCorrespondences.py . From the Scan2CAD dataset this will generate following: 1. Centered crops of the scan 2. Heatmaps on the CAD ( correspondence to the scan) 3. Scale (x,y,z) for the CAD 4. Match (0/1) indicates whether both inputs match semantically The generated data totals to approximately 500GB . Here is an example of the data generation (see in ./Assets/training data/scan centers sample/ and ./Assets/training data/CAD heatmaps sample/ ) Citation If you use this dataset or code please cite: @article{avetisyan2018scan2cad, title {Scan2CAD: Learning CAD Model Alignment in RGB D Scans}, author {Avetisyan, Armen and Dahnert, Manuel and Dai, Angela and Savva, Manolis and Chang, Angel X and Nie{\ss}ner, Matthias}, journal {arXiv preprint arXiv:1811.11187}, year {2018} }",3D Reconstruction,Vision Other 2397,Computer Vision,Computer Vision,Computer Vision,"Scan2CAD Annotation Webapp Description: Annotation webapp used in the research project Scan2CAD: Learning CAD Model Alignment in RGB D Scans : Download Paper (.pdf) See Youtube Video Demo 1. Step: Select Suitable CAD from a Pool 2. Step: Align Scan Object with CAD How To Use Get started 1. Clone this repo 2. cd repo name (enter downloaded repository folder) 3. Install nodejs ( npm ) from 4. Run npm install for client side and cd ./server/ && npm install for server side. This will install all dependencies specified in package.json 5. Run ./build.sh to compile Run ./watch.sh to develop with javascript (compiles with every change). 7. Edit ./server/config.js to specify your scan and CAD repository. Also edit the mongodb database to save the results. That means enter credentials to access your mongodb server (e.g. guest:guest ). 8. Run ./server/run.sh to start the server. 9. Go to localhost:8080/Scan2CAD/menu Create MongoDB database to store the results mongodb is a really nice app and works with javascript, python, c++, etc. . So very convinient to use, that's why we will store the annotation result in a mongo database: 1. Install mongodb 2. Login with mongo admin 3. In the mongo shell create a db and a collection: use scan2cad db.createCollection( correspondences ) 4. Now create a user to login: use admin db.createUser({user : guest , pwd : guest , roles : {role : readWrite , db : scan2cad } }) show users Hook your own scan and CAD repository We used ScanNet as scan dataset and and ShapeNet as CAD dataset to do the annotations. If you want to use it too, then you have to send an email and ask for the data they usually do it very quickly). However, you can your own datasets with the following steps: 1. All the routing to the datasets is done in ./server/routing/ . Type a name in ./server/config.js for dataset_scan and dataset_cad . Example: dataset_scan scannet and dataset_cad shapenet . Probably you will spend most of your time in the ./server/routing folder because that is where all the data to the webapp is served. 2. Create a scannet.js and a shapenet.js file in ./server/routing . Those files will provide the webapp with the approriate meshes, textures, labels, thumbnails etc. 3. Create a scannet and shapenet folder in ./server./static . In here you will symlink to your actual dataset. Notes about the CAD data structure You will notice some things that this app asks from you. Of course you can hack the source code and comment out the parts when it wants something from you (it's ok to do it). Thumbnails For instance CAD models should have thumbnails. You can just comment out the loading of the thumbnails but the annotation process is much easier with thumbnails (see the video). Category Also this webapp wants every CAD models to have two things: An id_cad : An unique id per CAD model A catid_cad : a category id. For instance chairs 001 , tables 002 , etc. T Internally it juggles around with both ids. Notes about the Scan data structure The webapp wants also something extra except the geometry mesh. That is, it asks for a semantically labeled mesh. This is needed because when you hover over the scan and click on a surface point, then the class name is looked up from the labeled mesh: such that you don't have to type it. See following image, (left) class labelled mesh (right) raw mesh If you cannot provide a labelled mesh or don't know how to do it. Then just comment out some parts of the source code or just provide a fake labelled mesh. The webapp will always then say Maybe: Unknown . And you will have to type in the category yourself everytime you search for a CAD model. Citation If you use this code please cite: @article{avetisyan2018scan2cad, title {Scan2CAD: Learning CAD Model Alignment in RGB D Scans}, author {Avetisyan, Armen and Dahnert, Manuel and Dai, Angela and Savva, Manolis and Chang, Angel X and Nie{\ss}ner, Matthias}, journal {arXiv preprint arXiv:1811.11187}, year {2018} }",3D Reconstruction,Vision Other 2499,Computer Vision,Computer Vision,Computer Vision,"Deep Image Matching Graduate Course Project: Deep Learning for Biometrics This is a Keras implementation of a deep image matching scheme, where the image descriptors(features) are computed using the concatenation of a VGG16 network (without the fully connected layers), a pooling layer, normalization layer, PCA layer, and another normalization layer. During the training phase, the VGG16 weights and PCA layer weights are fine tuned using a Siamese neural network: the Siamese neural network is composed of three identical copies of the deep network described above. The training (triplet) loss defined with the Siamese network allows for similar images to be closer and dissimilar images to be further apart in the descriptor/feature space. This training architecture uses ideas from the following papers: Training The fine tuning of the Siamese neural network weights can be performed by running the demo FineTuning_main.ipynb . For generating training data, importing the network architecture, and fine tuning the network weights, the following modules are called by the FineTuning_main.ipynb demo file, sequentially. modules/modules_split_data splits input data into train, test and validation data modules/load_model.py loads the Siamese network by calling modules/Deep_Retrieval_Siamese_Architecture.py modules/modules_generating_triplets.py generates triplets of training images, ranked in order of decreasing hardness. Each triplet consists of an anchor image , relevant image that is similar to the anchor image, and an irrelevant image that is dissimilar to the anchor image. The closer the irrelevant image to the anchor image , or the further the relevant image to the anchor image , the more the harness of a given triplet of images, and vice versa. modules/modules_custom_callbacks.py contains the custom callback modules that are called at the end of every epoch of training. This modules within are called to resample the training triplets, reset the training data generator, etc, after every epoch. All the modules called within the demo FineTuning_main.ipynb are placed in the /modules directory. Testing Run Test_gen_features_main.ipynb to obtain the test results. The demo file (i) loads the descriptor weights, (ii) computes and saves the features for all database images, (iii) finds the closest matching image in the database for each test query image, and (iv) reports the results in terms of mean average precision (mAP). All the modules called within Test_gen_features_main.ipynb are in the In this architecture, essentially, three identical copies of the NN architecture presented in are created. Specifically, in each of the three identical networks, (i) the image is passed as input to VGG16 network, (ii) the activations from the last layer of VGG16 network are either max pooled or sum pooled, (iii) the pooled activations are passed through a PCA layer, (iv) the activations of the PCA layer are used as image descriptors after further normalization. The three identical networks share the VGG16 weights as well as the PCA layer weights. Triplets of images are used for training the triplet network, such that there is a (i) base image, (ii) similar image, (iii) dissimilar image in each triplet of images. Correspondingly, a notion of triplet loss is formulated on top of these three identical networks such that the network weights are updated for (i) similar images to have smaller loss and (ii) dissimilar images to have larger loss. All the modules called within the demo Test_gen_features_main.ipynb are placed in the /modules/modules_generating_results.py file.",Image Retrieval,Vision Other 2539,Computer Vision,Computer Vision,Computer Vision,"TensorFlow Models This repository contains a number of different models implemented in TensorFlow : The official models (official) are a collection of example models that use TensorFlow's high level APIs. They are intended to be well maintained, tested, and kept up to date with the latest stable TensorFlow API. They should also be reasonably optimized for fast performance while still being easy to read. We especially recommend newer TensorFlow users to start here. The research models are a large collection of models implemented in TensorFlow by researchers. They are not officially supported or available in release branches; it is up to the individual researchers to maintain the models and/or provide support on issues and pull requests. The samples folder (samples) contains code snippets and smaller models that demonstrate features of TensorFlow, including code presented in various blog posts. The tutorials folder (tutorials) is a collection of models described in the TensorFlow tutorials . Contribution guidelines If you want to contribute to models, be sure to review the contribution guidelines (CONTRIBUTING.md). License Apache License 2.0 (LICENSE)",Image Retrieval,Vision Other 2619,Computer Vision,Computer Vision,Computer Vision,"LiteFlowNet This repository ( is the offical release of LiteFlowNet for my paper LiteFlowNet: A Lightweight Convolutional Neural Network for Optical Flow Estimation in CVPR18 (Spotlight). The up to date version of the paper is available on arXiv . LiteFlowNet is a lightweight, fast, and accurate opitcal flow CNN. We develop several specialized modules including cascaded flow inference, feature warping (f warp) layer, and flow regularization by feature driven local convolution (f lcon) layer. LiteFlowNet outperforms PWC Net (CVPR18) on KITTI and has a smaller model size. For more details about LiteFlowNet, you may visit my project page . KITTI12 Testing Set (Out Noc) KITTI15 Testing Set (Fl all) Model Size (M) FlowNet2 (CVPR17) 4.82% 11.48% 162.49 PWC Net (CVPR18) 4.22% 9.60% 8.75 LiteFlowNet (CVPR18) 3.27% 9.38% 5.37 NEW! Our extended work (LiteFlowNet2) is now available at License and Citation All code and other materials (including but not limited to the paper, figures, and tables) are provided for research purposes only and without any warranty. Any commercial use requires our consent. When using any parts of the code package or the paper ( LiteFlowNet: A Lightweight Convolutional Neural Network for Optical Flow Estimation ) in your work, please cite the following paper: @InProceedings{hui18liteflownet, author {Tak Wai Hui and Xiaoou Tang and Chen Change Loy}, title {LiteFlowNet: A Lightweight Convolutional Neural Network for Optical Flow Estimation}, booktitle {Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, year {2018}, url { } Datasets 1. FlyingChairs dataset (31GB) and train validation split . 2. RGB image pairs (clean pass) (37GB) and flow fields (311GB) for Things3D dataset. 3. Sintel dataset (clean + final passes) (5.3GB). 4. KITTI12 dataset (2GB) and KITTI15 dataset (2GB) (Simple registration is required). FlyingChairs FlyingThings3D Sintel KITTI Crop size 448 x 320 768 x 384 768 x 384 896 x 320 Batch size 8 4 4 4 PyTorch Reimplementation A PyTorch based reimplementation of LiteFlowNet is now available at Caffe Official The code package comes as the modified Caffe from DispFlowNet and FlowNet2 with our new layers, scripts, and trained models. Installation was tested under Ubuntu 14.04.5/16.04.2 with CUDA 8.0, cuDNN 5.1 and openCV 2.4.8/3.1.0. Edit Makefile.config (and Makefile) if necessary in order to fit your machine's settings. For openCV 3+, you may need to change opencv2/gpu/gpu.hpp to opencv2/cudaarithm.hpp in /src/caffe/layers/resample_layer.cu . If your machine installed a newer version of cuDNN, you do not need to downgrade it. You can do the following trick: 1. Download cudnn 8.0 linux x64 v5.1.tgz and untar it to a temp folder, say cuda 8 cudnn 5.1 2. Rename cudnn.h to cudnn 5.1.h in the folder /cuda 8 cudnn 5.1/include 3. $ sudo cp cuda 8 cudnn 5.1/include/cudnn 5.1.h /usr/local/cuda/include/ $ sudo cp cuda 8 cudnn 5.1/lib64/lib /usr/local/cuda/lib64/ 4. Replace include to include in /include/caffe/util/cudnn.hpp . Compiling $ cd LiteFlowNet $ make j 8 tools pycaffe Feature warping (f warp) layer The source files include /src/caffe/layers/warp_layer.cpp , /src/caffe/layers/warp_layer.cu , and /include/caffe/layers/warp_layer.hpp . The grid pattern that is used by f warp layer is generated by a grid layer. The source files include /src/caffe/layers/grid_layer.cpp and /include/caffe/layers/grid_layer.hpp . Feature driven local convolution (f lcon) layer It is implemented using off the shelf components. More details can be found in /models/testing/depoly.prototxt or /models/training_template/train.prototxt.template by locating the code segment NetE R . Other layers Two custom layers ( ExpMax and NegSquare ) are optimized in speed for forward pass. Training 1. Prepare the training set. In /data/make lmdbs train.sh , change YOUR_TRAINING_SET and YOUR_TESTING_SET to your favourite dataset. $ cd LiteFlowNet/data $ ./make lmdbs train.sh 2. Copy files from /models/training_template to a new model folder (e.g. NEW ). Edit all the files and make sure the settings are correct for your application. Model for the complete network is provided. LiteFlowNet uses stage wise training to boost the performance. Please refer to my paper for more details. $ mkdir LiteFlowNet/models/NEW $ cd LiteFlowNet/models/NEW $ cp ../training_template/solver.prototxt.template solver.prototxt $ cp ../training_template/train.prototxt.template train.prototxt $ cp ../training_template/train.py.template train.py 3. Create a soft link in your new model folder $ ln s ../../build/tools bin 4. Run the training script $ ./train.py gpu 0 2>&1 tee ./log.txt Trained models The trained models ( liteflownet , liteflownet ft sintel , liteflownet ft kitti ) are available in the folder /models/trained . Untar the files to the same folder before you use it. liteflownet : Trained on Chairs and then fine tuned on Things3D. liteflownet ft sintel : Model used for Sintel benchmark. liteflownet ft kitti : Model used for KITTI benchmark. Testing 1. Open the testing folder $ cd LiteFlowNet/models/testing 2. Create a soft link in the folder /testing $ ln s ../../build/tools bin 3. Replace MODE in ./test_MODE.py to batch if all the images has the same resolution (e.g. Sintel dataset), otherwise replace it to iter (e.g. KITTI dataset). 4. Replace MODEL in line 10 ( cnn_model 'MODEL' ) of test_MODE.py to one of the trained models (e.g. liteflownet ft sintel ). 5. Run the testing script. Flow fields ( MODEL 0000000.flo, MODEL 0000001.flo, ... etc) are stored in the folder /testing/results having the same order as the image pair sequence. $ test_MODE.py img1_pathList.txt img2_pathList.txt results Evaluation 1. End point error (EPE) per image can be calculated using the provided script /models/testing/util/endPointErr.m 2. Average end point error (AEE) is simply computed by taking the average of all EPE.",Optical Flow Estimation,Vision Other 2669,Computer Vision,Computer Vision,Computer Vision,"I3D models trained on Kinetics Overview For this project, I am building an end to end trainable behavior recognition system for mice using deep convolutional networks. These networks are inspired from Inception 3d, the current state of the art in video action recognition. Please find detailed information about this architecture in the paper Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset by Joao Carreira and Andrew Zisserman. The paper was posted on arXiv in May 2017, and was published as a CVPR 2017 conference paper. Below is an architecture diagram of Inception 3D. ! Alt text (imgs/acbm.png) Running the code Setup 1. Follow the instructions for installing Sonnet . 2. clone this repository using $ git clone 3. Add the cloned repository's parent path to $PYTHONPATH as follows cd /behavior_recognition; export PYTHONPATH $PYTHONPATH: Acknowledgments The Kinetics dataset Inception v1 Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset",Action Recognition,Vision Other 2670,Computer Vision,Computer Vision,Computer Vision,"I3D models trained on Kinetics Overview For this project, I am building an end to end trainable behavior recognition system for mice using deep convolutional networks. These networks are inspired from Inception 3d, the current state of the art in video action recognition. Please find detailed information about this architecture in the paper Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset by Joao Carreira and Andrew Zisserman. The paper was posted on arXiv in May 2017, and was published as a CVPR 2017 conference paper. Below is an architecture diagram of Inception 3D. ! Alt text (imgs/acbm.png) Running the code Setup 1. Follow the instructions for installing Sonnet . 2. clone this repository using $ git clone 3. Add the cloned repository's parent path to $PYTHONPATH as follows cd /behavior_recognition; export PYTHONPATH $PYTHONPATH: Acknowledgments The Kinetics dataset Inception v1 Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset",Action Recognition,Vision Other 2672,Computer Vision,Computer Vision,Computer Vision,"Pytorch Adversarial Domain Adaptation A collection of implementations of adversarial unsupervised domain adaptation algorithms. Domain adaptation ! (task.png) The goal of domain adaptation is to transfer the knowledge of a model to a different but related data distribution. The model is trained on a source dataset and applied to a target dataset (usually unlabeled). In this case, the model is trained on regular MNIST images, but we want to get good performance on MNIST with random color (without any labels). In adversarial domain adaptation, this problem is usually solved by training an auxiliary model called the domain discriminator. The goal of this model is to classify examples as coming from the source or target distribution. The original classifier will then try to maximize the loss of the domain discriminator, comparable to the GAN training procedure. Implemented papers Paper : Unsupervised Domain Adaptation by Backpropagation, Ganin & Lemptsky (2014) Link : Description : Negates the gradient of the discriminator for the feature extractor to train both networks simultaneously. Implementation : revgrad.py Paper : Adversarial Discriminative Domain Adaptation, Tzeng et al. (2017) Link : Description : Adapts the weights of a classifier pretrained on source data to produce similar features on the target data. Implementation : adda.py Paper : Wasserstein Distance Guided Representation Learning, Shen et al. (2017) Link : Description : Uses a domain critic to minimize the Wasserstein Distance (with Gradient Penalty) between domains. Implementation : wdgrl.py Results Method Accuracy on MNIST M Parameters Source only 0.33 RevGrad 0.74 default ADDA 0.76 default WDGRL 0.78 k clf 10 wd clf 0.1 Instructions 1. Download the BSDS500 dataset and extract it somewhere. Point the DATA_DIR variable in config.py to this location. 2. In a Python 3.6 environment, run: $ conda install pytorch torchvision numpy c pytorch $ pip install tqdm opencv python 3. Train a model on the source dataset with $ python train_source.py 4. Choose an algorithm and pass it the pretrained network, for example: $ python adda.py trained_models/source.pt",Unsupervised Image-To-Image Translation,Vision Other 2762,Computer Vision,Computer Vision,Computer Vision,"Pose Guided Structured Region Ensemble Network for Cascaded Hand Pose Estimation to appear in Neurocomputing By Xinghao Chen, Guijin Wang, Hengkai Guo, Cairong Zhang, Tsinghua University. \ arXiv\ \ Project Page\ ! demo1 (doc/demo_wjn.gif) ! demo2 (doc/demo_zmm.gif) \ Demos above are realtime results from Intel Realsense SR300 using models trained on Hands17 dataset. \ See more demos using pre trained models on ICVL, NYU and MSRA in src/demo (./src/demo). Introduction This repository contains the demo code for Pose REN , an accurate and fast method for depth based 3D hand pose estimation. ! framework (./doc/teaser.png) Figure 1: Framework of Pose REN. Citation If you find our work useful in your research, please consider citing: @article{chen2018pose, title {Pose Guided Structured Region Ensemble Network for Cascaded Hand Pose Estimation}, author {Chen, Xinghao and Wang, Guijin and Guo, Hengkai and Zhang, Cairong}, journal {Neurocomputing}, year {2018} } Requirements caffe pose OpenCV (with python interface) Optional: librealsense (for live demo only) Installation Clone caffe pose : git clone Install caffe: cd caffe pose cp Makefile.config.example Makefile.config uncomment WITH_PYTHON_LAYER : 1 change other settings accordingly make j16 make pycaffe j16 Add path/to/caffe pose/python to PYTHONPATH. We use a new layer called GenerateROILayer in Pose REN and the python and c++ implementations are located in src/libs . If you prefer using python layer, add path/to/src/libs to PYTHONPATH, otherwise copy generate_roi_layer.hpp/cpp to caffe pose , update caffe.proto with the provided patch caffe.patch.proto and build caffe again. Results & Models The tables below show the predicted labels and pretrained models on ICVL, NYU and MSRA dataset. All labels are in the format of (u, v, d) where u and v are pixel coordinates. Dataset Predicted Labels Models ICVL Download (./results/NEUCOM18_ICVL_Pose_REN.txt) \ Google Drive\ or \ Baidu Cloud\ NYU Download (./results/NEUCOM18_NYU_Pose_REN.txt) \ Google Drive\ or \ Baidu Cloud\ MSRA Download (./results/NEUCOM18_MSRA_Pose_REN.txt) \ Google Drive\ or \ Baidu Cloud\ HANDS17 \ Google Drive\ or \ Baidu Cloud\ Visualization Please use the Python script src/show_result.py to visualize the predicted results: bash $ python src/show_result.py icvl your/path/to/ICVL/test/Depth in_file results/NEUCOM18_ICVL_Pose_REN.txt You can see all the testing results on the images. Press 'q' to exit. Inference & Evaluation First copy and modify the example config.py for your setup. Please change data_dir and anno_dir accordingly. bash $ cp config.py.example config.py Use the Python script src/testing/predict.py for prediction with predefined centers in labels directory: bash $ python src/testing/predict.py icvl your/path/to/output/file.txt The script depends on pycaffe. Please see here for how to evaluate performance of hand pose estimation. Realsense Realtime Demo We provide a realtime hand pose estimation demo using Intel Realsense device. Note that we just use a naive depth thresholding method to detect the hand. Therefore, the hand should be in the range of 0, 650mm to run this demo. We tested this realtime demo with an Intel Realsense SR300 . Please use your right hand for this demo and try to avoid clustered foreground and redundant arm around the hand. Python demo with librealsense recommended First compile and install the librealsense and its python wrapper . After everything is working properly, just run the following python script for demo: bash python src/demo/realsense_realtime_demo_librealsense2.py By default this script uses pre trained weights on ICVL dataset. You can change the pre trained model by specifying the dataset. bash python src/demo/realsense_realtime_demo_librealsense2.py nyu/msra/icvl/hands17 Notes: The speed of this python demo is not optimal and it runs slightly slower than the c++ demo. C++ demo First compile and build: cd src/demo/pose ren demo cpp mkdir build cd build cmake .. make j16 Run the demo by: cd .. redirect to src/demo/pose ren demo cpp ./build/src/PoseREN run By default it uses pre trained weights on Hands17 dataset. You can change the pre trained model by specifying the dataset. bash ./build/src/PoseREN nyu/msra/icvl/hands17 Notes: This C++ demo is not fully developed and you may have to deal with some dependency problems to make it works. It serves as a preliminary project to demonstrate how to use Pose REN in C++. License Our code and pre trained models are available for non commercial research purposes only. Contact chenxinghaothu at gmail.com",Hand Pose Estimation,Vision Other 2788,Computer Vision,Computer Vision,Computer Vision,"License CC BY NC SA 4.0 ! Python 2.7 PWC Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume License Copyright (C) 2018 NVIDIA Corporation. All rights reserved. Licensed under the CC BY NC SA 4.0 license . Usage For Caffe users, please refer to Caffe/README.md (Caffe/README.md). For PyTorch users, please refer to PyTorch/README.md (PyTorch/README.md) Note that, currently, the PyTorch implementation is inferior to the Caffe implementation (3% performance drop on Sintel). These are due to differences in implementation between Caffe and PyTorch, such as image resizing and I/O. Network Architecture PWC Net fuses several classic optical flow estimation techniques, including image pyramid, warping, and cost volume, in an end to end trainable deep neural networks for achieving state of the art results. ! (network.png) Paper & Citation Deqing Sun, Xiaodong Yang, Ming Yu Liu, and Jan Kautz. PWC Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume. CVPR 2018 or arXiv:1709.02371 Project page link Talk at robust vision challenge workshop Talk at CVPR 2018 conference (starts 7:00) If you use PWC Net, please cite the following paper: @InProceedings{Sun2018PWC Net, author {Deqing Sun and Xiaodong Yang and Ming Yu Liu and Jan Kautz}, title {{PWC Net}: {CNNs} for Optical Flow Using Pyramid, Warping, and Cost Volume}, booktitle CVPR, year {2018}, } or the arXiv paper @article{sun2017pwc, author {Sun, Deqing and Yang, Xiaodong and Liu, Ming Yu and Kautz, Jan}, title {{PWC Net}: {CNNs} for Optical Flow Using Pyramid, Warping, and Cost Volume}, journal {arXiv preprint arXiv:1709.02371}, year {2017} } Related Work from NVIDIA flownet2 pytorch Contact Deqing Sun (deqings@nvidia.com)",Dense Pixel Correspondence Estimation,Vision Other 2794,Computer Vision,Computer Vision,Computer Vision,"! Python 3.6 Video Relationship Reasoning using Gated Spatio Temporal Energy Graph Pytorch implementation for learning an observation Gated Spatio Temporal Energy Graph for Video Relationship Reasoning on Charades dataset . Contact: Yao Hung Hubert Tsai (yaohungt@cs.cmu.edu) Paper Video Relationship Reasoning using Gated Spatio Temporal Energy Graph Yao Hung Hubert Tsai , Santosh Divvala , Louis Philippe Morency , Ruslan Salakhutdinov and Ali Farhadi Computer Vision and Pattern Recognition (CVPR), 2019. Please cite our paper if you find the code, dataset, or the experimental setting useful for your research. @inproceedings{tsai2019GSTEG, title {Video Relationship Reasoning using Gated Spatio Temporal Energy Graph}, author {Tsai, Yao Hung Hubert and Divvala, Santosh and Morency, Louis Philippe and Salakhutdinov, Ruslan and Farhadi, Ali}, booktitle {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, year {2019} } Overview Relationship Reasoning in Videos Visual relationship reasoning in images (top) vs. videos (bottom): Given a single image, it is ambiguous whether the monkey is creeping up or down the car . Using a video not only helps to unambiguously recognize a richer set of relations, but also model temporal correlations across them (e.g., creep down and jump left ). Gated Spatio Temporal Energy Graph An overview of our Proposed Gated Spatio Temporal Energy Graph. Given an input instance (a video clip), we predict the output relationships (e.g., { monkey, creep down, car }, etc.,) by reasoning over a fully connected spatio temporal graph with nodes S (Subject), P (Predicate) and O (Object). Instead of assuming a non gated (i.e., predefined or globally learned) pairwise energy function, we explore the use of gated energy functions (i.e., conditioned on the specific visual observation). Usage Prerequisites Python 3.6 Pytorch and torchvision Datasets Charades dataset Charades' Training and Validation Annotations Pretrained Model Download the pretrained (with Kinetics Dataset) I3D model here . Run the Code 1. Modify exp/GSTEG.py Create the cache directory Specify the location of the data, training/validation split, and pretrained model. 2. Command as follows python3 exp/GSTEG.py Acknowledgement A large portion of the code comes from the Temporal Fields , VidVRD , and ImageNet repo.",Action Recognition,Vision Other 2894,Computer Vision,Computer Vision,Computer Vision,"Mask Scoring R CNN (MS R CNN) By Zhaojin Huang , Lichao Huang , Yongchao Gong , Chang Huang , Xinggang Wang . CVPR 2019 Oral Paper This project is based on maskrcnn benchmark . Introduction Mask Scoring R CNN contains a network block to learn the quality of the predicted instance masks. The proposed network block takes the instance feature and the corresponding predicted mask together to regress the mask IoU. The mask scoring strategy calibrates the misalignment between mask quality and mask score, and improves instance segmentation performance by prioritizing more accurate mask predictions during COCO AP evaluation. By extensive evaluations on the COCO dataset, Mask Scoring R CNN brings consistent and noticeable gain with different models and different frameworks. The network of MS R CNN is as follows: ! alt text (demo/network.png) Install Check INSTALL.md (INSTALL.md) for installation instructions. Prepare Data mkdir p datasets/coco ln s /path_to_coco_dataset/annotations datasets/coco/annotations ln s /path_to_coco_dataset/train2014 datasets/coco/train2014 ln s /path_to_coco_dataset/test2014 datasets/coco/test2014 ln s /path_to_coco_dataset/val2014 datasets/coco/val2014 Pretrained Models mkdir pretrained_models The pretrained models will be downloaded when running the program. My training log and pre trained models can be found here link or link (pw:xm3f). Running Single GPU Training python tools/train_net.py config file configs/e2e_ms_rcnn_R_50_FPN_1x.yaml SOLVER.IMS_PER_BATCH 2 SOLVER.BASE_LR 0.0025 SOLVER.MAX_ITER 720000 SOLVER.STEPS (480000, 640000) TEST.IMS_PER_BATCH 1 Multi GPU Training export NGPUS 8 python m torch.distributed.launch nproc_per_node $NGPUS tools/train_net.py config file configs/e2e_ms_rcnn_R_50_FPN_1x.yaml Results NetWork Method mAP(mask) mAP(det) ResNet 50 FPN Mask R CNN 34.2 37.8 ResNet 50 FPN MS R CNN 35.6 37.9 ResNet 101 FPN Mask R CNN 36.1 40.1 ResNet 101 FPN MS R CNN 37.4 40.1 Visualization ! alt text (demo/demo.png) The left four images show good detection results with high classification scores but low mask quality. Our method aims at solving this problem. The rightmost image shows the case of a good mask with a high classification score. Our method will retrain the high score. As can be seen, scores predicted by our model can better interpret the actual mask quality. Acknowledgment The work was done during an internship at Horizon Robotics . Citations If you find MS R CNN useful in your research, please consider citing: @inproceedings{huang2019msrcnn, author {Zhaojin Huang and Lichao Huang and Yongchao Gong and Chang Huang and Xinggang Wang}, title {{Mask Scoring R CNN}}, booktitle {CVPR}, year {2019}, } License maskscoring_rcnn is released under the MIT license. See LICENSE (LICENSE) for additional details. Thanks to the Third Party Libs maskrcnn benchmark Pytorch",Instance Segmentation,Vision Other 1944,Speech,Speech,Other,"This project is a part of Mozilla Common Voice . TTS aims a deep learning based Text2Speech engine, low in cost and high in quality. To begin with, you can hear a sample generated voice from here . The model architecture is highly inspired by Tacotron: A Fully End to End Text To Speech Synthesis Model . However, it has many important updates that make training faster and computationally very efficient. Feel free to experiment with new ideas and propose changes. You can find here a brief note about TTS architectures and their comparisons. Requirements and Installation Highly recommended to use miniconda for easier installation. python> 3.6 pytorch> 0.4.1 librosa tensorboard tensorboardX matplotlib unidecode Install TTS using setup.py . It will install all of the requirements automatically and make TTS available to all the python environment as an ordinary python module. python setup.py develop Or you can use requirements.txt to install the requirements only. pip install r requirements.txt Docker A barebone Dockerfile exists at the root of the project, which should let you quickly setup the environment. By default, it will start the server and let you query it. Make sure to use nvidia docker to use your GPUs. Make sure you follow the instructions in the server README (server/README.md) before you build your image so that the server can find the model within the image. docker build t mozilla tts . nvidia docker run it rm p 5002:5002 mozilla tts Checkpoints and Audio Samples Check out here to compare the samples (except the first) below. Models Dataset Commit Audio Sample Details : : : : : : iter 62410 LJSpeech 99d56f7 link First model with plain Tacotron implementation. iter 170K LJSpeech e00bc66 link More stable and longer trained model. iter 270K LJSpeech 256ed63 link Stop Token prediction is added, to detect end of speech. iter 120K LJSpeech bf7590 link Better for longer sentences iter 108K TWEB 2810d57 link Best: iter 185K LJSpeech db7f3d3 link link Example Model Outputs Below you see model state after 16K iterations with batch size 32. > Recent research at Harvard has shown meditating for as little as 8 weeks can actually increase the grey matter in the parts of the brain responsible for emotional regulation and learning. Audio output: ! example_model_output (images/example_model_output.png?raw true) Runtime The most time consuming part is the vocoder algorithm (Griffin Lim) which runs on CPU. By setting its number of iterations, you might have faster execution with a small loss of quality. Some of the experimental values are below. Sentence: It took me quite a long time to develop a voice, and now that I have it I'm not going to be silent. Audio length is approximately 6 secs. Time (secs) System GL iters : : 2.00 GTX1080Ti 30 3.01 GTX1080Ti 60 Datasets and Data Loading TTS provides a generic dataloder easy to use for new datasets. You need to write an adaptor to format and that's all you need.Check datasets/preprocess.py to see example adaptors. After you wrote an adaptor, you need to set dataset field in config.json . Do not forget other data related fields. You can also use pre computed features. In this case, compute features with extract_features.py and set dataset field as tts_cache . Example datasets, we successfully applied TTS, are linked below. LJ Speech Nancy TWEB M AI Labs Training and Fine tuning LJ Speech Click Here for hands on Notebook example , training LJSpeech. Split metadata.csv into train and validation subsets respectively metadata_train.csv and metadata_val.csv . Note that having a validation split does not work well as oppose to other ML problems since at the validation time model generates spectrogram slices without Teacher Forcing and that leads misalignment between the ground truth and the prediction. Therefore, validation loss does not really show the model performance. Rather, you might use all data for training and check the model performance by relying on human inspection. shuf metadata.csv > metadata_shuf.csv head n 12000 metadata_shuf.csv > metadata_train.csv tail n 1100 metadata_shuf.csv > metadata_val.csv To train a new model, you need to define your own config.json file (check the example) and call with the command below. train.py config_path config.json To fine tune a model, use restore_path . train.py config_path config.json restore_path /path/to/your/model.pth.tar For multi GPU training use distribute.py . It enables process based multi GPU training where each process uses a single GPU. CUDA_VISIBLE_DEVICES 0,1,4 distribute.py config_path config.json Each run creates a new output folder and config.json is copied under this folder. In case of any error or intercepted execution, if there is no checkpoint yet under the output folder, the whole folder is going to be removed. You can also enjoy Tensorboard, if you point the Tensorboard argument logdir to the experiment folder. Testing Best way to test your network is to use Notebooks under notebooks folder. Contact/Getting Help Wiki Discourse Forums If your question is not addressed in the Wiki, the Discourse Forums is the next place to look. They contain conversations on General Topics, Using TTS, and TTS Development. Issues Finally, if all else fails, you can open an issue in our repo. What is new with TTS If you train TTS with LJSpeech dataset, you start to hear reasonable results after 12.5K iterations with batch size 32. This is the fastest training with character based methods up to our knowledge. Out implementation is also quite robust against long sentences. Location sensitive attention ( ref ). Attention is a vital part of text2speech models. Therefore, it is important to use an attention mechanism that suits the diagonal nature of the problem where the output strictly aligns with the text monotonically. Location sensitive attention performs better by looking into the previous alignment vectors and learns diagonal attention more easily. Yet, I believe there is a good space for research at this front to find a better solution. Attention smoothing with sigmoid ( ref ). Attention weights are computed by normalized sigmoid values instead of softmax for sharper values. That enables the model to pick multiple highly scored inputs for alignments while reducing the noise. Weight decay ( ref ). After a certain point of the training, you might observe the model over fitting. That is, the model is able to pronounce words probably better but the quality of the speech quality gets lower and sometimes attention alignment gets disoriented. Stop token prediction with an additional module. The original Tacotron model does not propose a stop token to stop the decoding process. Therefore, you need to use heuristic measures to stop the decoder. Here, we prefer to use additional layers at the end to decide when to stop. Applying sigmoid to the model outputs. Since the output values are expected to be in the range 0, 1 , we apply sigmoid to make things easier to approximate the expected output distribution. Phoneme based training is enabled for easier learning and robust pronunciation. It also makes easier to adapt TTS to the most languages without worrying about language specific characters. Configurable attention windowing at inference time for robust alignment. It enforces network to only consider a certain window of encoder steps per iteration. Detailed Tensorboard stats for activation, weight and gradient values per layer. It is useful to detect defects and compare networks. Constant history window. Instead of using only the last frame of predictions, define a constant history queue. It enables training with gradually decreasing prediction frame (r 5 > r 1) by only changing the last layer. For instance, you can train the model with r 5 and then fine tune it with r 1 without any performance loss. It also solves well known PreNet problem 50 . Initialization of hidden decoder states with Embedding layers instead of zero initialization. One common question is to ask why we don't use Tacotron2 architecture. According to our ablation experiments, nothing, except Location Sensitive Attention, improves the performance, given the increase in the model size. Please feel free to offer new changes and pull things off. We are happy to discuss and make things better. Problems waiting to be solved. Punctuations at the end of a sentence sometimes affect the pronunciation of the last word. Because punctuation sign is attended by the attention module, that forces the network to create a voice signal or at least modify the voice signal being generated for neighboring frames. Simpler stop token prediction. Right now we use RNN to keep the history of the previous frames. However, we never tested, if something simpler would work as well. Yet RNN based model gives more stable predictions. Train for better mel specs. Mel spectrograms are not good enough to be fed Neural Vocoder. Easy solution to this problem is to train the model with r 1. However, in this case, model struggles to align the attention. irregular words: minute , focus , aren't etc. Even though it might be solved (Use a better dataset like Nancy or train phonemes enabled.) Major TODOs x Implement the model. x Generate human like speech on LJSpeech dataset. x Generate human like speech on a different dataset (Nancy) (TWEB). x Train TTS with r 1 successfully. x Enable process based distributed training. Similar to . Adapting Neural Vocoder. The most active work is here Multi speaker embedding. References Efficient Neural Audio Synthesis Attention Based models for speech recognition Generating Sequences With Recurrent Neural Networks Char2Wav: End to End Speech Synthesis VoiceLoop: Voice Fitting and Synthesis via a Phonological Loop WaveRNN Faster WaveNet Parallel WaveNet Precursor implementations (Dataset and Test processing) (Initial Tacotron architecture)",Speech Synthesis,Speech 1950,Speech,Speech,Other,"The PyTorch Kaldi Speech Recognition Toolkit PyTorch Kaldi is an open source repository for developing state of the art DNN/HMM speech recognition systems. The DNN part is managed by PyTorch, while feature extraction, label computation, and decoding are performed with the Kaldi toolkit. This repository contains the last version of the PyTorch Kaldi toolkit (PyTorch Kaldi v1.0). To take a look into the previous version (PyTorch Kaldi v0.1), click here . If you use this code or part of it, please cite the following paper: M. Ravanelli, T. Parcollet, Y. Bengio, The PyTorch Kaldi Speech Recognition Toolkit , arXiv @inproceedings{pytorch kaldi, title {The PyTorch Kaldi Speech Recognition Toolkit}, author {M. Ravanelli and T. Parcollet and Y. Bengio}, booktitle {In Proc. of ICASSP}, year {2019} } The toolkit is released under a Creative Commons Attribution 4.0 International license . You can copy, distribute, modify the code for research, commercial and non commercial purposes. We only ask to cite our paper referenced above. To improve transparency and replicability of speech recognition results, we give users the possibility to release their PyTorch Kaldi model within this repository. Feel free to contact us (or doing a pull request) for that. Moreover, if your paper uses PyTorch Kaldi, it is also possible to advertise it in this repository. See a short introductory video on the PyTorch Kaldi Toolkit Table of Contents Introduction ( introduction) Prerequisites ( prerequisites) How to install ( how to install) Recent Updates ( recent updates) Tutorials: ( timit tutorial) TIMIT tutorial ( timit tutorial) Librispeech tutorial ( librispeech tutorial) Toolkit Overview: ( overview of the toolkit architecture) Toolkit architecture ( overview of the toolkit architecture) Configuration files ( description of the configuration files) FAQs: ( how can i plug in my model) How can I plug in my model? ( how can i plug in my model) How can I tune the hyperparameters? ( how can i tune the hyperparameters) How can I use my own dataset? ( how can i use my own dataset) How can I plug in my own features? ( how can i plug in my own features) How can I transcript my own audio files? ( how can i transcript my own audio files) Batch size, learning rate, and droput scheduler ( Batch size, learning rate, and dropout scheduler) How can I contribute to the project? ( how can i contribute to the project) EXTRA: ( speech recognition from the raw waveform with sincnet) Speech recognition from the raw waveform with SincNet ( speech recognition from the raw waveform with sincnet) Joint training between speech enhancement and ASR ( joint training between speech enhancement and asr) Distant Speech Recognition with DIRHA ( distant speech recognition with dirha) Training an autoencoder ( training an autoencoder) References ( references) Introduction The PyTorch Kaldi project aims to bridge the gap between the Kaldi and the PyTorch toolkits, trying to inherit the efficiency of Kaldi and the flexibility of PyTorch. PyTorch Kaldi is not only a simple interface between these toolkits, but it embeds several useful features for developing modern speech recognizers. For instance, the code is specifically designed to naturally plug in user defined acoustic models. As an alternative, users can exploit several pre implemented neural networks that can be customized using intuitive configuration files. PyTorch Kaldi supports multiple feature and label streams as well as combinations of neural networks, enabling the use of complex neural architectures. The toolkit is publicly released along with rich documentation and is designed to properly work locally or on HPC clusters. Some features of the new version of the PyTorch Kaldi toolkit: Easy interface with Kaldi. Easy plug in of user defined models. Several pre implemented models (MLP, CNN, RNN, LSTM, GRU, Li GRU, SincNet). Natural implementation of complex models based on multiple features, labels, and neural architectures. Easy and flexible configuration files. Automatic recovery from the last processed chunk. Automatic chunking and context expansions of the input features. Multi GPU training. Designed to work locally or on HPC clusters. Tutorials on TIMIT and Librispeech Datasets. Prerequisites 1. If not already done, install Kaldi . As suggested during the installation, do not forget to add the path of the Kaldi binaries into $HOME/.bashrc. For instance, make sure that .bashrc contains the following paths: export KALDI_ROOT /home/mirco/kaldi trunk PATH $PATH:$KALDI_ROOT/tools/openfst PATH $PATH:$KALDI_ROOT/src/featbin PATH $PATH:$KALDI_ROOT/src/gmmbin PATH $PATH:$KALDI_ROOT/src/bin PATH $PATH:$KALDI_ROOT//src/nnetbin export PATH Remember to change the KALDI_ROOT variable using your path. As a first test to check the installation, open a bash shell, type copy feats or hmm info and make sure no errors appear. 2. If not already done, install PyTorch . We tested our codes on PyTorch 1.0 and PyTorch 0.4. An older version of PyTorch is likely to raise errors. To check your installation, type “python” and, once entered into the console, type “import torch”, and make sure no errors appear. 3. We recommend running the code on a GPU machine. Make sure that the CUDA libraries are installed and correctly working. We tested our system on Cuda 9.0, 9.1 and 8.0. Make sure that python is installed (the code is tested with python 2.7 and python 3.7). Even though not mandatory, we suggest using Anaconda . Recent updates 19 Feb. 2019: updates: It is now possible to dynamically change batch size, learning rate, and dropout factors during training. We thus implemented a scheduler that supports the following formalism within the config files: batch_size_train 128 12 64 10 32 2 The line above means: do 12 epochs with 128 batches, 10 epochs with 64 batches, and 2 epochs with 32 batches. A similar formalism can be used for learning rate and dropout scheduling. See this section for more information ( batch size, learning rate, and dropout scheduler). 5 Feb. 2019: updates: 1. Our toolkit now supports parallel data loading (i.e., the next chunk is stored in memory while processing the current chunk). This allows a significant speed up. 2. When performing monophone regularization users can now set “dnn_lay N_lab_out_mono”. This way the number of monophones is automatically inferred by our toolkit. 3. We integrated the kaldi io toolkit from the kaldi io for python project into data_io py. 4. We provided a better hyperparameter setting for SincNet ( see this section ( speech recognition from the raw waveform with sincnet)) 5. We released some baselines with the DIRHA dataset ( see this section ( distant speech recognition with dirha)). We also provide some configuration examples for a simple autoencoder ( see this section ( training an autoencoder)) and for a system that jointly trains a speech enhancement and a speech recognition module ( see this section ( joint training between speech enhancement and asr)) 6. We fixed some minor bugs. Notes on the next version: In the next version, we plan to further extend the functionalities of our toolkit, supporting more models and features formats. The goal is to make our toolkit suitable for other speech related tasks such as end to end speech recognition, speaker identification, keyword spotting, speech separation, speech activity detection, speech enhancement, etc. If you would like to propose some novel functionalities, please give us your feedback by filling this survey . How to install To install PyTorch Kaldi, do the following steps: 1. Make sure all the software recommended in the “Prerequisites” sections are installed and are correctly working 2. Clone the PyTorch Kaldi repository: git clone 3. Go into the project folder and Install the needed packages with: pip install r requirements.txt TIMIT tutorial In the following, we provide a short tutorial of the PyTorch Kaldi toolkit based on the popular TIMIT dataset. 1. Make sure you have the TIMIT dataset. If not, it can be downloaded from the LDC website . 2. Make sure Kaldi and PyTorch installations are fine. Make also sure that your KALDI paths are currently working (you should add the Kaldi paths into the .bashrc as reported in the section Prerequisites ). For instance, type copy feats and hmm info and make sure no errors appear. 3. Run the Kaldi s5 baseline of TIMIT. This step is necessary to compute features and labels later used to train the PyTorch neural network. We recommend running the full timit s5 recipe (including the DNN training): cd kaldi/egs/timit/s5 ./run.sh ./local/nnet/run_dnn.sh This way all the necessary files are created and the user can directly compare the results obtained by Kaldi with that achieved with our toolkit. 4. Compute the alignments (i.e, the phone state labels) for test and dev data with the following commands (go into $KALDI_ROOT/egs/timit/s5). If you want to use tri3 alignments, type: steps/align_fmllr.sh nj 4 data/dev data/lang exp/tri3 exp/tri3_ali_dev steps/align_fmllr.sh nj 4 data/test data/lang exp/tri3 exp/tri3_ali_test If you want to use dnn alignments (as suggested), type: steps/nnet/align.sh nj 4 data fmllr tri3/train data/lang exp/dnn4_pretrain dbn_dnn exp/dnn4_pretrain dbn_dnn_ali steps/nnet/align.sh nj 4 data fmllr tri3/dev data/lang exp/dnn4_pretrain dbn_dnn exp/dnn4_pretrain dbn_dnn_ali_dev steps/nnet/align.sh nj 4 data fmllr tri3/test data/lang exp/dnn4_pretrain dbn_dnn exp/dnn4_pretrain dbn_dnn_ali_test 5. We start this tutorial with a very simple MLP network trained on mfcc features. Before launching the experiment, take a look at the configuration file cfg/TIMIT_baselines/TIMIT_MLP_mfcc_basic.cfg . See the Description of the configuration files ( description of the configuration files) for a detailed description of all its fields. 6. Change the config file according to your paths. In particular: Set “fea_lst” with the path of your mfcc training list (that should be in $KALDI_ROOT/egs/timit/s5/data/train/feats.scp) Add your path (e.g., $KALDI_ROOT/egs/timit/s5/data/train/utt2spk) into “ utt2spk ark:” Add your CMVN transformation e.g.,$KALDI_ROOT/egs/timit/s5/mfcc/cmvn_train.ark Add the folder where labels are stored (e.g.,$KALDI_ROOT/egs/timit/s5/exp/dnn4_pretrain dbn_dnn_ali for training and ,$KALDI_ROOT/egs/timit/s5/exp/dnn4_pretrain dbn_dnn_ali_dev for dev data). To avoid errors make sure that all the paths in the cfg file exist. Please, avoid using paths containing bash variables since paths are read literally and are not automatically expanded (e.g., use /home/mirco/kaldi trunk/egs/timit/s5/exp/dnn4_pretrain dbn_dnn_ali instead of $KALDI_ROOT/egs/timit/s5/exp/dnn4_pretrain dbn_dnn_ali) 7. Run the ASR experiment: python run_exp.py cfg/TIMIT_baselines/TIMIT_MLP_mfcc_basic.cfg This script starts a full ASR experiment and performs training, validation, forward, and decoding steps. A progress bar shows the evolution of all the aforementioned phases. The script run_exp.py progressively creates the following files in the output directory: res.res : a file that summarizes training and validation performance across various validation epochs. log.log : a file that contains possible errors and warnings. conf.cfg : a copy of the configuration file. model.svg is a picture that shows the considered model and how the various neural networks are connected. This is really useful to debug models that are more complex than this one (e.g, models based on multiple neural networks). The folder exp_files contains several files that summarize the evolution of training and validation over the various epochs. For instance, files .info report chunk specific information such as the chunk_loss and error and the training time. The .cfg files are the chunk specific configuration files (see general architecture for more details), while files .lst report the list of features used to train each specific chunk. At the end of training, a directory called generated outputs containing plots of loss and errors during the various training epochs is created. Note that you can stop the experiment at any time. If you run again the script it will automatically start from the last chunk correctly processed. The training could take a couple of hours, depending on the available GPU. Note also that if you would like to change some parameters of the configuration file (e.g., n_chunks ,fea_lst ,batch_size_train ,..) you must specify a different output folder (output_folder ). Debug: If you run into some errors, we suggest to do the following checks: 1. Take a look into the standard output. 2. If it is not helpful, take a look into the log.log file. 3. Take a look into the function run_nn into the core.py library. Add some prints in the various part of the function to isolate the problem and figure out the issue. 8. At the end of training, the phone error rate (PER\%) is appended into the res.res file. To see more details on the decoding results, you can go into “decoding_test” in the output folder and take a look to the various files created. For this specific example, we obtained the following res.res file: ep 000 tr 'TIMIT_tr' loss 3.398 err 0.721 valid TIMIT_dev loss 2.268 err 0.591 lr_architecture1 0.080000 time(s) 86 ep 001 tr 'TIMIT_tr' loss 2.137 err 0.570 valid TIMIT_dev loss 1.990 err 0.541 lr_architecture1 0.080000 time(s) 87 ep 002 tr 'TIMIT_tr' loss 1.896 err 0.524 valid TIMIT_dev loss 1.874 err 0.516 lr_architecture1 0.080000 time(s) 87 ep 003 tr 'TIMIT_tr' loss 1.751 err 0.494 valid TIMIT_dev loss 1.819 err 0.504 lr_architecture1 0.080000 time(s) 88 ep 004 tr 'TIMIT_tr' loss 1.645 err 0.472 valid TIMIT_dev loss 1.775 err 0.494 lr_architecture1 0.080000 time(s) 89 ep 005 tr 'TIMIT_tr' loss 1.560 err 0.453 valid TIMIT_dev loss 1.773 err 0.493 lr_architecture1 0.080000 time(s) 88 ......... ep 020 tr 'TIMIT_tr' loss 0.968 err 0.304 valid TIMIT_dev loss 1.648 err 0.446 lr_architecture1 0.002500 time(s) 89 ep 021 tr 'TIMIT_tr' loss 0.965 err 0.304 valid TIMIT_dev loss 1.649 err 0.446 lr_architecture1 0.002500 time(s) 90 ep 022 tr 'TIMIT_tr' loss 0.960 err 0.302 valid TIMIT_dev loss 1.652 err 0.447 lr_architecture1 0.001250 time(s) 88 ep 023 tr 'TIMIT_tr' loss 0.959 err 0.301 valid TIMIT_dev loss 1.651 err 0.446 lr_architecture1 0.000625 time(s) 88 %WER 18.1 192 7215 84.0 11.9 4.2 2.1 18.1 99.5 0.583 /home/mirco/pytorch kaldi new/exp/TIMIT_MLP_basic5/decode_TIMIT_test_out_dnn1/score_6/ctm_39phn.filt.sys The achieved PER(%) is 18.1%. Note that there could be some variability in the results, due to different initializations on different machines. We believe that averaging the performance obtained with different initialization seeds (i.e., change the field seed in the config file) is crucial for TIMIT since the natural performance variability might completely hide the experimental evidence. We noticed a standard deviation of about 0.2% for the TIMIT experiments. If you want to change the features, you have to first compute them with the Kaldi toolkit. To compute fbank features, you have to open $KALDI_ROOT/egs/timit/s5/run.sh and compute them with the following lines: feadir fbank for x in train dev test; do steps/make_fbank.sh cmd $train_cmd nj $feats_nj data/$x exp/make_fbank/$x $feadir steps/compute_cmvn_stats.sh data/$x exp/make_fbank/$x $feadir done Then, change the aforementioned configuration file with the new feature list. If you already have run the full timit Kaldi recipe, you can directly find the fmllr features in $KALDI_ROOT/egs/timit/s5/data fmllr tri3 . If you feed the neural network with such features you should expect a substantial performance improvement, due to the adoption of the speaker adaptation. In the TIMIT_baseline folder, we propose several other examples of possible TIMIT baselines. Similarly to the previous example, you can run them by simply typing: python run_exp.py $cfg_file There are some examples with recurrent (TIMIT_RNN ,TIMIT_LSTM ,TIMIT_GRU ,TIMIT_LiGRU ) and CNN architectures (TIMIT_CNN ). We also propose a more advanced model (TIMIT_DNN_liGRU_DNN_mfcc+fbank+fmllr.cfg) where we used a combination of feed forward and recurrent neural networks fed by a concatenation of mfcc, fbank, and fmllr features. Note that the latter configuration files correspond to the best architecture described in the reference paper. As you might see from the above mentioned configuration files, we improve the ASR performance by including some tricks such as the monophone regularization (i.e., we jointly estimate both context dependent and context independent targets). The following table reports the results obtained by running the latter systems (average PER\%): Model mfcc fbank fMLLR Kaldi DNN Baseline 18.5 MLP 18.2 18.7 16.7 RNN 17.7 17.2 15.9 SRU 16.6 LSTM 15.1 14.3 14.5 GRU 16.0 15.2 14.9 li GRU 15.5 14.9 14.2 Results show that, as expected, fMLLR features outperform MFCCs and FBANKs coefficients, thanks to the speaker adaptation process. Recurrent models significantly outperform the standard MLP one, especially when using LSTM, GRU, and Li GRU architecture, that effectively address gradient vanishing through multiplicative gates. The best result PER $14.2$\% is obtained with the Li GRU model 2,3 , that is based on a single gate and thus saves 33% of the computations over a standard GRU. The best results are actually obtained with a more complex architecture that combines MFCC, FBANK, and fMLLR features (see cfg/TIMI_baselines/TIMIT_mfcc_fbank_fmllr_liGRU_best.cfg ). To the best of our knowledge, the PER 13.8\% achieved by the latter system yields the best published performance on the TIMIT test set. The Simple Recurrent Units (SRU) is an efficient and highly parallelizable recurrent model. Its performance on ASR is worse than standard LSTM, GRU, and Li GRU models, but it is significantly faster. SRU is implemented here and described in the following paper: T. Lei, Y. Zhang, S. I. Wang, H. Dai, Y. Artzi, Simple Recurrent Units for Highly Parallelizable Recurrence, Proc. of EMNLP 2018. arXiv To do experiments with this model, use the config file cfg/TIMIT_baselines/TIMIT_SRU_fbank.cfg . Before you should install the model using pip install sru and you should uncomment import sru in neural_networks.py . You can directly compare your results with ours by going here . In this external repository, you can find all the folders containing the generated files. Librispeech tutorial The steps to run PyTorch Kaldi on the Librispeech dataset are similar to that reported above for TIMIT. The following tutorial is based on the 100h sub set , but it can be easily extended to the full dataset (960h). 1. Run the Kaldi recipe for librispeech at least until Stage 13 (included) 2. Compute the fmllr features by running the following script. But first copy exp/tri4b/trans. files into exp/tri4b/decode_tgsmall_train_clean_100/ before running the below script with chunk train_clean_100 . ./cmd.sh You'll want to change cmd.sh to something that will work on your system. . ./path.sh Source the tools/utils (import the queue.pl) chunk train_clean_100 chunk dev_clean Uncomment to process dev chunk test_clean Uncomment to process test gmmdir exp/tri4b dir fmllr/$chunk steps/nnet/make_fmllr_feats.sh nj 10 cmd $train_cmd \ transform dir $gmmdir/decode_tgsmall_$chunk \ $dir data/$chunk $gmmdir $dir/log $dir/data exit 1 compute cmvn stats spk2utt ark:data/$chunk/spk2utt scp:fmllr/$chunk/feats.scp ark:$dir/data/cmvn_speaker.ark 3. compute aligmenents using: aligments on dev_clean and test_clean steps/align_fmllr.sh nj 30 data/train_clean_100 data/lang exp/tri4b exp/tri4b_ali_clean_100 steps/align_fmllr.sh nj 10 data/dev_clean data/lang exp/tri4b exp/tri4b_ali_dev_clean_100 steps/align_fmllr.sh nj 10 data/test_clean data/lang exp/tri4b exp/tri4b_ali_test_clean_100 4. run the experiments with the following command: python run_exp.py cfg/Librispeech_baselines/libri_MLP_fmllr.cfg. If you would like to use a recurrent model you can use libri_RNN_fmllr.cfg , libri_LSTM_fmllr.cfg , libri_GRU_fmllr.cfg , or libri_liGRU_fmllr.cfg . The training of recurrent models might take some days (depending on the adopted GPU). The performance obtained with the tgsmall graph are reported in the following table: Model WER% MLP 9.6 LSTM 8.6 GRU 8.6 li GRU 8.6 These results are obtained without adding a lattice rescoring (i.e., using only the tgsmall graph). You can improve the performance by adding lattice rescoring in this way (run it from the kaldi_decoding_script folder of Pytorch Kaldi): data_dir /data/milatmp1/ravanelm/librispeech/s5/data/ dec_dir /u/ravanelm/pytorch Kaldi new/exp/libri_fmllr/decode_test_clean_out_dnn1/ out_dir /u/ravanelm/pytorch kaldi new/exp/libri_fmllr/ steps/lmrescore_const_arpa.sh $data_dir/lang_test_{tgsmall,fglarge} \ $data_dir/test_clean $dec_dir $out_dir/decode_test_clean_fglarge exit 1; The final results obtaineed using rescoring ( fglarge ) are reported in the following table: Model WER% MLP 6.5 LSTM 6.4 GRU 6.3 li GRU 6.2 You can take a look into the results obtained here . Overview of the toolkit architecture The main script to run an ASR experiment is run_exp.py . This python script performs training, validation, forward, and decoding steps. Training is performed over several epochs, that progressively process all the training material with the considered neural network. After each training epoch, a validation step is performed to monitor the system performance on held out data. At the end of training, the forward phase is performed by computing the posterior probabilities of the specified test dataset. The posterior probabilities are normalized by their priors (using a count file) and stored into an ark file. A decoding step is then performed to retrieve the final sequence of words uttered by the speaker in the test sentences. The run_exp.py script takes in input a global config file (e.g., cfg/TIMIT_MLP_mfcc.cfg ) that specifies all the needed options to run a full experiment. The code run_exp.py calls another function run_nn (see core.py library) that performs training, validation, and forward operations on each chunk of data. The function run_nn takes in input a chunk specific config file (e.g, exp/TIMIT_MLP_mfcc/exp_files/train_TIMIT_tr+TIMIT_dev_ep000_ck00.cfg ) that specifies all the needed parameters for running a single chunk experiment. The run_nn function outputs some info filles (e.g., exp/TIMIT_MLP_mfcc/exp_files/train_TIMIT_tr+TIMIT_dev_ep000_ck00.info ) that summarize losses and errors of the processed chunk. The results are summarized into the res.res files, while errors and warnings are redirected into the log.log file. Description of the configuration files: There are two types of config files (global and chunk specific cfg files). They are both in INI format and are read, processed, and modified with the configparser library of python. The global file contains several sections, that specify all the main steps of a speech recognition experiments (training, validation, forward, and decoding). The structure of the config file is described in a prototype file (see for instance proto/global.proto ) that not only lists all the required sections and fields but also specifies the type of each possible field. For instance, N_ep int(1,inf) means that the fields N_ep (i.e, number of training epochs) must be an integer ranging from 1 to inf. Similarly, lr float(0,inf) means that the lr field (i.e., the learning rate) must be a float ranging from 0 to inf. Any attempt to write a config file not compliant with these specifications will raise an error. Let's now try to open a config file (e.g., cfg/TIMIT_baselines/TIMIT_MLP_mfcc_basic.cfg ) and let's describe the main sections: cfg_proto cfg_proto proto/global.proto cfg_proto_chunk proto/global_chunk.proto The current version of the config file first specifies the paths of the global and chunk specific prototype files in the section cfg_proto . exp cmd run_nn_script run_nn out_folder exp/TIMIT_MLP_basic5 seed 1234 use_cuda True multi_gpu False save_gpumem False n_epochs_tr 24 The section exp contains some important fields, such as the output folder ( out_folder ) and the path of the chunk specific processing script run_nn (by default this function should be implemented in the core.py library). The field N_epochs_tr specifies the selected number of training epochs. Other options about using_cuda, multi_gpu, and save_gpumem can be enabled by the user. The field cmd can be used to append a command to run the script on a HPC cluster. dataset1 data_name TIMIT_tr fea fea_name mfcc fea_lst quick_test/data/train/feats_mfcc.scp fea_opts apply cmvn utt2spk ark:quick_test/data/train/utt2spk ark:quick_test/mfcc/train_cmvn_speaker.ark ark: ark: add deltas delta order 2 ark: ark: cw_left 5 cw_right 5 lab lab_name lab_cd lab_folder quick_test/dnn4_pretrain dbn_dnn_ali lab_opts ali to pdf lab_count_file auto lab_data_folder quick_test/data/train/ lab_graph quick_test/graph n_chunks 5 dataset2 data_name TIMIT_dev fea fea_name mfcc fea_lst quick_test/data/dev/feats_mfcc.scp fea_opts apply cmvn utt2spk ark:quick_test/data/dev/utt2spk ark:quick_test/mfcc/dev_cmvn_speaker.ark ark: ark: add deltas delta order 2 ark: ark: cw_left 5 cw_right 5 lab lab_name lab_cd lab_folder quick_test/dnn4_pretrain dbn_dnn_ali_dev lab_opts ali to pdf lab_count_file auto lab_data_folder quick_test/data/dev/ lab_graph quick_test/graph n_chunks 1 dataset3 data_name TIMIT_test fea fea_name mfcc fea_lst quick_test/data/test/feats_mfcc.scp fea_opts apply cmvn utt2spk ark:quick_test/data/test/utt2spk ark:quick_test/mfcc/test_cmvn_speaker.ark ark: ark: add deltas delta order 2 ark: ark: cw_left 5 cw_right 5 lab lab_name lab_cd lab_folder quick_test/dnn4_pretrain dbn_dnn_ali_test lab_opts ali to pdf lab_count_file auto lab_data_folder quick_test/data/test/ lab_graph quick_test/graph n_chunks 1 The config file contains a number of sections ( dataset1 , dataset2 , dataset3 ,...) that describe all the corpora used for the ASR experiment. The fields on the dataset\ section describe all the features and labels considered in the experiment. The features, for instance, are specified in the field fea: , where fea_name contains the name given to the feature, fea_lst is the list of features (in the scp Kaldi format), fea_opts allows users to specify how to process the features (e.g., doing CMVN or adding the derivatives), while cw_left and cw_right set the characteristics of the context window (i.e., number of left and right frames to append). Note that the current version of the PyTorch Kaldi toolkit supports the definition of multiple features streams. Indeed, as shown in cfg/TIMIT_baselines/TIMIT_mfcc_fbank_fmllr_liGRU_best.cfg multiple feature streams (e.g., mfcc, fbank, fmllr) are employed. Similarly, the lab section contains some sub fields. For instance, lab_name refers to the name given to the label, while lab_folder contains the folder where the alignments generated by the Kaldi recipe are stored. lab_opts allows the user to specify some options on the considered alignments. For example lab_opts ali to pdf extracts standard context dependent phone state labels, while lab_opts ali to phones per frame true can be used to extract monophone targets. lab_count_file is used to specify the file that contains the counts of the considered phone states. These counts are important in the forward phase, where the posterior probabilities computed by the neural network are divided by their priors. PyTorch Kaldi allows users to both specify an external count file or to automatically retrieve it (using lab_count_file auto ). Users can also specify lab_count_file none if the count file is not strictly needed, e.g., when the labels correspond to an output not used to generate the posterior probabilities used in the forward phase (see for instance the monophone targets in cfg/TIMIT_baselines/TIMIT_MLP_mfcc.cfg ). lab_data_folder , instead, corresponds to the data folder created during the Kaldi data preparation. It contains several files, including the text file eventually used for the computation of the final WER. The last sub field lab_graph is the path of the Kaldi graph used to generate the labels. The full dataset is usually large and cannot fit the GPU/RAM memory. It should thus be split into several chunks. PyTorch Kaldi automatically splits the dataset into the number of chunks specified in N_chunks . The number of chunks might depend on the specific dataset. In general, we suggest processing speech chunks of about 1 or 2 hours (depending on the available memory). data_use train_with TIMIT_tr valid_with TIMIT_dev forward_with TIMIT_test This section tells how the data listed into the sections datasets\ are used within the run_exp.py script. The first line means that we perform training with the data called TIMIT_tr . Note that this dataset name must appear in one of the dataset sections, otherwise the config parser will raise an error. Similarly, the second and third lines specify the data used for validation and forward phases, respectively. batches batch_size_train 128 max_seq_length_train 1000 increase_seq_length_train False start_seq_len_train 100 multply_factor_seq_len_train 2 batch_size_valid 128 max_seq_length_valid 1000 batch_size_train is used to define the number of training examples in the mini batch. The fields max_seq_length_train truncates the sentences longer than the specified value. When training recurrent models on very long sentences, out of memory issues might arise. With this option, we allow users to mitigate such memory problems by truncating long sentences. Moreover, it is possible to progressively grow the maximum sentence length during training by setting increase_seq_length_train True . If enabled, the training starts with a maximum sentence length specified in start_seq_len_train (e.g, start_seq_len_train 100 ). After each epoch the maximum sentence length is multiplied by the multply_factor_seq_len_train (e.g multply_factor_seq_len_train 2 ). We have observed that this simple strategy generally improves the system performance since it encourages the model to first focus on short term dependencies and learn longer term ones only at a later stage. Similarly, batch_size_valid and max_seq_length_valid specify the number of examples in the mini batches and the maximum length for the dev dataset. architecture1 arch_name MLP_layers1 arch_proto proto/MLP.proto arch_library neural_networks arch_class MLP arch_pretrain_file none arch_freeze False arch_seq_model False dnn_lay 1024,1024,1024,1024,N_out_lab_cd dnn_drop 0.15,0.15,0.15,0.15,0.0 dnn_use_laynorm_inp False dnn_use_batchnorm_inp False dnn_use_batchnorm True,True,True,True,False dnn_use_laynorm False,False,False,False,False dnn_act relu,relu,relu,relu,softmax arch_lr 0.08 arch_halving_factor 0.5 arch_improvement_threshold 0.001 arch_opt sgd opt_momentum 0.0 opt_weight_decay 0.0 opt_dampening 0.0 opt_nesterov False The sections architecture\ are used to specify the architectures of the neural networks involved in the ASR experiments. The field arch_name specifies the name of the architecture. Since different neural networks can depend on a different set of hyperparameters, the user has to add the path of a proto file that contains the list of hyperparameters into the field proto . For example, the prototype file for a standard MLP model contains the following fields: proto library path class MLP dnn_lay str_list dnn_drop float_list(0.0,1.0) dnn_use_laynorm_inp bool dnn_use_batchnorm_inp bool dnn_use_batchnorm bool_list dnn_use_laynorm bool_list dnn_act str_list Similarly to the other prototype files, each line defines a hyperparameter with the related value type. All the hyperparameters defined in the proto file must appear into the global configuration file under the corresponding architecture\ section. The field arch_library specifies where the model is coded (e.g. neural_nets.py ), while arch_class indicates the name of the class where the architecture is implemented (e.g. if we set class MLP we will do from neural_nets.py import MLP ). The field arch_pretrain_file can be used to pre train the neural network with a previously trained architecture, while arch_freeze can be set to False if you want to train the parameters of the architecture during training and should be set to True do keep the parameters fixed (i.e., frozen) during training. The section arch_seq_model indicates if the architecture is sequential (e.g. RNNs) or non sequential (e.g., a feed forward MLP or CNN). The way PyTorch Kaldi processes the input batches is different in the two cases. For recurrent neural networks ( arch_seq_model True ) the sequence of features is not randomized (to preserve the elements of the sequences), while for feedforward models ( arch_seq_model False ) we randomize the features (this usually helps to improve the performance). In the case of multiple architectures, sequential processing is used if at least one of the employed architectures is marked as sequential ( arch_seq_model True ). Note that the hyperparameters starting with arch_ and opt_ are mandatory and must be present in all the architecture specified in the config file. The other hyperparameters (e.g., dnn_ , ) are specific of the considered architecture (they depend on how the class MLP is actually implemented by the user) and can define number and typology of hidden layers, batch and layer normalizations, and other parameters. Other important parameters are related to the optimization of the considered architecture. For instance, arch_lr is the learning rate, while arch_halving_factor is used to implement learning rate annealing. In particular, when the relative performance improvement on the dev set between two consecutive epochs is smaller than that specified in the arch_improvement_threshold (e.g, arch_improvement_threshold) we multiply the learning rate by the arch_halving_factor (e.g., arch_halving_factor 0.5 ). The field arch_opt specifies the type of optimization algorithm. We currently support SGD, Adam, and Rmsprop. The other parameters are specific to the considered optimization algorithm (see the PyTorch documentation for exact meaning of all the optimization specific hyperparameters). Note that the different architectures defined in archictecture\ can have different optimization hyperparameters and they can even use a different optimization algorithm. model model_proto proto/model.proto model out_dnn1 compute(MLP_layers1,mfcc) loss_final cost_nll(out_dnn1,lab_cd) err_final cost_err(out_dnn1,lab_cd) The way all the various features and architectures are combined is specified in this section with a very simple and intuitive meta language. The field model: describes how features and architectures are connected to generate as output a set of posterior probabilities. The line out_dnn1 compute(MLP_layers,mfcc) means feed the architecture called MLP_layers1 with the features called mfcc and store the output into the variable out_dnn1 ”. From the neural network output out_dnn1 the error and the loss functions are computed using the labels called lab_cd , that have to be previously defined into the datasets\ sections. The err_final and loss_final fields are mandatory subfields that define the final output of the model. A much more complex example (discussed here just to highlight the potentiality of the toolkit) is reported in cfg/TIMIT_baselines/TIMIT_mfcc_fbank_fmllr_liGRU_best.cfg : model model_proto proto/model.proto model:conc1 concatenate(mfcc,fbank) conc2 concatenate(conc1,fmllr) out_dnn1 compute(MLP_layers_first,conc2) out_dnn2 compute(liGRU_layers,out_dnn1) out_dnn3 compute(MLP_layers_second,out_dnn2) out_dnn4 compute(MLP_layers_last,out_dnn3) out_dnn5 compute(MLP_layers_last2,out_dnn3) loss_mono cost_nll(out_dnn5,lab_mono) loss_mono_w mult_constant(loss_mono,1.0) loss_cd cost_nll(out_dnn4,lab_cd) loss_final sum(loss_cd,loss_mono_w) err_final cost_err(out_dnn4,lab_cd) In this case we first concatenate mfcc, fbank, and fmllr features and we then feed a MLP. The output of the MLP is fed into the a recurrent neural network (specifically a Li GRU model). We then have another MLP layer ( MLP_layers_second ) followed by two softmax classifiers (i.e., MLP_layers_last , MLP_layers_last2 ). The first one estimates standard context dependent states, while the second estimates monophone targets. The final cost function is a weighted sum between these two predictions. In this way we implement the monophone regularization, that turned out to be useful to improve the ASR performance. The full model can be considered as a single big computational graph, where all the basic architectures used in the model section are jointly trained. For each mini batch, the input features are propagated through the full model and the cost_final is computed using the specified labels. The gradient of the cost function with respect to all the learnable parameters of the architecture is then computed. All the parameters of the employed architectures are then updated together with the algorithm specified in the architecture\ sections. forward forward_out out_dnn1 normalize_posteriors True normalize_with_counts_from lab_cd save_out_file True require_decoding True The section forward first defines which is the output to forward (it must be defined into the model section). if normalize_posteriors True , these posterior are normalized by their priors (using a count file). If save_out_file True , the posterior file (usually a very big ark file) is stored, while if save_out_file False this file is deleted when not needed anymore. The require_decoding is a boolean that specifies if we need to decode the specified output. The field normalize_with_counts_from set which counts using to normalize the posterior probabilities. decoding decoding_script_folder kaldi_decoding_scripts/ decoding_script decode_dnn.sh decoding_proto proto/decoding.proto min_active 200 max_active 7000 max_mem 50000000 beam 13.0 latbeam 8.0 acwt 0.2 max_arcs 1 skip_scoring false scoring_script local/score.sh scoring_opts min lmwt 1 max lmwt 10 norm_vars False The decoding section reports parameters about decoding, i.e. the steps that allows one to pass from a sequence of the context dependent probabilities provided by the DNN into a sequence of words. The field decoding_script_folder specifies the folder where the decoding script is stored. The decoding script field is the script used for decoding (e.g., decode_dnn.sh ) that should be in the decoding_script_folder specified before. The field decoding_proto reports all the parameters needed for the considered decoding script. To make the code more flexible, the config parameters can also be specified within the command line. For example, you can run: python run_exp.py quick_test/example_newcode.cfg optimization,lr 0.01 batches,batch_size 4 The script will replace the learning rate in the specified cfg file with the specified lr value. The modified config file is then stored into out_folder/config.cfg . The script run_exp.py automatically creates chunk specific config files, that are used by the run_nn function to perform a single chunk training. The structure of chunk specific cfg files is very similar to that of the global one. The main difference is a field to_do {train, valid, forward} that specifies the type of processing to on the features chunk specified in the field fea . Why proto files? Different neural networks, optimization algorithms, and HMM decoders might depend on a different set of hyperparameters. To address this issue, our current solution is based on the definition of some prototype files (for global, chunk, architecture config files). In general, this approach allows a more transparent check of the fields specified into the global config file. Moreover, it allows users to easily add new parameters without changing any line of the python code. For instance, to add a user defined model, a new proto file (e.g., user model.prot o) that specifies the hyperparameter must be written. Then, the user should only write a class (e.g., user model in neural_networks.py ) that implements the architecture). FAQs How can I plug in my model The toolkit is designed to allow users to easily plug in their own acoustic models. To add a customized neural model do the following steps: 1. Go into the proto folder and create a new proto file (e.g., proto/myDNN.proto ). The proto file is used to specify the list of the hyperparameters of your model that will be later set into the configuration file. To have an idea about the information to add to your proto file, you can take a look into the MLP.proto file: proto dnn_lay str_list dnn_drop float_list(0.0,1.0) dnn_use_laynorm_inp bool dnn_use_batchnorm_inp bool dnn_use_batchnorm bool_list dnn_use_laynorm bool_list dnn_act str_list 2. The parameter dnn_lay must be a list of string, dnn_drop (i.e., the dropout factors for each layer) is a list of float ranging from 0.0 and 1.0, dnn_use_laynorm_inp and dnn_use_batchnorm_inp are booleans that enable or disable batch or layer normalization of the input. dnn_use_batchnorm and dnn_use_laynorm are a list of boolean that decide layer by layer if batch/layer normalization has to be used. The parameter dnn_act is again a list of string that sets the activation function of each layer. Since every model is based on its own set of hyperparameters, different models have a different prototype file. For instance, you can take a look into GRU.proto and see that the hyperparameter list is different from that of a standard MLP. Similarly to the previous examples, you should add here your list of hyperparameters and save the file. 3. Write a PyTorch class implementing your model. Open the library neural_networks.py and look at some of the models already implemented. For simplicity, you can start taking a look into the class MLP. The classes have two mandatory methods: init and forward . The first one is used to initialize the architecture, the second specifies the list of computations to do. The method init takes in input two variables that are automatically computed within the run_nn function. inp_dim is simply the dimensionality of the neural network input, while options is a dictionary containing all the parameters specified into the section architecture of the configuration file. For instance, you can access to the DNN activations of the various layers in this way: options 'dnn_lay' .split(',') . As you might see from the MLP class, the initialization method defines and initializes all the parameters of the neural network. The forward method takes in input a tensor x (i.e., the input data) and outputs another vector containing x. If your model is a sequence model (i.e., if there is at least one architecture with arch_seq_model true in the cfg file), x is a tensor with (time_steps, batches, N_in), otherwise is a (batches, N_in) matrix. The class forward defines the list of computations to transform the input tensor into a corresponding output tensor. The output must have the sequential format (time_steps, batches, N_out) for recurrent models and the non sequential format (batches, N_out) for feed forward models. Similarly to the already implemented models the user should write a new class (e.g., myDNN) that implements the customized model: class myDNN(nn.Module): def __init__(self, options,inp_dim): super(myDNN, self).__init__() // initialize the parameters def forward(self, x): // do some computations out f(x) return out 4. Create a configuration file. Now that you have defined your model and the list of its hyperparameters, you can create a configuration file. To create your own configuration file, you can take a look into an already existing config file (e.g., for simplicity you can consider cfg/TIMIT_baselines/TIMIT_MLP_mfcc_basic.cfg ). After defining the adopted datasets with their related features and labels, the configuration file has some sections called architecture\ . Each architecture implements a different neural network. In cfg/TIMIT_baselines/TIMIT_MLP_mfcc_basic.cfg we only have architecture1 since the acoustic model is composed of a single neural network. To add your own neural network, you have to write an architecture section (e.g., architecture1 ) in the following way: architecture1 arch_name mynetwork (this is a name you would like to use to refer to this architecture within the following model section) arch_proto proto/myDNN.proto (here is the name of the proto file defined before) arch_library neural_networks (this is the name of the library where myDNN is implemented) arch_class myDNN (This must be the name of the class you have implemented) arch_pretrain_file none (With this you can specify if you want to pre train your model) arch_freeze False (set False if you want to update the parameters of your model) arch_seq_model False (set False for feed forward models, True for recurrent models) Then, you have to specify proper values for all the hyperparameters specified in proto/myDNN.proto . For the MLP.proto , we have: dnn_lay 1024,1024,1024,1024,1024,N_out_lab_cd dnn_drop 0.15,0.15,0.15,0.15,0.15,0.0 dnn_use_laynorm_inp False dnn_use_batchnorm_inp False dnn_use_batchnorm True,True,True,True,True,False dnn_use_laynorm False,False,False,False,False,False dnn_act relu,relu,relu,relu,relu,softmax Then, add the following parameters related to the optimization of your own architecture. You can use here standard sdg, adam, or rmsprop (see cfg/TIMIT_baselines/TIMIT_LSTM_mfcc.cfg for an example with rmsprop): arch_lr 0.08 arch_halving_factor 0.5 arch_improvement_threshold 0.001 arch_opt sgd opt_momentum 0.0 opt_weight_decay 0.0 opt_dampening 0.0 opt_nesterov False 5. Save the configuration file into the cfg folder (e.g, cfg/myDNN_exp.cfg ). 6. Run the experiment with: python run_exp.sh cfg/myDNN_exp.cfg 7. To debug the model you can first take a look at the standard output. The config file is automatically parsed by the run_exp.sh and it raises errors in case of possible problems. You can also take a look into the log.log file to see additional information on the possible errors. When implementing a new model, an important debug test consists of doing an overfitting experiment (to make sure that the model is able to overfit a tiny dataset). If the model is not able to overfit, it means that there is a major bug to solve. 8. Hyperparameter tuning. In deep learning, it is often important to play with the hyperparameters to find the proper setting for your model. This activity is usually very computational and time consuming but is often necessary when introducing new architectures. To help hyperparameter tuning, we developed a utility that implements a random search of the hyperparameters (see next section for more details). How can I tune the hyperparameters A hyperparameter tuning is often needed in deep learning to search for proper neural architectures. To help tuning the hyperparameters within PyTorch Kaldi, we have implemented a simple utility that implements a random search. In particular, the script tune_hyperparameters.py generates a set of random configuration files and can be run in this way: python tune_hyperparameters.py cfg/TIMIT_MLP_mfcc.cfg exp/TIMIT_MLP_mfcc_tuning 10 arch_lr randfloat(0.001,0.01) batch_size_train randint(32,256) dnn_act choose_str{relu,relu,relu,relu,softmax tanh,tanh,tanh,tanh,softmax} The first parameter is the reference cfg file that we would like to modify, while the second one is the folder where the random configuration files are saved. The third parameter is the number of the random config file that we would like to generate. There is then the list of all the hyperparameters that we want to change. For instance, arch_lr randfloat(0.001,0.01) will replace the field arch_lr with a random float ranging from 0.001 to 0.01. batch_size_train randint(32,256) will replace batch_size_train with a random integer between 32 and 256 and so on. Once the config files are created, they can be run sequentially or in parallel with: python run_exp.py $cfg_file How can I use my own dataset PyTorch Kaldi can be used with any speech dataset. To use your own dataset, the steps to take are similar to those discussed in the TIMIT/Librispeech tutorials. In general, what you have to do is the following: 1. Run the Kaldi recipe with your dataset. Please, see the Kaldi website to have more information on how to perform data preparation. 2. Compute the alignments on training, validation, and test data. 3. Write a PyTorch Kaldi config file $cfg_file . 4. Run the config file with python run_exp.sh $cfg_file . How can I plug in my own features The current version of PyTorch Kaldi supports input features stored with the Kaldi ark format. If the user wants to perform experiments with customized features, the latter must be converted into the ark format. Take a look into the Kaldi io for python git repository for a detailed description about converting numpy arrays into ark files. Moreover, you can take a look into our utility called save_raw_fea.py. This script generates Kaldi ark files containing raw features, that are later used to train neural networks fed by the raw waveform directly (see the section about processing audio with SincNet). How can I transcript my own audio files The current version of Pytorch Kaldi supports the standard production process of using a Pytorch Kaldi pre trained acoustic model to transcript one or multiples .wav files. It is important to understand that you must have a trained Pytorch Kaldi model. While you don't need labels or alignments anymore, Pytorch Kaldi still needs many files to transcript a new audio file: 1. The features and features list feats.scp (with .ark files, see how can i plug my own features) 2. The decoding graph (usually created with mkgraph.sh during previous model training such as triphones models). This graph is not needed if you're not decoding. Once you have all these files, you can start adding your dataset section to the global configuration file. The easiest way is to copy the cfg file used to train your acoustic model and just modify by adding a new dataset : dataset4 data_name myWavFile fea fea_name fbank fea_lst myWavFilePath/data/feats.scp fea_opts apply cmvn utt2spk ark:myWavFilePath/data//utt2spk ark:myWavFilePath/cmvn_test.ark ark: ark: add deltas delta order 0 ark: ark: cw_left 5 cw_right 5 lab lab_name none lab_data_folder myWavFilePath/data/ lab_graph myWavFilePath/exp/tri3/graph n_chunks 1 data_use train_with TIMIT_tr valid_with TIMIT_dev forward_with myWavFile The key string for your audio file transcription is lab_name none . The none tag asks Pytorch Kaldi to enter a production mode that only does the forward propagation and decoding without any labels. You don't need TIMIT_tr and TIMIT_dev to be on your production server since Pytorch Kaldi will skip this information to directly go to the forward phase of the dataset given in the forward_with field. As you can see, the global fea field requires the exact same parameters than standard training or testing dataset, while the lab field only requires two parameters. Please, note that lab_data_folder is nothing more than the same path as fea_lst . Finally, you still need to specify the number of chunks you want to create to process this file (1 hour 1 chunk). WARNINGS In your standard .cfg, you might have used keywords such as N_out_lab_cd that can not be used anymore. Indeed, in a production scenario, you don't want to have the training data on your machine. Therefore, all the variables that were on your .cfg file must be replaced by their true values. To replace all the N_out_{mono,lab_cd} you can take a look at the output of: hmm info /path/to/the/final.mdl/used/to/generate/the/training/ali Then, if you normalize posteriors as (check in your .cfg Section forward): normalize_posteriors True normalize_with_counts_from lab_cd You must replace lab_cd by: normalize_posteriors True normalize_with_counts_from /path/to/ali_train_pdf.counts This normalization step is crucial for HMM DNN speech recognition. DNNs, in fact, provide posterior probabilities, while HMMs are generative models that work with likelihoods. To derive the required likelihoods, one can simply divide the posteriors by the prior probabilities. To create this ali_train_pdf.counts file you can follow: alidir /path/to/the/exp/tri_ali (change it with your path to the exp with the ali) num_pdf $(hmm info $alidir/final.mdl awk '/pdfs/{print $4}') labels_tr_pdf ark:ali to pdf $alidir/final.mdl \ ark:gunzip c $alidir/ali. .gz \ ark: analyze counts verbose 1 binary false counts dim $num_pdf $labels_tr_pdf ali_train_pdf.counts et voilà ! In a production scenario, you might need to transcript a huge number of audio files, and you don't want to create as much as needed .cfg file. In this extent, and after creating this initial production .cfg file (you can leave the path blank), you can call the run_exp.py script with specific arguments referring to your different.wav features: python run_exp.py cfg/TIMIT_baselines/TIMIT_MLP_fbank_prod.cfg dataset4,fea,0,fea_lst myWavFilePath/data/feats.scp dataset4,lab,0,lab_data_folder myWavFilePath/data/ dataset4,lab,0,lab_graph myWavFilePath/exp/tri3/graph/ This command will internally alter the configuration file with your specified paths, and run and your defined features! Note that passing long arguments to the run_exp.py script requires a specific notation. dataset4 specifies the name of the created section, fea is the name of the higher level field, fea_lst or lab_graph are the name of the lowest level field you want to change. The 0 is here to indicate which lowest level field you want to alter, indeed some configuration files may contain multiple lab_graph per dataset! Therefore, 0 indicates the first occurrence, 1 the second ... Paths MUST be encapsulated by to be interpreted as full strings! Note that you need to alter the data_name and forward_with fields if you don't want different .wav files transcriptions to erase each other (decoding files are stored accordingly to the field data_name ). dataset4,data_name MyNewName data_use,forward_with MyNewName . Batch size, learning rate, and dropout scheduler In order to give users more flexibility, the latest version of PyTorch Kaldi supports scheduling of the batch size, max_seq_length_train, learning rate, and dropout factor. This means that it is now possible to change these values during training. To support this feature, we implemented the following formalisms within the config files: batch_size_train 128 12 64 10 32 2 In this case, our batch size will be 128 for the first 12 epochs, 64 for the following 10 epochs, and 32 for the last two epochs. By default means for N times , while is used to indicate a change of the batch size. Note that if the user simply sets batch_size_train 128 , the batch size is kept fixed during all the training epochs by default. A similar formalism can be used to perform learning rate scheduling: arch_lr 0.08 10 0.04 5 0.02 3 0.01 2 0.005 2 0.0025 2 In this case, if the user simply sets arch_lr 0.08 the learning rate is annealed with the new bob procedure used in the previous version of the toolkit. In practice, we start from the specified learning rate and we multiply it by a halving factor every time that the improvement on the validation dataset is smaller than the threshold specified in the field arch_improvement_threshold . Also the dropout factor can now be changed during training with the following formalism: dnn_drop 0.15 12 0.20 12,0.15,0.15 10 0.20 14,0.15,0.0 With the line before we can set a different dropout rate for different layers and for different epochs. For instance, the first hidden layer will have a dropout rate of 0.15 for the first 12 epochs, and 0.20 for the other 12. The dropout factor of the second layer, instead, will remain constant to 0.15 over all the training. The same formalism is used for all the layers. Note that indicates a change in the dropout factor within the same layer, while , indicates a different layer. You can take a look here into a config file where batch sizes, learning rates, and dropout factors are changed here: cfg/TIMIT_baselines/TIMIT_mfcc_basic_flex.cfg or here: cfg/TIMIT_baselines/TIMIT_liGRU_fmllr_lr_schedule.cfg How can I contribute to the project The project is still in its initial phase and we invite all potential contributors to participate. We hope to build a community of developers larger enough to progressively maintain, improve, and expand the functionalities of our current toolkit. For instance, it could be helpful to report any bug or any suggestion to improve the current version of the code. People can also contribute by adding additional neural models, that can eventually make richer the set of currently implemented architectures. EXTRA Speech recognition from the raw waveform with SincNet Take a look into our video introduction to SincNet SincNet is a convolutional neural network recently proposed to process raw audio waveforms. In particular, SincNet encourages the first layer to discover more meaningful filters by exploiting parametrized sinc functions. In contrast to standard CNNs, which learn all the elements of each filter, only low and high cutoff frequencies of band pass filters are directly learned from data. This inductive bias offers a very compact way to derive a customized filter bank front end, that only depends on some parameters with a clear physical meaning. For a more detailed description of the SincNet model, please refer to the following papers: M. Ravanelli, Y. Bengio, Speaker Recognition from raw waveform with SincNet , in Proc. of SLT 2018 ArXiv M. Ravanelli, Y.Bengio, Interpretable Convolutional Filters with SincNet , in Proc. of NIPS@IRASL 2018 ArXiv To use this model for speech recognition on TIMIT, to the following steps: 1. Follows the steps described in the “TIMIT tutorial”. 2. Save the raw waveform into the Kaldi ark format. To do it, you can use the save_raw_fea.py utility in our repository. The script saves the input signals into a binary Kaldi archive, keeping the alignments with the pre computed labels. You have to run it for all the data chunks (e.g., train, dev, test). It can also specify the length of the speech chunk ( sig_wlen 200 ms ) composing each frame. 3. Open the cfg/TIMIT_baselines/TIMIT_SincNet_raw.cfg , change your paths, and run: python ./run_exp.sh cfg/TIMIT_baselines/TIMIT_SincNet_raw.cfg 4. With this architecture, we have obtained a PER(%) 17.1% . A standard CNN fed the same features gives us a PER(%) 18.% . Please, see here to take a look into our results. Our results on SincNet outperforms results obtained with MFCCs and FBANKs fed by standard feed forward networks. In the following table, we compare the result of SincNet with other feed forward neural network: Model WER(\%) MLP fbank 18.7 MLP mfcc 18.2 CNN raw 18.1 SincNet raw 17.2 Joint training between speech enhancement and ASR In this section, we show how to use PyTorch Kaldi to jointly train a cascade between a speech enhancement and a speech recognition neural networks. The speech enhancement has the goal of improving the quality of the speech signal by minimizing the MSE between clean and noisy features. The enhanced features then feed another neural network that predicts context dependent phone states. In the following, we report a toy task example based on a reverberated version of TIMIT, that is only intended to show how users should set the config file to train such a combination of neural networks. Even though some implementation details (and the adopted datasets) are different, this tutorial is inspired by this paper: M. Ravanelli, P. Brakel, M. Omologo, Y. Bengio, Batch normalized joint training for DNN based distant speech recognition , in Proceedings of STL 2016 arXiv To run the system do the following steps: 1 Make sure you have the standard clean version of TIMIT available. 2 Run the Kaldi s5 baseline of TIMIT. This step is necessary to compute the clean features (that will be the labels of the speech enhancement system) and the alignments (that will be the labels of the speech recognition system). We recommend running the full timit s5 recipe (including the DNN training). 3 The standard TIMIT recipe uses MFCCs features. In this tutorial, instead, we use FBANK features. To compute FBANK features run the following script in $KALDI_ROOT/egs/TIMIT/s5 : feadir fbank for x in train dev test; do steps/make_fbank.sh cmd $train_cmd nj $feats_nj data/$x exp/make_fbank/$x $feadir steps/compute_cmvn_stats.sh data/$x exp/make_fbank/$x $feadir done Note that we use 40 FBANKS here, while Kaldi uses by default 23 FBANKs. To compute 40 dimensional features go into $KALDI_ROOT/egs/TIMIT/conf/fbank.conf and change the number of considered output filters. 4 Go to this external repository and follow the steps to generate a reverberated version of TIMIT starting from the clean one. Note that this is just a toy task that is only helpful to show how setting up a joint training system. 5 Compute the FBANK features for the TIMIT_rev dataset. To do it, you can copy the scripts in $KALDI_ROOT/egs/TIMIT/ into $KALDI_ROOT/egs/TIMIT_rev/ . Please, copy also the data folder. Note that the audio files in the TIMIT_rev folders are saved with the standard WAV format, while TIMIT is released with the SPHERE format. To bypass this issue, open the files data/train/wav.scp , data/dev/wav.scp , data/test/wav.scp and delete the part about SPHERE reading (e.g., /home/mirco/kaldi trunk/tools/sph2pipe_v2.5/sph2pipe f wav ). You also have to change the paths from the standard TIMIT to the reverberated one (e.g. replace /TIMIT/ with /TIMIT_rev/). Remind to remove the final pipeline symbol“ ”. Save the changes and run the computation of the fbank features in this way: feadir fbank for x in train dev test; do steps/make_fbank.sh cmd $train_cmd nj $feats_nj data/$x exp/make_fbank/$x $feadir steps/compute_cmvn_stats.sh data/$x exp/make_fbank/$x $feadir done Remember to change the $KALDI_ROOT/egs/TIMIT_rev/conf/fbank.conf file in order to compute 40 features rather than the 23 FBANKS of the default configuration. 6 Once features are computed, open the following config file: cfg/TIMIT_baselines/TIMIT_rev/TIMIT_joint_training_liGRU_fbank.cfg Remember to change the paths according to where data are stored in your machine. As you can see, we consider two types of features. The fbank_rev features are computed from the TIMIT_rev dataset, while the fbank_clean features are derived from the standard TIMIT dataset and are used as targets for the speech enhancement neural network. As you can see in the model section of the config file, we have the cascade between networks doing speech enhancement and speech recognition. The speech recognition architecture jointly estimates both context dependent and monophone targets (thus using the so called monophone regularization). To run an experiment type the following command: python run_exp.py cfg/TIMIT_baselines/TIMIT_rev/TIMIT_joint_training_liGRU_fbank.cfg 7 Results With this configuration file, you should obtain a Phone Error Rate (PER) 28.1% . Note that some oscillations around this performance are more than natural and are due to different initialization of the neural parameters. You can take a closer look into our results here Distant Speech Recognition with DIRHA In this tutorial, we use the DIRHA English dataset to perform a distant speech recognition experiment. The DIRHA English Dataset is a multi microphone speech corpus being developed under the EC project DIRHA. The corpus is composed of both real and simulated sequences recorded with 32 sample synchronized microphones in a domestic environment. The database contains signals of different characteristics in terms of noise and reverberation making it suitable for various multi microphone signal processing and distant speech recognition tasks. The part of the dataset currently released is composed of 6 native US speakers (3 Males, 3 Females) uttering 409 wall street journal sentences. The training data have been created using a realistic data contamination approach, that is based on contaminating the clean speech wsj 5k sentences with high quality multi microphone impulse responses measured in the targeted environment. For more details on this dataset, please refer to the following papers: M. Ravanelli, L. Cristoforetti, R. Gretter, M. Pellin, A. Sosi, M. Omologo, The DIRHA English corpus and related tasks for distant speech recognition in domestic environments , in Proceedings of ASRU 2015. ArXiv M. Ravanelli, P. Svaizer, M. Omologo, Realistic Multi Microphone Data Simulation for Distant Speech Recognition , in Proceedings of Interspeech 2016. ArXiv In this tutorial, we use the aforementioned simulated data for training (using LA6 microphone), while test is performed using the real recordings (LA6). This task is very realistic, but also very challenging. The speech signals are characterized by a reverberation time of about 0.7 seconds. Non stationary domestic noises (such as vacuum cleaner, steps, phone rings, etc.) are also present in the real recordings. Let’s start now with the practical tutorial. 1 If not available, download the DIRHA dataset from the LDC website . LDC releases the full dataset for a small fee. 2 Go this external reposotory . As reported in this repository, you have to generate the contaminated WSJ dataset with the provided MATLAB script. Then, you can run the proposed KALDI baseline to have features and labels ready for our pytorch kaldi toolkit. 3 Open the following configuration file: cfg/DIRHA_baselines/DIRHA_liGRU_fmllr.cfg The latter configuration file implements a simple RNN model based on a Light Gated Recurrent Unit (Li GRU). We used fMLLR as input features. Change the paths and run the following command: python run_exp.py cfg/DIRHA_baselines/DIRHA_liGRU_fmllr.cfg 4 Results: The aforementioned system should provide Word Error Rate (WER%) 23.2% . You can find the results obtained by us here . Using the other configuration files in the cfg/DIRHA_baselines folder you can perform experiments with different setups. With the provided configuration files you can obtain the following results: Model WER(\%) MLP 26.1 GRU 25.3 Li GRU 23.8 Training an autoencoder The current version of the repository is mainly designed for speech recognition experiments. We are actively working a new version, which is much more flexible and can manage input/output different from Kaldi features/labels. Even with the current version, however, it is possible to implement other systems, such as an autoencoder. An autoencoder is a neural network whose inputs and outputs are the same. The middle layer normally contains a bottleneck that forces our representations to compress the information of the input. In this tutorial, we provide a toy example based on the TIMIT dataset. For instance, see the following configuration file: cfg/TIMIT_baselines/TIMIT_MLP_fbank_autoencoder.cfg Our inputs are the standard 40 dimensional fbank coefficients that are gathered using a context windows of 11 frames (i.e., the total dimensionality of our input is 440). A feed forward neural network (called MLP_encoder) encodes our features into a 100 dimensional representation. The decoder (called MLP_decoder) is fed by the learned representations and tries to reconstruct the output. The system is trained with Mean Squared Error (MSE) metric. Note that in the Model section we added this line “err_final cost_err(dec_out,lab_cd)” at the end. The current version of the model, in fact, by default needs that at least one label is specified (we will remove this limit in the next version). You can train the system running the following command: python run_exp.py cfg/TIMIT_baselines/TIMIT_MLP_fbank_autoencoder.cfg The results should look like this: ep 000 tr 'TIMIT_tr' loss 0.139 err 0.999 valid TIMIT_dev loss 0.076 err 1.000 lr_architecture1 0.080000 lr_architecture2 0.080000 time(s) 41 ep 001 tr 'TIMIT_tr' loss 0.098 err 0.999 valid TIMIT_dev loss 0.062 err 1.000 lr_architecture1 0.080000 lr_architecture2 0.080000 time(s) 39 ep 002 tr 'TIMIT_tr' loss 0.091 err 0.999 valid TIMIT_dev loss 0.058 err 1.000 lr_architecture1 0.040000 lr_architecture2 0.040000 time(s) 39 ep 003 tr 'TIMIT_tr' loss 0.088 err 0.999 valid TIMIT_dev loss 0.056 err 1.000 lr_architecture1 0.020000 lr_architecture2 0.020000 time(s) 38 ep 004 tr 'TIMIT_tr' loss 0.087 err 0.999 valid TIMIT_dev loss 0.055 err 0.999 lr_architecture1 0.010000 lr_architecture2 0.010000 time(s) 39 ep 005 tr 'TIMIT_tr' loss 0.086 err 0.999 valid TIMIT_dev loss 0.054 err 1.000 lr_architecture1 0.005000 lr_architecture2 0.005000 time(s) 39 ep 006 tr 'TIMIT_tr' loss 0.086 err 0.999 valid TIMIT_dev loss 0.054 err 1.000 lr_architecture1 0.002500 lr_architecture2 0.002500 time(s) 39 ep 007 tr 'TIMIT_tr' loss 0.086 err 0.999 valid TIMIT_dev loss 0.054 err 1.000 lr_architecture1 0.001250 lr_architecture2 0.001250 time(s) 39 ep 008 tr 'TIMIT_tr' loss 0.086 err 0.999 valid TIMIT_dev loss 0.054 err 0.999 lr_architecture1 0.000625 lr_architecture2 0.000625 time(s) 41 ep 009 tr 'TIMIT_tr' loss 0.086 err 0.999 valid TIMIT_dev loss 0.054 err 0.999 lr_architecture1 0.000313 lr_architecture2 0.000313 time(s) 38 You should only consider the field loss . The filed err only contains not useuful information in this case (for the aforementioned reason). You can take a look into the generated features typing the following command: copy feats ark:exp/TIMIT_MLP_fbank_autoencoder/exp_files/forward_TIMIT_test_ep009_ck00_enc_out.ark ark,t: more References 1 M. Ravanelli, T. Parcollet, Y. Bengio, The PyTorch Kaldi Speech Recognition Toolkit , ArxIv 2 M. Ravanelli, P. Brakel, M. Omologo, Y. Bengio, Improving speech recognition by revising gated recurrent units , in Proceedings of Interspeech 2017. ArXiv 3 M. Ravanelli, P. Brakel, M. Omologo, Y. Bengio, Light Gated Recurrent Units for Speech Recognition , in IEEE Transactions on Emerging Topics in Computational Intelligence. ArXiv 4 M. Ravanelli, Deep Learning for Distant Speech Recognition , PhD Thesis, Unitn 2017. ArXiv 5 T. Parcollet, M. Ravanelli, M. Morchid, G. Linarès, C. Trabelsi, R. De Mori, Y. Bengio, Quaternion Recurrent Neural Networks , in Proceedings of ICLR 2019 ArXiv 6 T. Parcollet, M. Morchid, G. Linarès, R. De Mori, Bidirectional Quaternion Long Short Term Memory Recurrent Neural Networks for Speech Recognition , in Proceedings of ICASSP 2019 ArXiv",Speech Recognition,Speech 1968,Speech,Speech,Other,"Myrtle Deep Speech A PyTorch implementation of DeepSpeech and DeepSpeech2 . This repository is intended as an evolving baseline for other implementations to compare their training performance against. Current roadmap: 1. Pre trained weights for both networks and full performance statistics. See v0.1 release: 1. Mixed precision training. Running Build the Docker image: make build Run the Docker container (here using nvidia docker ), ensuring to publish the port of the JupyterLab session to the host: sudo docker run runtime nvidia shm size 512M p 9999:9999 deepspeech The JupyterLab session can be accessed via localhost:9999 . This Python package will accessible in the running Docker container and is accessible through either the command line interface: deepspeech help or as a Python package: python import deepspeech Examples deepspeech help will print the configurable parameters (batch size, learning rate, log location, number of epochs...) it aims to have reasonably sensible defaults. Training A Deep Speech training run can be started by the following command, adding flags as necessary: deepspeech ds1 By default the experimental data and logs are output to /tmp/experiments/year_month_date hour_minute_second_microsecond . Inference A Deep Speech evaluation run can be started by the following command, adding flags as necessary: deepspeech ds1 \ state_dict_path $MODEL_PATH \ log_file \ decoder greedy \ train_subsets \ dev_log wer \ dev_subsets dev clean \ dev_batch_size 1 Note the lack of an argument to log_file causes the WER results to be written to stderr. Dataset The package contains code to download and use the LibriSpeech ASR corpus . WER The word error rate (WER) is computed using the formula that is widely used in many open source speech to text systems (Kaldi, PaddlePaddle, Mozilla DeepSpeech). In pseudocode, where N is the number of validation or test samples: sum_edits sum( edit_distance(target, predict) for target, predict in zip(targets, predictions) ) sum_lens sum( len(target) for target in targets ) WER (1.0/N) (sum_edits / sum_lens) This reduces the impact on the WER of errors in short sentences. Toy example: Target Prediction Edit Distance Label Length lectures lectured 1 1 i'm afraid he said i am afraid he said 2 4 nice to see you mister meeking nice to see your mister makin 2 6 The mean WER of each sample considered individually is: >>> (1.0/3) ((1.0/1) + (2.0/4) + (2.0/6)) 0.611111111111111 Compared to the pseudocode version given above: >>> (1.0/3) ((1.0 + 2 + 2) / (1.0 + 4 + 6)) 0.1515151515151515 Maintainer Please contact sam at myrtle dot ai .",Speech Recognition,Speech 2012,Speech,Speech,Other,"Baidu's Deep Speech 2 (Tensorflow) (This is a work in progress) This is a python implementation of Baidu's Deep Speech 2 paper using tensorflow TODO: Fix GPU memory Add batch normalization to RNN Implement row convolution layer Add other dataset support Create pretrained models Preprocessing To preprocess your data you must first download the one of the datasets above and extract them to a folder. Then run the following script to preprocess the data (This might take a while depending on the amount of data you have) python preprocess.py data dir dataset Training Now that you have preprocessed your data, you can train a model. To do this, you can edit the settings in the config.py file if you want. Then run the following command to train the model: python train.py Testing your model Now that you have trained a model, you can go ahead and start using it. We have created two scripts that can help you do this infer.py and streaming_infer.py . The infer.py script, transcribes a audio file that you give it python infer.py f The streaming_infer.py script uses PyAudio to record audio from your computer's microphone and transcribes it in real time. To run it simply: python streaming_infer.py",Speech Recognition,Speech 2099,Speech,Speech,Other,"WaveNet Keras implementation This repository contains a basic implementation of the WaveNet as described in the paper published by DeepMind: Oord, Aaron van den, et al. Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499 (2016). Installation instructions The code has only been tested and verified with Python 3.6. Assuming you have an installation of pipenv for Python 3, you may clone the project, navigate to the root folder and run: bash make install This will most likely take care of the dependencies, unless you're using Windows. Reproducibility: Running the examples In the examples folder you will find a small sample of data, downloaded from the LJ Speech Dataset . The dataset originally contains about 24 hours of speech, but I selected just a few files to create a small proof of concept, since I ran the training on my laptop and training such a complex architecture on a huge dataset was not viable for me. I used 50 files for training and 6 for validation. Training To train the network with the small amount of data provided in the package, navigate to the examples directory and run: bash pipenv run python train_small.py Feel free to also tweak the parameters and add more data, if your computational resources allow it (e.g. use AWS spot instances with GPUs). For example, I see posts around the internet that use 1000 2000 epochs. I used 20, because an order of magnitude higher would take days to train. The filter size should also probably be larger (e.g. 64), and the residual blocks should be more (but keep in mind the paper recommends dilation rate mod9 ). In the figure below, you may see a plot of the training loss, using the default parameters currently in wavenet.examples.train_small . It's obvious that the model is far from saturation. ! Training Loss (wavenet/examples/training_loss.png) Generating sound Using the little network that I trained, the generated wavefile sounds like plain noise. However, if you'd like to generate your own wavefile, tweak the parameters accordingly (e.g. point to your own model) and run: bash pipenv run python generate_small.py",Speech Synthesis,Speech 2108,Speech,Speech,Other,"Keras TCN Downloads Downloads bash pip install keras tcn Keras Temporal Convolutional Network Keras TCN ( keras tcn) Why Temporal Convolutional Network? ( why temporal convolutional network) API ( api) Arguments ( arguments) Input shape ( input shape) Output shape ( output shape) Supported task types ( supported task types) Receptive field ( receptive field) Non causal TCN ( non causal tcn) Installation (Python 3) ( installation python 3) Run ( run) Tasks ( tasks) Adding Task ( adding task) Explanation ( explanation) Implementation results ( implementation results) Copy Memory Task ( copy memory task) Explanation ( explanation 1) Implementation results (first epochs) ( implementation results first epochs) Sequential MNIST ( sequential mnist) Explanation ( explanation 2) Implementation results ( implementation results 1) References ( references) Why Temporal Convolutional Network? TCNs exhibit longer memory than recurrent architectures with the same capacity. Constantly performs better than LSTM/GRU architectures on a vast range of tasks (Seq. MNIST, Adding Problem, Copy Memory, Word level PTB...). Parallelism, flexible receptive field size, stable gradients, low memory requirements for training, variable length inputs... Visualization of a stack of dilated causal convolutional layers (Wavenet, 2016) API The usual way is to import the TCN layer and use it inside a Keras model. I provide a snippet below to illustrate it on a regression task (cf. tasks/ for other examples): python from keras.layers import Dense from keras.models import Input, Model from tcn import TCN batch_size, timesteps, input_dim None, 20, 1 def get_x_y(size 1000): import numpy as np pos_indices np.random.choice(size, size int(size // 2), replace False) x_train np.zeros(shape (size, timesteps, 1)) y_train np.zeros(shape (size, 1)) x_train pos_indices, 0 1.0 y_train pos_indices, 0 1.0 return x_train, y_train i Input(batch_shape (batch_size, timesteps, input_dim)) o TCN(return_sequences False)(i) The TCN layers are here. o Dense(1)(o) m Model(inputs i , outputs o ) m.compile(optimizer 'adam', loss 'mse') x, y get_x_y() m.fit(x, y, epochs 10, validation_split 0.2) In the example above, TCNs can also be stacked together, like this: python o TCN(return_sequences True)(i) o TCN(return_sequences False)(o) I also provide a ready to use TCN model that can be imported and used this way (cf. tasks/ for the full code): python from tcn import compiled_tcn model compiled_tcn(...) model.fit(x, y) Keras model. Arguments TCN(nb_filters 64, kernel_size 2, nb_stacks 1, dilations 1, 2, 4, 8, 16, 32 , padding 'causal', use_skip_connections True, dropout_rate 0.0, return_sequences True, name 'tcn') nb_filters : Integer. The number of filters to use in the convolutional layers. Would be similar to units for LSTM. kernel_size : Integer. The size of the kernel to use in each convolutional layer. dilations : List. A dilation list. Example is: 1, 2, 4, 8, 16, 32, 64 . nb_stacks : Integer. The number of stacks of residual blocks to use. padding : String. The padding to use in the convolutions. 'causal' for a causal network (as in the original implementation) and 'same' for a non causal network. use_skip_connections : Boolean. If we want to add skip connections from input to each residual block. return_sequences : Boolean. Whether to return the last output in the output sequence, or the full sequence. dropout_rate : Float between 0 and 1. Fraction of the input units to drop. name : Name of the model. Useful when having multiple TCN. Input shape 3D tensor with shape (batch_size, timesteps, input_dim) . timesteps can be None. This can be useful if each sequence is of a different length: Multiple Length Sequence Example (tasks/multi_length_sequences.py). Output shape if return_sequences True : 3D tensor with shape (batch_size, timesteps, nb_filters) . if return_sequences False : 2D tensor with shape (batch_size, nb_filters) . Supported task types Regression (Many to one) e.g. adding problem Classification (Many to many) e.g. copy memory task Classification (Many to one) e.g. sequential mnist task For a Many to Many regression, a cheap fix for now is to change the number of units of the final Dense layer . Receptive field Receptive field nb_stacks_of_residuals_blocks kernel_size last_dilation . If a TCN has only one stack of residual blocks with a kernel size of 2 and dilations 1, 2, 4, 8 , its receptive field is 2 1 8 16. The image below illustrates it: ks 2, dilations 1, 2, 4, 8 , 1 block If the TCN has now 2 stacks of residual blocks, wou would get the situation below, that is, an increase in the receptive field to 32: ks 2, dilations 1, 2, 4, 8 , 2 blocks If we increased the number of stacks to 3, the size of the receptive field would increase again, such as below: ks 2, dilations 1, 2, 4, 8 , 3 blocks Thanks a lot to @alextheseal for providing such visuals. Non causal TCN Making the TCN architecture non causal allows it to take the future into consideration to do its prediction as shown in the figure below. However, it is not anymore suitable for real time applications. Non Causal TCN ks 3, dilations 1, 2, 4, 8 , 1 block To use a non causal TCN, specify padding 'valid' or padding 'same' when initializing the TCN layers. Special thanks to: @qlemaire22 Installation (Python 3) bash git clone git@github.com:philipperemy/keras tcn.git cd keras tcn virtualenv p python3.6 venv source venv/bin/activate pip install r requirements.txt change to tensorflow if you dont have a gpu. pip install . upgrade install it as a package. Note: Only compatible with Python 3 at the moment. Should be almost compatible with python 2. Run Once keras tcn is installed as a package, you can take a glimpse of what's possible to do with TCNs. Some tasks examples are available in the repository for this purpose: bash cd adding_problem/ python main.py run adding problem task cd copy_memory/ python main.py run copy memory task cd mnist_pixel/ python main.py run sequential mnist pixel task Tasks Adding Task The task consists of feeding a large array of decimal numbers to the network, along with a boolean array of the same length. The objective is to sum the two decimals where the boolean array contain the two 1s. Explanation Adding Problem Task Implementation results The model takes time to learn this task. It's symbolized by a very long plateau (could take 8 epochs on some runs). 200000/200000 293s 1ms/step loss: 0.1731 val_loss: 0.1662 200000/200000 289s 1ms/step loss: 0.1675 val_loss: 0.1665 200000/200000 287s 1ms/step loss: 0.1670 val_loss: 0.1665 200000/200000 288s 1ms/step loss: 0.1668 val_loss: 0.1669 200000/200000 285s 1ms/step loss: 0.1085 val_loss: 0.0019 200000/200000 285s 1ms/step loss: 0.0011 val_loss: 4.1667e 04 200000/200000 282s 1ms/step loss: 6.0470e 04 val_loss: 6.7708e 04 200000/200000 282s 1ms/step loss: 4.3099e 04 val_loss: 7.3898e 04 200000/200000 282s 1ms/step loss: 3.9102e 04 val_loss: 1.8727e 04 200000/200000 280s 1ms/step loss: 3.1040e 04 val_loss: 0.0010 200000/200000 281s 1ms/step loss: 3.1166e 04 val_loss: 2.2333e 04 200000/200000 281s 1ms/step loss: 2.8046e 04 val_loss: 1.5194e 04 Copy Memory Task The copy memory consists of a very large array: At the beginning, there's the vector x of length N. This is the vector to copy. At the end, N+1 9s are present. The first 9 is seen as a delimiter. In the middle, only 0s are there. The idea is to copy the content of the vector x to the end of the large array. The task is made sufficiently complex by increasing the number of 0s in the middle. Explanation Copy Memory Task Implementation results (first epochs) 30000/30000 30s 1ms/step loss: 0.1174 acc: 0.9586 val_loss: 0.0370 val_acc: 0.9859 30000/30000 26s 874us/step loss: 0.0367 acc: 0.9859 val_loss: 0.0363 val_acc: 0.9859 30000/30000 26s 852us/step loss: 0.0361 acc: 0.9859 val_loss: 0.0358 val_acc: 0.9859 30000/30000 26s 872us/step loss: 0.0355 acc: 0.9859 val_loss: 0.0349 val_acc: 0.9859 30000/30000 25s 850us/step loss: 0.0339 acc: 0.9864 val_loss: 0.0291 val_acc: 0.9881 30000/30000 26s 856us/step loss: 0.0235 acc: 0.9896 val_loss: 0.0159 val_acc: 0.9944 30000/30000 26s 872us/step loss: 0.0169 acc: 0.9929 val_loss: 0.0125 val_acc: 0.9966 Sequential MNIST Explanation The idea here is to consider MNIST images as 1 D sequences and feed them to the network. This task is particularly hard because sequences are 28 28 784 elements. In order to classify correctly, the network has to remember all the sequence. Usual LSTM are unable to perform well on this task. Sequential MNIST Implementation results 60000/60000 118s 2ms/step loss: 0.2348 acc: 0.9265 val_loss: 0.1308 val_acc: 0.9579 60000/60000 116s 2ms/step loss: 0.0973 acc: 0.9698 val_loss: 0.0645 val_acc: 0.9798 ... 60000/60000 112s 2ms/step loss: 0.0075 acc: 0.9978 val_loss: 0.0547 val_acc: 0.9894 60000/60000 111s 2ms/step loss: 0.0093 acc: 0.9968 val_loss: 0.0585 val_acc: 0.9895 References (TCN for Pytorch) (An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling) (Original Wavenet paper) Useful links (Tensorflow Eager implementation of TCNs) Repo views (since 2018/10/30) HitCount",Speech Synthesis,Speech 2189,Speech,Speech,Other,"tacotron_pytorch Build Status PyTorch implementation of Tacotron speech synthesis model. Inspired from keithito/tacotron . Currently not as much good speech quality as keithito/tacotron can generate, but it seems to be basically working. You can find some generated speech examples trained on LJ Speech Dataset at here . If you are comfortable working with TensorFlow, I'd recommend you to try instead. The reason to rewrite it in PyTorch is that it's easier to debug and extend (multi speaker architecture, etc) at least to me. Requirements PyTorch TensorFlow (if you want to run the training script. This definitely can be optional, but for now required.) Installation git clone recursive pip install e . or python setup.py develop If you want to run the training script, then you need to install additional dependencies. pip install e . train Training The package relis on keithito/tacotron for text processing, audio preprocessing and audio reconstruction (added as a submodule). Please follows the quick start section at and prepare your dataset accordingly. If you have your data prepared, assuming your data is in /tacotron/training (which is the default), then you can train your model by: python train.py Alignment, predicted spectrogram, target spectrogram, predicted waveform and checkpoint (model and optimizer states) are saved per 1000 global step in checkpoints directory. Training progress can be monitored by: tensorboard logdir log Testing model Open the notebook in notebooks directory and change checkpoint_path to your model.",Speech Synthesis,Speech 2190,Speech,Speech,Other,"WaveNet: A Generative Model for Raw Audio This is the Chainer implementation of WaveNet この記事 で実装したコードです。 まだ完成していませんが音声の生成はできます。 Todo: x Generating audio Local conditioning Global conditioning Training on CSTR VCTK Corpus Training the network Requirements Chainer 2 scipy.io.wavfile Preprocessing Donwsample your .wav to 16KHz / 8KHz to speed up convergence. Aaudacity Create data directory Add all .wav files to /train_audio/wav Hyperparameters You can edit the hyperparameters of the network in model.py before running train.py , or edit /params/params.json after training starts. Training run train.py Generating audio run generate.py Passing use_faster_wavenet will generate audio faster than original WaveNet. Listen to a sample generated by WaveNet 🎶 music Implementation ! figure ! figure ! figure",Speech Synthesis,Speech 2195,Speech,Speech,Other,"Plan is to generate some new lecture based on previous lectures and corpora learned from them. list of used lectures: speech to text google api speech to text: files are big, so I must upload them to the google cloud storage (GCS) I created new regional bucket in EU here I upload here all lectures then I set project in command line, being sure I use project ID, not the project name gcloud config set project divine glazing 140110 and then I can finally use the speech to text, be sure audio files are in wav or flac (note: console in browser works a bit better than desktop version) gcloud ml speech recognize long running gs://european germany bucket/mono NatsuCon2013 Grek1 Mahó šódžo včera dnes a zítra.wav async language code cs CZ it returns the job it, with it we can control its progress If I want to hint some words for it, I can do it by passing it as hints: gcloud ml speech recognize long running gs://european germany bucket/mono NatsuCon2013 Grek1 Mahó šódžo včera dnes a zítra.wav async language code cs CZ hints mahó, šódžó the following command shows progress of the job, and after it's done, it outputs the result gcloud ml speech operations describe 3785909817154180023 when job is done, we save the resulting json into file gcloud ml speech operations describe 3785909817154180023 gsutil cp gs://european germany bucket/ NatsuCon2013 Grek1 Mahó šódžo včera dnes a zítra.json gcloud ml speech operations wait 1719105980631949569 gsutil cp gs://european germany bucket/Grek1 IKEA Záporáci.json waits until the job is over and then saved its result into json file for creating the subtitle format, I can use this: new text generations there are PDF's with some approaches, other approaches are: some other useful information: text to speech Generated lecture will then be transferred into audio using some known libraries for TTS either by if it will work Marytts is built from source to add new language. Marytts is installed through the installer to be used as txt to wav server. very research, lots of features, still evolving, no detailed up to date documentation. Not practical to use. or could be used. Need to find out how to add new voice. Has czech support or use the Festival it has guide for creating voices training neural network for TTS I'll probably use that, can use that to overfit on one voice there have been some paper lately for TTS using attention based models. e.g.: etc. some reddit discussions for it tooling and data preparation for extraction audio from video, I used VLC player, which can export flac audio from video for downloading from youtube, both video and audio, I used for trimming and cutting longer video, I used Movico (desktop app for windows 10, with ads but for free and does the job) for converting other audio formats to wav, I used Sound Converter (desktop app for windows 10) data structure: in text_out/google stt are automatically generated subtitles in text_out/corrected final are final versions of srt files in text_out/corrected final for nlp are final versions of srt file modified to be more suitable for NLP in pure_text_dataset are pure text sentences from previous directory",Speech Synthesis,Speech 2267,Speech,Speech,Other,"RETURNN development tree RETURNN paper 2016 , RETURNN paper 2018 . RETURNN RWTH extensible training framework for universal recurrent neural networks, is a Theano/TensorFlow based implementation of modern recurrent neural network architectures. It is optimized for fast and reliable training of recurrent neural networks in a multi GPU environment. Features include: Mini batch training of feed forward neural networks Sequence chunking based batch training for recurrent neural networks Long short term memory recurrent neural networks including our own fast CUDA kernel Multidimensional LSTM (GPU only, there is no CPU version) Memory management for large data sets Work distribution across multiple devices Flexible and fast architecture which allows all kinds of encoder attention decoder models Please read the documentation for more information . Here is the video recording of a RETURNN overview talk ( slides ; hosted by eBay). There are some example demos in /demos which work on artifically generated data, i.e. they should work as is. There are some real world examples here . Some benchmark setups against other frameworks can be found here . The results are in the RETURNN paper 2016 . Performance benchmarks of our LSTM kernel vs CuDNN and other TensorFlow kernels are here . There is also a wiki . Questions can also be asked on StackOverflow using the returnn tag . Test Status",Speech Recognition,Speech 2268,Speech,Speech,Other,"This repo contains the configs and related files to be used with RETURNN (called CRNN earlier/internally) and RASR (called Sprint internally) for data preprocessing and decoding. To use the RETURNN configs with other data, replace the train / dev config settings, which specify the train and dev corpus data. At the moment, they will use the ExternSprintDataset interface to get the preprocessed data out of RASR. You can also use other dataset implementations provided by RETURNN (see RETURNN doc / source code), e.g. the HDF format directly.",Speech Recognition,Speech 2316,Speech,Speech,Other,A Universal Music Translation Network Implementation ! structure (./figures/structure.png) References A Universal Music Translation Network WaveNet: A Generative Model for Raw Audio Domain Adversarial Training of Neural Networks,Speech Synthesis,Speech 2371,Speech,Speech,Other,MACK'S AI COCKTAIL BAR 🍸,Speech Synthesis,Speech 2410,Speech,Speech,Other,"Natural Language Understanding benchmark This repository contains the results of three benchmarks that compare natural language understanding services offering: 1. built in intents (Apple’s SiriKit, Amazon’s Alexa, Microsoft’s Luis, Google’s API.ai, and Snips.ai ) on a selection of various intents. This benchmark was performed in December 2016. Its results are described in length in the following post . 2. custom intent engines (Google's API.ai, Facebook's Wit, Microsoft's Luis, Amazon's Alexa, and Snips' NLU) for seven chosen intents. This benchmark was performed in June 2017. Its results are described in a paper and a blog post . 3. extension of Braun et al., 2017 (Google's API.AI, Microsoft's Luis, IBM's Watson, Rasa) This experiment replicates the analysis made by Braun et al., 2017, published in Evaluating Natural Language Understanding Services for Conversational Question Answering Systems as part of SIGDIAL 2017 proceedings. Snips and Rasa are added. Details are available in a paper and a blog post . The data is provided for each benchmark and more details about the methods are available in the README file in each folder. Any publication based on these datasets must include a full citation to the following paper in which the results were published by Snips: Snips Voice Platform: an embedded Spoken Language Understanding system for private by design voice interfaces",Speech Recognition,Speech 2411,Speech,Speech,Other,"Snips NLU .. image:: :target: .. image:: :target: .. image:: :target: .. image:: :target: .. image:: :target: .. image:: :target: Snips NLU _ (Natural Language Understanding) is a Python library that allows to parse sentences written in natural language and extracts structured information. Summary What is Snips NLU about ? _ Getting Started _ System requirements _ Installation _ Language Resources _ API Usage _ Sample code _ Command Line Interface _ Sample datasets _ Benchmarks _ Documentation _ Citing Snips NLU _ FAQ & Community _ Related content _ How do I contribute ? _ Licence _ What is Snips NLU about ? Behind every chatbot and voice assistant lies a common piece of technology: Natural Language Understanding (NLU). Anytime a user interacts with an AI using natural language, their words need to be translated into a machine readable description of what they meant. The NLU engine first detects what the intention of the user is (a.k.a. intent _), then extracts the parameters (called slots _) of the query. The developer can then use this to determine the appropriate action or response. Let’s take an example to illustrate this, and consider the following sentence: .. code block:: text What will be the weather in paris at 9pm? Properly trained, the Snips NLU engine will be able to extract structured data such as: .. code block:: json { intent : { intentName : searchWeatherForecast , probability : 0.95 }, slots : { value : paris , entity : locality , slotName : forecast_locality }, { value : { kind : InstantTime , value : 2018 02 08 20:00:00 +00:00 }, entity : snips/datetime , slotName : forecast_start_datetime } } In this case, the identified intent is searchWeatherForecast and two slots were extracted, a locality and a datetime. As you can see, Snips NLU does an extra step on top of extracting entities: it resolves them. The extracted datetime value has indeed been converted into a handy ISO format. Check out our blog post _ to get more details about why we built Snips NLU and how it works under the hood. We also published a paper on arxiv _, presenting the machine learning architecture of the Snips Voice Platform. Getting Started System requirements Python 2.7 or Python > 3.5 RAM: Snips NLU will typically use between 100MB and 200MB of RAM, depending on the language and the size of the dataset. Installation .. code block:: python pip install snips nlu We currently have pre built binaries (wheels) for snips nlu and its dependencies for MacOS (10.11 and later), Linux x86_64 and Windows. For any other architecture/os snips nlu can be installed from the source distribution. To do so, Rust _ and setuptools_rust _ must be installed before running the pip install snips nlu command. Language resources Snips NLU relies on external language resources _ that must be downloaded before the library can be used. You can fetch resources for a specific language by running the following command: .. code block:: sh python m snips_nlu download en Or simply: .. code block:: sh snips nlu download en The list of supported languages is available at this address _. API Usage Command Line Interface The easiest way to test the abilities of this library is through the command line interface. First, start by training the NLU with one of the sample datasets _: .. code block:: sh snips nlu train path/to/dataset.json path/to/output_trained_engine Where path/to/dataset.json is the path to the dataset which will be used during training, and path/to/output_trained_engine is the location where the trained engine should be persisted once the training is done. After that, you can start parsing sentences interactively by running: .. code block:: sh snips nlu parse path/to/trained_engine Where path/to/trained_engine corresponds to the location where you have stored the trained engine during the previous step. Sample code Here is a sample code that you can run on your machine after having installed snips nlu , fetched the english resources and downloaded one of the sample datasets _: .. code block:: python >>> from __future__ import unicode_literals, print_function >>> import io >>> import json >>> from snips_nlu import SnipsNLUEngine >>> from snips_nlu.default_configs import CONFIG_EN >>> with io.open( sample_datasets/lights_dataset.json ) as f: ... sample_dataset json.load(f) >>> nlu_engine SnipsNLUEngine(config CONFIG_EN) >>> nlu_engine nlu_engine.fit(sample_dataset) >>> text Please turn the light on in the kitchen >>> parsing nlu_engine.parse(text) >>> parsing intent intentName 'turnLightOn' What it does is training an NLU engine on a sample weather dataset and parsing a weather query. Sample datasets Here is a list of some datasets that can be used to train a Snips NLU engine: Lights dataset _: Turn on the lights in the kitchen , Set the light to red in the bedroom Beverage dataset _: Prepare two cups of cappucino , Make me a cup of tea Flights dataset _: Book me a flight to go to boston this weekend , book me some tickets from istanbul to moscow in three days Benchmarks In January 2018, we reproduced an academic benchmark _ which was published during the summer 2017. In this article, authors assessed the performance of API.ai (now Dialogflow, Google), Luis.ai (Microsoft), IBM Watson, and Rasa NLU _. For fairness, we used an updated version of Rasa NLU and compared it to the latest version of Snips NLU (both in dark blue). .. image:: .img/benchmarks.png In the figure above, F1 scores _ of both intent classification and slot filling were computed for several NLU providers, and averaged accross the three datasets used in the academic benchmark mentionned before. All the underlying results can be found here _. Documentation To find out how to use Snips NLU please refer to the package documentation _, it will provide you with a step by step guide on how to setup and use this library. Citing Snips NLU Please cite the following paper when using Snips NLU: .. code block:: bibtex @article{coucke2018snips, title {Snips Voice Platform: an embedded Spoken Language Understanding system for private by design voice interfaces}, author {Coucke, Alice and Saade, Alaa and Ball, Adrien and Bluche, Th{\'e}odore and Caulier, Alexandre and Leroy, David and Doumouro, Cl{\'e}ment and Gisselbrecht, Thibault and Caltagirone, Francesco and Lavril, Thibaut and others}, journal {arXiv preprint arXiv:1805.10190}, pages {12 16}, year {2018} } FAQ & Community Please join the forum _ to ask your questions and get feedback from the community. Related content What is Snips about ? _ Snips NLU Open sourcing blog post _ Snips Voice Platform paper (arxiv) _ Snips NLU Language Resources _ Bug tracker _ Snips NLU Rust _: Rust inference pipeline implementation and bindings (C, Swift, Kotlin, Python) Rustling _: Snips NLU builtin entities parser How do I contribute ? Please see the Contribution Guidelines _. Licence This library is provided by Snips _ as Open Source software. See LICENSE _ for more information. .. _external language resources: .. _forum: .. _blog post: .. _paper on arxiv: .. _academic benchmark: .. _Rasa NLU: .. _F1 scores: .. _intent: .. _slots:",Speech Recognition,Speech 2537,Speech,Speech,Other,"TensorFlow Models This repository contains a number of different models implemented in TensorFlow : The official models (official) are a collection of example models that use TensorFlow's high level APIs. They are intended to be well maintained, tested, and kept up to date with the latest stable TensorFlow API. They should also be reasonably optimized for fast performance while still being easy to read. We especially recommend newer TensorFlow users to start here. The research models are a large collection of models implemented in TensorFlow by researchers. They are not officially supported or available in release branches; it is up to the individual researchers to maintain the models and/or provide support on issues and pull requests. The samples folder (samples) contains code snippets and smaller models that demonstrate features of TensorFlow, including code presented in various blog posts. The tutorials folder (tutorials) is a collection of models described in the TensorFlow tutorials . Contribution guidelines If you want to contribute to models, be sure to review the contribution guidelines (CONTRIBUTING.md). License Apache License 2.0 (LICENSE)",Speech Recognition,Speech 2687,Speech,Speech,Other,"Text independent voice vectors Subtitle: which of the Hollywood stars is most similar to my voice? > Authors: Dabi Ahn(andabi412@gmail.com), Noah Jung , Jujin Lee and Kyubyong Park > Demo: Check out who your voice is like! Prologue Everyone has their own voice. The same voice will not exist from different people, but some people have similar voices while others do not. This project aims to find individual voice vectors using VoxCeleb dataset, which contains 1,251 Hollywood stars' 145,379 utterances. The voice vectors are text independent, meaning that any pair of utterances from same speaker has similar voice vectors. Also the closer the vector distance is, the more voices are similar. Architectures The architecture is based on a classification model. The utterance inputted is classified as one of the Hollywood stars. The objective function is simply a cross entropy between speaker labels from ground truth and predictions. Eventually, the last layer's activation becomes the speaker's embedding. The model architecture is structured as follows. 1. memory cell CBHG module from Tacotron captures hidden features from sequential data. 2. embedding memory cell's last output is projected by the size of embedding vector. 3. softmax embedding is logits for each classes. Training VoxCeleb dataset used. 1,251 Hollywood stars' 145,379 utterances gender dist.: 690 males and 561 females age dist.: 136, 351, 318, 210, and 236 for 20s, 30s, 40s, 50s, and over 60s respectively. text independent at each step, the speaker is arbitrarily selected. for each speaker, the utterance inputted is randomly selected and cropped so that it does not matter to text. loss and train accuracy Embedding Common Voice dataset used for inference. hundreds of thousands of English utterances from numerous voice contributors in the world. evaluation accuracy embedding visualization using t SNE voices are well clustered by gender without any supervision in training. but we could not find any tendency toward age. How to run? Requirements python 2.7 tensorflow > 1.1 numpy > 1.11.1 librosa 0.5.1 tensorpack 0.8.0 Settings configurations are set in two YAML files. hparams/default.yaml includes default settings for signal processing, model, training, evaluation and embedding. hparams/hparams.yaml is for customizing the default settings in each experiment case. Runnable python files train.py for training. run python train.py some_case_name remote mode: utilizing more cores of remote server to load data and enqueue more quickly. run python train.py some_case_name remote port 1234 in local server. run python remote_dataflow.py some_case_name dest_url tcp://local server host:1234 num_thread 12 in remote server. eval.py for evaluation. run python eval.py some_case_name embedding.py for inference and getting embedding vectors. run python embedding.py some_case_name Visualizations Tensorboard Scalars tab: loss, train accuracy, and eval accuracy. Audio tab: sample audios of input speakers(wav) and predicted speakers(wav_pred) Text tab: prediction texts with the following form: 'input speaker name (meta) > predicted speaker name (meta)' ex. sample 022653 (('female', 'fifties', 'england')) > Max_Schneider (('M', '26', 'USA')) t SNE output file outputs/embedding some_case_name .png Future works One shot learning with triplet loss. References Nagrani, A., Chung, J. S., & Zisserman, A. (2017, June 27). VoxCeleb: a large scale speaker identification dataset . arXiv.org. Zhang, C., & Koishida, K. (2017). End to End Text Independent Speaker Verification with Triplet Loss on Short Utterances (pp. 1487–1491). Presented at the Interspeech 2017, ISCA: ISCA. Li, C., Ma, X., Jiang, B., Li, X., Zhang, X., Liu, X., et al. (2017, May 6). Deep Speaker: an End to End Neural Speaker Embedding System . arXiv.org.",Speech Synthesis,Speech 2767,Speech,Speech,Other,"The PyTorch Kaldi Speech Recognition Toolkit PyTorch Kaldi is an open source repository for developing state of the art DNN/HMM speech recognition systems. The DNN part is managed by PyTorch, while feature extraction, label computation, and decoding are performed with the Kaldi toolkit. This repository contains the last version of the PyTorch Kaldi toolkit (PyTorch Kaldi v1.0). To take a look into the previous version (PyTorch Kaldi v0.1), click here . If you use this code or part of it, please cite the following paper: M. Ravanelli, T. Parcollet, Y. Bengio, The PyTorch Kaldi Speech Recognition Toolkit , arXiv @inproceedings{pytorch kaldi, title {The PyTorch Kaldi Speech Recognition Toolkit}, author {M. Ravanelli and T. Parcollet and Y. Bengio}, booktitle {In Proc. of ICASSP}, year {2019} } The toolkit is released under a Creative Commons Attribution 4.0 International license . You can copy, distribute, modify the code for research, commercial and non commercial purposes. We only ask to cite our paper referenced above. To improve transparency and replicability of speech recognition results, we give users the possibility to release their PyTorch Kaldi model within this repository. Feel free to contact us (or doing a pull request) for that. Moreover, if your paper uses PyTorch Kaldi, it is also possible to advertise it in this repository. See a short introductory video on the PyTorch Kaldi Toolkit Table of Contents Introduction ( introduction) Prerequisites ( prerequisites) How to install ( how to install) Recent Updates ( recent updates) Tutorials: ( timit tutorial) TIMIT tutorial ( timit tutorial) Librispeech tutorial ( librispeech tutorial) Toolkit Overview: ( overview of the toolkit architecture) Toolkit architecture ( overview of the toolkit architecture) Configuration files ( description of the configuration files) FAQs: ( how can i plug in my model) How can I plug in my model? ( how can i plug in my model) How can I tune the hyperparameters? ( how can i tune the hyperparameters) How can I use my own dataset? ( how can i use my own dataset) How can I plug in my own features? ( how can i plug in my own features) How can I transcript my own audio files? ( how can i transcript my own audio files) Batch size, learning rate, and droput scheduler ( Batch size, learning rate, and dropout scheduler) How can I contribute to the project? ( how can i contribute to the project) EXTRA: ( speech recognition from the raw waveform with sincnet) Speech recognition from the raw waveform with SincNet ( speech recognition from the raw waveform with sincnet) Joint training between speech enhancement and ASR ( joint training between speech enhancement and asr) Distant Speech Recognition with DIRHA ( distant speech recognition with dirha) Training an autoencoder ( training an autoencoder) References ( references) Introduction The PyTorch Kaldi project aims to bridge the gap between the Kaldi and the PyTorch toolkits, trying to inherit the efficiency of Kaldi and the flexibility of PyTorch. PyTorch Kaldi is not only a simple interface between these toolkits, but it embeds several useful features for developing modern speech recognizers. For instance, the code is specifically designed to naturally plug in user defined acoustic models. As an alternative, users can exploit several pre implemented neural networks that can be customized using intuitive configuration files. PyTorch Kaldi supports multiple feature and label streams as well as combinations of neural networks, enabling the use of complex neural architectures. The toolkit is publicly released along with rich documentation and is designed to properly work locally or on HPC clusters. Some features of the new version of the PyTorch Kaldi toolkit: Easy interface with Kaldi. Easy plug in of user defined models. Several pre implemented models (MLP, CNN, RNN, LSTM, GRU, Li GRU, SincNet). Natural implementation of complex models based on multiple features, labels, and neural architectures. Easy and flexible configuration files. Automatic recovery from the last processed chunk. Automatic chunking and context expansions of the input features. Multi GPU training. Designed to work locally or on HPC clusters. Tutorials on TIMIT and Librispeech Datasets. Prerequisites 1. If not already done, install Kaldi . As suggested during the installation, do not forget to add the path of the Kaldi binaries into $HOME/.bashrc. For instance, make sure that .bashrc contains the following paths: export KALDI_ROOT /home/mirco/kaldi trunk PATH $PATH:$KALDI_ROOT/tools/openfst PATH $PATH:$KALDI_ROOT/src/featbin PATH $PATH:$KALDI_ROOT/src/gmmbin PATH $PATH:$KALDI_ROOT/src/bin PATH $PATH:$KALDI_ROOT//src/nnetbin export PATH As a first test to check the installation, open a bash shell, type copy feats or hmm info and make sure no errors appear. 2. If not already done, install PyTorch . We tested our codes on PyTorch 1.0 and PyTorch 0.4. An older version of PyTorch is likely to raise errors. To check your installation, type “python” and, once entered into the console, type “import torch”, and make sure no errors appear. 3. We recommend running the code on a GPU machine. Make sure that the CUDA libraries are installed and correctly working. We tested our system on Cuda 9.0, 9.1 and 8.0. Make sure that python is installed (the code is tested with python 2.7 and python 3.7). Even though not mandatory, we suggest using Anaconda . Recent updates 19 Feb. 2019: updates: It is now possible to dynamically change batch size, learning rate, and dropout factors during training. We thus implemented a scheduler that supports the following formalism within the config files: batch_size_train 128 12 64 10 32 2 The line above means: do 12 epochs with 128 batches, 10 epochs with 64 batches, and 2 epochs with 32 batches. A similar formalism can be used for learning rate and dropout scheduling. See this section for more information ( batch size, learning rate, and dropout scheduler). 5 Feb. 2019: updates: 1. Our toolkit now supports parallel data loading (i.e., the next chunk is stored in memory while processing the current chunk). This allows a significant speed up. 2. When performing monophone regularization users can now set “dnn_lay N_lab_out_mono”. This way the number of monophones is automatically inferred by our toolkit. 3. We integrated the kaldi io toolkit from the kaldi io for python project into data_io py. 4. We provided a better hyperparameter setting for SincNet ( see this section ( speech recognition from the raw waveform with sincnet)) 5. We released some baselines with the DIRHA dataset ( see this section ( distant speech recognition with dirha)). We also provide some configuration examples for a simple autoencoder ( see this section ( training an autoencoder)) and for a system that jointly trains a speech enhancement and a speech recognition module ( see this section ( joint training between speech enhancement and asr)) 6. We fixed some minor bugs. Notes on the next version: In the next version, we plan to further extend the functionalities of our toolkit, supporting more models and features formats. The goal is to make our toolkit suitable for other speech related tasks such as end to end speech recognition, speaker identification, keyword spotting, speech separation, speech activity detection, speech enhancement, etc. If you would like to propose some novel functionalities, please give us your feedback by filling this survey . How to install To install PyTorch Kaldi, do the following steps: 1. Make sure all the software recommended in the “Prerequisites” sections are installed and are correctly working 2. Clone the PyTorch Kaldi repository: git clone 3. Go into the project folder and Install the needed packages with: pip install r requirements.txt TIMIT tutorial In the following, we provide a short tutorial of the PyTorch Kaldi toolkit based on the popular TIMIT dataset. 1. Make sure you have the TIMIT dataset. If not, it can be downloaded from the LDC website . 2. Make sure Kaldi and PyTorch installations are fine. Make also sure that your KALDI paths are currently working (you should add the Kaldi paths into the .bashrc as reported in the section Prerequisites ). For instance, type copy feats and hmm info and make sure no errors appear. 3. Run the Kaldi s5 baseline of TIMIT. This step is necessary to compute features and labels later used to train the PyTorch neural network. We recommend running the full timit s5 recipe (including the DNN training). This way all the necessary files are created and the user can directly compare the results obtained by Kaldi with that achieved with our toolkit. 4. Compute the alignments (i.e, the phone state labels) for test and dev data with the following commands (go into $KALDI_ROOT/egs/timit/s5). If you want to use tri3 alignments, type: steps/align_fmllr.sh nj 4 data/dev data/lang exp/tri3 exp/tri3_ali_dev steps/align_fmllr.sh nj 4 data/test data/lang exp/tri3 exp/tri3_ali_test If you want to use dnn alignments (as suggested), type: steps/nnet/align.sh nj 4 data fmllr tri3/train data/lang exp/dnn4_pretrain dbn_dnn exp/dnn4_pretrain dbn_dnn_ali steps/nnet/align.sh nj 4 data fmllr tri3/dev data/lang exp/dnn4_pretrain dbn_dnn exp/dnn4_pretrain dbn_dnn_ali_dev steps/nnet/align.sh nj 4 data fmllr tri3/test data/lang exp/dnn4_pretrain dbn_dnn exp/dnn4_pretrain dbn_dnn_ali_test 5. We start this tutorial with a very simple MLP network trained on mfcc features. Before launching the experiment, take a look at the configuration file cfg/TIMIT_baselines/TIMIT_MLP_mfcc_basic.cfg . See the Description of the configuration files ( description of the configuration files) for a detailed description of all its fields. 6. Change the config file according to your paths. In particular: Set “fea_lst” with the path of your mfcc training list (that should be in $KALDI_ROOT/egs/timit/s5/data/train/feats.scp) Add your path (e.g., $KALDI_ROOT/egs/timit/s5/data/train/utt2spk) into “ utt2spk ark:” Add your CMVN transformation e.g.,$KALDI_ROOT/egs/timit/s5/mfcc/cmvn_train.ark Add the folder where labels are stored (e.g.,$KALDI_ROOT/egs/timit/s5/exp/dnn4_pretrain dbn_dnn_ali for training and ,$KALDI_ROOT/egs/timit/s5/exp/dnn4_pretrain dbn_dnn_ali_dev for dev data). To avoid errors make sure that all the paths in the cfg file exist. Please, avoid using paths containing bash variables since paths are read literally and are not automatically expanded (e.g., use /home/mirco/kaldi trunk/egs/timit/s5/exp/dnn4_pretrain dbn_dnn_ali instead of $KALDI_ROOT/egs/timit/s5/exp/dnn4_pretrain dbn_dnn_ali) 7. Run the ASR experiment: python run_exp.py cfg/TIMIT_baselines/TIMIT_MLP_mfcc_basic.cfg This script starts a full ASR experiment and performs training, validation, forward, and decoding steps. A progress bar shows the evolution of all the aforementioned phases. The script run_exp.py progressively creates the following files in the output directory: res.res : a file that summarizes training and validation performance across various validation epochs. log.log : a file that contains possible errors and warnings. conf.cfg : a copy of the configuration file. model.svg is a picture that shows the considered model and how the various neural networks are connected. This is really useful to debug models that are more complex than this one (e.g, models based on multiple neural networks). The folder exp_files contains several files that summarize the evolution of training and validation over the various epochs. For instance, files .info report chunk specific information such as the chunk_loss and error and the training time. The .cfg files are the chunk specific configuration files (see general architecture for more details), while files .lst report the list of features used to train each specific chunk. At the end of training, a directory called generated outputs containing plots of loss and errors during the various training epochs is created. Note that you can stop the experiment at any time. If you run again the script it will automatically start from the last chunk correctly processed. The training could take a couple of hours, depending on the available GPU. Note also that if you would like to change some parameters of the configuration file (e.g., n_chunks ,fea_lst ,batch_size_train ,..) you must specify a different output folder (output_folder ). Debug: If you run into some errors, we suggest to do the following checks: 1. Take a look into the standard output. 2. If it is not helpful, take a look into the log.log file. 3. Take a look into the function run_nn into the core.py library. Add some prints in the various part of the function to isolate the problem and figure out the issue. 8. At the end of training, the phone error rate (PER\%) is appended into the res.res file. To see more details on the decoding results, you can go into “decoding_test” in the output folder and take a look to the various files created. For this specific example, we obtained the following res.res file: ep 000 tr 'TIMIT_tr' loss 3.398 err 0.721 valid TIMIT_dev loss 2.268 err 0.591 lr_architecture1 0.080000 time(s) 86 ep 001 tr 'TIMIT_tr' loss 2.137 err 0.570 valid TIMIT_dev loss 1.990 err 0.541 lr_architecture1 0.080000 time(s) 87 ep 002 tr 'TIMIT_tr' loss 1.896 err 0.524 valid TIMIT_dev loss 1.874 err 0.516 lr_architecture1 0.080000 time(s) 87 ep 003 tr 'TIMIT_tr' loss 1.751 err 0.494 valid TIMIT_dev loss 1.819 err 0.504 lr_architecture1 0.080000 time(s) 88 ep 004 tr 'TIMIT_tr' loss 1.645 err 0.472 valid TIMIT_dev loss 1.775 err 0.494 lr_architecture1 0.080000 time(s) 89 ep 005 tr 'TIMIT_tr' loss 1.560 err 0.453 valid TIMIT_dev loss 1.773 err 0.493 lr_architecture1 0.080000 time(s) 88 ......... ep 020 tr 'TIMIT_tr' loss 0.968 err 0.304 valid TIMIT_dev loss 1.648 err 0.446 lr_architecture1 0.002500 time(s) 89 ep 021 tr 'TIMIT_tr' loss 0.965 err 0.304 valid TIMIT_dev loss 1.649 err 0.446 lr_architecture1 0.002500 time(s) 90 ep 022 tr 'TIMIT_tr' loss 0.960 err 0.302 valid TIMIT_dev loss 1.652 err 0.447 lr_architecture1 0.001250 time(s) 88 ep 023 tr 'TIMIT_tr' loss 0.959 err 0.301 valid TIMIT_dev loss 1.651 err 0.446 lr_architecture1 0.000625 time(s) 88 %WER 18.1 192 7215 84.0 11.9 4.2 2.1 18.1 99.5 0.583 /home/mirco/pytorch kaldi new/exp/TIMIT_MLP_basic5/decode_TIMIT_test_out_dnn1/score_6/ctm_39phn.filt.sys The achieved PER(%) is 18.1%. Note that there could be some variability in the results, due to different initializations on different machines. We believe that averaging the performance obtained with different initialization seeds (i.e., change the field seed in the config file) is crucial for TIMIT since the natural performance variability might completely hide the experimental evidence. We noticed a standard deviation of about 0.2% for the TIMIT experiments. If you want to change the features, you have to first compute them with the Kaldi toolkit. To compute fbank features, you have to open $KALDI_ROOT/egs/timit/s5/run.sh and compute them with the following lines: feadir fbank for x in train dev test; do steps/make_fbank.sh cmd $train_cmd nj $feats_nj data/$x exp/make_fbank/$x $feadir steps/compute_cmvn_stats.sh data/$x exp/make_fbank/$x $feadir done Then, change the aforementioned configuration file with the new feature list. If you already have run the full timit Kaldi recipe, you can directly find the fmllr features in $KALDI_ROOT/egs/timit/s5/data fmllr tri3 . If you feed the neural network with such features you should expect a substantial performance improvement, due to the adoption of the speaker adaptation. In the TIMIT_baseline folder, we propose several other examples of possible TIMIT baselines. Similarly to the previous example, you can run them by simply typing: python run_exp.py $cfg_file There are some examples with recurrent (TIMIT_RNN ,TIMIT_LSTM ,TIMIT_GRU ,TIMIT_LiGRU ) and CNN architectures (TIMIT_CNN ). We also propose a more advanced model (TIMIT_DNN_liGRU_DNN_mfcc+fbank+fmllr.cfg) where we used a combination of feed forward and recurrent neural networks fed by a concatenation of mfcc, fbank, and fmllr features. Note that the latter configuration files correspond to the best architecture described in the reference paper. As you might see from the above mentioned configuration files, we improve the ASR performance by including some tricks such as the monophone regularization (i.e., we jointly estimate both context dependent and context independent targets). The following table reports the results obtained by running the latter systems (average PER\%): Model mfcc fbank fMLLR Kaldi DNN Baseline 18.5 MLP 18.2 18.7 16.7 RNN 17.7 17.2 15.9 LSTM 15.1 14.3 14.5 GRU 16.0 15.2 14.9 li GRU 15.5 14.9 14.2 Results show that, as expected, fMLLR features outperform MFCCs and FBANKs coefficients, thanks to the speaker adaptation process. Recurrent models significantly outperform the standard MLP one, especially when using LSTM, GRU, and Li GRU architecture, that effectively address gradient vanishing through multiplicative gates. The best result PER $14.2$\% is obtained with the Li GRU model 2,3 , that is based on a single gate and thus saves 33% of the computations over a standard GRU. The best results are actually obtained with a more complex architecture that combines MFCC, FBANK, and fMLLR features (see cfg/TIMI_baselines/TIMIT_mfcc_fbank_fmllr_liGRU_best.cfg ). To the best of our knowledge, the PER 13.8\% achieved by the latter system yields the best published performance on the TIMIT test set. You can directly compare your results with ours by going here . In this external repository, you can find all the folders containing the generated files. Librispeech tutorial The steps to run PyTorch Kaldi on the Librispeech dataset are similar to that reported above for TIMIT. The following tutorial is based on the 100h sub set , but it can be easily extended to the full dataset (960h). 1. Run the Kaldi recipe for librispeech (at least until decode using the tri4b model) 2. Compute the fmllr features by running: . ./cmd.sh You'll want to change cmd.sh to something that will work on your system. . ./path.sh Source the tools/utils (import the queue.pl) chunk train_clean_100 chunk dev_clean Uncomment to process dev chunk test_clean Uncomment to process test gmmdir exp/tri4b dir fmllr/$chunk steps/nnet/make_fmllr_feats.sh nj 10 cmd $train_cmd \ transform dir $gmmdir/decode_tgsmall_$chunk \ $dir data/$chunk $gmmdir $dir/log $dir/data exit 1 compute cmvn stats spk2utt ark:data/$chunk/spk2utt scp:fmllr/$chunk/feats.scp ark:$dir/data/cmvn_speaker.ark 3. compute aligmenents using: aligments on dev_clean and test_clean steps/align_fmllr.sh nj 10 data/dev_clean data/lang exp/tri4b exp/tri4b_ali_dev_clean_100 steps/align_fmllr.sh nj 10 data/test_clean data/lang exp/tri4b exp/tri4b_ali_test_clean_100 4. run the experiments with the following command: python run_exp.py cfg/Librispeech_baselines/libri_MLP_fmllr.cfg. If you would like to use a recurrent model you can use libri_RNN_fmllr.cfg , libri_LSTM_fmllr.cfg , libri_GRU_fmllr.cfg , or libri_liGRU_fmllr.cfg . The training of recurrent models might take some days (depending on the adopted GPU). The performance obtained with the tgsmall graph are reported in the following table: Model WER% MLP 9.6 LSTM 8.6 GRU 8.6 li GRU 8.6 These results are obtained without adding a lattice rescoring (i.e., using only the tgsmall graph). You can improve the performance by adding lattice rescoring in this way (run it from the kaldi_decoding_script folder of Pytorch Kaldi): data_dir /data/milatmp1/ravanelm/librispeech/s5/data/ dec_dir /u/ravanelm/pytorch Kaldi new/exp/libri_fmllr/decode_test_clean_out_dnn1/ out_dir /u/ravanelm/pytorch kaldi new/exp/libri_fmllr/ steps/lmrescore_const_arpa.sh $data_dir/lang_test_{tgsmall,fglarge} \ $data_dir/test_clean $dec_dir $out_dir/decode_test_clean_fglarge exit 1; The final results obtaineed using rescoring ( fglarge ) are reported in the following table: Model WER% MLP 6.5 LSTM 6.4 GRU 6.3 li GRU 6.2 You can take a look into the results obtained here . Overview of the toolkit architecture The main script to run an ASR experiment is run_exp.py . This python script performs training, validation, forward, and decoding steps. Training is performed over several epochs, that progressively process all the training material with the considered neural network. After each training epoch, a validation step is performed to monitor the system performance on held out data. At the end of training, the forward phase is performed by computing the posterior probabilities of the specified test dataset. The posterior probabilities are normalized by their priors (using a count file) and stored into an ark file. A decoding step is then performed to retrieve the final sequence of words uttered by the speaker in the test sentences. The run_exp.py script takes in input a global config file (e.g., cfg/TIMIT_MLP_mfcc.cfg ) that specifies all the needed options to run a full experiment. The code run_exp.py calls another function run_nn (see core.py library) that performs training, validation, and forward operations on each chunk of data. The function run_nn takes in input a chunk specific config file (e.g, exp/TIMIT_MLP_mfcc/exp_files/train_TIMIT_tr+TIMIT_dev_ep000_ck00.cfg ) that specifies all the needed parameters for running a single chunk experiment. The run_nn function outputs some info filles (e.g., exp/TIMIT_MLP_mfcc/exp_files/train_TIMIT_tr+TIMIT_dev_ep000_ck00.info ) that summarize losses and errors of the processed chunk. The results are summarized into the res.res files, while errors and warnings are redirected into the log.log file. Description of the configuration files: There are two types of config files (global and chunk specific cfg files). They are both in INI format and are read, processed, and modified with the configparser library of python. The global file contains several sections, that specify all the main steps of a speech recognition experiments (training, validation, forward, and decoding). The structure of the config file is described in a prototype file (see for instance proto/global.proto ) that not only lists all the required sections and fields but also specifies the type of each possible field. For instance, N_ep int(1,inf) means that the fields N_ep (i.e, number of training epochs) must be an integer ranging from 1 to inf. Similarly, lr float(0,inf) means that the lr field (i.e., the learning rate) must be a float ranging from 0 to inf. Any attempt to write a config file not compliant with these specifications will raise an error. Let's now try to open a config file (e.g., cfg/TIMIT_baselines/TIMIT_MLP_mfcc_basic.cfg ) and let's describe the main sections: cfg_proto cfg_proto proto/global.proto cfg_proto_chunk proto/global_chunk.proto The current version of the config file first specifies the paths of the global and chunk specific prototype files in the section cfg_proto . exp cmd run_nn_script run_nn out_folder exp/TIMIT_MLP_basic5 seed 1234 use_cuda True multi_gpu False save_gpumem False n_epochs_tr 24 The section exp contains some important fields, such as the output folder ( out_folder ) and the path of the chunk specific processing script run_nn (by default this function should be implemented in the core.py library). The field N_epochs_tr specifies the selected number of training epochs. Other options about using_cuda, multi_gpu, and save_gpumem can be enabled by the user. The field cmd can be used to append a command to run the script on a HPC cluster. dataset1 data_name TIMIT_tr fea fea_name mfcc fea_lst quick_test/data/train/feats_mfcc.scp fea_opts apply cmvn utt2spk ark:quick_test/data/train/utt2spk ark:quick_test/mfcc/train_cmvn_speaker.ark ark: ark: add deltas delta order 2 ark: ark: cw_left 5 cw_right 5 lab lab_name lab_cd lab_folder quick_test/dnn4_pretrain dbn_dnn_ali lab_opts ali to pdf lab_count_file auto lab_data_folder quick_test/data/train/ lab_graph quick_test/graph n_chunks 5 dataset2 data_name TIMIT_dev fea fea_name mfcc fea_lst quick_test/data/dev/feats_mfcc.scp fea_opts apply cmvn utt2spk ark:quick_test/data/dev/utt2spk ark:quick_test/mfcc/dev_cmvn_speaker.ark ark: ark: add deltas delta order 2 ark: ark: cw_left 5 cw_right 5 lab lab_name lab_cd lab_folder quick_test/dnn4_pretrain dbn_dnn_ali_dev lab_opts ali to pdf lab_count_file auto lab_data_folder quick_test/data/dev/ lab_graph quick_test/graph n_chunks 1 dataset3 data_name TIMIT_test fea fea_name mfcc fea_lst quick_test/data/test/feats_mfcc.scp fea_opts apply cmvn utt2spk ark:quick_test/data/test/utt2spk ark:quick_test/mfcc/test_cmvn_speaker.ark ark: ark: add deltas delta order 2 ark: ark: cw_left 5 cw_right 5 lab lab_name lab_cd lab_folder quick_test/dnn4_pretrain dbn_dnn_ali_test lab_opts ali to pdf lab_count_file auto lab_data_folder quick_test/data/test/ lab_graph quick_test/graph n_chunks 1 The config file contains a number of sections ( dataset1 , dataset2 , dataset3 ,...) that describe all the corpora used for the ASR experiment. The fields on the dataset\ section describe all the features and labels considered in the experiment. The features, for instance, are specified in the field fea: , where fea_name contains the name given to the feature, fea_lst is the list of features (in the scp Kaldi format), fea_opts allows users to specify how to process the features (e.g., doing CMVN or adding the derivatives), while cw_left and cw_right set the characteristics of the context window (i.e., number of left and right frames to append). Note that the current version of the PyTorch Kaldi toolkit supports the definition of multiple features streams. Indeed, as shown in cfg/TIMIT_baselines/TIMIT_mfcc_fbank_fmllr_liGRU_best.cfg multiple feature streams (e.g., mfcc, fbank, fmllr) are employed. Similarly, the lab section contains some sub fields. For instance, lab_name refers to the name given to the label, while lab_folder contains the folder where the alignments generated by the Kaldi recipe are stored. lab_opts allows the user to specify some options on the considered alignments. For example lab_opts ali to pdf extracts standard context dependent phone state labels, while lab_opts ali to phones per frame true can be used to extract monophone targets. lab_count_file is used to specify the file that contains the counts of the considered phone states. These counts are important in the forward phase, where the posterior probabilities computed by the neural network are divided by their priors. PyTorch Kaldi allows users to both specify an external count file or to automatically retrieve it (using lab_count_file auto ). Users can also specify lab_count_file none if the count file is not strictly needed, e.g., when the labels correspond to an output not used to generate the posterior probabilities used in the forward phase (see for instance the monophone targets in cfg/TIMIT_baselines/TIMIT_MLP_mfcc.cfg ). lab_data_folder , instead, corresponds to the data folder created during the Kaldi data preparation. It contains several files, including the text file eventually used for the computation of the final WER. The last sub field lab_graph is the path of the Kaldi graph used to generate the labels. The full dataset is usually large and cannot fit the GPU/RAM memory. It should thus be split into several chunks. PyTorch Kaldi automatically splits the dataset into the number of chunks specified in N_chunks . The number of chunks might depend on the specific dataset. In general, we suggest processing speech chunks of about 1 or 2 hours (depending on the available memory). data_use train_with TIMIT_tr valid_with TIMIT_dev forward_with TIMIT_test This section tells how the data listed into the sections datasets\ are used within the run_exp.py script. The first line means that we perform training with the data called TIMIT_tr . Note that this dataset name must appear in one of the dataset sections, otherwise the config parser will raise an error. Similarly, the second and third lines specify the data used for validation and forward phases, respectively. batches batch_size_train 128 max_seq_length_train 1000 increase_seq_length_train False start_seq_len_train 100 multply_factor_seq_len_train 2 batch_size_valid 128 max_seq_length_valid 1000 batch_size_train is used to define the number of training examples in the mini batch. The fields max_seq_length_train truncates the sentences longer than the specified value. When training recurrent models on very long sentences, out of memory issues might arise. With this option, we allow users to mitigate such memory problems by truncating long sentences. Moreover, it is possible to progressively grow the maximum sentence length during training by setting increase_seq_length_train True . If enabled, the training starts with a maximum sentence length specified in start_seq_len_train (e.g, start_seq_len_train 100 ). After each epoch the maximum sentence length is multiplied by the multply_factor_seq_len_train (e.g multply_factor_seq_len_train 2 ). We have observed that this simple strategy generally improves the system performance since it encourages the model to first focus on short term dependencies and learn longer term ones only at a later stage. Similarly, batch_size_valid and max_seq_length_valid specify the number of examples in the mini batches and the maximum length for the dev dataset. architecture1 arch_name MLP_layers1 arch_proto proto/MLP.proto arch_library neural_networks arch_class MLP arch_pretrain_file none arch_freeze False arch_seq_model False dnn_lay 1024,1024,1024,1024,N_out_lab_cd dnn_drop 0.15,0.15,0.15,0.15,0.0 dnn_use_laynorm_inp False dnn_use_batchnorm_inp False dnn_use_batchnorm True,True,True,True,False dnn_use_laynorm False,False,False,False,False dnn_act relu,relu,relu,relu,softmax arch_lr 0.08 arch_halving_factor 0.5 arch_improvement_threshold 0.001 arch_opt sgd opt_momentum 0.0 opt_weight_decay 0.0 opt_dampening 0.0 opt_nesterov False The sections architecture\ are used to specify the architectures of the neural networks involved in the ASR experiments. The field arch_name specifies the name of the architecture. Since different neural networks can depend on a different set of hyperparameters, the user has to add the path of a proto file that contains the list of hyperparameters into the field proto . For example, the prototype file for a standard MLP model contains the following fields: proto library path class MLP dnn_lay str_list dnn_drop float_list(0.0,1.0) dnn_use_laynorm_inp bool dnn_use_batchnorm_inp bool dnn_use_batchnorm bool_list dnn_use_laynorm bool_list dnn_act str_list Similarly to the other prototype files, each line defines a hyperparameter with the related value type. All the hyperparameters defined in the proto file must appear into the global configuration file under the corresponding architecture\ section. The field arch_library specifies where the model is coded (e.g. neural_nets.py ), while arch_class indicates the name of the class where the architecture is implemented (e.g. if we set class MLP we will do from neural_nets.py import MLP ). The field arch_pretrain_file can be used to pre train the neural network with a previously trained architecture, while arch_freeze can be set to False if you want to train the parameters of the architecture during training and should be set to True do keep the parameters fixed (i.e., frozen) during training. The section arch_seq_model indicates if the architecture is sequential (e.g. RNNs) or non sequential (e.g., a feed forward MLP or CNN). The way PyTorch Kaldi processes the input batches is different in the two cases. For recurrent neural networks ( arch_seq_model True ) the sequence of features is not randomized (to preserve the elements of the sequences), while for feedforward models ( arch_seq_model False ) we randomize the features (this usually helps to improve the performance). In the case of multiple architectures, sequential processing is used if at least one of the employed architectures is marked as sequential ( arch_seq_model True ). Note that the hyperparameters starting with arch_ and opt_ are mandatory and must be present in all the architecture specified in the config file. The other hyperparameters (e.g., dnn_ , ) are specific of the considered architecture (they depend on how the class MLP is actually implemented by the user) and can define number and typology of hidden layers, batch and layer normalizations, and other parameters. Other important parameters are related to the optimization of the considered architecture. For instance, arch_lr is the learning rate, while arch_halving_factor is used to implement learning rate annealing. In particular, when the relative performance improvement on the dev set between two consecutive epochs is smaller than that specified in the arch_improvement_threshold (e.g, arch_improvement_threshold) we multiply the learning rate by the arch_halving_factor (e.g., arch_halving_factor 0.5 ). The field arch_opt specifies the type of optimization algorithm. We currently support SGD, Adam, and Rmsprop. The other parameters are specific to the considered optimization algorithm (see the PyTorch documentation for exact meaning of all the optimization specific hyperparameters). Note that the different architectures defined in archictecture\ can have different optimization hyperparameters and they can even use a different optimization algorithm. model model_proto proto/model.proto model out_dnn1 compute(MLP_layers1,mfcc) loss_final cost_nll(out_dnn1,lab_cd) err_final cost_err(out_dnn1,lab_cd) The way all the various features and architectures are combined is specified in this section with a very simple and intuitive meta language. The field model: describes how features and architectures are connected to generate as output a set of posterior probabilities. The line out_dnn1 compute(MLP_layers,mfcc) means feed the architecture called MLP_layers1 with the features called mfcc and store the output into the variable out_dnn1 ”. From the neural network output out_dnn1 the error and the loss functions are computed using the labels called lab_cd , that have to be previously defined into the datasets\ sections. The err_final and loss_final fields are mandatory subfields that define the final output of the model. A much more complex example (discussed here just to highlight the potentiality of the toolkit) is reported in cfg/TIMIT_baselines/TIMIT_mfcc_fbank_fmllr_liGRU_best.cfg : model model_proto proto/model.proto model:conc1 concatenate(mfcc,fbank) conc2 concatenate(conc1,fmllr) out_dnn1 compute(MLP_layers_first,conc2) out_dnn2 compute(liGRU_layers,out_dnn1) out_dnn3 compute(MLP_layers_second,out_dnn2) out_dnn4 compute(MLP_layers_last,out_dnn3) out_dnn5 compute(MLP_layers_last2,out_dnn3) loss_mono cost_nll(out_dnn5,lab_mono) loss_mono_w mult_constant(loss_mono,1.0) loss_cd cost_nll(out_dnn4,lab_cd) loss_final sum(loss_cd,loss_mono_w) err_final cost_err(out_dnn4,lab_cd) In this case we first concatenate mfcc, fbank, and fmllr features and we then feed a MLP. The output of the MLP is fed into the a recurrent neural network (specifically a Li GRU model). We then have another MLP layer ( MLP_layers_second ) followed by two softmax classifiers (i.e., MLP_layers_last , MLP_layers_last2 ). The first one estimates standard context dependent states, while the second estimates monophone targets. The final cost function is a weighted sum between these two predictions. In this way we implement the monophone regularization, that turned out to be useful to improve the ASR performance. The full model can be considered as a single big computational graph, where all the basic architectures used in the model section are jointly trained. For each mini batch, the input features are propagated through the full model and the cost_final is computed using the specified labels. The gradient of the cost function with respect to all the learnable parameters of the architecture is then computed. All the parameters of the employed architectures are then updated together with the algorithm specified in the architecture\ sections. forward forward_out out_dnn1 normalize_posteriors True normalize_with_counts_from lab_cd save_out_file True require_decoding True The section forward first defines which is the output to forward (it must be defined into the model section). if normalize_posteriors True , these posterior are normalized by their priors (using a count file). If save_out_file True , the posterior file (usually a very big ark file) is stored, while if save_out_file False this file is deleted when not needed anymore. The require_decoding is a boolean that specifies if we need to decode the specified output. The field normalize_with_counts_from set which counts using to normalize the posterior probabilities. decoding decoding_script_folder kaldi_decoding_scripts/ decoding_script decode_dnn.sh decoding_proto proto/decoding.proto min_active 200 max_active 7000 max_mem 50000000 beam 13.0 latbeam 8.0 acwt 0.2 max_arcs 1 skip_scoring false scoring_script local/score.sh scoring_opts min lmwt 1 max lmwt 10 norm_vars False The decoding section reports parameters about decoding, i.e. the steps that allows one to pass from a sequence of the context dependent probabilities provided by the DNN into a sequence of words. The field decoding_script_folder specifies the folder where the decoding script is stored. The decoding script field is the script used for decoding (e.g., decode_dnn.sh ) that should be in the decoding_script_folder specified before. The field decoding_proto reports all the parameters needed for the considered decoding script. To make the code more flexible, the config parameters can also be specified within the command line. For example, you can run: python run_exp.py quick_test/example_newcode.cfg optimization,lr 0.01 batches,batch_size 4 The script will replace the learning rate in the specified cfg file with the specified lr value. The modified config file is then stored into out_folder/config.cfg . The script run_exp.py automatically creates chunk specific config files, that are used by the run_nn function to perform a single chunk training. The structure of chunk specific cfg files is very similar to that of the global one. The main difference is a field to_do {train, valid, forward} that specifies the type of processing to on the features chunk specified in the field fea . Why proto files? Different neural networks, optimization algorithms, and HMM decoders might depend on a different set of hyperparameters. To address this issue, our current solution is based on the definition of some prototype files (for global, chunk, architecture config files). In general, this approach allows a more transparent check of the fields specified into the global config file. Moreover, it allows users to easily add new parameters without changing any line of the python code. For instance, to add a user defined model, a new proto file (e.g., user model.prot o) that specifies the hyperparameter must be written. Then, the user should only write a class (e.g., user model in neural_networks.py ) that implements the architecture). FAQs How can I plug in my model The toolkit is designed to allow users to easily plug in their own acoustic models. To add a customized neural model do the following steps: 1. Go into the proto folder and create a new proto file (e.g., proto/myDNN.proto ). The proto file is used to specify the list of the hyperparameters of your model that will be later set into the configuration file. To have an idea about the information to add to your proto file, you can take a look into the MLP.proto file: proto dnn_lay str_list dnn_drop float_list(0.0,1.0) dnn_use_laynorm_inp bool dnn_use_batchnorm_inp bool dnn_use_batchnorm bool_list dnn_use_laynorm bool_list dnn_act str_list 2. The parameter dnn_lay must be a list of string, dnn_drop (i.e., the dropout factors for each layer) is a list of float ranging from 0.0 and 1.0, dnn_use_laynorm_inp and dnn_use_batchnorm_inp are booleans that enable or disable batch or layer normalization of the input. dnn_use_batchnorm and dnn_use_laynorm are a list of boolean that decide layer by layer if batch/layer normalization has to be used. The parameter dnn_act is again a list of string that sets the activation function of each layer. Since every model is based on its own set of hyperparameters, different models have a different prototype file. For instance, you can take a look into GRU.proto and see that the hyperparameter list is different from that of a standard MLP. Similarly to the previous examples, you should add here your list of hyperparameters and save the file. 3. Write a PyTorch class implementing your model. Open the library neural_networks.py and look at some of the models already implemented. For simplicity, you can start taking a look into the class MLP. The classes have two mandatory methods: init and forward . The first one is used to initialize the architecture, the second specifies the list of computations to do. The method init takes in input two variables that are automatically computed within the run_nn function. inp_dim is simply the dimensionality of the neural network input, while options is a dictionary containing all the parameters specified into the section architecture of the configuration file. For instance, you can access to the DNN activations of the various layers in this way: options 'dnn_lay' .split(',') . As you might see from the MLP class, the initialization method defines and initializes all the parameters of the neural network. The forward method takes in input a tensor x (i.e., the input data) and outputs another vector containing x. If your model is a sequence model (i.e., if there is at least one architecture with arch_seq_model true in the cfg file), x is a tensor with (time_steps, batches, N_in), otherwise is a (batches, N_in) matrix. The class forward defines the list of computations to transform the input tensor into a corresponding output tensor. The output must have the sequential format (time_steps, batches, N_out) for recurrent models and the non sequential format (batches, N_out) for feed forward models. Similarly to the already implemented models the user should write a new class (e.g., myDNN) that implements the customized model: class myDNN(nn.Module): def __init__(self, options,inp_dim): super(myDNN, self).__init__() // initialize the parameters def forward(self, x): // do some computations out f(x) return out 4. Create a configuration file. Now that you have defined your model and the list of its hyperparameters, you can create a configuration file. To create your own configuration file, you can take a look into an already existing config file (e.g., for simplicity you can consider cfg/TIMIT_baselines/TIMIT_MLP_mfcc_basic.cfg ). After defining the adopted datasets with their related features and labels, the configuration file has some sections called architecture\ . Each architecture implements a different neural network. In cfg/TIMIT_baselines/TIMIT_MLP_mfcc_basic.cfg we only have architecture1 since the acoustic model is composed of a single neural network. To add your own neural network, you have to write an architecture section (e.g., architecture1 ) in the following way: architecture1 arch_name mynetwork (this is a name you would like to use to refer to this architecture within the following model section) arch_proto proto/myDNN.proto (here is the name of the proto file defined before) arch_library neural_networks (this is the name of the library where myDNN is implemented) arch_class myDNN (This must be the name of the class you have implemented) arch_pretrain_file none (With this you can specify if you want to pre train your model) arch_freeze False (set False if you want to update the parameters of your model) arch_seq_model False (set False for feed forward models, True for recurrent models) Then, you have to specify proper values for all the hyperparameters specified in proto/myDNN.proto . For the MLP.proto , we have: dnn_lay 1024,1024,1024,1024,1024,N_out_lab_cd dnn_drop 0.15,0.15,0.15,0.15,0.15,0.0 dnn_use_laynorm_inp False dnn_use_batchnorm_inp False dnn_use_batchnorm True,True,True,True,True,False dnn_use_laynorm False,False,False,False,False,False dnn_act relu,relu,relu,relu,relu,softmax Then, add the following parameters related to the optimization of your own architecture. You can use here standard sdg, adam, or rmsprop (see cfg/TIMIT_baselines/TIMIT_LSTM_mfcc.cfg for an example with rmsprop): arch_lr 0.08 arch_halving_factor 0.5 arch_improvement_threshold 0.001 arch_opt sgd opt_momentum 0.0 opt_weight_decay 0.0 opt_dampening 0.0 opt_nesterov False 5. Save the configuration file into the cfg folder (e.g, cfg/myDNN_exp.cfg ). 6. Run the experiment with: python run_exp.sh cfg/myDNN_exp.cfg 7. To debug the model you can first take a look at the standard output. The config file is automatically parsed by the run_exp.sh and it raises errors in case of possible problems. You can also take a look into the log.log file to see additional information on the possible errors. When implementing a new model, an important debug test consists of doing an overfitting experiment (to make sure that the model is able to overfit a tiny dataset). If the model is not able to overfit, it means that there is a major bug to solve. 8. Hyperparameter tuning. In deep learning, it is often important to play with the hyperparameters to find the proper setting for your model. This activity is usually very computational and time consuming but is often necessary when introducing new architectures. To help hyperparameter tuning, we developed a utility that implements a random search of the hyperparameters (see next section for more details). How can I tune the hyperparameters A hyperparameter tuning is often needed in deep learning to search for proper neural architectures. To help tuning the hyperparameters within PyTorch Kaldi, we have implemented a simple utility that implements a random search. In particular, the script tune_hyperparameters.py generates a set of random configuration files and can be run in this way: python tune_hyperparameters.py cfg/TIMIT_MLP_mfcc.cfg exp/TIMIT_MLP_mfcc_tuning 10 arch_lr randfloat(0.001,0.01) batch_size_train randint(32,256) dnn_act choose_str{relu,relu,relu,relu,softmax tanh,tanh,tanh,tanh,softmax} The first parameter is the reference cfg file that we would like to modify, while the second one is the folder where the random configuration files are saved. The third parameter is the number of the random config file that we would like to generate. There is then the list of all the hyperparameters that we want to change. For instance, arch_lr randfloat(0.001,0.01) will replace the field arch_lr with a random float ranging from 0.001 to 0.01. batch_size_train randint(32,256) will replace batch_size_train with a random integer between 32 and 256 and so on. Once the config files are created, they can be run sequentially or in parallel with: python run_exp.py $cfg_file How can I use my own dataset PyTorch Kaldi can be used with any speech dataset. To use your own dataset, the steps to take are similar to those discussed in the TIMIT/Librispeech tutorials. In general, what you have to do is the following: 1. Run the Kaldi recipe with your dataset. Please, see the Kaldi website to have more information on how to perform data preparation. 2. Compute the alignments on training, validation, and test data. 3. Write a PyTorch Kaldi config file $cfg_file . 4. Run the config file with python run_exp.sh $cfg_file . How can I plug in my own features The current version of PyTorch Kaldi supports input features stored with the Kaldi ark format. If the user wants to perform experiments with customized features, the latter must be converted into the ark format. Take a look into the Kaldi io for python git repository for a detailed description about converting numpy arrays into ark files. Moreover, you can take a look into our utility called save_raw_fea.py. This script generates Kaldi ark files containing raw features, that are later used to train neural networks fed by the raw waveform directly (see the section about processing audio with SincNet). How can I transcript my own audio files The current version of Pytorch Kaldi supports the standard production process of using a Pytorch Kaldi pre trained acoustic model to transcript one or multiples .wav files. It is important to understand that you must have a trained Pytorch Kaldi model. While you don't need labels or alignments anymore, Pytorch Kaldi still needs many files to transcript a new audio file: 1. The features and features list feats.scp (with .ark files, see how can i plug my own features) 2. The decoding graph (usually created with mkgraph.sh during previous model training such as triphones models) 3. The final.mdl file that has been used to create the acoustic features (only for decoding, not mandatory if you have your custom decoding script) Once you have all these files, you can start adding your dataset section to the global configuration file. The easiest way is to copy the cfg file used to train your acoustic model and just modify by adding a new dataset : dataset4 data_name myWavFile fea fea_name fbank fea_lst myWavFilePath/data/feats.scp fea_opts apply cmvn utt2spk ark:myWavFilePath/data//utt2spk ark:myWavFilePath/cmvn_test.ark ark: ark: add deltas delta order 0 ark: ark: cw_left 5 cw_right 5 lab lab_name none lab_data_folder myWavFilePath/data/ lab_graph myWavFilePath/exp/tri3/graph n_chunks 1 data_use train_with TIMIT_tr valid_with TIMIT_dev forward_with myWavFile The key string for your audio file transcription is lab_name none . The none tag asks Pytorch Kaldi to enter a production mode that only does the forward propagation and decoding without any labels. You don't need TIMIT_tr and TIMIT_dev to be on your production server since Pytorch Kaldi will skip this information to directly go to the forward phase of the dataset given in the forward_with field. As you can see, the global fea field requires the exact same parameters than standard training or testing dataset, while the lab field only requires two parameters. Please, note that lab_data_folder is nothing more than the same path as fea_lst . Finally, you still need to specify the number of chunks you want to create to process this file (1 hour 1 chunk). In a production scenario, you might need to transcript a huge number of audio files, and you don't want to create as much as needed .cfg file. In this extent, and after creating this initial production .cfg file (you can leave the path blank), you can call the run_exp.py script with specific arguments referring to your different.wav features: python run_exp.py cfg/TIMIT_baselines/TIMIT_MLP_fbank_prod.cfg dataset4,fea,0,fea_lst myWavFilePath/data/feats.scp dataset4,lab,0,lab_data_folder myWavFilePath/data/ dataset4,lab,0,lab_graph myWavFilePath/exp/tri3/graph/ This command will internally alter the configuration file with your specified paths, and run and your defined features! Note that passing long arguments to the run_exp.py script requires a specific notation. dataset4 specifies the name of the created section, fea is the name of the higher level field, fea_lst or lab_graph are the name of the lowest level field you want to change. The 0 is here to indicate which lowest level field you want to alter, indeed some configuration files may contain multiple lab_graph per dataset! Therefore, 0 indicates the first occurrence, 1 the second ... Paths MUST be encapsulated by to be interpreted as full strings! Note that you need to alter the data_name and forward_with fields if you don't want different .wav files transcriptions to erase each other (decoding files are stored accordingly to the field data_name ). dataset4,data_name MyNewName data_use,forward_with MyNewName . Batch size, learning rate, and dropout scheduler In order to give users more flexibility, the latest version of PyTorch Kaldi supports scheduling of the batch size, max_seq_length_train, learning rate, and dropout factor. This means that it is now possible to change these values during training. To support this feature, we implemented the following formalisms within the config files: batch_size_train 128 12 64 10 32 2 In this case, our batch size will be 128 for the first 12 epochs, 64 for the following 10 epochs, and 32 for the last two epochs. By default means for N times , while is used to indicate a change of the batch size. Note that if the user simply sets batch_size_train 128 , the batch size is kept fixed during all the training epochs by default. A similar formalism can be used to perform learning rate scheduling: arch_lr 0.08 10 0.04 5 0.02 3 0.01 2 0.005 2 0.0025 2 In this case, if the user simply sets arch_lr 0.08 the learning rate is annealed with the new bob procedure used in the previous version of the toolkit. In practice, we start from the specified learning rate and we multiply it by a halving factor every time that the improvement on the validation dataset is smaller than the threshold specified in the field arch_improvement_threshold . Also the dropout factor can now be changed during training with the following formalism: dnn_drop 0.15 12 0.20 12,0.15,0.15 10 0.20 14,0.15,0.0 With the line before we can set a different dropout rate for different layers and for different epochs. For instance, the first hidden layer will have a dropout rate of 0.15 for the first 12 epochs, and 0.20 for the other 12. The dropout factor of the second layer, instead, will remain constant to 0.15 over all the training. The same formalism is used for all the layers. Note that indicates a change in the dropout factor within the same layer, while , indicates a different layer. You can take a look here into a config file where batch sizes, learning rates, and dropout factors are changed here: cfg/TIMIT_baselines/TIMIT_mfcc_basic_flex.cfg or here: cfg/TIMIT_baselines/TIMIT_liGRU_fmllr_lr_schedule.cfg How can I contribute to the project The project is still in its initial phase and we invite all potential contributors to participate. We hope to build a community of developers larger enough to progressively maintain, improve, and expand the functionalities of our current toolkit. For instance, it could be helpful to report any bug or any suggestion to improve the current version of the code. People can also contribute by adding additional neural models, that can eventually make richer the set of currently implemented architectures. EXTRA Speech recognition from the raw waveform with SincNet Take a look into our video introduction to SincNet SincNet is a convolutional neural network recently proposed to process raw audio waveforms. In particular, SincNet encourages the first layer to discover more meaningful filters by exploiting parametrized sinc functions. In contrast to standard CNNs, which learn all the elements of each filter, only low and high cutoff frequencies of band pass filters are directly learned from data. This inductive bias offers a very compact way to derive a customized filter bank front end, that only depends on some parameters with a clear physical meaning. For a more detailed description of the SincNet model, please refer to the following papers: M. Ravanelli, Y. Bengio, Speaker Recognition from raw waveform with SincNet , in Proc. of SLT 2018 ArXiv M. Ravanelli, Y.Bengio, Interpretable Convolutional Filters with SincNet , in Proc. of NIPS@IRASL 2018 ArXiv To use this model for speech recognition on TIMIT, to the following steps: 1. Follows the steps described in the “TIMIT tutorial”. 2. Save the raw waveform into the Kaldi ark format. To do it, you can use the save_raw_fea.py utility in our repository. The script saves the input signals into a binary Kaldi archive, keeping the alignments with the pre computed labels. You have to run it for all the data chunks (e.g., train, dev, test). It can also specify the length of the speech chunk ( sig_wlen 200 ms ) composing each frame. 3. Open the cfg/TIMIT_baselines/TIMIT_SincNet_raw.cfg , change your paths, and run: python ./run_exp.sh cfg/TIMIT_baselines/TIMIT_SincNet_raw.cfg 4. With this architecture, we have obtained a PER(%) 17.1% . A standard CNN fed the same features gives us a PER(%) 18.% . Please, see here to take a look into our results. Our results on SincNet outperforms results obtained with MFCCs and FBANKs fed by standard feed forward networks. In the following table, we compare the result of SincNet with other feed forward neural network: Model WER(\%) MLP fbank 18.7 MLP mfcc 18.2 CNN raw 18.1 SincNet raw 17.2 Joint training between speech enhancement and ASR In this section, we show how to use PyTorch Kaldi to jointly train a cascade between a speech enhancement and a speech recognition neural networks. The speech enhancement has the goal of improving the quality of the speech signal by minimizing the MSE between clean and noisy features. The enhanced features then feed another neural network that predicts context dependent phone states. In the following, we report a toy task example based on a reverberated version of TIMIT, that is only intended to show how users should set the config file to train such a combination of neural networks. Even though some implementation details (and the adopted datasets) are different, this tutorial is inspired by this paper: M. Ravanelli, P. Brakel, M. Omologo, Y. Bengio, Batch normalized joint training for DNN based distant speech recognition , in Proceedings of STL 2016 arXiv To run the system do the following steps: 1 Make sure you have the standard clean version of TIMIT available. 2 Run the Kaldi s5 baseline of TIMIT. This step is necessary to compute the clean features (that will be the labels of the speech enhancement system) and the alignments (that will be the labels of the speech recognition system). We recommend running the full timit s5 recipe (including the DNN training). 3 The standard TIMIT recipe uses MFCCs features. In this tutorial, instead, we use FBANK features. To compute FBANK features run the following script in $KALDI_ROOT/egs/TIMIT/s5 : feadir fbank for x in train dev test; do steps/make_fbank.sh cmd $train_cmd nj $feats_nj data/$x exp/make_fbank/$x $feadir steps/compute_cmvn_stats.sh data/$x exp/make_fbank/$x $feadir done Note that we use 40 FBANKS here, while Kaldi uses by default 23 FBANKs. To compute 40 dimensional features go into $KALDI_ROOT/egs/TIMIT/conf/fbank.conf and change the number of considered output filters. 4 Go to this external repository and follow the steps to generate a reverberated version of TIMIT starting from the clean one. Note that this is just a toy task that is only helpful to show how setting up a joint training system. 5 Compute the FBANK features for the TIMIT_rev dataset. To do it, you can copy the scripts in $KALDI_ROOT/egs/TIMIT/ into $KALDI_ROOT/egs/TIMIT_rev/ . Please, copy also the data folder. Note that the audio files in the TIMIT_rev folders are saved with the standard WAV format, while TIMIT is released with the SPHERE format. To bypass this issue, open the files data/train/wav.scp , data/dev/wav.scp , data/test/wav.scp and delete the part about SPHERE reading (e.g., /home/mirco/kaldi trunk/tools/sph2pipe_v2.5/sph2pipe f wav ). You also have to change the paths from the standard TIMIT to the reverberated one (e.g. replace /TIMIT/ with /TIMIT_rev/). Remind to remove the final pipeline symbol“ ”. Save the changes and run the computation of the fbank features in this way: feadir fbank for x in train dev test; do steps/make_fbank.sh cmd $train_cmd nj $feats_nj data/$x exp/make_fbank/$x $feadir steps/compute_cmvn_stats.sh data/$x exp/make_fbank/$x $feadir done Remember to change the $KALDI_ROOT/egs/TIMIT_rev/conf/fbank.conf file in order to compute 40 features rather than the 23 FBANKS of the default configuration. 6 Once features are computed, open the following config file: cfg/TIMIT_baselines/TIMIT_rev/TIMIT_joint_training_liGRU_fbank.cfg Remember to change the paths according to where data are stored in your machine. As you can see, we consider two types of features. The fbank_rev features are computed from the TIMIT_rev dataset, while the fbank_clean features are derived from the standard TIMIT dataset and are used as targets for the speech enhancement neural network. As you can see in the model section of the config file, we have the cascade between networks doing speech enhancement and speech recognition. The speech recognition architecture jointly estimates both context dependent and monophone targets (thus using the so called monophone regularization). To run an experiment type the following command: python run_exp.py cfg/TIMIT_baselines/TIMIT_rev/TIMIT_joint_training_liGRU_fbank.cfg 7 Results With this configuration file, you should obtain a Phone Error Rate (PER) 28.1% . Note that some oscillations around this performance are more than natural and are due to different initialization of the neural parameters. You can take a closer look into our results here Distant Speech Recognition with DIRHA In this tutorial, we use the DIRHA English dataset to perform a distant speech recognition experiment. The DIRHA English Dataset is a multi microphone speech corpus being developed under the EC project DIRHA. The corpus is composed of both real and simulated sequences recorded with 32 sample synchronized microphones in a domestic environment. The database contains signals of different characteristics in terms of noise and reverberation making it suitable for various multi microphone signal processing and distant speech recognition tasks. The part of the dataset currently released is composed of 6 native US speakers (3 Males, 3 Females) uttering 409 wall street journal sentences. The training data have been created using a realistic data contamination approach, that is based on contaminating the clean speech wsj 5k sentences with high quality multi microphone impulse responses measured in the targeted environment. For more details on this dataset, please refer to the following papers: M. Ravanelli, L. Cristoforetti, R. Gretter, M. Pellin, A. Sosi, M. Omologo, The DIRHA English corpus and related tasks for distant speech recognition in domestic environments , in Proceedings of ASRU 2015. ArXiv M. Ravanelli, P. Svaizer, M. Omologo, Realistic Multi Microphone Data Simulation for Distant Speech Recognition , in Proceedings of Interspeech 2016. ArXiv In this tutorial, we use the aforementioned simulated data for training (using LA6 microphone), while test is performed using the real recordings (LA6). This task is very realistic, but also very challenging. The speech signals are characterized by a reverberation time of about 0.7 seconds. Non stationary domestic noises (such as vacuum cleaner, steps, phone rings, etc.) are also present in the real recordings. Let’s start now with the practical tutorial. 1 If not available, download the DIRHA dataset from the LDC website . LDC releases the full dataset for a small fee. 2 Go this external reposotory . As reported in this repository, you have to generate the contaminated WSJ dataset with the provided MATLAB script. Then, you can run the proposed KALDI baseline to have features and labels ready for our pytorch kaldi toolkit. 3 Open the following configuration file: cfg/DIRHA_baselines/DIRHA_liGRU_fmllr.cfg The latter configuration file implements a simple RNN model based on a Light Gated Recurrent Unit (Li GRU). We used fMLLR as input features. Change the paths and run the following command: python run_exp.py cfg/DIRHA_baselines/DIRHA_liGRU_fmllr.cfg 4 Results: The aforementioned system should provide Word Error Rate (WER%) 23.2% . You can find the results obtained by us here . Using the other configuration files in the cfg/DIRHA_baselines folder you can perform experiments with different setups. With the provided configuration files you can obtain the following results: Model WER(\%) MLP 26.1 GRU 25.3 Li GRU 23.8 Training an autoencoder The current version of the repository is mainly designed for speech recognition experiments. We are actively working a new version, which is much more flexible and can manage input/output different from Kaldi features/labels. Even with the current version, however, it is possible to implement other systems, such as an autoencoder. An autoencoder is a neural network whose inputs and outputs are the same. The middle layer normally contains a bottleneck that forces our representations to compress the information of the input. In this tutorial, we provide a toy example based on the TIMIT dataset. For instance, see the following configuration file: cfg/TIMIT_baselines/TIMIT_MLP_fbank_autoencoder.cfg Our inputs are the standard 40 dimensional fbank coefficients that are gathered using a context windows of 11 frames (i.e., the total dimensionality of our input is 440). A feed forward neural network (called MLP_encoder) encodes our features into a 100 dimensional representation. The decoder (called MLP_decoder) is fed by the learned representations and tries to reconstruct the output. The system is trained with Mean Squared Error (MSE) metric. Note that in the Model section we added this line “err_final cost_err(dec_out,lab_cd)” at the end. The current version of the model, in fact, by default needs that at least one label is specified (we will remove this limit in the next version). You can train the system running the following command: python run_exp.py cfg/TIMIT_baselines/TIMIT_MLP_fbank_autoencoder.cfg The results should look like this: ep 000 tr 'TIMIT_tr' loss 0.139 err 0.999 valid TIMIT_dev loss 0.076 err 1.000 lr_architecture1 0.080000 lr_architecture2 0.080000 time(s) 41 ep 001 tr 'TIMIT_tr' loss 0.098 err 0.999 valid TIMIT_dev loss 0.062 err 1.000 lr_architecture1 0.080000 lr_architecture2 0.080000 time(s) 39 ep 002 tr 'TIMIT_tr' loss 0.091 err 0.999 valid TIMIT_dev loss 0.058 err 1.000 lr_architecture1 0.040000 lr_architecture2 0.040000 time(s) 39 ep 003 tr 'TIMIT_tr' loss 0.088 err 0.999 valid TIMIT_dev loss 0.056 err 1.000 lr_architecture1 0.020000 lr_architecture2 0.020000 time(s) 38 ep 004 tr 'TIMIT_tr' loss 0.087 err 0.999 valid TIMIT_dev loss 0.055 err 0.999 lr_architecture1 0.010000 lr_architecture2 0.010000 time(s) 39 ep 005 tr 'TIMIT_tr' loss 0.086 err 0.999 valid TIMIT_dev loss 0.054 err 1.000 lr_architecture1 0.005000 lr_architecture2 0.005000 time(s) 39 ep 006 tr 'TIMIT_tr' loss 0.086 err 0.999 valid TIMIT_dev loss 0.054 err 1.000 lr_architecture1 0.002500 lr_architecture2 0.002500 time(s) 39 ep 007 tr 'TIMIT_tr' loss 0.086 err 0.999 valid TIMIT_dev loss 0.054 err 1.000 lr_architecture1 0.001250 lr_architecture2 0.001250 time(s) 39 ep 008 tr 'TIMIT_tr' loss 0.086 err 0.999 valid TIMIT_dev loss 0.054 err 0.999 lr_architecture1 0.000625 lr_architecture2 0.000625 time(s) 41 ep 009 tr 'TIMIT_tr' loss 0.086 err 0.999 valid TIMIT_dev loss 0.054 err 0.999 lr_architecture1 0.000313 lr_architecture2 0.000313 time(s) 38 You should only consider the field loss . The filed err only contains not useuful information in this case (for the aforementioned reason). You can take a look into the generated features typing the following command: copy feats ark:exp/TIMIT_MLP_fbank_autoencoder/exp_files/forward_TIMIT_test_ep009_ck00_enc_out.ark ark,t: more References 1 M. Ravanelli, T. Parcollet, Y. Bengio, The PyTorch Kaldi Speech Recognition Toolkit , ArxIv 2 M. Ravanelli, P. Brakel, M. Omologo, Y. Bengio, Improving speech recognition by revising gated recurrent units , in Proceedings of Interspeech 2017. ArXiv 3 M. Ravanelli, P. Brakel, M. Omologo, Y. Bengio, Light Gated Recurrent Units for Speech Recognition , in IEEE Transactions on Emerging Topics in Computational Intelligence. ArXiv 4 M. Ravanelli, Deep Learning for Distant Speech Recognition , PhD Thesis, Unitn 2017. ArXiv 5 T. Parcollet, M. Ravanelli, M. Morchid, G. Linarès, C. Trabelsi, R. De Mori, Y. Bengio, Quaternion Recurrent Neural Networks , in Proceedings of ICLR 2019 ArXiv 6 T. Parcollet, M. Morchid, G. Linarès, R. De Mori, Bidirectional Quaternion Long Short Term Memory Recurrent Neural Networks for Speech Recognition , in Proceedings of ICASSP 2019 ArXiv",Speech Recognition,Speech 2395,Graphs,Graphs,Other,"Metric Learning using Graph Convolutional Neural Networks (GCNs) The code in this repository implements a metric learning approach for irregular graphs. The method has been applied on brain connectivity networks and is presented in our papers: Sofia Ira Ktena, Sarah Parisot, Enzo Ferrante, Martin Rajchl, Matthew Lee, Ben Glocker, Daniel Rueckert, Metric learning with spectral graph convolutions on brain connectivity networks , NeuroImage, 2018. Sofia Ira Ktena, Sarah Parisot, Enzo Ferrante, Martin Rajchl, Matthew Lee, Ben Glocker, Daniel Rueckert, Distance Metric Learning using Graph Convolutional Networks: Application to Functional Brain Networks , Medical Image Computing and Computer Assisted Interventions (MICCAI), 2017. The code is released under the terms of the MIT license (LICENSE.txt). Please cite the above paper if you use it. There is also implementations of the filters and graph coarsening used in: Michaël Defferrard, Xavier Bresson, Pierre Vandergheynst, Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering , Neural Information Processing Systems (NIPS), 2016. The implementaton of the global loss function is based on: Vijay Kuma, Gustavo Carneiro, Ian Reid, Learning Local Image Descriptors with Deep Siamese and Triplet Convolutional Networks by Minimising Global Loss Functions , IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. Installation 1. Clone this repository. sh git clone cd gcn_metric_learning 2. Install the dependencies. Please edit requirements.txt to choose the TensorFlow version (CPU / GPU, Linux / Mac) you want to install, or install it beforehand. sh pip install r requirements.txt or make install Using the model To use our siamese graph ConvNet on your data, you need: 1. pairs of graphs as matrices where each row is a node and each column is a node feature, 2. a class label for each graph, 3. an adjacency matrix which provides the structure as a graph; the same structure will be used for all samples. Please get in touch if you are unsure about applying the model to a different setting.",Node Classification,Graphs 2454,Graphs,Graphs,Other,"Graph_Nets_GCN Prerequisites 1. Linux or OSX 2. Python 3.6+ 3. Sonnet(deepmind) 4. Tensorflow 5. Graphnet(deepmind) Usage python Node_Apply_GCN.py epochs 500 Paper Semi Supervised Classification with Graph Convolutional Networks _Joseph Redmon, Ali Farhadi_ Abstract We present a scalable approach for semi supervised learning on graph structured data that is based on an efficient variant of convolutional neural networks which operate directly on graphs. We motivate the choice of our convolutional architecture via a localized first order approximation of spectral graph convolutions. Our model scales linearly in the number of graph edges and learns hidden layer representations that encode both local graph structure and features of nodes. In a number of experiments on citation networks and on a knowledge graph dataset we demonstrate that our approach outperforms related methods by a significant margin. Paper Original Implementation Download Dataset cora Cite Please cite our paper if you use this code in your own work: @article{kipf2016semi, title {Semi Supervised Classification with Graph Convolutional Networks}, author {Kipf, Thomas N and Welling, Max}, journal {arXiv preprint arXiv:1609.02907}, year {2016} }",Node Classification,Graphs 2502,Graphs,Graphs,Other,"LINE: Large scale information network embedding Introduction This is the LINE toolkit developed for embedding very large scale information networks. It is suitable to a variety of networks including directed, undirected, binary or weighted edges. The LINE model is quite efficient, which is able to embed a network with millions of vertices and billions of edges on a single machine within a few hours. Contact: Jian Tang, tangjianpku@gmail.com Project page: This work was done when the author was working at Microsoft Research Usage We provide both the Windows and LINUX versions. To compile the souce codes, some external packages are required, which are used to generate random numbers for the edge sampling algorithm in the LINE model. For Windows version, the BOOST package is used and can be downloaded at for LINUX, the GSL package is used and can be downloaded at Network Input The input of a network consists of the edges in the network. Each line of the input file represents a DIRECTED edge in the network, which is specified as the format source_node target_node weight (can be either separated by blank or tab). For each undirected edge, users must use TWO DIRECTED edges to represent it. Here is an input example of a word co occurrence network: good the 3 the good 3 good bad 1 bad good 1 bad of 4 of bad 4 Run ./line train network_file output embedding_file binary 1 size 200 order 2 negative 5 samples 100 rho 0.025 threads 20 train, the input file of a network; output, the output file of the embedding; binary, whether saving the output file in binary mode; the default is 0 (off); size, the dimension of the embedding; the default is 100; order, the order of the proximity used; 1 for first order, 2 for second order; the default is 2; negative, the number of negative samples used in negative sampling; the deault is 5; samples, the total number of training samples ( Million); rho, the starting value of the learning rate; the default is 0.025; threads, the total number of threads used; the default is 1. Files in the folder line.cpp, the souce code of the LINE; reconstruct.cpp, the code used for reconstructing the sparse networks into dense ones, which is described in Section 4.3; normalize.cpp, the code for normalizing the embeddings (l2 normalization); concatenate.cpp, the code for concatenating the embeddings with 1st order and 2nd order; Examples We provide an example running script for the Youtube data set (available at The script will first run LINE to learn network embeddings, then it will evaluate the learned embeddings on the node classification task. To run the script, users first need to compile the evaluation codes by running make.sh in the folder evaluate . Afterwards, we can run train_youtube.bat or train_youtube.sh to run the whole pipeline. Citation @inproceedings{tang2015line, title {LINE: Large scale Information Network Embedding.}, author {Tang, Jian and Qu, Meng and Wang, Mingzhe and Zhang, Ming and Yan, Jun and Mei, Qiaozhu}, booktitle {WWW}, year {2015}, organization {ACM} }",Node Classification,Graphs 2595,Graphs,Graphs,Other,"Graph Auto Encoders This is a TensorFlow implementation of the (Variational) Graph Auto Encoder model as described in our paper: T. N. Kipf, M. Welling, Variational Graph Auto Encoders , NIPS Workshop on Bayesian Deep Learning (2016) Graph Auto Encoders (GAEs) are end to end trainable neural network models for unsupervised learning, clustering and link prediction on graphs. ! (Variational) Graph Auto Encoder (figure.png) GAEs have successfully been used for: Link prediction in large scale relational data: M. Schlichtkrull & T. N. Kipf et al., Modeling Relational Data with Graph Convolutional Networks (2017), Matrix completion / recommendation with side information: R. Berg et al., Graph Convolutional Matrix Completion (2017). GAEs are based on Graph Convolutional Networks (GCNs), a recent class of models for end to end (semi )supervised learning on graphs: T. N. Kipf, M. Welling, Semi Supervised Classification with Graph Convolutional Networks , ICLR (2017). A high level introduction is given in our blog post: Thomas Kipf, Graph Convolutional Networks (2016) Installation bash python setup.py install Requirements TensorFlow (1.0 or later) python 2.7 networkx scikit learn scipy Run the demo bash python train.py Data In order to use your own data, you have to provide an N by N adjacency matrix (N is the number of nodes), and an N by D feature matrix (D is the number of features per node) optional Have a look at the load_data() function in input_data.py for an example. In this example, we load citation network data (Cora, Citeseer or Pubmed). The original datasets can be found here: and here (in a different format): You can specify a dataset as follows: bash python train.py dataset citeseer (or by editing train.py ) Models You can choose between the following models: gcn_ae : Graph Auto Encoder (with GCN encoder) gcn_vae : Variational Graph Auto Encoder (with GCN encoder) Cite Please cite our paper if you use this code in your own work: @article{kipf2016variational, title {Variational Graph Auto Encoders}, author {Kipf, Thomas N and Welling, Max}, journal {NIPS Workshop on Bayesian Deep Learning}, year {2016} }",Link Prediction,Graphs 2596,Graphs,Graphs,Other,"Graph Convolutional Networks This is a TensorFlow implementation of Graph Convolutional Networks for the task of (semi supervised) classification of nodes in a graph, as described in our paper: Thomas N. Kipf, Max Welling, Semi Supervised Classification with Graph Convolutional Networks (ICLR 2017) For a high level explanation, have a look at our blog post: Thomas Kipf, Graph Convolutional Networks (2016) Installation bash python setup.py install Requirements tensorflow (>0.12) networkx Run the demo bash cd gcn python train.py Data In order to use your own data, you have to provide an N by N adjacency matrix (N is the number of nodes), an N by D feature matrix (D is the number of features per node), and an N by E binary label matrix (E is the number of classes). Have a look at the load_data() function in utils.py for an example. In this example, we load citation network data (Cora, Citeseer or Pubmed). The original datasets can be found here: In our version (see data folder) we use dataset splits provided by (Zhilin Yang, William W. Cohen, Ruslan Salakhutdinov, Revisiting Semi Supervised Learning with Graph Embeddings , ICML 2016). You can specify a dataset as follows: bash python train.py dataset citeseer (or by editing train.py ) Models You can choose between the following models: gcn : Graph convolutional network (Thomas N. Kipf, Max Welling, Semi Supervised Classification with Graph Convolutional Networks , 2016) gcn_cheby : Chebyshev polynomial version of graph convolutional network as described in (Michaël Defferrard, Xavier Bresson, Pierre Vandergheynst, Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering , NIPS 2016) dense : Basic multi layer perceptron that supports sparse inputs Graph classification Our framework also supports batch wise classification of multiple graph instances (of potentially different size) with an adjacency matrix each. It is best to concatenate respective feature matrices and build a (sparse) block diagonal matrix where each block corresponds to the adjacency matrix of one graph instance. For pooling (in case of graph level outputs as opposed to node level outputs) it is best to specify a simple pooling matrix that collects features from their respective graph instances, as illustrated below: ! graph_classification Cite Please cite our paper if you use this code in your own work: @inproceedings{kipf2017semi, title {Semi Supervised Classification with Graph Convolutional Networks}, author {Kipf, Thomas N. and Welling, Max}, booktitle {International Conference on Learning Representations (ICLR)}, year {2017} }",Node Classification,Graphs 2597,Graphs,Graphs,Other,"Iterative Classification Algorithm This is a python/sklearn implementation of the Iterative Classification Algorithm from: Qing Lu, Lise Getoor, Link based classification (ICML 2003) which served as a semi supervised classification baseline in our recent paper: Thomas N. Kipf, Max Welling, Semi Supervised Classification with Graph Convolutional Networks (2016) This implementation is largely based on and adapted from: Installation bash python setup.py install Requirements sklearn networkx Run the demo bash python train.py Data In order to use your own data, you have to provide an N by N adjacency matrix (N is the number of nodes), an N by D feature matrix (D is the number of features per node), and a N by E binary label matrix (E is the number of classes). Have a look at the load_data() function in utils.py for an example. In this example, we load citation network data (Cora, Citeseer or Pubmed). The original datasets can be found here: In our version (see data folder) we use dataset splits provided by (Zhilin Yang, William W. Cohen, Ruslan Salakhutdinov, Revisiting Semi Supervised Learning with Graph Embeddings , ICML 2016). You can specify a dataset as follows: bash python train.py dataset citeseer (or by editing train.py )",Node Classification,Graphs 2598,Graphs,Graphs,Other,"Graph Convolutional Networks in PyTorch PyTorch implementation of Graph Convolutional Networks (GCNs) for semi supervised classification 1 . For a high level introduction to GCNs, see: Thomas Kipf, Graph Convolutional Networks (2016) ! Graph Convolutional Networks (figure.png) Note: There are subtle differences between the TensorFlow implementation in and this PyTorch re implementation. This re implementation serves as a proof of concept and is not intended for reproduction of the results reported in 1 . This implementation makes use of the Cora dataset from 2 . Installation python setup.py install Requirements PyTorch 0.4 or 0.5 Python 2.7 or 3.6 Usage python train.py References 1 Kipf & Welling, Semi Supervised Classification with Graph Convolutional Networks, 2016 2 Sen et al., Collective Classification in Network Data, AI Magazine 2008 Cite Please cite our paper if you use this code in your own work: @article{kipf2016semi, title {Semi Supervised Classification with Graph Convolutional Networks}, author {Kipf, Thomas N and Welling, Max}, journal {arXiv preprint arXiv:1609.02907}, year {2016} }",Node Classification,Graphs 2618,Graphs,Graphs,Other,"PSCN is a python3 implementation of the paper Learning Convolutional Neural Networks for Graphs by Mathias Niepert, Mohamed Ahmed and Konstantin Kutzkov Requires : networkx > 2, keras, pynauty To install pynauty : Go to folder pynauty 0.6.0 Build pynauty : make pynauty Set user : make user ins pynauty Install : pip install . Questions about the paper : There is no indication about the labeling procedure used for the classification : is it chosen among a bunch of procedures (which one ?) before the classification using Theorem 2, or is it fixed to one procedure (eigenvector centrality, degree, ...). In this implementation it is fixed to be the betweeness centrality. The convolution seems to be made over the dicrete labels/attributes of the graph nodes in the classification (molecules represented by 0,1,2 etc for MUTAG, PTC..) : what is the sense of such a convolution ? For dummy nodes (when the size of the receptive field is higher than the size of the graph) which node attribute should be used ??",Graph Classification,Graphs 2663,Graphs,Graphs,Other,Graph Capsule CNN Networks (coming soon...) Graph Capsule Convolutional Neural Networks,Graph Classification,Graphs 2692,Graphs,Graphs,Other,"Local Neighborhood Graph Autoencoders This is a Keras implementation of the symmetrical autoencoder architecture with parameter sharing for the tasks of unsupervised link prediction and semi supervised node classification, as described in the following: Tran, Phi Vu. Learning to Make Predictions on Graphs with Autoencoders. Proceedings of the 5th IEEE International Conference on Data Science and Advanced Analytics (2018). Full oral paper. Tran, Phi Vu. Multi Task Graph Autoencoders. NIPS 2018 Workshop on Relational Representation Learning. Short poster paper. ! FCN_schematic (figure1.png?raw true) Requirements The code is tested on Ubuntu 16.04 with the following components: Software Python 2.7 Keras 2.0.6 using TensorFlow GPU 1.1.0 backend CUDA 8.0 with CuDNN 5.1 NetworkX 1.11 NumPy 1.11 SciPy 0.17.0 Scikit Learn 0.18.1 Hardware Intel Xeon CPU with 32 cores 64GB of system RAM NVIDIA GeForce GTX TITAN X GPU with 12GB of VRAM Datasets Citation networks from Thomas Kipf and Max Welling. 2016. Semi Supervised Classification with Graph Convolutional Networks : Cora , Citeseer , Pubmed Collaboration and social networks from Wang et al. 2016. Structural Deep Network Embedding : Arxiv GRQC , BlogCatalog Miscellaneous networks from Aditya Krishna Menon and Charles Elkan. 2011. Link Prediction via Matrix Factorization : Protein , Metabolic , Conflict , PowerGrid For custom graph datasets, the following are required: N x N adjacency matrix (N is the number of nodes) required for link prediction , N x F matrix of node features (F is the number of features per node) optional for link prediction , N x C matrix of one hot label classes (C is the number of classes) required for node classification . For an example of how to prepare the input dataset, take a look at the load_citation_data() function in utils_gcn.py . Usage For training and evaluation, execute the following bash commands in the same directory where the code resides: bash Set the PYTHONPATH environment variable $ export PYTHONPATH /path/to/this/repo:$PYTHONPATH Train the autoencoder model for network reconstruction using only latent features learned from local graph topology. $ python train_reconstruction.py Train the autoencoder model for link prediction using only latent features learned from local graph topology. $ python train_lp.py Train the autoencoder model for link prediction using both latent graph features and available explicit node features. $ python train_lp_with_feats.py Train the autoencoder model for the multi task learning of both link prediction and semi supervised node classification, simultaneously. $ python train_multitask_lpnc.py The flag refers to one of the following nine supported dataset strings: protein , metabolic , conflict , powergrid , cora , citeseer , pubmed , arxiv grqc , blogcatalog . The flag denotes the GPU device ID, 0 by default if only one GPU is available. Citation If you find this work useful, please cite the following: @inproceedings{Tran LoNGAE:2018, author {Tran, Phi Vu}, title {Learning to Make Predictions on Graphs with Autoencoders}, booktitle {5th IEEE International Conference on Data Science and Advanced Analytics}, year {2018} }",Link Prediction,Graphs 2717,Graphs,Graphs,Other,"GraphSage: Representation Learning on Large Graphs Authors: William L. Hamilton (wleif@stanford.edu), Rex Ying (rexying@stanford.edu) Project Website Alternative reference PyTorch implementation Overview This directory contains code necessary to run the GraphSage algorithm. GraphSage can be viewed as a stochastic generalization of graph convolutions, and it is especially useful for massive, dynamic graphs that contain rich feature information. See our paper for details on the algorithm. Note: GraphSage now also has better support for training on smaller, static graphs and graphs that don't have node features. The original algorithm and paper are focused on the task of inductive generalization (i.e., generating embeddings for nodes that were not present during training), but many benchmarks/tasks use simple static graphs that do not necessarily have features. To support this use case, GraphSage now includes optional identity features that can be used with or without other node attributes. Including identity features will increase the runtime, but also potentially increase performance (at the usual risk of overfitting). See the section on Running the code below. Note: GraphSage is intended for use on large graphs (>100,000) nodes. The overhead of subsampling will start to outweigh its benefits on smaller graphs. The example_data subdirectory contains a small example of the protein protein interaction data, which includes 3 training graphs + one validation graph and one test graph. The full Reddit and PPI datasets (described in the paper) are available on the project website . If you make use of this code or the GraphSage algorithm in your work, please cite the following paper: @inproceedings{hamilton2017inductive, author {Hamilton, William L. and Ying, Rex and Leskovec, Jure}, title {Inductive Representation Learning on Large Graphs}, booktitle {NIPS}, year {2017} } Requirements Recent versions of TensorFlow, numpy, scipy, sklearn, and networkx are required (but networkx must be G.json A networkx specified json file describing the input graph. Nodes have 'val' and 'test' attributes specifying if they are a part of the validation and test sets, respectively. id_map.json A json stored dictionary mapping the graph node ids to consecutive integers. class_map.json A json stored dictionary mapping the graph node ids to classes. feats.npy optional A numpy stored array of node features; ordering given by id_map.json. Can be omitted and only identity features will be used. walks.txt optional A text file specifying random walk co occurrences (one pair per line) ( only for unsupervised version of graphsage) To run the model on a new dataset, you need to make data files in the format described above. To run random walks for the unsupervised model and to generate the walks.txt file) you can use the run_walks function in graphsage.utils . Model variants The user must also specify a model, the variants of which are described in detail in the paper: graphsage_mean GraphSage with mean based aggregator graphsage_seq GraphSage with LSTM based aggregator graphsage_maxpool GraphSage with max pooling aggregator (as described in the NIPS 2017 paper) graphsage_meanpool GraphSage with mean pooling aggregator (a variant of the pooling aggregator, where the element wie mean replaces the element wise max). gcn GraphSage with GCN based aggregator n2v an implementation of DeepWalk (called n2v for short in the code.) Logging directory Finally, a base_log_dir should be specified (it defaults to the current directory). The output of the model and log files will be stored in a subdirectory of the base_log_dir. The path to the logged data will be of the form /graphsage / . The supervised model will output F1 scores, while the unsupervised model will train embeddings and store them. The unsupervised embeddings will be stored in a numpy formated file named val.npy with val.txt specifying the order of embeddings as a per line list of node ids. Note that the full log outputs and stored embeddings can be 5 10Gb in size (on the full data when running with the unsupervised variant). Using the output of the unsupervised models The unsupervised variants of GraphSage will output embeddings to the logging directory as described above. These embeddings can then be used in downstream machine learning applications. The eval_scripts directory contains examples of feeding the embeddings into simple logistic classifiers. Acknowledgements The original version of this code base was originally forked from and we owe many thanks to Thomas Kipf for making his code available. We also thank Yuanfang Li and Xin Li who contributed to a course project that was based on this work. Please see the paper for funding details and additional (non code related) acknowledgements.",Node Classification,Graphs 2745,Graphs,Graphs,Other,"Graph ConvNets in PyTorch October 15, 2017 Xavier Bresson Description Prototype implementation in PyTorch of the NIPS'16 paper: Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering M Defferrard, X Bresson, P Vandergheynst Advances in Neural Information Processing Systems, 3844 3852, 2016 ArXiv preprint: arXiv:1606.09375 Code objective The code provides a simple example of graph ConvNets for the MNIST classification task. The graph is a 8 nearest neighbor graph of a 2D grid. The signals on graph are the MNIST images vectorized as $28^2 \times 1$ vectors. Installation sh git clone cd graph_convnets_pytorch pip install r requirements.txt installation for python 3.6.2 python check_install.py jupyter notebook run the 2 notebooks Results GPU Quadro M4000 Standard ConvNets: 01_standard_convnet_lenet5_mnist_pytorch.ipynb , accuracy 99.31, speed 6.9 sec/epoch. Graph ConvNets: 02_graph_convnet_lenet5_mnist_pytorch.ipynb , accuracy 99.19, speed 100.8 sec/epoch Note PyTorch has not yet implemented function torch.mm(sparse, dense) for variables: It will be certainly implemented but in the meantime, I defined a new autograd function for sparse variables, called my_sparse_mm , by subclassing torch.autograd.function and implementing the forward and backward passes. python class my_sparse_mm(torch.autograd.Function): Implementation of a new autograd function for sparse variables, called my_sparse_mm , by subclassing torch.autograd.Function and implementing the forward and backward passes. def forward(self, W, x): W is SPARSE self.save_for_backward(W, x) y torch.mm(W, x) return y def backward(self, grad_output): W, x self.saved_tensors grad_input grad_output.clone() grad_input_dL_dW torch.mm(grad_input, x.t()) grad_input_dL_dx torch.mm(W.t(), grad_input ) return grad_input_dL_dW, grad_input_dL_dx When to use this algorithm? Any problem that can be cast as analyzing a set of signals on a fixed graph, and you want to use ConvNets for this analysis.",Node Classification,Graphs 2786,Graphs,Graphs,Other,"RELEARN Code for the paper .... KDD 2019 Command To train an RELEARN model with default setting, please see run.sh Key Parameters sample_mode : there are 4 type of loss, each one correspond to one component of sample mode. n is node feature reconstruct loss, l is link prediction loss, dc is diffusion content reconstruction loss, ds is diffusion structure(link) prediction loss. diffusion_threshold : filter out diffusion which contain nodes less than this threshold. neighbor_sample_size : how many neighbor to aggregate in GCN layer. sample_size : how many data to be used in one epoch for each sample mode. Note that for the two link prediction loss, sample size is the sum of positive sample size and negative sample size. negative_sample_size : it is negative sample / positive sample. sample_embed : the dimension of hidden state, also the dimension of learned embedding. relation : number of relations to be used in variational inference. use_superv : whether to add supervision in trainig. superv_ratio : how many supervision to add, used in label efficiency experiments. a, b, c, d : weights for different loss, main hyper parameter to tune in practice.",Node Classification,Graphs 2836,Graphs,Graphs,Other,"Graph Convolutional Networks in PyTorch PyTorch implementation of Graph Convolutional Networks (GCNs) for semi supervised classification 1 . For a high level introduction to GCNs, see: Thomas Kipf, Graph Convolutional Networks (2016) ! Graph Convolutional Networks (figure.png) Note: There are subtle differences between the TensorFlow implementation in and this PyTorch re implementation. This re implementation serves as a proof of concept and is not intended for reproduction of the results reported in 1 . This implementation makes use of the Cora dataset from 2 . Installation python setup.py install Requirements PyTorch 0.4 or 0.5 Python 2.7 or 3.6 Usage python train.py References 1 Kipf & Welling, Semi Supervised Classification with Graph Convolutional Networks, 2016 2 Sen et al., Collective Classification in Network Data, AI Magazine 2008 Cite Please cite our paper if you use this code in your own work: @article{kipf2016semi, title {Semi Supervised Classification with Graph Convolutional Networks}, author {Kipf, Thomas N and Welling, Max}, journal {arXiv preprint arXiv:1609.02907}, year {2016} }",Node Classification,Graphs 2848,Graphs,Graphs,Other,"PWC M3GM October 29: all model code is here, documented, and validated. Bonus content is here. All done! This repository contains code for Max Margin Markov Graph Models (M3GMs) as described in the paper : Predicting Semantic Relations using Global Graph Properties . Full citation format below. Code Requirements The project was written and tested in Python 3.6. Some packages needed to run it include: dynet 2.0 scipy tqdm nltk with the wordnet corpus available Write me, or open an issue, if you find more blocking dependencies! Workflow The eventual goal of training an M3GM model and replicating the results from the paper runs through a number of intermediate steps. Here is the hopefully full linearized flowchart, with some detailed descriptions in following sections: 1. Create a pickled WordNet prediction dataset in sparse matrix format, using create_wn18_data.py (create_wn18_data.py). To use our exact dataset, obtain the distibution of WN18RR here and point the script at the text version. 1. Obtain synset embeddings. These can be AutoExtend based ones, which map directly to synsets, or any downloadable word embeddings which can then be averaged across synset lexemes, such as those from FastText . 1. If your embeddings are word level, synsetify them using embed_from_words.py (embed_from_words.py). Run it without parameters to see usage. 1. Train an association model (for baseline results or for training an M3GM on top) using pretrain_assoc.py (pretrain_assoc.py). Demo command (for the result from the paper) given below. 1. Train an M3GM using predict_wn18.py (predict_wn18.py). Demo command (for results from the paper ) given below. 1. If so inclined, tune the alpha_r parameters using optimize_alpha_per_relation.py (optimize_alpha_per_relation.py). You will need to do some math later to translate into results comparable to those in the paper. Disclaimer: some of the code here takes a while to run. Any suggestions for improving any of the calculations, or for getting things to run on GPUs for that matter, will be most appreciated. Association Models This script trains a local association model using one of several models (see paper for details): Bilinear, TransE, Diag R1 ( diagonal + rank 1 matrix ), DistMult. Be sure to keep record of the embedding dimension used (no need to provide the dimension as an argument if initializing from an a pre trained file) and of the association algorithm ( assoc mode ), as these will be necessary for downstream M3GM training. One parameter you may want to add depending on your target setup is rule override , which trains modules for all relations, including the four symmetric ones (in WordNet). It would also evaluate on trained modules in symmetric relations, rather than with a (high accuracy) rule based system. The default behavior, without this parameter, is training said modules once every five epochs, as it helps with synset embeddings tuning. The early stopping method used is: for each dev epoch, if its MRR score is lower than both of the last two epochs, halt and return the best model so far. Outputs the auto generated log file (avoid using no log ) will output many, many scores and their components for every single instance encountered. model out is readable both by this code for test mode, and by downstream M3GM trainer ( model param). Demo command python pretrain_assoc.py input data/wn18rr.pkl embeddings data/ft embs all lower.vec model out models/pret_transE nll assoc mode transE neg samp 10 early stopping eval dev Max Margin Markov Graph Models The most powerful use case for M3GM is when we've trained a good association model, and augment it with weights for combinatorial graph features by way of M3GM training. It is best if the association weights, as well as the word embeddings, are frozen from this point on, using the no assoc bp parameter. If we believe some of them to be bad, they can later be weighted down using the optimize_alpha_per_relation.py (optimize_alpha_per_relation.py) post processor, which computes a best performing association component weight for each relation. model only init is a related parameter, which ensures that the M3GM component is trained over the data (makes more sense when considering that there's also an ergm model input parameter which can be used for picking up training from a saved point). A prerequesite for this code to run in the common mode is that both emb size and assoc mode are set to the same values that the association model was trained with. Outputs the auto generated log file (avoid using no log ) will output ERGM scores for all instances and negative samples in training phase, and all cases of re ranking in the development data traversals. model out will save the model in a four file format that can later be read by both this script and the test mode code ( TODO ). rerank out provides an input file for optimize_alpha_per_relation.py (optimize_alpha_per_relation.py). It includes all to be reranked lists from the dev set and scores from both association and graph components, as well as flags for the true instances. Demo command python predict_wn18.py input data/wn18rr.pkl emb size 300 model models/pret_transE ep 14 model only init assoc mode transE eval dev no assoc bp epochs 3 neg samp 10 regularize 0.01 rand all skip symmetrics model out models/from_pret_trE 3eps rerank out from_pret_trE 3eps.txt Model Development A good entry point to try and play with the ERGM features underlying M3GM would be ergm_feats.py (ergm_feats.py). Be sure to enter them into the cache and feature set in model.py (model.py) so they can have weights trained for them. Running the dataset creation code with the no symmetrics flag would result in a dataset we called WN18RSR when working on this research. It contains only the seven asymmetric, nonreciprocal relations. All model results on it are abysmal, but you're welcome to try :) Repo level TODOs X Add exploration Notebook for WordNet (WN) structure X Add mapping of Synset codes from WN 1.7.1 all the way to 3.0. Move non script code into lib directory Remove dy.parameter() calls (deprecated in dynet 2.0.4 ) Turn any remaining TODOs from here into repo issues Citation @InProceedings{pinter eisenstein:2018:EMNLP, author {Pinter, Yuval and Eisenstein, Jacob}, title {{Predicting Semantic Relations using Global Graph Properties}}, booktitle {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing}, month {October November}, year {2018}, address {Brussels, Belgium}, publisher {Association for Computational Linguistics}, pages {1741 1751}, url { } Contact uvp@gatech.edu .",Link Prediction,Graphs 2860,Graphs,Graphs,Other,"Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering The code in this repository implements an efficient generalization of the popular Convolutional Neural Networks (CNNs) to arbitrary graphs, presented in our paper: Michaël Defferrard, Xavier Bresson, Pierre Vandergheynst, Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering arXiv , Neural Information Processing Systems (NIPS), 2016. Additional material: NIPS2016 spotlight video video , 2016 11 22. Deep Learning on Graphs slides_ntds , a lecture for EPFL's master course A Network Tour of Data Science ntds , 2016 12 21. Deep Learning on Graphs slides_dlid , an invited talk at the Deep Learning on Irregular Domains dlid workshop of BMVC, 2017 09 17. video : slides_ntds : ntds : slides_dlid : dlid : There is also implementations of the filters used in: Joan Bruna, Wojciech Zaremba, Arthur Szlam, Yann LeCun, Spectral Networks and Locally Connected Networks on Graphs bruna , International Conference on Learning Representations (ICLR), 2014. Mikael Henaff, Joan Bruna and Yann LeCun, Deep Convolutional Networks on Graph Structured Data henaff , arXiv, 2015. arXiv : bruna : henaff : Installation 1. Clone this repository. sh git clone cd cnn_graph 2. Install the dependencies. The code should run with TensorFlow 1.0 and newer. sh pip install r requirements.txt or make install 3. Play with the Jupyter notebooks. sh jupyter notebook Reproducing our results Run all the notebooks to reproduce the experiments on MNIST (nips2016/mnist.ipynb) and 20NEWS (nips2016/20news.ipynb) presented in the paper. sh cd nips2016 make Using the model To use our graph ConvNet on your data, you need: 1. a data matrix where each row is a sample and each column is a feature, 2. a target vector, 3. optionally, an adjacency matrix which encodes the structure as a graph. See the usage notebook usage for a simple example with fabricated data. Please get in touch if you are unsure about applying the model to a different setting. usage : License & co The code in this repository is released under the terms of the MIT license (LICENSE.txt). Please cite our paper arXiv if you use it. @inproceedings{cnn_graph, title {Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering}, author {Defferrard, Micha\ el and Bresson, Xavier and Vandergheynst, Pierre}, booktitle {Advances in Neural Information Processing Systems}, year {2016}, url { }",Node Classification,Graphs 1818,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Enhancing Sentence Embedding with Generalized Pooling Source code for Enhancing Sentence Embedding with Generalized Pooling based on Theano. If you use this code as part of any published research, please acknowledge the following paper. Enhancing Sentence Embedding with Generalized Pooling Qian Chen, Zhen Hua Ling, Xiaodan Zhu. _COLING (2018)_ @InProceedings{Chen Qian:2018:COLING, author {Chen, Qian and Ling, Zhen Hua and Zhu, Xiaodan}, title {Enhancing Sentence Embedding with Generalized Pooling}, booktitle {Proceedings of the 27th International Conference on Computational Linguistics (COLING 2018)}, month {August}, year {2018}, address {Santa Fe, USA}, publisher {ACL} } Homepage of the Qian Chen, Dependencies To run it perfectly, you will need (recommend using Ananconda to set up environment): Python 2.7.13 Theano 0.9.0 Running the Script 1. Download and preprocess cd data bash fetch_and_preprocess.sh 2. Train and test model cd scripts/generalized pooling/ bash train.sh The result is in scripts/generalized pooling/log.txt file. 3. Analysis the result for dev/test set (optional) bash test.sh",Natural Language Inference,Natural Language Inference 1971,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Multi Task Deep Neural Networks for Natural Language Understanding This PyTorch package implements the Multi Task Deep Neural Networks (MT DNN) for Natural Language Understanding, as described in: Xiaodong Liu\ , Pengcheng He\ , Weizhu Chen and Jianfeng Gao Multi Task Deep Neural Networks for Natural Language Understanding arXiv version \ : Equal contribution Xiaodong Liu, Pengcheng He, Weizhu Chen and Jianfeng Gao Improving Multi Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding arXiv version Quickstart Setup Environment Install via pip: 1. python3.6 2. install requirements > pip install r requirements.txt Use docker: 1. Pull docker > docker pull allenlao/pytorch mt dnn:v0.1 2. Run docker > docker run it rm runtime nvidia allenlao/pytorch mt dnn:v0.1 bash Please refere the following link if you first use docker: Train a toy MT DNN model 1. Download data > sh download.sh Please refer to download GLUE dataset: 2. Preprocess data > python prepro.py 3. Training > python train.py Note that we ran experiments on 4 V100 GPUs for base MT DNN models. You may need to reduce batch size for other GPUs. GLUE Result reproduce 1. MTL refinement: refine MT DNN (shared layers), initialized with the pre trained BERT model, via MTL using all GLUE tasks excluding WNLI to learn a new shared representation. Note that we ran this experiment on 8 V100 GPUs (32G) with a batch size of 32. + Preprocess GLUE data via the aforementioned script + Training: >scripts\run_mt_dnn.sh 2. Finetuning: finetune MT DNN to each of the GLUE tasks to get task specific models. Here, we provide two examples, STS B and RTE. You can use similar scripts to finetune all the GLUE tasks. + Finetune on the STS B task > scripts\run_stsb.sh You should get about 90.5/90.4 on STS B dev in terms of Pearson/Spearman correlation. + Finetune on the RTE task > scripts\run_rte.sh You should get about 83.8 on RTE dev in terms of accuracy. SciTail & SNIL Result reproduce (Domain Adaptation) 1. Domain Adaptation on SciTail >scripts\scitail_domain_adaptation_bash.sh 2. Domain Adaptation on SNLI >scripts\snli_domain_adaptation_bash.sh TODO Release codes/models MT DNN with Knowledge Distillation. Publish pretrained Tensorflow checkpoints. FAQ Do you shared the pretrained mt dnn models? Yes, we released the pretrained shared embedings via MTL which are aligned to BERT base/large models: mt_dnn_base.pt and mt_dnn_large.pt . To obtain the similar models: 1. run the >sh scripts\run_mt_dnn.sh , and then pick the best checkpoint based on the average dev preformance of MNLI/RTE. 2. strip the task specific layers via scritps\strip_model.py . Why SciTail/SNLI do not enable SAN? For SciTail/SNLI tasks, the purpose is to test generalization of the learned embedding and how easy it is adapted to a new domain instead of complicated model structures for a direct comparison with BERT. Thus, we use a linear projection on the all domain adaptation settings. What is the difference between V1 and V2 The difference is in the QNLI dataset. Please refere to the GLUE official homepage for more details. Do you fine tune single task for your GLUE leaderboard submission? We can use the multi task refinement model to run the prediction and produce a reasonable result. But to achieve a better result, it requires a fine tuneing on each task. It is worthing noting the paper in arxiv is a littled out dated and on the old GLUE dataset. We will update the paper as we mentioned below. Notes and Acknowledgments BERT pytorch is from: BERT: We also used some code from: How do I cite MT DNN? For now, please cite arXiv version : @article{liu2019mt dnn, title {Multi Task Deep Neural Networks for Natural Language Understanding}, author {Liu, Xiaodong and He, Pengcheng and Chen, Weizhu and Gao, Jianfeng}, journal {arXiv preprint arXiv:1901.11504}, year {2019} } and a new version of the paper will be shared later. @article{liu2019mt dnn kd, title {Improving Multi Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding}, author {Liu, Xiaodong and He, Pengcheng and Chen, Weizhu and Gao, Jianfeng}, journal {arXiv preprint arXiv:1904.09482}, year {2019} } Typo: there is no activation fuction in Equation 2. Contact Information For help or issues using MT DNN, please submit a GitHub issue. For personal communication related to MT DNN, please contact Xiaodong Liu ( xiaodl@microsoft.com ), Pengcheng He ( penhe@microsoft.com ), Weizhu Chen ( wzchen@microsoft.com ) or Jianfeng Gao ( jfgao@microsoft.com ).",Natural Language Inference,Natural Language Inference 2039,Natural Language Processing,Natural Language Processing,Natural Language Processing,"GLUE Baselines This repo contains the code for baselines for the Generalized Language Understanding Evaluation (GLUE) benchmark. See our paper for more details about GLUE or the baselines. Deprecation Warning Use this code to reproduce our baselines. If you want code to use as a starting point for new development, though, we strongly recommend using jiant instead—it's a much more extensive and much better documented toolkit built around the same goals. Dependencies Make sure you have installed the packages listed in environment.yml. When listed, specific particular package versions are required. If you use conda, you can create an environment from this package with the following command: conda env create f environment.yml Note: The version of AllenNLP available on pip may not be compatible with PyTorch 0.4, in which we recommend installing from source . Downloading GLUE We provide a convenience python script for downloading all GLUE data and standard splits. python download_glue_data.py data_dir glue_data tasks all After downloading GLUE, point PATH_PREFIX in src/preprocess.py to the directory containing the data. If you are blocked from s3.amazonaws.com (as may be the case in China), downloading MRPC will fail, instead you can run the command below: git clone python download_glue_data.py data_dir glue_data tasks all path_to_mrpc paraphrase_identification/dataset/msr paraphrase corpus Running To run our baselines, use src/main.py . Because preprocessing is expensive (particularly for ELMo) and we often want to run multiple experiments using the same preprocessing, we use an argument exp_dir for sharing preprocessing between experiments. We use argument run_dir to save information specific to a particular run, with run_dir usually nested within exp_dir . python main.py exp_dir EXP_DIR run_dir RUN_DIR train_tasks all word_embs_file PATH_TO_GLOVE NB: The version of AllenNLP used has issues with tensorboard. You may need to substitute calls from tensorboard import SummaryWriter to from tensorboardX import SummaryWriter in your AllenNLP source files. GloVe, CoVe, and ELMo Many of our models make use of GloVe pretrained word embeddings , in particular the 300 dimensional, 840B version. To use GloVe vectors, download and extract the relevant files and set word_embs_file to the GloVe file. To learn embeddings from scratch, set glove to 0. We use the CoVe implementation provided here . To use CoVe, clone the repo and fill in PATH_TO_COVE in src/models.py and set cove to 1. We use the ELMo implementation provided by AllenNLP . To use ELMo, set elmo to 1. To use ELMo without GloVe, additionally set elmo_no_glove to 1. Reference If you use this code or GLUE, please consider citing us. @unpublished{wang2018glue title {{GLUE}: A Multi Task Benchmark and Analysis Platform for Natural Language Understanding} author {Wang, Alex and Singh, Amanpreet and Michael, Julian and Hill, Felix and Levy, Omer and Bowman, Samuel R.} note {arXiv preprint 1804.07461} year {2018} } Feel free to contact alexwang _at_ nyu.edu with any questions or comments.",Natural Language Inference,Natural Language Inference 2040,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Baseline Models for MultiNLI Corpus This is the code we used to establish baselines for the MultiNLI corpus introduced in A Broad Coverage Challenge Corpus for Sentence Understanding through Inference . Data The MultiNLI and SNLI corpora are both distributed in JSON lines and tab separated value files. Both can be downloaded here . Models We present three baseline neural network models. These range from a bare bones model (CBOW), to an elaborate model which has achieved state of the art performance on the SNLI corpus (ESIM), Continuous Bag of Words (CBOW): in this model, each sentence is represented as the sum of the embedding representations of its words. This representation is passed to a deep, 3 layers, MLP. Main code for this model is in cbow.py Bi directional LSTM: in this model, the average of the states of a bidirectional LSTM RNN is used as the sentence representation. Main code for this model is in bilstm.py Enhanced Sequential Inference Model (ESIM): this is our implementation of the Chen et al.'s (2017) ESIM, without ensembling with a TreeLSTM. Main code for this model is in esim.py We use dropout for regularization in all three models. Training and Testing Training settings The models can be trained on three different settings. Each setting has its own training script. To train a model only on SNLI data, Use train_snli.py . Accuracy on SNLI's dev set is used to do early stopping. To train a model on only MultiNLI or on a mixture of MultiNLI and SNLI data, Use train_mnli.py . The optional alpha flag determines what percentage of SNLI data is used in training. The default value for alpha is 0.0, which means the model will be only trained on MultiNLI data. If alpha is a set to a value greater than 0 (and less than 1), an alpha percentage of SNLI training data is randomly sampled at the beginning of each epoch. When using SNLI training data in this setting, we set alpha 0.15. Accuracy on MultiNLI's matched dev set is used to do early stopping. To train a model on a single MultiNLI genre, Use train_genre.py . To use this training setting, you must call the genre flag and set it to a valid training genre ( travel , fiction , slate , telephone , government , or snli ). Accuracy on the dev set for the chosen genre is used to do early stopping. Additionally, logs created with this training setting contain evaulation statistics by genre. You can also train a model on SNLI with this script if you desire genre specific statistics in your logs. Command line flags To start training with any of the training scripts, there are a couple of required command line flags and an array of optional flags. The code concerning all flags can be found in parameters.py . All the parameters set in parameters.py are printed to the log file everytime the training script is launched. Required flags, model_type : there are three model types in this repository, cbow , bilstm , and cbow . You must state which model you want to use. model_name : this is your experiment name. This name will be used the prefix the log and checkpoint files. Optional flags, datapath : path to your directory with MultiNLI, and SNLI data. Default is set to ../data ckptpath : path to your directory where you wish to store checkpoint files. Default is set to ../logs logpath : path to your directory where you wish to store log files. Default is set to ../logs emb_to_load : path to your directory with GloVe data. Default is set to ../data learning_rate : the learning rate you wish to use during training. Default value is set to 0.0004 keep_rate : the hyper parameter for dropout rate. keep_rate 1 dropout rate. The default value is set to 0.5. seq_length : the maximum sequence length you wish to use. Default value is set to 50. Sentences shorter than seq_length are padded to the right. Sentences longer than seq length are truncated. emb_train : boolean flag that determines if the model updates word embeddings during training. If called, the word embeddings are updated. alpha : only used during train_mnli scheme. Determines what percentage of SNLI training data to use in each epoch of training. Default value set to 0.0 (which makes the model train on MultiNLI only). genre : only used during train_genre scheme. Use this flag to set which single genre you wish to train on. Valid genres are travel , fiction , slate , telephone , government , or snli . test : boolean used to test a trained model. Call this flag if you wish to load a trained model and test it on MultiNLI dev sets and SNLI test set. When called, the best checkpoint will be used (see section on checkpoints for more details). Dev sets are currently used for testing on MultiNLI since the test sets have not be released. Other parameters Remaining parameters like the size of hidden layers, word embeddings, and minibatch can be changed directly in parameters.py . The default hidden embedding and word embedding size is set to 300, the minibatch size ( batch_size in the code) is set to 32. Sample commands To execute all of the following sample commands, you must be in the python folder, To train on SNLI data only, here is a sample command, PYTHONPATH $PYTHONPATH:. python train_snli.py cbow petModel 0 keep_rate 0.9 seq_length 25 emb_train where the model_type flag is set to cbow and can be swapped for bilstm or esim , and the model_name flag is set to petModel 0 and can be changed to whatever you please. Similarly, to train on a mixture MultiNLI and SNLI data, here is a sample command, PYTHONPATH $PYTHONPATH:. python train_mnli.py bilstm petModel 1 keep_rate 0.9 alpha 0.15 emb_train where 15% of SNLI training data is randomly sampled at the beginning of each epoch. To train on just the travel genre in MultiNLI data, PYTHONPATH $PYTHONPATH:. python train_genre.py esim petModel 2 genre travel emb_train Testing models On dev set, To test a trained model, simply add the test flag to the command used for training. The best checkpoint will be loaded and used to evaluate the model's performance on the MultiNLI dev sets, SNLI test set, and the dev set for each genre in MultiNLI. For example, PYTHONPATH $PYTHONPATH:. python train_genre.py esim petModel 2 genre travel emb_train test With the test flag, the train_mnli.py script will also generate a CSV of predictions for the unlabaled matched and mismatched test sets. Results for unlabeled test sets, To get a CSV of predicted results for unlabeled test sets use predictions.py . This script requires the same flags as the training scripts. You must enter the model_type and model_name , and the path to the saved checkpoint and log files if they are different from the default (the default is set to ../logs for both paths). Here is a sample command, PYTHONPATH $PYTHONPATH:. python predictions.py esim petModel 1 alpha 0.15 emb_train logpath ../logs_keep ckptpath ../logs_keep This script will create a CSV with two columns: pairID and gold_label. Checkpoints We maintain two checkpoints: the most recent checkpoint and the best checkpoint. Every 500 steps, the most recent checkpoint is updated, and we test to see if the dev set accuracy has improved by at least 0.04%. If the accuracy has gone up by at least 0.04%, then the best checkpoint is updated. Annotation Tags The script which was used to determine the percentage of annotation tags is available in this repository, within the subfolder python under the name autotags.py . It takes a parsed corpus file (e.g., a dev set file) and reports the percentages of annotation tags in that file. You should also update your paths in the script to reflect your local file organization. License Copyright 2018, New York University Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the Software ), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED AS IS , WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.",Natural Language Inference,Natural Language Inference 2057,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Status: Archive (code is provided as is, no updates expected) finetune transformer lm Code and model for the paper Improving Language Understanding by Generative Pre Training Currently this code implements the ROCStories Cloze Test result reported in the paper by running: python train.py dataset rocstories desc rocstories submit analysis data_dir path to data here Note: The code is currently non deterministic due to various GPU ops. The median accuracy of 10 runs with this codebase (using default hyperparameters) is 85.8% slightly lower than the reported single run of 86.5% from the paper. The ROCStories dataset can be downloaded from the associated website .",Natural Language Inference,Natural Language Inference 2221,Natural Language Processing,Natural Language Processing,Natural Language Processing,"THANOS THANOS is a sequence model which captures the two level hierarchy within a document. The first is word level hierarchy and the second is the sentence level hierarchy. The THANOS architecture consists of a tree LSTM based word level encoder in order to obtain embedding for each sentence in the dataset, a GRU based sentence level encoder and a sentence level attention layer. However, the limitation of tree LSTM is that, it does not directly support batched computation. Therefore SPINN is used to implement Tree LSTM at word level to create sentence vectors from the word embedding. _Data explanation is as following_ 1. Yelp review dataset (raw_150k.csv) consisting of 150k reviews is in path Data/Yelp Raw Data. 2. Text review column (Data/Input Data) from raw_150k.csv is used to create parse trees (Data/Binary Tree Output) using the jar file in the path Data/Binary Tree Jar File. The jar file is created using Stanford tree parser with some NLP preprocessing tasks. 3. Binary tree output is used to create 3 pickle files (Data/Pickle File) named yelp_unk150k.pkl, yelp_parsedtree150k and vocab.pkl. yelp_parsedtree150k.pkl consits of parsed trees for the reviews. yelp_unk150k.pkl consists of list of tokens for each repective tree. The token list consist of the words in the tree in sequential order. the words with less than 5 frequency in the vocab list of the dataset is replace by 'unk' token. For example : ( ( ( ( i ( ( excepted ( a lot ) ) ( from ( this movie ) ) ) ) , ) and ) ( it ( did deliver ) ) ) . ) token list for above tree will be: 'i', 'expected', 'a', 'lot', 'from', 'this', 'movie', ',', 'and', 'it', 'did', 'deliver' vocab.pkl file consists of all the unique words in the tokens created from trees which will be used to create the dictionary of words in our dataset. _After preparing the tree, token , and vocab file we are ready to feed this data to train our model. Below are the steps_ 1. python notebook creating_train_test_dev_files.ipynb is used to create train, dev and test pickle files from yelp_parsedtree150k.pkl and yelp_unk150k.pkl files. 2. python notebook run_model.ipynb consists of commands to create the vocab json file using python file build_vocab.py and train the model using python file train.py. The commands are as below: _How to execute the model_ Open the jupyter notebook and run all the cells of python notebook creating_train_test_dev_files.ipynb. %run build_vocab.py data_dir Data/Pickle File (from jupyter notebook) %run train.py data_dir Data model_dir experiments/base_model (from jupyter notebook)",Natural Language Inference,Natural Language Inference 2243,Natural Language Processing,Natural Language Processing,Natural Language Processing,"dataset: SNLI DATA pre train model > GLOVE Model: glove.840B.300d.txt. from > Reference paper: 1. Enhanced LSTM for Natural Language Inference, > 2. Learning Natural Language Inference with LSTM, >chrome extension://oemmndcbldboiebfnladdacbdfmadadm/ 3. An overview of Natural Language Inference Data Collection, >chrome extension://oemmndcbldboiebfnladdacbdfmadadm/ 4. Supervised Learning of Universal Sentence Representations from Natural Language Inference Data, > precomputed_glove.weights.npy, >",Natural Language Inference,Natural Language Inference 2405,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Folders: Raw (raw data) final_matches.csv is matches collected from early groupsolver studies (SQL) documentdb_matches.csv is matches collected from studies done more recently (Azure documentdb) Data 1. In order to get data ready for training run get_clean_data.R. Output file train.csv 2. To get word embeddings for words in the training data, run get_word2vec.py. Output file embeddings.p 3. To get features that are used in training run get_features.py (works in runtime) Output file features.csv Uses functions from scripts folder 4. To get indico sentiment features: a. Run get_indico_sentiment_hq.py to get sentiment lookup table for all statements in training data Output file: indico_sentiment_hq_key.csv (function currently broken) b. Run get_sentiment_features.py to get sentiment features for every row in the training data Output file: sentiment_features.py 5. stopwords.json contains a list of stopwords (words that we want to remove before training because the statement does not add to the sentence meaning) A couple of other considerations include using corrected data (some matches are incorrectly labeled. CST has fixed a few statements, but not all), data that has split statements removed, and the universal sentence encoder. 5. For corrections: a. Batches of pairs to correct folder includes scripts for producing data for CST that has not already been corrected b. corrections.csv are the current corrections we have c. Run data_corrector.R to get training data with corrections. Output file corrected_train.csv 6. For split statements removed: a. Run statement_splitter.py in order to get split data for the training data. Output file statement_splits.json b. Run get_split_corrected_data.py to get data that was not split Output file train_splits_removed.csv 7. For the universal sentence encoder (better than simple average sentence embeddings): a. Run get_use_embedding_features.py to get a dictionary of the embeddings for each sentence in the training set. Output file use_embeddings.p b. Run get_use_embedding_features.py to get the features using use instead of word2vec embeddings. Output file use_features.csv Analysis: Python notebooks. Does a basic analysis on the raw documentdb data and another analysis on the training sets with splits removed vs. not removed. Source 4 general model frameworks considered. Used logistic regression to do a simple evaluation of the features Feed forward network is what we use now. (There's actually a lot of room for improvement using the other models, especially when focusing on log loss) use_feed_forward is a feed forward network using the universal sentence encoder. Outperforms the regular feed forward (which uses word2vec) Decomposable Attention model (from a paper by google This one uses word2vec with a more complicated network. Also outperforms regular feed forward network Logistic regression (with regularization to improve performance) 4 tests 1. train.py runs vanilla logistic regression with regularization 2. train_use.py uses the universal sentence encoder (improved performance) 3. train_non_split_data.py trains using the training set with split statements removed (improvement) 4. train_test_sentiment.py Tests using sentiment as well as other features (marginal improvement) Feed forward network Since there are many hyperparameters in neural networks, folder is designed to test different hyperparameters 1. generate_hpyerparameters.py genderates a random set of hyperparameters to use for testing 2. feed_forward_nn.py is the neural network code 3. train.py tests the neural network using the hyperparameters from hyperparameters.json. 4. test_results.csv shows how well each set of hyperparameters did use_feed_forward Same as feed forward but uses universal sentence encoded embeddings Didn't go through the work to look at hyperparameters (used a set of default hyperparameters) Decomposable Attention decomposable_attention.py is the code for the decomposable_attention network Also didn't go through the work for hyperparameters Scripts Helper functions commonly used in other scripts 1. calc_features.py (didn't use as much) 2. docdb_query: used to get data from document_db 3. function.py: Several functions uses for preprocessing and training models, decriptions of function in comments in file 4. fuzzy_features.py: types of text features called fuzzy features that seem to improve model performance 5. sentiment.py: Used to help get sentiment features 6. simple_features.py: Used to calculate simple text features (e.g. text length differences, common word count, etc.) 7. use_embeddings.py Loads univeral sentence encoder model 8. wmd.py: calculates word mover's distance 9. word2vec_features.py: Used to calculate word embedding features (can be used for universal sentence encoded sentences as well)",Natural Language Inference,Natural Language Inference 2439,Natural Language Processing,Natural Language Processing,Natural Language Processing,"NOTE: This codebase is under active development. To exactly reproduce the experiments published in ACL 2016, use this release 7 . For the most recent version, see the NYU fork . Stack augmented Parser Interpreter Neural Network This repository contains the source code described in our paper A Fast Unified Model for Sentence Parsing and Understanding 1 . For a more informal introduction to the ideas behind the model, see this Stanford NLP blog post 8 . There are three separate implementations available: A Python/Theano implementation of SPINN using a naïve stack representation (named fat stack ) A Python/Theano implementation of SPINN using the thin stack representation described in our paper A C++/CUDA implementation of the SPINN feedforward, used for performance testing Python code The Python code lives, quite intuitively, in the python folder. We used this code to train and test the SPINN models before publication. There is one enormous difference in the fat and thin stack implementations: fat stack uses Theano's automatically generated symbolic backpropagation graphs, while thin stack generates its own optimal backpropagation graph. This makes thin stack oodles faster than its brother, but we have not yet implemented all SPINN variants to support this custom backpropagation. Installation Requirements: Python 2.7 CUDA > 7.0 CuDNN v4 (v5 is not compatible with our Theano fork) Install all required Python dependencies using the command below. ( WARNING: This installs our custom Theano fork. We recommend installing in a virtual environment in order to avoid overwriting any stock Theano install you already have.) pip install r python/requirements.txt We use a modified version of Theano 3 in order to support fast forward and backward prop in thin stack . While it isn't absolutely necessary to use this hacked Theano, it greatly improves thin stack performance. Alternatively, you can use a custom Docker image that we've prepared, as discussed in this CodaLab worksheet . Running the code The easiest way to launch a train/test run is to use one of the checkpoints directory . The Bash scripts in this directory will download the necessary data and launch train/test runs of all models reported in our paper. You can run any of the following scripts: ./checkpoints/spinn.sh ./checkpoints/spinn_pi.sh ./checkpoints/spinn_pi_nt.sh ./checkpoints/rnn.sh All of the above scripts will by default launch a training run beginning with the recorded parameters of our best models. You can change their behavior using the arguments below: $ ./checkpoints/spinn.sh h spinn.sh h e t s run a train or test run of a SPINN model where: h show this help text e run in eval only mode (evaluates on dev set by default) t evaluate on test set s skip the checkpoint loading; run with a randomly initialized model To evaluate our best SPINN PI NT model on the test set, for example, run $ ./checkpoints/spinn_pi_nt.sh e t Running command: python m spinn.models.fat_classifier data_type snli embedding_data_path ../glove/glove.840B.300d.txt log_path ../logs training_data_path ../snli_1.0/snli_1.0_train.jsonl experiment_name spinn_pi_nt expanded_eval_only eval_data_path ../snli_1.0/snli_1.0_test.jsonl ckpt_path spinn_pi_nt.ckpt_best batch_size 32 embedding_keep_rate 0.828528124124 eval_seq_length 50 init_range 0.005 l2_lambda 3.45058959758e 06 learning_rate 0.000297682444894 model_dim 600 model_type Model0 noconnect_tracking_comp num_sentence_pair_combination_layers 2 semantic_classifier_keep_rate 0.9437038157 seq_length 50 tracking_lstm_hidden_dim 57 use_tracking_lstm word_embedding_dim 300 ... 1 Checkpointed model was trained for 156500 steps. 1 Building forward pass. 1 Writing eval output for ../snli_1.0/snli_1.0_test.jsonl. 1 Written gold parses in spinn_pi_nt snli_1.0_test.jsonl parse.gld 1 Written predicted parses in spinn_pi_nt snli_1.0_test.jsonl parse.tst 1 Step: 156500 Eval acc: 0.808734 0.000000 ../snli_1.0/snli_1.0_test.jsonl Custom model configurations The main executable for the SNLI experiments in the paper is fat_classifier.py , whose flags specify the hyperparameters of the model. You may also need to set Theano flags through the THEANO_FLAGS environment variable, which specifies compilation mode (set it to fast_compile during development, and delete it to use the default state for longer runs), device , which can be set to cpu or gpu , and cuda.root , which specifies the location of CUDA when running on GPU. floatX should always be set to float32 . Here's a sample command that runs a fast, low dimensional CPU training run, training and testing only on the dev set. It assumes that you have a copy of SNLI available locally. PYTHONPATH spinn/python \ THEANO_FLAGS optimizer fast_compile,device cpu,floatX float32 \ python2.7 m spinn.models.fat_classifier data_type snli \ training_data_path snli_1.0/snli_1.0_dev.jsonl \ eval_data_path snli_1.0/snli_1.0_dev.jsonl \ embedding_data_path spinn/python/spinn/tests/test_embedding_matrix.5d.txt \ word_embedding_dim 5 model_dim 10 For full runs, you'll also need a copy of the 840B word 300D GloVe word vectors . C++ code The C++ code lives in the cpp folder. This code implements a basic SPINN feedforward. (This implementation corresponds to the bare SPINN PI NT, parsed input / no tracking model, described in the paper.) It has been verified to produce the exact same output as a recursive neural network with the same weights and inputs. (We used a simplified version of Ozan Irsoy's deep recursive project 5 as a comparison.) The main binary, stacktest , simply generates random input data and runs a feedforward. It outputs the total feedforward time elapsed and the numerical result of the feedforward. Dependencies The only external dependency of the C++ code is CUDA > 7.0. The tests depend on the googletest library 4 , included in this repository as a Git submodule. Installation First install CUDA > 7.0 and ensure that nvcc is on your PATH . Then: From project root cd cpp Pull down Git submodules (libraries) git submodule update init Compile C++ code make stacktest make rnntest This should generate a binary in cpp/bin/stacktest . Running The binary cpp/bin/stacktest runs on random input data. You can time the feedforward yourself by running the following commands: From project root cd cpp BATCH_SIZE 512 ./bin/stacktest You can of course set BATCH_SIZE to whatever integer you desire. The other model architecture parameters are fixed in the code, but you can easily change them as well on this line 6 if you desire. Baseline RNN The binary cpp/bin/rnntest runs a vanilla RNN (ReLU activations) with random input data. You can run this performance test script as follows: From project root cd cpp BATCH_SIZE 512 ./bin/rnntest License Copyright 2018, Stanford University Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the Software ), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED AS IS , WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 1 : 2 : 3 : 4 : 5 : 6 : 7 : 8 :",Natural Language Inference,Natural Language Inference 2509,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Directional Self Attention Network This repo is the codes of DiSAN: Directional Self Attention Network for RNN/CNN free Language Understanding . This is python based codes implementation under tensorflow 1.2 DL framework. The leaderboard of Stanford Natural Language Inference is available here . Please contact Tao Shen (Tao.Shen@student.uts.edu.au) or open an issue for questions/suggestions. Cite this paper using BibTex: @inproceedings{shen2018disan, Author {Shen, Tao and Zhou, Tianyi and Long, Guodong and Jiang, Jing and Pan, Shirui and Zhang, Chengqi}, Booktitle {AAAI Conference on Artificial Intelligence}, Title {DISAN: Directional self attention network for rnn/cnn free language understanding}, Year {2018} } Overall Requirements Python3 (verified on 3.5.2, or Anaconda3 4.2.0) tensorflow> 1.2 Python Packages: numpy This repo includes three part as follows: 1. Directionnal Self Attention Network independent file > file disan.py 2. DiSAN implementation for Stanford Natural Language Inference > dir SNLI_disan 3. DiSAN implementation for Stanford Sentiment Classification > dir SST_disan __The Usage of disan.py will be introduced below, and as for the implementation of SNLI and SST, please enter corresponding folder for further introduction.__ __And, Code for the other experiments (e.g. SICK, MPQA, CR etc.) appeared in the paper is under preparation.__ Usage of disan.py Parameters: param rep\_tensor : 3D tensorflow dense float tensor batch\_size, max\_len, dim param rep\_mask : 2D tensorflow bool tensor as mask for rep\_tensor, batch\_size, max\_len param scope : tensorflow variable scope param keep\_prob : float, dropout keep probability param is\_train : tensorflow bool scalar param wd : if wd>0, add related tensor to tf collectoion reg_vars for further l2 decay param activation : disan activation function elu relu selu param tensor\_dict : a dict to record disan internal attention result (insignificance) param name : record name suffix (insignificance) Output: 2D tensorflow dense float tensor, which shape is batch\_size, dim as the encoding result for each sentence. Acknowledgements Some basic neural networks are copied from Minjoon's Repo , including RNN cell, dropout able dynamic RNN etc.",Natural Language Inference,Natural Language Inference 2647,Natural Language Processing,Natural Language Processing,Natural Language Processing,BiMPM_PyTorch Implementation of Bilateral Multi Perspective Matching (BiMPM) in PyTorch.,Natural Language Inference,Natural Language Inference 2706,Natural Language Processing,Natural Language Processing,Natural Language Processing,"BREXIT APPLICATION The application analyses the sentiment of tweets with the hashtag brexit . The crucial step is an appropriate encoding of the sentences. The encoder is taken from facebookresearch. Please see the reference below. Once the encoder has produced sentence embeddings a simple log. regression predicts the sentiment of any given tweet. Accuracy is at about 75%. Could potentially be much higher with a better but mainly larger labelled dataset. FACEBOOK RESEARCH Sentence Encoder that is used to produce sentence embeddings: Supervised Learning of Universal Sentence Representations from Natural Language Inference Data (EMNLP 2017, Outstanding Paper Award) A. Conneau, D. Kiela, H. Schwenk, L. Barrault, A. Bordes, Supervised Learning of Universal Sentence Representations from Natural Language Inference Data @article{conneau2017supervised, title {Supervised Learning of Universal Sentence Representations from Natural Language Inference Data}, author {Conneau, Alexis and Kiela, Douwe and Schwenk, Holger and Barrault, Loic and Bordes, Antoine}, journal {arXiv preprint arXiv:1705.02364}, year {2017} } Contact: aconneau@fb.com (mailto:aconneau@fb.com)",Natural Language Inference,Natural Language Inference 2779,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Multi task Learning with Sample Re weighting for Machine Reading Comprehension This PyTorch package implements the Multi Task Stochastic Answer Network (MT SAN) for Machine Reading Comprehension, as described in: Yichong Xu, Xiaodong Liu, Yelong Shen, Jingjing Liu and Jianfeng Gao Multi task Learning with Sample Re weighting for Machine Reading Comprehension North American Chapter of the Association for Computational Linguistics (NAACL), 2019 arXiv version Please cite the above paper if you use this code. Results We report single model results produced by this package as follows. Dataset EM/F1 SQuAD v1.1 (Rajpurkar et al., 2016) 81.6 / 88.2 (development set) NewsQA (Trischler et al., 2016) 59.9 / 72.6 (vs. 46.5/69.4 Human Performance) Quickstart Install via pip: 1. python3.6 2. install requirements > pip install r requirements.txt Use docker: 1. pull docker: We can use the same docker as mt dnn > docker pull allenlao/pytorch mt dnn:v0.1 2. run docker > docker run it rm runtime nvidia allenlao/pytorch mt dnn:v0.1 bash Please refer to the following link if you first use docker: Train a MT SAN Model 1. prepare data > ./prepare_data.sh 2. train a model: See example codes in run.sh Pretrained Models You can download pretrained models with the best SQuAD performance here . Notes and Acknowledgments The code is developed based on the original SAN code: Related: MT DNN by yichongx@cs.cmu.edu",Natural Language Inference,Natural Language Inference 1721,Computer Vision,Computer Vision,Computer Vision,"Pedestrian Alignment Network for Person Re identification This repo is for our IEEE TCSVT paper . The main idea is to align the pedestrian within the bboxes, and reduce the noisy factors, i.e., scale and pose variances. Network Structure ! For more details, you can see this png file . But it is low solution now, and I may replace it recently. Installation 1.Clone this repo. git clone cd Pedestrian_Alignment mkdir data 2.Download the pre trained model. Put it into './data'. cd data wget 3.Compile Matconvnet (Note that I have included my Matconvnet in this repo, so you do not need to download it again. I have changed some codes comparing with the original version. For example, one of the difference is in /matlab/+dagnn/@DagNN/initParams.m . If one layer has params, I will not initialize it again, especially for pretrained model.) You just need to uncomment and modify some lines in gpu_compile.m and run it in Matlab. Try it (The code does not support cudnn 6.0. You may just turn off the Enablecudnn or try cudnn5.1) If you fail in compilation, you may refer to Dataset Download Market1501 Dataset For training CUHK03, we follow the new evaluation protocol in the CVPR2017 paper . It conducts a multi shot person re ID evaluation and only needs to run one time. Train 1. Add your dataset path into prepare_data.m and run it. Make sure the code outputs the right image path. 2. uncomment Run train_id_net_res_market_new.m to pretrain the base branch. 3. comment Run train_id_net_res_market_align.m to finetune the whole net. Test 1. Run test/test_gallery_stn_base.m and test/test_gallery_stn_align.m to extract the image features from base branch and alignment brach. Note that you need to change the dir path in the code. They will store in a .mat file. Then you can use it to do the evaluation. 2. Evaluate feature on the Market 1501. Run evaluation/zzd_evaluation_res_faster.m . You can get a Single query Result around the following result. Methods Rank@1 mAP Ours 82.81% 63.35% You may find our trained model at GoogleDrive Visualize Results We conduct an extra interesting experiment: When zooming in the input image (adding scale variance), how does our alignment network react? We can observe a robust transform on the output image (focusing on the human body and keeping the scale). The left image is the input; The right image is the output of our network. ! ! ! ! ! ! Citation Please cite this paper in your publications if it helps your research: @article{zheng2017pedestrian, title {Pedestrian Alignment Network for Large scale Person Re identification}, author {Zheng, Zhedong and Zheng, Liang and Yang, Yi}, doi {10.1109/TCSVT.2018.2873599}, journal {IEEE Transactions on Circuits and Systems for Video Technology}, year {2018} } Acknowledge Thanks for the suggestions from Qiule Sun.",Person Re-Identification,Person Re-Identification 1722,Computer Vision,Computer Vision,Computer Vision,"Person reID_GAN This repository contains the code for our paper Unlabeled Samples Generated by GAN Improve the Person Re identification Baseline in vitro . ! 1.Unsupervised Learning (GAN) The first stage is to generate fake images by DCGAN. We used the code provided in and modify some hyper parameters at You can directly use my forked code. For more reference, you can find our modified training code and generating code in ./DCGAN . We wrote a detailed README . If you still has some question, feel free to contact me (zdzheng12@gmail.com). 2.Semi supervised Learning The second stage is to combine the original data and generated data to train the network. This repos includes the baseline code and the three different methods in the paper. Models Reference resnet52_market.m ResNet50 baseline resnet52_market_K_1.m One extra class for generated images resnet52_market_lsro.m The proposed method, uniform probability resnet52_market_pseudo.m Give the most likely label for generated images You can find more detailed code for proposed loss in forward code backward code . (We write the label smooth loss first and then extend it to LSRO. Here we also provide a brief illustration .) Pseudo label is realized in Compile Matconvnet (Note that I have included my Matconvnet in this repo, so you do not need to download it again. I has changed some codes comparing with the original version. For example, one of the difference is in /matlab/+dagnn/@DagNN/initParams.m . If one layer has params, I will not initialize it again, especially for pretrained model.) You just need to uncomment and modify some lines in gpu_compile.m and run it in Matlab. Try it (The code does not support cudnn 6.0. You may just turn off the Enablecudnn or try cudnn5.1) If you fail in compilation, you may refer to Dataset Download Market1501 Dataset We take Market1501 as an example in this repos and you can easily extend it to other datasets. ImageNet Pretrained model 1. Make a dir called data by typing mkdir ./data . 2. Download ResNet 50 model pretrained on Imagenet. Put it in the data dir. Train the Baseline code 1. Add your dataset path into prepare_data.m and run it. Make sure the code outputs the right image path. 2. Run train_id_net_res_market_new.m . Train with generated data 1. Add your generated data path into prepare_data_gan.m and run it. It will add generated image path into the original image database. 2. Run train_id_net_res_market_K_1.m for training extra class method. Or run train_id_net_res_market_lsro.m for training the proposed method. Or run train_id_net_res_market_pseudo.m for training the pseudo label method. (What's new: I also include train_id_net_res_2stream_gan.m for training the code with the method proposed in my another paper. I do not import all files, and you may find the missing code in ) Test 1. Run test/test_gallery_query_crazy.m to extract the features of images in the gallery and query set. They will store in a .mat file. Then you can use it to do evaluation. 2. Evaluate feature on the Market 1501. Run evaluation/zzd_evaluation_res_faster.m . Citation Please cite this paper in your publications if it helps your research: @inproceedings{zheng2017unlabeled, title {Unlabeled Samples Generated by GAN Improve the Person Re identification Baseline in vitro}, author {Zheng, Zhedong and Zheng, Liang and Yang, Yi}, booktitle {Proceedings of the IEEE International Conference on Computer Vision}, year {2017} } Related Repos 1. 2stream Person re ID 2. Pedestrian Alignment Network 3. MpRL Person re ID",Person Re-Identification,Person Re-Identification 1723,Computer Vision,Computer Vision,Computer Vision,"Person_reID_baseline_pytorch Language grade: Python Build Status Total alerts License: MIT A tiny, friendly, strong baseline code for Person reID (based on pytorch ). Strong. It is consistent with the new baseline result in several top conference works, e.g., Beyond Part Models: Person Retrieval with Refined Part Pooling(ECCV18) and Camera Style Adaptation for Person Re identification(CVPR18) . We arrived Rank@1 88.24%, mAP 70.68% only with softmax loss. Small. With fp16, our baseline could be trained with only 2GB GPU memory. Friendly. You may use the off the shelf options to apply many state of the art tricks in one line. Besides, if you are new to person re ID, you may check out our Tutorial first (8 min read) :+1: . ! Table of contents Features ( features) Some News ( some news) Trained Model ( trained model) Prerequisites ( prerequisites) Getting Started ( getting started) Installation ( installation) Dataset Preparation ( dataset preparation) Train ( train) Test ( test) Evaluation ( evaluation) Tips for training with other datasets ( tips) Citation ( citation) Related Repos ( related repos) Features Now we have supported: Float16 to save GPU memory based on apex Part based Convolutional Baseline(PCB) Multiple Query Evaluation Re Ranking Random Erasing ResNet/DenseNet Visualize Training Curves Visualize Ranking Result Here we provide hyperparameters and architectures, that were used to generate the result. Some of them (i.e. learning rate) are far from optimal. Do not hesitate to change them and see the effect. P.S. With similar structure, we arrived Rank@1 87.74% mAP 69.46% with Matconvnet . (batchsize 8, dropout 0.75) You may refer to Here . Different framework need to be tuned in a different way. Some News What's new: FP16 has been added. It can be used by simply added fp16 . You need to install apex and update your pytorch to 1.0. Float16 could save about 50% GPU memory usage without accuracy drop. Our baseline could be trained with only 2GB GPU memory. bash python train.py fp16 What's new: Visualizing ranking result is added. bash python prepare.py python train.py python test.py python demo.py query_index 777 What's new: Multiple query Evaluation is added. The multiple query result is about Rank@1 91.95% mAP 78.06% . bash python prepare.py python train.py python test.py multi python evaluate_gpu.py What's new: PCB is added. You may use ' PCB' to use this model. It can achieve around Rank@1 92.73% mAP 78.16% . I used a GPU (P40) with 24GB Memory. You may try apply smaller batchsize and choose the smaller learning rate (for stability) to run. (For example, batchsize 32 lr 0.01 PCB ) bash python train.py PCB batchsize 64 name PCB 64 python test.py PCB name PCB 64 What's new: You may try evaluate_gpu.py to conduct a faster evaluation with GPU. What's new: You may apply ' use_dense' to use DenseNet 121 . It can arrive around Rank@1 89.91% mAP 73.58%. What's new: Re ranking is added to evaluation. The re ranked result is about Rank@1 90.20% mAP 84.76% . What's new: Random Erasing is added to train. What's new: I add some code to generate training curves. The figure will be saved into the model folder when training. ! Trained Model I re trained several models, and the results may be different with the original one. Just for a quick reference, you may directly use these models. The download link is Here . Methods Rank@1 mAP Reference ResNet 50 88.84% 71.59% python train.py train_all DenseNet 121 90.17% 74.02% python train.py name ft_net_dense use_dense train_all PCB 92.64% 77.47% python train.py name PCB PCB train_all lr 0.02 ResNet 50 (fp16) 88.03% 71.40% python train.py name fp16 fp16 train_all Model Structure You may learn more from model.py . We add one linear layer(bottleneck), one batchnorm layer and relu. Prerequisites Python 3.6 GPU Memory > 6G Numpy Pytorch 0.3+ Optional apex (for float16) Optional pretrainedmodels (Some reports found that updating numpy can arrive the right accuracy. If you only get 5080 Top1 Accuracy, just try it.) We have successfully run the code based on numpy 1.12.1 and 1.13.1 . Getting started Installation Install Pytorch from Install Torchvision from the source git clone cd vision python setup.py install Optinal You may skip it. Install apex from the source git clone cd apex python setup.py install cuda_ext cpp_ext Because pytorch and torchvision are ongoing projects. Here we noted that our code is tested based on Pytorch 0.3.0/0.4.0/0.5.0/1.0.0 and Torchvision 0.2.0/0.2.1 . Dataset & Preparation Download Market1501 Dataset Preparation: Put the images with the same id in one folder. You may use bash python prepare.py Remember to change the dataset path to your own path. Futhermore, you also can test our code on DukeMTMC reID Dataset . Our baseline code is not such high on DukeMTMC reID Rank@1 64.23%, mAP 43.92% . Hyperparameters are need to be tuned. Train Train a model by bash python train.py gpu_ids 0 name ft_ResNet50 train_all batchsize 32 data_dir your_data_path gpu_ids which gpu to run. name the name of model. data_dir the path of the training data. train_all using all images to train. batchsize batch size. erasing_p random erasing probability. Train a model with random erasing by bash python train.py gpu_ids 0 name ft_ResNet50 train_all batchsize 32 data_dir your_data_path erasing_p 0.5 Test Use trained model to extract feature by bash python test.py gpu_ids 0 name ft_ResNet50 test_dir your_data_path batchsize 32 which_epoch 59 gpu_ids which gpu to run. batchsize batch size. name the dir name of trained model. which_epoch select the i th model. data_dir the path of the testing data. Evaluation bash python evaluate.py It will output Rank@1, Rank@5, Rank@10 and mAP results. You may also try evaluate_gpu.py to conduct a faster evaluation with GPU. For mAP calculation, you also can refer to the C++ code for Oxford Building . We use the triangle mAP calculation (consistent with the Market1501 original code). re ranking bash python evaluate_rerank.py It may take more than 10G Memory to run. So run it on a powerful machine if possible. It will output Rank@1, Rank@5, Rank@10 and mAP results. Tips Notes the format of the camera id and the number of cameras. For some dataset, e.g., MSMT17, there are more than 10 cameras. You need to modify the prepare.py and test.py to read the double digit camera ID. For some vehicle re ID datasets. e.g. VeRi, you also need to modify the prepare.py and test.py . It has different naming rules. (Sorry. It is in Chinese) Citation As far as I know, the following papers may be the first two to use the bottleneck baseline. You may cite them in your paper. @article{DBLP:journals/corr/SunZDW17, author {Yifan Sun and Liang Zheng and Weijian Deng and Shengjin Wang}, title {SVDNet for Pedestrian Retrieval}, booktitle {ICCV}, year {2017}, } @article{hermans2017defense, title {In Defense of the Triplet Loss for Person Re Identification}, author {Hermans, Alexander and Beyer, Lucas and Leibe, Bastian}, journal {arXiv preprint arXiv:1703.07737}, year {2017} } Related Repos 1. Pedestrian Alignment Network 2. 2stream Person re ID 3. Pedestrian GAN 4. Language Person Search",Person Re-Identification,Person Re-Identification 1823,Computer Vision,Computer Vision,Computer Vision,"NetVLAD pytorch Pytorch implementation of NetVLAD & Online Hardest Triplet Loss. In NetVLAD, broadcasting is used to calculate residuals of clusters and it makes whole calculation time much faster. NetVLAD: In Defense of the Triplet Loss for Person Re Identification: Usage import torch import torch.nn as nn from torch.autograd import Variable from netvlad import NetVLAD from netvlad import EmbedNet from hard_triplet_loss import HardTripletLoss from torchvision.models import resnet18 Discard layers at the end of base network encoder resnet18(pretrained True) base_model nn.Sequential( encoder.conv1, encoder.bn1, encoder.relu, encoder.maxpool, encoder.layer1, encoder.layer2, encoder.layer3, encoder.layer4, ) dim list(base_model.parameters()) 1 .shape 0 last channels (512) Define model for embedding net_vlad NetVLAD(num_clusters 32, dim dim, alpha 1.0) model EmbedNet(base_model, net_vlad).cuda() Define loss criterion HardTripletLoss(margin 0.1).cuda() This is just toy example. Typically, the number of samples in each classes are 4. labels torch.randint(0, 10, (40, )).long() x torch.rand(40, 3, 128, 128).cuda() output model(x) triplet_loss criterion(output, labels)",Person Re-Identification,Person Re-Identification 2139,Computer Vision,Computer Vision,Computer Vision,"Incremental Learning in Person Re Identification This repository contains code for our research. Paper can be found here, arXiv Getting started 1. cd /PATH_NAME 2. Run git clone 3. Install the specified dependencies, to install use pip3 install requirements.txt 4. Follow the below mentioned steps for preparation of dataset and performing training Prerequisites: OS: Linux/MacOS Pytorch> 0.3 Install Dependencies Installing Pytorch (pytorch.org) Datasets Market 1501 Duke MTMC Dataset structure This is the recommended file structure which was used For preparation of Market1501 + Market1501 + bounding_box_test + bounding_box_train ....... For preparation of Duke MTMC + dukemtmc reid + DukeMTMC reID + bounding_box_test + bounding_box_train ............. Covariance loss metric has been added to all the modules. You're required to change the flags as per phase as described in paper Create a directory named as data/ and use the standard directory structure. For training on Market1501: $ python covariance_market1501.py For training on Duke MTMC. $ python covariance_duke.py To use ensembling and training, use $ python covariance_ensembling.py In this case, you'll have to specify amongst which pipelines do you want to perform ensembling. If you get better results, please file a PR. To perform training: While executing make sure to correctly carry out training (Phase 1 and Phase 2) properly as mentioned to achieve incremental learning When training, log file would be created in the /log directory. Results: No. Dataset Rank 1 Rank 20 maP : : : : 1 Market1501 89.3% 98.3% 71.8% 2 DukeMTMC 80.0% 93.7% 60.2% 3 Market1501 70.2% 92.4% 41.2% Takes around 8 9 hours to train the model for 950 epochs (convergence is usually achieved) Models We used a ResNet50 along with different architecture of pipelines. We have used hybrid_convnet2 . You are required to change the dimensions of the FC layer as per number of classes manually. To resume training $ mkdir saved_models Then specify this as per dir structure in the main module SAVED_MODEL_PATH 'saved_models/p1.pth.tar' checkpoint torch.load(SAVED_MODEL_PATH) model.load_state_dict(checkpoint 'state_dict' ) For evaluation $ python evaluation.py Make sure to set the dataset and path of the models correctly, and also which pipeline to use for evaluation Citation: Please cite this, if you use our work @misc{bhargava2018incremental, title {Incremental Learning in Person Re Identification}, author {Prajjwal Bhargava}, year {2018}, eprint {1808.06281}, archivePrefix {arXiv}, primaryClass {cs.CV} }",Person Re-Identification,Person Re-Identification 2226,Computer Vision,Computer Vision,Computer Vision,"Random Erasing Data Augmentation This code has the source code for the paper Random Erasing Data Augmentation . If you find this code useful in your research, please consider citing: @article{zhong2017random, title {Random Erasing Data Augmentation}, author {Zhong, Zhun and Zheng, Liang and Kang, Guoliang and Li, Shaozi and Yang, Yi}, journal {arXiv preprint arXiv:1708.04896}, year {2017} } Thanks for Marcus D. Bloice , Marcus D. Bloice reproduces our method in Augmentor . Augmentor is an image augmentation library in Python for machine learning. Original image Random Erasing ! Original ! Original Other re implementations \ Python Augmentor\ \ CamStyle\ \ Keras\ \ Person_reID_baseline + Random Erasing + Re ranking\ Installation Requirements for Pytorch (see Pytorch installation instructions) Examples: CIFAR10 ResNet 20 baseline on CIFAR10: python cifar.py dataset cifar10 arch resnet depth 20 ResNet 20 + Random Erasing on CIFAR10: python cifar.py dataset cifar10 arch resnet depth 20 p 0.5 CIFAR100 ResNet 20 baseline on CIFAR100: python cifar.py dataset cifar100 arch resnet depth 20 ResNet 20 + Random Erasing on CIFAR100: python cifar.py dataset cifar100 arch resnet depth 20 p 0.5 Fashion MNIST ResNet 20 baseline on Fashion MNIST: python fashionmnist.py dataset fashionmnist arch resnet depth 20 ResNet 20 + Random Erasing on Fashion MNIST: python fashionmnist.py dataset fashionmnist arch resnet depth 20 p 0.5 Other architectures For ResNet: arch resnet depth (20, 32, 44, 56, 110) For WRN: arch wrn depth 28 widen factor 10 Our results You can reproduce the results in our paper: CIFAR10 CIFAR10 CIFAR100 CIFAR100 Fashion MNIST Fashion MNIST Models Base. +RE Base. +RE Base. +RE ResNet 20 7.21 6.73 30.84 29.97 4.39 4.02 ResNet 32 6.41 5.66 28.50 27.18 4.16 3.80 ResNet 44 5.53 5.13 25.27 24.29 4.41 4.01 ResNet 56 5.31 4.89 24.82 23.69 4.39 4.13 ResNet 110 5.10 4.61 23.73 22.10 4.40 4.01 WRN 28 10 3.80 3.08 18.49 17.73 4.01 3.65 NOTE THAT, if you use the latest released Fashion MNIST, the performance will slightly lower than the results reported in our paper. Please refer to the issue . If you have any questions about this code, please do not hesitate to contact us. Zhun Zhong Liang Zheng",Person Re-Identification,Person Re-Identification 2312,Computer Vision,Computer Vision,Computer Vision,"Parameter Free Spatial Attention Network for Person Re Identification This is the implementation of the arxiv paper Parameter Free Spatial Attention Network for Person Re Identification . We propose a modification to the global average pooling called spatial attention which shows a consistent improvement in the generic classfication tasks. Currently the experiments are only conducted on the Person ReID tasks (which is formulated into a fine grained classification problem). Our code is mainly based on PCB . Network ! The proposed architecture formulates the task as a classification . It consists of four components. The yellow region represents the backbone feature extractor. The red region represents the deeply supervised branches (DS). The blue region represents six part classifiers (P). The two green region represents two sets of spatial attention layers (SA), SA1 is not used for the main results. It only appears in the ablation study. Then the total loss is the summation over all deep supervision losses, six part losses and the loss from the backbone. Note that the spatial attention is only added before GAP. Preparation Prerequisite: Python 2.7 and Pytorch 0.4.0(we run the code under version 0.4.0, maybe versions < 0.4.0 also work.) Dataset Market 1501 (password: 1ri5) Training&Testing if you are going to train on the dataset of market 1501, run training: python2 main.py d market b 48 j 4 epochs 100 log logs/market/ combine trainval step size 40 data dir Market 1501 also, you can just download a trained weight file from BaiduYun (password: wwjv), and put it into model folder, which should be like 'model/checkpoint.pth.tar', then run testing: python2 main.py d market b 48 j 4 log logs/market/ combine trainval step size 40 data dir Market 1501 resume ./model/checkpoint.pth.tar evaluate Results ! We achieved the state of the art on four benchmarks as is shown in Table 1 (11. Nov. 2018). ! Here we show 6 examples to compare the class activation maps ( CAM ) of plain GAP and GAP with SA. From left to right are the original image, the CAM from plain GAP and the CAM from GAP with SA. We see that the highlighted area from plain GAP is always concentrated to some parts of the object, which may suffer from the absence of that feature due to some occlusion and view point changing. With the help of the spatial attention, the focus of the model is distributed all over the image, providing the classifier more details of the object, which increases the model robustness. Ablation Study In order to demonstrate the effectiveness of the spatial attention layer. We are now working on more examples for the ablation study. Each example inside the folder ablation is independent of the rest of the snippets. Person Re ID: Besides the ones in the paper, we uploaded another example for the ablation of the SA for the backbone model on Market 1501. Random erasing is cut off for the simplicity. The training epoch is set as 60. python2 main.py d market b 48 j 4 epochs 60 log logs/market/ feature 256 height 384 width 128 combine trainval step size 40 data dir Market 1501 Classification: Cifar 100: Citiaion Please cite the paper if it helps your research: @article{wang2018parameter, title {Parameter Free Spatial Attention Network for Person Re Identification}, author {Wang, Haoran and Fan, Yue and Wang, Zexin and Jiao, Licheng and Schiele, Bernt}, journal {arXiv preprint arXiv:1811.12150}, year {2018} }",Person Re-Identification,Person Re-Identification 2321,Computer Vision,Computer Vision,Computer Vision,"Multiple Granularity Network Reproduction of paper: Learning Discriminative Features with Multiple Granularities for Person Re Identification Dependencies Python > 3.5 PyTorch > 0.4.0 TorchVision Matplotlib Argparse Sklearn Pillow Numpy Scipy Tqdm Train Prepare training data Download Market1501 training data. here Begin to train In the demo.sh file, add the Market1501 directory to datadir run sh demo.sh Result mAP rank1 rank3 rank5 rank10 : : : : : : : : : : : : 2018 7 22 92.17 94.60 96.53 97.06 98.01 2018 7 24 93.53 95.34 97.06 97.68 98.49 last 93.83 95.78 97.21 97.83 98.43 Download model file in here The architecture of Multiple Granularity Network (MGN) ! Multiple Granularity Network Figure . Multiple Granularity Network architecture. text @ARTICLE{2018arXiv180401438W, author {{Wang}, G. and {Yuan}, Y. and {Chen}, X. and {Li}, J. and {Zhou}, X.}, title {Learning Discriminative Features with Multiple Granularities for Person Re Identification} , journal {ArXiv e prints}, archivePrefix arXiv , eprint {1804.01438}, primaryClass cs.CV , keywords {Computer Science Computer Vision and Pattern Recognition}, year 2018, month apr, adsurl { adsnote {Provided by the SAO/NASA Astrophysics Data System} }",Person Re-Identification,Person Re-Identification 2479,Computer Vision,Computer Vision,Computer Vision,"Part based Convolutional Baseline for Person Retrieval and the Refined Part Pooling Code for the paper Beyond Part Models: Person Retrieval with Refined Part Pooling (and A Strong Convolutional Baseline) . This code is ONLY released for academic use. Preparation Prerequisite: Python 2.7 and Pytorch 0.3+ 1. Install Pytorch 2. Download dataset a. Market 1501 BaiduYun b. DukeMTMC reID BaiduYun (password:bhbh) c. Move them to /datasets/Market 1501/(DukeMTMC reID) train PCB sh train_PCB.sh With Pytorch 0.4.0, we shall get about 93.0% rank 1 accuracy and 78.0% mAP on Market 1501. train RPP sh train_RPP.sh With Pytorch 0.4.0, we shall get about 93.5% rank 1 accuracy and 81.5% mAP on Market 1501. Citiaion Please cite this paper in your publications if it helps your research: @inproceedings{sun2018PCB, author {Yifan Sun and Liang Zheng and Yi Yang and Qi Tian and Shengjin Wang}, title {Beyond Part Models: Person Retrieval with Refined Part Pooling (and A Strong Convolutional Baseline)}, booktitle {ECCV}, year {2018}, }",Person Re-Identification,Person Re-Identification 2567,Computer Vision,Computer Vision,Computer Vision,"Triplet based Person Re Identification Modification based on In Defense of the Triplet Loss for Person Re Identification and some modification has been made suitable for DIVA project. Code for reproducing the results of our In Defense of the Triplet Loss for Person Re Identification paper. The original author provided the following things: The exact pre trained weights for the TriNet model as used in the paper, including some rudimentary example code for using it to compute embeddings. See section Pretrained models ( pretrained models). A clean re implementation of the training code that can be used for training your own models/data. See section Training your own models ( training your own models). A script for evaluation which computes the CMC and mAP of embeddings in an HDF5 ( new .mat ) file. See section Evaluating embeddings ( evaluating embeddings). A list of independent re implementations ( independent re implementations). If you use any of the provided code, please cite: @article{HermansBeyer2017Arxiv, title {{In Defense of the Triplet Loss for Person Re Identification}}, author {Hermans , Alexander and Beyer , Lucas and Leibe, Bastian}, journal {arXiv preprint arXiv:1703.07737}, year {2017} } Pretrained TensorFlow models For convenience, we provide the pretrained weights for our TriNet TensorFlow model, trained on Market 1501 using the code from this repository and the settings form our paper. The TensorFlow checkpoint can be downloaded in the release section . Training your own models If you want more flexibility, we now provide code for training your own models. This is not the code that was used in the paper (which became a unusable mess), but rather a clean re implementation of it in TensorFlow , achieving about the same performance. This repository requires at least version 1.4 of TensorFlow. The TensorFlow code is Python 3 only and won't work in Python 2! :boom: :fire: :exclamation: If you train on a very different dataset, don't forget to tune the learning rate and schedule :exclamation: :fire: :boom: If the dataset is much larger, or much smaller, you might need to train much longer or much shorter. Market1501, MARS (in tracklets) and DukeMTMC are all roughly similar in size, hence the same schedule works well for all. CARS196, for example, is much smaller and thus needs a much shorter schedule. Defining a dataset A dataset consists of two things: 1. An image_root folder which contains all images, possibly in sub folders. 2. A dataset .csv file describing the dataset. To create a dataset, you simply create a new .csv file for it of the following form: identity,relative_path/to/image.jpg Where the identity is also often called PID ( P erson ID entity) and corresponds to the class name , it can be any arbitrary string, but should be the same for images belonging to the same identity. The relative_path/to/image.jpg is relative to aforementioned image_root . Training Given the dataset file, and the image_root , you can already train a model. The minimal way of training a model is to just call train.py in the following way: python train.py \ train_set data/market1501_train.csv \ image_root /absolute/image/root \ experiment_root /experiments/my_experiment This will start training with all default parameters. We recommend writing a script file similar to market1501_train.sh where you define all kinds of parameters, it is highly recommended you tune hyperparameters such as net_input_{height,width} , learning_rate , decay_start_iteration , and many more. See the top of train.py for a list of all parameters. As a convenience, we store all the parameters that were used for a run in experiment_root/args.json . Pre trained initialization If you want to initialize the model using pre trained weights, such as done for TriNet, you need to specify the location of the checkpoint file through initial_checkpoint . For most common models, you can download the checkpoints provided by Google here . For example, that's where we get our ResNet50 pre trained weights from, and what you should pass as second parameter to market1501_train.sh . Example training log This is what a healthy training on Market1501 looks like, using the provided script: ! Screenshot of tensorboard of a healthy Market1501 run (healthy market run.png) The Histograms tab in tensorboard also shows some interesting logs. Interrupting and resuming training Since training can take quite a while, interrupting and resuming training is important. You can interrupt training at any time by hitting Ctrl+C or sending SIGINT (2) or SIGTERM (15) to the training process; it will finish the current batch, store the model and optimizer state, and then terminate cleanly. Because of the args.json file, you can later resume that run simply by running: python train.py experiment_root /experiments/my_experiment resume The last checkpoint is determined automatically by TensorFlow using the contents of the checkpoint file.",Person Re-Identification,Person Re-Identification 2777,Computer Vision,Computer Vision,Computer Vision,"Overview This codebase is for training/deploying models in pytorch (onnx), currently it provides basic protocols for model training, evaluation and deploying. Features Distillation. Rank1:0.942993/map:0.831319 taught by resnet 101 model Spectral feature transform (rank1: 0.945071, map: 0.827155 w\o post processing). . PCB structure ; Improved training strategy GAN related person generator(unstable) AM softmax & triplet loss step wise LR warm up Install Dependency This code depends on pytorch v0.4 and torchvision, run the following command to install pytorch: pip install user torch 0.4 torchvision 0.2.1 tensorflow 1.8 tensorboardX lmdb i Model Training To train a model, clone the repo, modify params.json as you need, and run train.py. cd pytorch reid lite Modify params.json specify your own working dir. sub_working_dir is optional python train.py operation start_train config_path params.json sub_working_dir SUB\_WORKING\_DIR\_NAME On the Fly Evaluation: You can enable on the fly automatic evaluation by setting type under evaluation_params key in params.json (default is None ). If set, after each epoch the code will run your evaluation, and only saves the best performing model. The code currently supports market\_evaluate for person reid and classification\_evaluate for image classification, but it is easy to extend this to support other evalutiaons (like LFW). All you need to do is create a new file say lfw\_evaluate.py in the evaluate folder, and expose a run\_eval method which takes in your training config and returns your evaluation result. See evaluate/market\_evalute.py for an example. Offline evaluation python evaluator.py eval_params.json Tensorboard visualization You can visualize your training progress with tensorboardX (a pytorch integration of Tensorboard for Tensorflow), the code generates an event file in your sub working dir, to run tensorboard, do so as you would when using Tensorflow: cd /.local/bin ./tensorboard logdir YOUR_SUB_WORKING_DIR port YOUR_PORT Tips summary Which benefits: PCB structure PCB randomly update batchnorm random erasing, zero paddding crop warm up learning rate global branch small batchsize Which might helps: feature erasing feature mask tri loss balanced sampling multi gpu training (differs in BN layer) Not working: adam am softmax bias in FC layer or BN Baselines backbone imgSize PCB rank1 map aug. batchsize comments resnet 50 384 128 1536/6 0.628266 0.346756 mirro 64 1 classifier no bias, 60 epoch, decay per 40 resnet 50 384 128 1536/6 0.683492 0.411627 mirro 64 1 weight_decay from 4e 5 to 5e 4 resnet 50 384 128 1536/6 0.837886 0.620621 mirro 64 1 add dropout before PCB resnet 50 384 128 1536/6 0.856888 0.640600 mirro 64 1 last_conv_stride 1 resnet 50 384 128 1536/6 0.920724 0.755717 mirro 64 1 add BN to pcb stripe resnet 50 384 128 1536/6 0.921318 0.765050 mirro,RE 64 1 add BN to pcb stripe resnet 50 384 128 1536/6 0.927553 0.776928 mirro,RE 64 1 add global branch resnet 50 384 128 1536/6 0.926366 0.784323 mirro,RE 64 1 random erase 1 branch, wp resnet 50 384 128 1536/6 0.928147 0.785333 mirro,RE 64 1 random erase 5 branch, wp resnet 50 384 128 1536/6 0.929929 0.790466 mirro,RE 64 1 random erase 6 branch, wp resnet 50 384 128 1536/6 0.929038 0.787618 mirro,RE 64 1 random erase 6 branch, wp, 32X2 resnet 50 384 128 1536/6 0.927850 0.782085 mirro,RE 64 1 random erase 6 branch, wp, 16X4 resnet 50 384 128 1536/6 0.928741 0.771841 mirro,RE 64 1 global branch m 0.1 resnet 50 384 128 1536/6 0.926960 0.777564 mirro,RE 64 1 global branch m 0.3, warm up resnet 50 384 128 1536/6 0.926069 0.764451 mirro,RE 64 1 global branch m 0.4, warm up resnet 50 384 128 1536/6 0.924287 0.777912 mirro,RE 64 1 global branch m 0.4, warm up, mask resnet 50 384 128 1536/6 0.920428 0.775502 mirro,RE 64 1 mask@global branch resnet 50 384 128 1536/6 0.930523 0.783172 mirro,RE 64 1 change hue resnet 50 384 128 1536/6 0.920724 0.768056 mirro 32 1 120 epoch, decay per 40, hue resnet 50 256 128 1024/4 0.907957 0.731270 mirro 32 1 120 epoch, decay per 40 resnet 50 256 128 1024/4 0.907957 0.750186 mirro,RE 32 1 120 epoch, decay per 40 For following settings PCB branchs 6 batch_size 64 image size h x w 384 x 128 GPU memory usage: 9529MiB for last_conv_stride 1 (130 example/sec) 7155MiB for last_conv_stride 2 (170 example/sec) add global branchs at resnet stage 4(start from no relu and dropout, adaptiveMaxPool) backbone imgSize PCB rank1 map aug. bs comments resnet 50 384 128 1536+256 0.935273 0.802506 mirro,RE 64 1 no relu & dropout, global f erasing(RE) resnet 50 384 128 1536+256 0.940321 0.818069 mirro,RE 64 1 padcrop_10 resnet 50 384 128 1536+256 0.935570 0.820962 mirro,RE 64 1 random erase 6 branch(RB) resnet 50 384 128 1536+256 0.935570 0.821505 mirro,RE 64 1 dropout, without feature erasing resnet 50 384 128 1536+256 0.937055 0.818202 mirro,RE 64 1 dropout, no_pcbRE, no f_RE resnet 50 384 128 1536+256 0.937945 0.815731 mirro,RE 64 1 pcbFE0.3, no_pcbRE, no f_RE resnet 50 384 128 1536+256 0.927257 0.793121 mirro,RE 64 1 mask@ all bracnchs, pcbRE resnet 50 384 128 1536+256 0.940024 0.818851 mirro,RE 64 1 no f_RE, update max loss branch resnet 50 384 128 1536+256 0.925178 0.807808 mirro,RE 32 2 pcb_s_triloss, no_mask, no_pcbRE resnet 50 384 128 1536+256 0.932304 0.819861 mirro,RE 32 2 pcb_s_triloss m 0.16 resnet 50 384 128 1536+256 0.940618 0.826704 mirro,RE 32 2 g_triloss + pcb_g_triloss(soft) resnet 50 384 128 1536+256 0.940618 0.831889 mirro,RE 32 2 g_tri + pcb_g_tri, m 0.16 resnet 50 384 128 1536+256 0.941211 0.835557 mirro,RE 32 2 g_tri + pcb_g_tri, pcbRB6 resnet 50 384 128 1536+256 0.943290 0.834388 mirro,RE 32 2 g_tri_0.16, pcbRB6 resnet 50 384 128 1536+256 0.939133 0.826529 mirro,RE 32 2 g_tri_0.16, pcbRB6+mask resnet 50 384 128 1536+256 0.883314 0.739106 mirro,RE 32 2 g_tri_0.16, pcbRB6+am0.3s15 resnet 50 384 128 1536+256 0.917458 0.791989 mirro,RE 32 2 g_tri_0.16, pcbRB6+am0.3s0 resnet 50 384 128 1536+256 0.940024 0.828990 mirro,RE 32 2 g_tri_0.16, pcbRB6, no additional stage 4 resnet 50 384 128 1536+256 0.939727 0.830806 mirro,RE 48 3 g_tri_0.16, pcbRB6 resnet 50 384 128 1536+256 0.939133 0.829087 mirro,RE 32 2 g_tri_0.16, pcbRB6, BN_nobias Conclusions: 1. global branch after stage 4 helps 2. AM softmax still cause overfitting 3. Tri loss only used in global features 4. Update each PCB branches randomly backbone imgSize PCB rank1 map aug. batchsize comments resnet 50 256 128 256 1 0.802553 0.601922 mirro 128 1 last_stride 1 resnet 50 256 128 256 1 0.869062 0.685709 mirro 128 1 add BN, Dropout after feature layer resnet 50 256 128 256 1 0.867874 0.685979 mirro 128 1 cls no bias (not use) resnet 50 256 128 256 1 0.893112 0.740011 mirro 32 1 add BN, Dropout after feature layer resnet 50 256 128 256 1 0.898753 0.749818 mirro,RE 32 1 120 epoch, decay per 40 resnet 50 256 128 256 1 0.907660 0.763313 mirro,RE 32 1 warm up before 20 epoch resnet 50 256 128 256 1 0.923100 0.782874 mirro,RE 8 4 700+ epochs resnet 50 256 128 256 1 0.931116 0.819774 mirro,RE 8 4 pad_zero_crop, no dropout resnet 50 256 128 256 1 0.945071 0.827155 mirro,RE 8 4 st0.3, fine tuned resnet 50 256 128 256 1 0.948931 0.873448 mirro,RE 8 4 st0.3, post proce 0.5/top50 resnet 50 256 128 256 1 0.900831 0.774981 mirro,RE 16 8 spectral st_0.5_norm, pad_6 resnet 50 256 128 256 1 0.922506 0.811486 mirro,RE 8 4 tri_m 0.16, pad_6 resnet 50 256 128 256 1 0.921912 0.801184 mirro,RE 16 2 tri_m 0.16, pad_6 resnet 50 256 128 256 1 0.905879 0.756945 mirro,RE 32 1 am 0.0 resnet 50 256 128 256 1 0.898753 0.756945 mirro,RE 32 1 am 0.0(w normalized) resnet 50 256 128 256 1 0.895190 0.756697 mirro,RE 32 1 am 0.1 resnet 50 256 128 256 1 0.906473 0.774181 mirro,RE 32 1 Add feature mask resnet 50 256 128 256 1 0.914786 0.788952 mirro,RE 32 1 Change hue(with mask) resnet 50 256 128 256 1 0.896081 0.738212 mirro,RE 32 1 Crop 288 144 resnet 50 256 128 256 1 0.849169 0.673918 mirro 32 1 adam, epoch 20 lr decay resnet 50 256 128 256 1 0.864014 0.679649 mirro 32 1 adam, epoch 40 lr decay resnet 50 256 128 256 1 0.867874 0.704566 mirro 32 1 global_pool 2048d as feature For following settings PCB branchs 0 batch_size 128 64 causes divergence (w\o BN and dropout) image size h x w 256 x 128 GPU memory usage: 10343MiB for last_conv_stride 1 (215 example/sec) Parameters The params.json file contains the settings you need to run your model, here is a brief documentation of what they are about: 1. batch\_size : The batch\_size PER GPU. 2. batches\_dir : The path to your dataset generated by the open platform. 3. data\_augmentation contains the params related to data\_augmentation. 4. epoch : How many epochs to train your model. 5. imagenet\_pretrain : Whether to initialze your model with ImageNet pretrained network. Note that some networks might not support this. 6. img\_h and img\_w : Size of the input image. 7. lr contains the params related to learning rate setting, where base\_lr denotes the initial learning rate for the base network and fc\_lr denotes the initial learning rate for the fc layers. Also note that decay_step here refers to training epochs. 8. model\_params contains the setting of network structure. 9. optimizer : Which opitimization algorithm to use, default is is SGD. 10. parallels : The GPU(s) to train your model on. 11. pretrain\_snapshot : Path to pretrained model. 12. weight\_decay : The l2 regularization parameter. 13. fine\_tune : If set to true , train only the final classification layer and freeze all layers before. 14. evaluation\_params : Run different types of evaluation accordingly, now supports market\_evaluate and classificaton\_evaluate . 15. working\_dir : Where your model will be stored on disk. 19. tri\_loss\_margin : If set, the model will be trained with the Triplet loss with batch hard mining, set to soft\_margin to use the soft margin setting, and set to 0 to disbale. 20. tri\_loss\_lambda\_cls : If set, the model will be trained with the Triplet loss and The Classicfication loss(softmax/AM softmax) together, set to 0 to disbale. 21. batch\_sampling\_params : If class\_balanced is set to true, then the code will sample each batch by first randomly selecting P classes and then randomly selecting K images for each class (batch\_size P K); set class\_balanced to false to use random sampling. Also note that if class\_balanced is set to true, the lr decay step will be counted as each iteration, as opposed to epoch for random sampling.",Person Re-Identification,Person Re-Identification 2887,Computer Vision,Computer Vision,Computer Vision,"Random Erasing Data Augmentation ! Examples (all_examples page 001.jpg) This code has the source code for the paper Random Erasing Data Augmentation . If you find this code useful in your research, please consider citing: @article{zhong2017random, title {Random Erasing Data Augmentation}, author {Zhong, Zhun and Zheng, Liang and Kang, Guoliang and Li, Shaozi and Yang, Yi}, journal {arXiv preprint arXiv:1708.04896}, year {2017} } Thanks for Marcus D. Bloice , Marcus D. Bloice reproduces our method in Augmentor . Augmentor is an image augmentation library in Python for machine learning. Original image Random Erasing ! Original ! Original Other re implementations \ Python Augmentor\ \ CamStyle\ \ Keras\ \ Person_reID_baseline + Random Erasing + Re ranking\ Installation Requirements for Pytorch (see Pytorch installation instructions) Examples: CIFAR10 ResNet 20 baseline on CIFAR10: python cifar.py dataset cifar10 arch resnet depth 20 ResNet 20 + Random Erasing on CIFAR10: python cifar.py dataset cifar10 arch resnet depth 20 p 0.5 CIFAR100 ResNet 20 baseline on CIFAR100: python cifar.py dataset cifar100 arch resnet depth 20 ResNet 20 + Random Erasing on CIFAR100: python cifar.py dataset cifar100 arch resnet depth 20 p 0.5 Fashion MNIST ResNet 20 baseline on Fashion MNIST: python fashionmnist.py dataset fashionmnist arch resnet depth 20 ResNet 20 + Random Erasing on Fashion MNIST: python fashionmnist.py dataset fashionmnist arch resnet depth 20 p 0.5 Other architectures For ResNet: arch resnet depth (20, 32, 44, 56, 110) For WRN: arch wrn depth 28 widen factor 10 Our results You can reproduce the results in our paper: CIFAR10 CIFAR10 CIFAR100 CIFAR100 Fashion MNIST Fashion MNIST Models Base. +RE Base. +RE Base. +RE ResNet 20 7.21 6.73 30.84 29.97 4.39 4.02 ResNet 32 6.41 5.66 28.50 27.18 4.16 3.80 ResNet 44 5.53 5.13 25.27 24.29 4.41 4.01 ResNet 56 5.31 4.89 24.82 23.69 4.39 4.13 ResNet 110 5.10 4.61 23.73 22.10 4.40 4.01 WRN 28 10 3.80 3.08 18.49 17.73 4.01 3.65 NOTE THAT, if you use the latest released Fashion MNIST, the performance of Baseline and RE will slightly lower than the results reported in our paper. Please refer to the issue . If you have any questions about this code, please do not hesitate to contact us. Zhun Zhong Liang Zheng",Person Re-Identification,Person Re-Identification 2901,Computer Vision,Computer Vision,Computer Vision,"Multiple Granularity Network Reproduction of paper: Learning Discriminative Features with Multiple Granularities for Person Re Identification About This is a non official pytorch re production of paper: Learning Discriminative Features with Multiple Granularities for Person Re Identification . Still Work In Progress . Please cite and refer to: text @ARTICLE{2018arXiv180401438W, author {{Wang}, G. and {Yuan}, Y. and {Chen}, X. and {Li}, J. and {Zhou}, X.}, title {Learning Discriminative Features with Multiple Granularities for Person Re Identification} , journal {ArXiv e prints}, archivePrefix arXiv , eprint {1804.01438}, primaryClass cs.CV , keywords {Computer Science Computer Vision and Pattern Recognition}, year 2018, month apr, adsurl { adsnote {Provided by the SAO/NASA Astrophysics Data System} } Implementation ! Multiple Granularity Network (/architecture.png) mgn/mgn.py (/mgn/mgn.py): re production of Multiple Granularity Network. mgn/ide.py (/mgn/ide.py): baseline ResNet 50 based model, which is a rewritten from Person reID baseline pytorch ( mgn/triplet.py (/mgn/triplet.py): triplet semi hard sample mining loss. mgn/market1501.py (/mgn/market1501.py): Market 1501 dataset. Market 1501 v15.09.15/ : Market 1501 dataset root directory. Current Progress 2018 04 28: mAP 0.579464, r@1 0.798694, r@5 0.909739, r@10 0.938539",Person Re-Identification,Person Re-Identification 2247,Computer Vision,Computer Vision,Computer Vision,"Learning to Reason: End to End Module Networks for Visual Question Answering This repository contains the code for the following paper: R. Hu, J. Andreas, M. Rohrbach, T. Darrell, K. Saenko, Learning to Reason: End to End Module Networks for Visual Question Answering . in ICCV, 2017. ( PDF ) @inproceedings{hu2017learning, title {Learning to Reason: End to End Module Networks for Visual Question Answering}, author {Hu, Ronghang and Andreas, Jacob and Rohrbach, Marcus and Darrell, Trevor and Saenko, Kate}, booktitle {Proceedings of the IEEE International Conference on Computer Vision (ICCV)}, year {2017} } Project Page: Installation 1. Install Python 3 (Anaconda recommended: 2. Install TensorFlow v1.0.0 (Note: newer or older versions of TensorFlow may fail to work due to incompatibility with TensorFlow Fold): pip install tensorflow gpu 1.0.0 3. Install TensorFlow Fold (which is needed to run dynamic graph): pip install 4. Download this repository or clone with Git, and then enter the root directory of the repository: git clone && cd n2nmn Train and evaluate on the CLEVR dataset Download and preprocess the data 1. Download the CLEVR dataset from and symbol link it to exp_clevr/clevr dataset . After this step, the file structure should look like exp_clevr/clevr dataset/ images/ train/ CLEVR_train_000000.png ... val/ test/ questions/ CLEVR_train_questions.json CLEVR_val_questions.json CLEVR_test_questions.json ... 2. Extract visual features from the images and store them on the disk. In our experiments, we keep the original 480 x 320 image size in CLEVR, and use the pool5 layer output of shape the (1, 10, 15, 512) from VGG 16 network (feature stored as numpy array in HxWxC format). Then, construct the expert layout from ground truth functional programs, and build image collections (imdb) for clevr. These procedures can be down as follows. ./exp_clevr/tfmodel/vgg_net/download_vgg_net.sh VGG 16 converted to TF cd ./exp_clevr/data/ python extract_visual_features_vgg_pool5.py feature extraction python get_ground_truth_layout.py construct expert policy python build_clevr_imdb.py build image collections cd ../../ The saved features will take up approximately 29GB disk space (for all images in CLEVR train, val and test). Training 0. Add the root of this repository to PYTHONPATH: export PYTHONPATH .:$PYTHONPATH 1. Train with ground truth layout (cloning expert + policy search after cloning) Step a (cloning expert): python exp_clevr/train_clevr_gt_layout.py Step b (policy search after cloning): python exp_clevr/train_clevr_rl_gt_layout.py which is by default initialized from exp_clevr/tfmodel/clevr_gt_layout/00050000 (the 50000 iteration snapshot in Step a). If you want to initialize from another snapshot, use the pretrained_model flag to specify the snapshot path. 2. Train without ground truth layout (policy search from scratch) python exp_clevr/train_clevr_scratch.py Note: By default, the above scripts use GPU 0. To train on a different GPU, set the gpu_id flag. During training, the script will write TensorBoard events to exp_clevr/tb/ and save the snapshots under exp_clevr/tfmodel/ . Pre trained models (TensorFlow snapshots) on CLEVR dataset can be downloaded from: clevr_gt_layout (cloning expert): clevr_rl_gt_layout (policy search after cloning): clevr_scratch (policy search from scratch): The downloaded snapshots should be placed under exp_clevr/tfmodel/clevr_gt_layout , exp_clevr/tfmodel/clevr_rl_gt_layout and exp_clevr/tfmodel/clevr_scratch respectively. You may evaluate their performance using the test code below. Test 0. Add the root of this repository to PYTHONPATH: export PYTHONPATH .:$PYTHONPATH 1. Evaluate clevr_gt_layout (cloning expert): python exp_clevr/eval_clevr.py exp_name clevr_gt_layout snapshot_name 00050000 test_split val Expected accuracy: 78.9% (on val split). 2. Evaluate clevr_rl_gt_layout (policy search after cloning): python exp_clevr/eval_clevr.py exp_name clevr_rl_gt_layout snapshot_name 00050000 test_split val Expected accuracy: 83.6% (on val split). 3. Evaluate clevr_scratch (policy search from scratch): python exp_clevr/eval_clevr.py exp_name train_clevr_scratch snapshot_name 00100000 test_split val Expected accuracy: 69.1% (on val split). Note: The above evaluation scripts will print out the accuracy (only for val split) and also save it under exp_clevr/results/ . It will also save a prediction output file under exp_clevr/eval_outputs/ . By default, the above scripts use GPU 0, and evaluate on the validation split of CLEVR. To evaluate on a different GPU, set the gpu_id flag. To evaluate on the test split, use test_split tst instead. As there is no ground truth answers for test split in the downloaded CLEVR data, the evaluation script above will print out zero accuracy on the test split. You may email the prediction outputs in exp_clevr/eval_outputs/ to the CLEVR dataset authors for the test split accuracy. Train and evaluate on the VQA dataset Download and preprocess the data 1. Download the VQA dataset annotations from and symbol link it to exp_vqa/vqa dataset . After this step, the file structure should look like exp_vqa/vqa dataset/ Questions/ OpenEnded_mscoco_train2014_questions.json OpenEnded_mscoco_val2014_questions.json OpenEnded_mscoco_test dev2015_questions.json OpenEnded_mscoco_test2015_questions.json Annotations/ mscoco_train2014_annotations.json mscoco_val2014_annotations.json 2. Download the COCO images from extract features from the images, and store them under exp_vqa/data/resnet_res5c/ . In our experiments, we resize all the COCO images to 448 x 448, and use the res5c layer output of shape (1, 14, 14, 2048) from the ResNet 152 network pretrained on ImageNET classification (feature stored as numpy array in HxWxC format). In our experiments, we use the same ResNet 152 res5c features as in MCB , except that the extracted features are stored in NHWC format (instead of NCHW format used in MCB). The saved features will take up approximately 307GB disk space (for all images in COCO train2014, val2014 and test2015). After feature extraction, the file structure for the features should look like exp_vqa/data/resnet_res5c/ train2014/ COCO_train2014_000000000009.npy ... val2014/ COCO_val2014_000000000042.npy ... test2015/ COCO_test2015_000000000001.npy ... where each of the .npy file contains COCO image feature extracted from the res5c layer of the ResNet 152 network, which is a numpy array of shape (1, 14, 14, 2048) and float32 type, stored in HxWxC format. 3. Build image collections (imdb) for VQA: cd ./exp_vqa/data/ python build_vqa_imdb.py cd ../../ Note: this repository already contains the parsing results from Stanford Parser for the VQA questions under exp_vqa/data/parse/new_parse (parsed using this script ), with the converted ground truth (expert) layouts under exp_vqa/data/gt_layout_ _new_parse.npy (converted using notebook exp_vqa/data/convert_new_parse_to_gt_layout.ipynb ). Training Train with ground truth layout: 0. Add the root of this repository to PYTHONPATH: export PYTHONPATH .:$PYTHONPATH 1. Step a (cloning expert): python exp_vqa/train_vqa_gt_layout.py 2. Step b (policy search after cloning): python exp_vqa/train_vqa_rl_gt_layout.py Note: By default, the above scripts use GPU 0, and train on the union of train2014 and val2014 splits. To train on a different GPU, set the gpu_id flag. During training, the script will write TensorBoard events to exp_vqa/tb/ and save the snapshots under exp_vqa/tfmodel/ . Pre trained models (TensorFlow snapshots) on VQA dataset can be downloaded from: vqa_gt_layout (cloning expert): vqa_rl_gt_layout (policy search after cloning): The downloaded snapshots should be placed under exp_vqa/tfmodel/vqa_gt_layout and exp_vqa/tfmodel/vqa_rl_gt_layout . You may evaluate their performance using the test code below. Test 0. Add the root of this repository to PYTHONPATH: export PYTHONPATH .:$PYTHONPATH 1. Evaluate on vqa_gt_layout (cloning expert): (on test dev2015 split): python exp_vqa/eval_vqa.py exp_name vqa_gt_layout snapshot_name 00040000 test_split test dev2015 (on test2015 split): python exp_vqa/eval_vqa.py exp_name vqa_gt_layout snapshot_name 00040000 test_split test2015 2. Evaluate on vqa_rl_gt_layout (policy search after cloning): (on test dev2015 split): python exp_vqa/eval_vqa.py exp_name vqa_rl_gt_layout snapshot_name 00040000 test_split test dev2015 (on test2015 split): python exp_vqa/eval_vqa.py exp_name vqa_rl_gt_layout snapshot_name 00040000 test_split test2015 Note: the above evaluation scripts will not print out the accuracy, but will write the prediction outputs to exp_vqa/eval_outputs/ , which can be uploaded to the evaluation sever for evaluation. The expected accuacy of vqa_rl_gt_layout on test dev2015 split is 64.9%. Train and evaluate on the VQAv2 dataset Download and preprocess the data 1. Download the VQAv2 dataset annotations from and symbol link it to exp_vqa/vqa dataset . After this step, the file structure should look like exp_vqa/vqa dataset/ Questions/ v2_OpenEnded_mscoco_train2014_questions.json v2_OpenEnded_mscoco_val2014_questions.json v2_OpenEnded_mscoco_test dev2015_questions.jso v2_OpenEnded_mscoco_test2015_questions.json Annotations/ v2_mscoco_train2014_annotations.json v2_mscoco_val2014_annotations.json v2_mscoco_train2014_complementary_pairs.json v2_mscoco_val2014_complementary_pairs.json 2. Download the COCO images from extract features from the images, and store them under exp_vqa/data/resnet_res5c/ . In our experiments, we resize all the COCO images to 448 x 448, and use the res5c layer output of shape (1, 14, 14, 2048) from the ResNet 152 network pretrained on ImageNET classification (feature stored as numpy array in HxWxC format). In our experiments, we use the same ResNet 152 res5c features as in MCB , except that the extracted features are stored in NHWC format (instead of NCHW format used in MCB). The saved features will take up approximately 307GB disk space (for all images in COCO train2014, val2014 and test2015). After feature extraction, the file structure for the features should look like exp_vqa/data/resnet_res5c/ train2014/ COCO_train2014_000000000009.npy ... val2014/ COCO_val2014_000000000042.npy ... test2015/ COCO_test2015_000000000001.npy ... where each of the .npy file contains COCO image feature extracted from the res5c layer of the ResNet 152 network, which is a numpy array of shape (1, 14, 14, 2048) and float32 type, stored in HxWxC format. 3. Build image collections (imdb) for VQAv2: cd ./exp_vqa/data/ python build_vqa_v2_imdb.py cd ../../ Note: this repository already contains the parsing results from Stanford Parser for the VQAv2 questions under exp_vqa/data/parse/new_parse_vqa_v2 (parsed using this script ), with the converted ground truth (expert) layouts under exp_vqa/data/v2_gt_layout_ _new_parse.npy . Training Train with ground truth layout: 0. Add the root of this repository to PYTHONPATH: export PYTHONPATH .:$PYTHONPATH 1. Step a (cloning expert): python exp_vqa/train_vqa2_gt_layout.py 2. Step b (policy search after cloning): python exp_vqa/train_vqa2_rl_gt_layout.py Note: By default, the above scripts use GPU 0, and train on the union of train2014 and val2014 splits. To train on a different GPU, set the gpu_id flag. During training, the script will write TensorBoard events to exp_vqa/tb/ and save the snapshots under exp_vqa/tfmodel/ . Pre trained models (TensorFlow snapshots) on VQAv2 dataset can be downloaded from: vqa2_gt_layout (cloning expert): vqa2_rl_gt_layout (policy search after cloning): The downloaded snapshots should be placed under exp_vqa/tfmodel/vqa2_gt_layout and exp_vqa/tfmodel/vqa2_rl_gt_layout . You may evaluate their performance using the test code below. Test 0. Add the root of this repository to PYTHONPATH: export PYTHONPATH .:$PYTHONPATH 1. Evaluate on vqa2_gt_layout (cloning expert): (on test dev2015 split): python exp_vqa/eval_vqa2.py exp_name vqa2_gt_layout snapshot_name 00080000 test_split test dev2015 (on test2015 split): python exp_vqa/eval_vqa2.py exp_name vqa2_gt_layout snapshot_name 00080000 test_split test2015 2. Evaluate on vqa2_rl_gt_layout (policy search after cloning): (on test dev2015 split): python exp_vqa/eval_vqa2.py exp_name vqa2_rl_gt_layout snapshot_name 00080000 test_split test dev2015 (on test2015 split): python exp_vqa/eval_vqa2.py exp_name vqa2_rl_gt_layout snapshot_name 00080000 test_split test2015 Note: the above evaluation scripts will not print out the accuracy, but will write the prediction outputs to exp_vqa/eval_outputs/ , which can be uploaded to the evaluation sever for evaluation. The expected accuacy of vqa2_rl_gt_layout on test dev2015 split is 63.3%. Train and evaluate on the SHAPES dataset A copy of the SHAPES dataset is contained in this repository under exp_shapes/shapes_dataset . The ground truth module layouts (expert layouts) we use in our experiments are also provided under exp_shapes/data/ _symbols.json . The script to obtain the expert layouts from the annotations is in exp_shapes/data/get_ground_truth_layout.ipynb . Training 0. Add the root of this repository to PYTHONPATH: export PYTHONPATH .:$PYTHONPATH 1. Train with ground truth layout (behavioral cloning from expert): python exp_shapes/train_shapes_gt_layout.py 2. Train without ground truth layout (policy search from scratch): python exp_shapes/train_shapes_scratch.py Note: by default, the above scripts use GPU 0. To train on a different GPU, set the gpu_id flag. During training, the script will write TensorBoard events to exp_shapes/tb/ and save the snapshots under exp_shapes/tfmodel/ . Test 0. Add the root of this repository to PYTHONPATH: export PYTHONPATH .:$PYTHONPATH 1. Evaluate shapes_gt_layout (behavioral cloning from expert): python exp_shapes/eval_shapes.py exp_name shapes_gt_layout snapshot_name 00040000 test_split test 2. Evaluate shapes_scratch (policy search from scratch): python exp_shapes/eval_shapes.py exp_name shapes_scratch snapshot_name 00400000 test_split test Note: the above evaluation scripts will print out the accuracy and also save it under exp_shapes/results/ . By default, the above scripts use GPU 0, and evaluate on the test split of SHAPES. To evaluate on a different GPU, set the gpu_id flag. To evaluate on the validation split, use test_split val instead.",Visual Question Answering,Visual Question Answering 2254,Computer Vision,Computer Vision,Computer Vision,"VisualReasoning_MMnet Models in Pytorch for visual reasoning task on Clevr dataset. Stack attention : Module network : Yes, but what's new? Try to archive same performance in end to end differentiable architecture: Module memory network new Module memory network end2end differentiable new Try to archive weak supervision: ( Work in progress ) Set up Step 1: Download the data mkdir data wget O data/CLEVR_v1.0.zip unzip data/CLEVR_v1.0.zip d data Step 2: Extract Image Features python scripts/extract_features.py \ input_image_dir data/CLEVR_v1.0/images/train \ output_h5_file data/train_features.h5 Step 3: Preprocess Questions python scripts/preprocess_questions.py \ input_questions_json data/CLEVR_v1.0/questions/CLEVR_train_questions.json \ output_h5_file data/train_questions.h5 \ output_vocab_json data/vocab.json Test sample Train python train.py args arguments: model Model to train: SAN, SAN_wbw, PG, PG_memory, PG_endtoend question_size Number of words in question dictionary stem dim Number of feature maps n channel Number of features channels batch_size Mini batch dim min_grad Minimum value of gradient clipping max_grad Maximum value of gradient clipping load_model_path Load pre trained model (path) load_model_mode Load model mode: Execution engine (EE), Program Generator (PG), Both (PG+EE) save_model Save model ? (bool) clevr_dataset Clevr dataset data (path) clevr_val_images Clevr dataset validation images (path) num_iterations Num iteration per epoch num_val_samples Number validation samples batch_multiplier Virtual batch (minimum value: 1) train_mode Train mode: Execution engine (EE), Program Generator (PG), Both (PG+EE) decoder_mode Progam generator mode: Backpropagation (soft, gumbel) Reinforce (hard, hard+penalty) use_curriculum Use curriculum to train program generator (bool) Module memory network (Pg_memory) Module memory network end2end (Pg_endtoend) Models Stack Attention (SAN) Stack Attention word2word (SAN_wbw) Module Network (PG) Module Memory Network (PG_memory) Module Memory Network end2end (PG_endtoend)",Visual Question Answering,Visual Question Answering 2260,Computer Vision,Computer Vision,Computer Vision,"Discriminability objective for training descriptive captions This is the implementation of paper Discriminability objective for training descriptive captions . Requirements Python 2.7 (because there is no coco caption version for python 3) PyTorch 1.0 (along with torchvision) java 1.8 for (coco caption) Downloads Clone the repository git clone recursive Data split In this paper we use the data split from Context aware Captions from Context agnostic Supervision . It's different from standard karpathy's split, so we need to download different files. Download link: Google drive link To train on your own, you only need to download dataset_coco.json , but it's also suggested to download cocotalk.json and cocotalk_label.h5 as well. If you want to run pretrained model, you have to download all three files. coco caption bash cd coco caption bash ./get_stanford_models.sh cd annotations Download captions_val2014.json from the google drive link above to this folder cd ../../ The reason why we need to replace the captions_val2014.json is because the original file can only evaluate images from the val2014 set, and we are using rama's split. Pre computed feature In this paper, for retrieval model, we use outputs of last layer of resnet 101. For captioning model, we use the bottom up feature from The features can be downloaded from the same link, and you need to compress them to data/cocotalk_fc and data/cocobu_att respectively. Pretrained models. Download pretrained models from link . Decompress them into root folder. To evaluate on pretrained model, run: bash eval.sh att_d1 test The pretrained models can match the results shown in the paper. Train on you rown Preprocessing Preprocess the captions (skip if you already have 'cocotalk.json' and 'cocotalk_label.h5'): bash $ python scripts/prepro_labels.py input_json data/dataset_coco.json output_json data/cocotalk.json output_h5 data/cocotalk Preprocess for self critical training: $ python scripts/prepro_ngrams.py input_json data/dataset_coco.json dict_json data/cocotalk.json output_pkl data/coco train split train Start training First train a retrieval model: bash run_fc_con.sh Second, pretrain the captioning model. bash run_att.sh Third, finetune the captioning model with cider+discriminability optimization: bash run_att_d.sh 1 (1 is the discriminability weight, and can be changed to other values) Evaluate bash bash eval.sh att_d1 test Citation If you found this useful, please consider citing: @InProceedings{Luo_2018_CVPR, author {Luo, Ruotian and Price, Brian and Cohen, Scott and Shakhnarovich, Gregory}, title {Discriminability Objective for Training Descriptive Captions}, booktitle {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, month {June}, year {2018} } Acknowledgements The code is based on ImageCaptioning.pytorch",Visual Question Answering,Visual Question Answering 2261,Computer Vision,Computer Vision,Computer Vision,"Transformer for captioning Note: This repository is deprecated, and the code has been merged to self critical.pytorch . The same training script should work for self critical too. This is an experiment to use transformer model to do captioning. Most of the code is copy from Harvard detailed tutorial for transformer . Also, notice, this repository is a fork of my self critical.pytorch repository. Most of the code are shared. The addition to self critical.pytorch is following: transformer model Add warmup adam for training transformer (important) Add reduce_on_paltaeu (not really useful) A training script that could achieve 1.25 on validation set without beam search. bash id transformer ckpt_path log_ $id if ! d $ckpt_path ; then mkdir $ckpt_path fi if ! f $ckpt_path /infos_ $id .pkl ; then start_from else start_from start_from $ckpt_path fi python train.py id $id caption_model transformer noamopt noamopt_warmup 20000 label_smoothing 0.0 input_json data/cocotalk.json input_label_h5 data/cocotalk_label.h5 input_fc_dir data/cocobu_fc input_att_dir data/cocobu_att seq_per_img 5 batch_size 10 beam_size 1 learning_rate 5e 4 num_layers 6 input_encoding_size 512 rnn_size 2048 learning_rate_decay_start 0 scheduled_sampling_start 0 checkpoint_path $ckpt_path $start_from save_checkpoint_every 3000 language_eval 1 val_images_use 5000 max_epochs 15 python train.py id $id caption_model transformer reduce_on_plateau input_json data/cocotalk.json input_label_h5 data/cocotalk_label.h5 input_fc_dir data/cocobu_fc input_att_dir data/cocobu_att input_box_dir data/cocobu_box seq_per_img 5 batch_size 10 beam_size 1 learning_rate 1e 5 num_layers 6 input_encoding_size 512 rnn_size 2048 checkpoint_path $ckpt_path $start_from save_checkpoint_every 3000 language_eval 1 val_images_use 5000 self_critical_after 10 Notice : because I'm to lazy, I reuse the option name for RNNs to set the hyperparameters for transformer: N num_layers d_model input_encoding_size d_ff rnn_size h is always 8 Self critical Sequence Training for Image Captioning (+ misc.) This repository includes the unofficial implementation Self critical Sequence Training for Image Captioning and Bottom Up and Top Down Attention for Image Captioning and Visual Question Answering . The author of SCST helped me a lot when I tried to replicate the result. Great thanks. The att2in2 model can achieve more than 1.20 Cider score on Karpathy's test split (with self critical training, bottom up feature, large rnn hidden size, without ensemble) This is based on my ImageCaptioning.pytorch repository. The modifications is: Self critical training. Bottom up feature support from ref . (Evaluation on arbitrary images is not supported.) Ensemble Multi GPU training Requirements Python 2.7 (because there is no coco caption version for python 3) PyTorch 0.4 (along with torchvision) cider (already been added as a submodule) ( Skip if you are using bottom up feature ): If you want to use resnet to extract image features, you need to download pretrained resnet model for both training and evaluation. The models can be downloaded from here , and should be placed in data/imagenet_weights . Pretrained models (using resnet101 feature) Pretrained models are provided here . And the performances of each model will be maintained in this issue . If you want to do evaluation only, you can then follow this section ( generate image captions) after downloading the pretrained models (and also the pretrained resnet101). Train your own network on COCO Download COCO captions and preprocess them Download preprocessed coco captions from link from Karpathy's homepage. Extract dataset_coco.json from the zip file and copy it in to data/ . This file provides preprocessed captions and also standard train val test splits. Then do: bash $ python scripts/prepro_labels.py input_json data/dataset_coco.json output_json data/cocotalk.json output_h5 data/cocotalk prepro_labels.py will map all words that occur < 5 times to a special UNK token, and create a vocabulary for all the remaining words. The image information and vocabulary are dumped into data/cocotalk.json and discretized caption data are dumped into data/cocotalk_label.h5 . Download COCO dataset and pre extract the image features (Skip if you are using bottom up feature) Download the coco images from link . We need 2014 training images and 2014 val. images. You should put the train2014/ and val2014/ in the same directory, denoted as $IMAGE_ROOT . Then: $ python scripts/prepro_feats.py input_json data/dataset_coco.json output_dir data/cocotalk images_root $IMAGE_ROOT prepro_feats.py extract the resnet101 features (both fc feature and last conv feature) of each image. The features are saved in data/cocotalk_fc and data/cocotalk_att , and resulting files are about 200GB. (Check the prepro scripts for more options, like other resnet models or other attention sizes.) Warning : the prepro script will fail with the default MSCOCO data because one of their images is corrupted. See this issue for the fix, it involves manually replacing one image in the dataset. Download Bottom up features (Skip if you are using resnet features) Download pre extracted feature from link . You can either download adaptive one or fixed one. For example: mkdir data/bu_data; cd data/bu_data wget unzip trainval.zip Then: bash python script/make_bu_data.py output_dir data/cocobu This will create data/cocobu_fc , data/cocobu_att and data/cocobu_box . If you want to use bottom up feature, you can just follow the following steps and replace all cocotalk with cocobu. Start training bash $ python train.py id fc caption_model fc input_json data/cocotalk.json input_fc_dir data/cocotalk_fc input_att_dir data/cocotalk_att input_label_h5 data/cocotalk_label.h5 batch_size 10 learning_rate 5e 4 learning_rate_decay_start 0 scheduled_sampling_start 0 checkpoint_path log_fc save_checkpoint_every 6000 val_images_use 5000 max_epochs 30 The train script will dump checkpoints into the folder specified by checkpoint_path (default save/ ). We only save the best performing checkpoint on validation and the latest checkpoint to save disk space. To resume training, you can specify start_from option to be the path saving infos.pkl and model.pth (usually you could just set start_from and checkpoint_path to be the same). If you have tensorflow, the loss histories are automatically dumped into checkpoint_path , and can be visualized using tensorboard. The current command use scheduled sampling, you can also set scheduled_sampling_start to 1 to turn off scheduled sampling. If you'd like to evaluate BLEU/METEOR/CIDEr scores during training in addition to validation cross entropy loss, use language_eval 1 option, but don't forget to download the coco caption code into coco caption directory. For more options, see opts.py . A few notes on training. To give you an idea, with the default settings one epoch of MS COCO images is about 11000 iterations. After 1 epoch of training results in validation loss 2.5 and CIDEr score of 0.68. By iteration 60,000 CIDEr climbs up to about 0.84 (validation loss at about 2.4 (under scheduled sampling)). Train using self critical First you should preprocess the dataset and get the cache for calculating cider score: $ python scripts/prepro_ngrams.py input_json .../dataset_coco.json dict_json data/cocotalk.json output_pkl data/coco train split train Then, copy the model from the pretrained model using cross entropy. (It's not mandatory to copy the model, just for back up) $ bash scripts/copy_model.sh fc fc_rl Then bash $ python train.py id fc_rl caption_model fc input_json data/cocotalk.json input_fc_dir data/cocotalk_fc input_att_dir data/cocotalk_att input_label_h5 data/cocotalk_label.h5 batch_size 10 learning_rate 5e 5 start_from log_fc_rl checkpoint_path log_fc_rl save_checkpoint_every 6000 language_eval 1 val_images_use 5000 self_critical_after 30 You will see a huge boost on Cider score, : ). A few notes on training. Starting self critical training after 30 epochs, the CIDEr score goes up to 1.05 after 600k iterations (including the 30 epochs pertraining). Caption images after training Generate image captions Evaluate on raw images Now place all your images of interest into a folder, e.g. blah , and run the eval script: bash $ python eval.py model model.pth infos_path infos.pkl image_folder blah num_images 10 This tells the eval script to run up to 10 images from the given folder. If you have a big GPU you can speed up the evaluation by increasing batch_size . Use num_images 1 to process all images. The eval script will create an vis.json file inside the vis folder, which can then be visualized with the provided HTML interface: bash $ cd vis $ python m SimpleHTTPServer Now visit localhost:8000 in your browser and you should see your predicted captions. Evaluate on Karpathy's test split bash $ python eval.py dump_images 0 num_images 5000 model model.pth infos_path infos.pkl language_eval 1 The defualt split to evaluate is test. The default inference method is greedy decoding ( sample_max 1 ), to sample from the posterior, set sample_max 0 . Beam Search . Beam search can increase the performance of the search for greedy decoding sequence by 5%. However, this is a little more expensive. To turn on the beam search, use beam_size N , N should be greater than 1. Miscellanea Using cpu . The code is currently defaultly using gpu; there is even no option for switching. If someone highly needs a cpu model, please open an issue; I can potentially create a cpu checkpoint and modify the eval.py to run the model on cpu. However, there's no point using cpu to train the model. Train on other dataset . It should be trivial to port if you can create a file like dataset_coco.json for your own dataset. Live demo . Not supported now. Welcome pull request. For more advanced features: Checkout ADVANCED.md . Reference If you find this repo useful, please consider citing (no obligation at all): @article{luo2018discriminability, title {Discriminability objective for training descriptive captions}, author {Luo, Ruotian and Price, Brian and Cohen, Scott and Shakhnarovich, Gregory}, journal {arXiv preprint arXiv:1803.04376}, year {2018} } Of course, please cite the original paper of models you are using (You can find references in the model files). Acknowledgements Thanks the original neuraltalk2 and awesome PyTorch team.",Visual Question Answering,Visual Question Answering 2263,Computer Vision,Computer Vision,Computer Vision,"Self critical Sequence Training for Image Captioning (+ misc.) This repository includes the unofficial implementation Self critical Sequence Training for Image Captioning and Bottom Up and Top Down Attention for Image Captioning and Visual Question Answering . The author of SCST helped me a lot when I tried to replicate the result. Great thanks. The att2in2 model can achieve more than 1.20 Cider score on Karpathy's test split (with self critical training, bottom up feature, large rnn hidden size, without ensemble) This is based on my ImageCaptioning.pytorch repository. The modifications is: Self critical training. Bottom up feature support from ref . (Evaluation on arbitrary images is not supported.) Ensemble Multi GPU training Add transformer (merged from Transformer_captioning ) Requirements Python 2.7 (because there is no coco caption version for python 3) PyTorch 0.4 (along with torchvision) cider (already been added as a submodule) ( Skip if you are using bottom up feature ): If you want to use resnet to extract image features, you need to download pretrained resnet model for both training and evaluation. The models can be downloaded from here , and should be placed in data/imagenet_weights . Pretrained models (using resnet101 feature) Pretrained models are provided here . And the performances of each model will be maintained in this issue . If you want to do evaluation only, you can then follow this section ( generate image captions) after downloading the pretrained models (and also the pretrained resnet101). Train your own network on COCO Download COCO captions and preprocess them Download preprocessed coco captions from link from Karpathy's homepage. Extract dataset_coco.json from the zip file and copy it in to data/ . This file provides preprocessed captions and also standard train val test splits. Then do: bash $ python scripts/prepro_labels.py input_json data/dataset_coco.json output_json data/cocotalk.json output_h5 data/cocotalk prepro_labels.py will map all words that occur < 5 times to a special UNK token, and create a vocabulary for all the remaining words. The image information and vocabulary are dumped into data/cocotalk.json and discretized caption data are dumped into data/cocotalk_label.h5 . Download COCO dataset and pre extract the image features (Skip if you are using bottom up feature) Download the coco images from link . We need 2014 training images and 2014 val. images. You should put the train2014/ and val2014/ in the same directory, denoted as $IMAGE_ROOT . Then: $ python scripts/prepro_feats.py input_json data/dataset_coco.json output_dir data/cocotalk images_root $IMAGE_ROOT prepro_feats.py extract the resnet101 features (both fc feature and last conv feature) of each image. The features are saved in data/cocotalk_fc and data/cocotalk_att , and resulting files are about 200GB. (Check the prepro scripts for more options, like other resnet models or other attention sizes.) Warning : the prepro script will fail with the default MSCOCO data because one of their images is corrupted. See this issue for the fix, it involves manually replacing one image in the dataset. Download Bottom up features (Skip if you are using resnet features) Download pre extracted feature from link . You can either download adaptive one or fixed one. For example: mkdir data/bu_data; cd data/bu_data wget unzip trainval.zip Then: bash python script/make_bu_data.py output_dir data/cocobu This will create data/cocobu_fc , data/cocobu_att and data/cocobu_box . If you want to use bottom up feature, you can just follow the following steps and replace all cocotalk with cocobu. Start training bash $ python train.py id fc caption_model fc input_json data/cocotalk.json input_fc_dir data/cocotalk_fc input_att_dir data/cocotalk_att input_label_h5 data/cocotalk_label.h5 batch_size 10 learning_rate 5e 4 learning_rate_decay_start 0 scheduled_sampling_start 0 checkpoint_path log_fc save_checkpoint_every 6000 val_images_use 5000 max_epochs 30 The train script will dump checkpoints into the folder specified by checkpoint_path (default save/ ). We only save the best performing checkpoint on validation and the latest checkpoint to save disk space. To resume training, you can specify start_from option to be the path saving infos.pkl and model.pth (usually you could just set start_from and checkpoint_path to be the same). If you have tensorflow, the loss histories are automatically dumped into checkpoint_path , and can be visualized using tensorboard. The current command use scheduled sampling, you can also set scheduled_sampling_start to 1 to turn off scheduled sampling. If you'd like to evaluate BLEU/METEOR/CIDEr scores during training in addition to validation cross entropy loss, use language_eval 1 option, but don't forget to download the coco caption code into coco caption directory. For more options, see opts.py . A few notes on training. To give you an idea, with the default settings one epoch of MS COCO images is about 11000 iterations. After 1 epoch of training results in validation loss 2.5 and CIDEr score of 0.68. By iteration 60,000 CIDEr climbs up to about 0.84 (validation loss at about 2.4 (under scheduled sampling)). Train using self critical First you should preprocess the dataset and get the cache for calculating cider score: $ python scripts/prepro_ngrams.py input_json .../dataset_coco.json dict_json data/cocotalk.json output_pkl data/coco train split train Then, copy the model from the pretrained model using cross entropy. (It's not mandatory to copy the model, just for back up) $ bash scripts/copy_model.sh fc fc_rl Then bash $ python train.py id fc_rl caption_model fc input_json data/cocotalk.json input_fc_dir data/cocotalk_fc input_att_dir data/cocotalk_att input_label_h5 data/cocotalk_label.h5 batch_size 10 learning_rate 5e 5 start_from log_fc_rl checkpoint_path log_fc_rl save_checkpoint_every 6000 language_eval 1 val_images_use 5000 self_critical_after 30 You will see a huge boost on Cider score, : ). A few notes on training. Starting self critical training after 30 epochs, the CIDEr score goes up to 1.05 after 600k iterations (including the 30 epochs pertraining). Caption images after training Generate image captions Evaluate on raw images Now place all your images of interest into a folder, e.g. blah , and run the eval script: bash $ python eval.py model model.pth infos_path infos.pkl image_folder blah num_images 10 This tells the eval script to run up to 10 images from the given folder. If you have a big GPU you can speed up the evaluation by increasing batch_size . Use num_images 1 to process all images. The eval script will create an vis.json file inside the vis folder, which can then be visualized with the provided HTML interface: bash $ cd vis $ python m SimpleHTTPServer Now visit localhost:8000 in your browser and you should see your predicted captions. Evaluate on Karpathy's test split bash $ python eval.py dump_images 0 num_images 5000 model model.pth infos_path infos.pkl language_eval 1 The defualt split to evaluate is test. The default inference method is greedy decoding ( sample_max 1 ), to sample from the posterior, set sample_max 0 . Beam Search . Beam search can increase the performance of the search for greedy decoding sequence by 5%. However, this is a little more expensive. To turn on the beam search, use beam_size N , N should be greater than 1. Miscellanea Using cpu . The code is currently defaultly using gpu; there is even no option for switching. If someone highly needs a cpu model, please open an issue; I can potentially create a cpu checkpoint and modify the eval.py to run the model on cpu. However, there's no point using cpu to train the model. Train on other dataset . It should be trivial to port if you can create a file like dataset_coco.json for your own dataset. Live demo . Not supported now. Welcome pull request. For more advanced features: Checkout ADVANCED.md . Reference If you find this repo useful, please consider citing (no obligation at all): @article{luo2018discriminability, title {Discriminability objective for training descriptive captions}, author {Luo, Ruotian and Price, Brian and Cohen, Scott and Shakhnarovich, Gregory}, journal {arXiv preprint arXiv:1803.04376}, year {2018} } Of course, please cite the original paper of models you are using (You can find references in the model files). Acknowledgements Thanks the original neuraltalk2 and awesome PyTorch team.",Visual Question Answering,Visual Question Answering 2380,Computer Vision,Computer Vision,Computer Vision,"NBSVM Since I still receive a good number of emails about this project 4 years later, I decided to put this code on github and write the instructions better. The code itself is unchanged, in matlab, and not that great. Luckily, there are several other implementations ( other implementations) in various languages, which are better. For example, I used Grégoire Mesnil's implementation on this CodaLab worksheet and got slightly better results than we originally did. Running NBSVM Download the data and override the empty data directory in root: for example, you should have ./data/rt10662/unigram_rts.mat if this readme has pass ./README.MD Go to src and run the script master.m to produce the results from the paper Results and details are logged in resultslog.txt and details.txt, respectively A table with all the results is printed, like: AthR XGraph BbCrypt CR IMDB MPQA RT 2k RTs subj 85.13 91.19 99.40 79.97 86.59 86.27 85.85 79.03 93.56 MNB bigram 84.99 89.96 99.29 79.76 83.55 85.29 83.45 77.94 92.58 MNB unigram 83.73 86.17 97.68 80.85 89.16 86.72 87.40 77.72 91.74 SVM bigram 82.61 85.14 98.29 79.02 86.95 86.15 86.25 76.23 90.84 SVM unigram 87.66 90.68 99.50 81.75 91.22 86.32 89.45 79.38 93.18 NBSVM bigram 87.94 91.19 99.70 80.45 88.29 85.25 87.80 78.05 92.40 SVM unigram The data data 404.4MB includes all the data data_small 108.5MB data_small data_all large_IMDB For each data set, there is a corresponding folder data/$DatasetName. You can find $FeatureType_$DatasetName.mat in data/$DatasetName, where $FeatureType unigram or bigram . data/$DatasetName/cv_obj.mat determines the standard evaluation for each dataset (how many folds, whats the split, etc.). They are generated by corresponding data processing script in src/misc Other implementations Please consider submitting a pull request or shoot me an email if you used NBSVM in your work! Python implementation by Grégoire Mesnil, It runs on the large IMDB dataset with a single script and the results are described in their ICLR 2015 paper Java implementation by Daniel Pressel, using SGD. Python implementation by Luis Rei, multiclass a Go implementation by tkng, probably imcomplete Perl! unfortunately cant read Japanese It appears to be used in these kaggle entries: Notes The datasets are collected by others, please cite the original sources if you work with them The data structure used kept the order information of the document, instead of converting to bag of words vector right away. This resulted in some unnecessary mess for this work, but might make it easier if you want to try a more complex model. Comments While many experiments have been ran for this task, performance is really all about regularization, and even the simplest model (Naive Bayes) would fit the training set perfectly. As far as I know, there is no good theory for why things even work in this case of non sparse weights and p>>n. It is unclear if any of the complicated deep learning models today are doing significantly more than bag of words on these datasets: As far as I know, none of these results are impressively better (usually about 1%) Available compute power, engineering competence, and software infrastructure are vastly better for deep learning Difference in enthusiasm level: no one seems to try very hard pushing basic models to the available compute power / hardware Bag of words models run in few seconds or less, and behaves predictably for a different test distribution. It is very encouraging for me to see others finding this work helpful and implementing it. Another example of bag of words going strong in 2015. References For technical details see our paper (wang12simple.pdf) and our talk (wang12simple_slides.pdf). @inproceedings{wang12simple, author {Wang, Sida I. and Manning, Christopher D.}, booktitle {Proceedings of the ACL}, title {Baselines and Bigrams: Simple, Good Sentiment and Topic Classification}, year {2012}, booktitle {ACL (2)}, pages {90 94} } IMDB comparisons These works compare with the largest dataset of the batch (IMDB), where maybe regularization is not as important. Our result was 91.22% correct. Quoc V. Le, Tomas Mikolov. Distributed Representations of Sentences and Documents. 2014. Got 92.58%, no released code, the paper below reports that the results were not reproduced. Grégoire Mesnil, Tomas Mikolov, Marc'Aurelio Ranzato, Yoshua Bengio. Ensemble of Generative and Discriminative Techniques for Sentiment Analysis of Movie Reviews. ICLR 2015 Their implementation of NBSVM actually got better than us at 91.87%, and their best number is 92.57% with some ensembling. Andrew M. Dai, Quoc V. Le. Semi supervised Sequence Learning. NIPS 2015. 92.76% with additional unlabeled data. Stefan Wager, Sida Wang, and Percy Liang. Dropout Training as Adaptive Regularization. NIPS 2013 We got 91.98% using unlabeled data with logistic regression and bigrams. (please submit a pull request if you want something added or changed) MIT license: here (LICENSE.MD)",Visual Question Answering,Visual Question Answering 2382,Computer Vision,Computer Vision,Computer Vision,"What’s in a Question: Using Visual Questions as a Form of Supervision This is the code for the CVPR'17 spotlight paper, What’s in a Question: Using Visual Questions as a Form of Supervision . Arxiv bib Github Project Page ! What’s in a Question: Using Visual Questions as a Form of Supervision Abstract > Collecting fully annotated image datasets is challenging and expensive. Many types of weak supervision have been explored: weak manual annotations, web search results, temporal continuity, ambient sound, and others. We focus on one particular unexplored mode: visual questions that are asked about images. Our work is based on the key observation that the question itself provides useful information about the image (even without the answer being available). For instance, the question “what is the breed of the dog?” informs the computer that the animal in the scene is a dog and that there is only one dog present. We make three contributions: (1) we provide an extensive qualitative and quantitative analysis of the information contained in human visual questions, (2) we propose two simple but surprisingly effective modifications to the standard visual question answering models that allows it to make use of weak supervision in the form of unanswered questions associated with images, and (3) we demonstrate that a simple data augmentation strategy inspired by our insights results in a 7:1% improvement on the standard VQA benchmark. The trained models attain the following scores on the test dev of the MS COCO VQA v1.0 dataset . Model Name Overall Other Number Yes/No iBOWIMG 2x 62.80 53.11 37.94 80.72 There are three tasks described in the paper: 1. Image Descriptions We analyze whether the visual questions contain enough information to provide an accurate description of the image using the Seq2Seq model. See Image Descriptions README for detailed description for each file. 2. Object Classification Visual questions can provide information about the object classes that are present in the image. E.g., asking “what color is the bus?” indicates the presence of a bus in the image. See Object Classification README for detailed description for each file. Training/fine tuning the image features (caffe) Fine tuning modifies only the last layer of a network to give the application specific number of outputs. For fine tuning we start with the parameters initially learnt on the ImageNet images, and then fine tune with MS COCO images. All caffe related code for fine tuning the models is present in the caffe directory. See caffe README for detailed description for each file. 3. Visual Question Answering Visual Question Answering is, given an image and a natural language question about the image, the task is to provide an accurate natural language answer. Visual questions focus on different areas of an image, including background details and underlying context. We utilize not just the target question, but also the unanswered questions about a particular image. See Visual Question Answering README for detailed description for each file. Acknowldegements This code is based on Simple Baseline for Visual Question Answering by Bolei Zhou and Yuandong Tian. Please cite us if you use our code: @inproceedings{GanjuCVPR17, author {Siddha Ganju and Olga Russakovsky and Abhinav Gupta}, title {What's in a Question: Using Visual Questions as a Form of Supervision}, booktitle {CVPR}, year {2017} }",Visual Question Answering,Visual Question Answering 2421,Computer Vision,Computer Vision,Computer Vision,"Pythia Pythia is a modular framework for Visual Question Answering research, which formed the basis for the winning entry to the VQA Challenge 2018 from Facebook AI Research (FAIR)’s A STAR team. Please check our paper for more details. (A STAR: Agents that See, Talk, Act, and Reason.) ! Alt text (info/vqa_example.png?raw true vqa examples ) Table of Contents 0. Motivation ( motivation) 0. Citing pythia ( citing pythia) 0. Installing pythia environment ( installing pythia environment) 0. Quick start ( quick start) 0. Preprocess dataset ( preprocess dataset) 0. Test with pretrained models ( test with pretrained models) 0. Ensemble models ( ensemble models) 0. Customize config ( customize config) 0. Docker demo ( docker demo) 0. AWS s3 dataset summary ( aws s3 dataset summary) 0. Acknowledgements ( acknowledgements) 0. References ( references) Motivation The motivation for Pythia comes from the following observation – a majority of today’s Visual Question Answering (VQA) models fit a particular design paradigm, with modules for question encoding, image feature extraction, fusion of the two (typically with attention), and classification over the space of answers. The long term goal of Pythia is to serve as a platform for easy and modular research & development in VQA and related directions like visual dialog. Why the name _Pythia_? The name Pythia is an homage to the Oracle of Apollo at Delphi, who answered questions in Ancient Greece. See here for more details. Citing pythia If you use Pythia in your research, please use the following BibTeX entries for reference: The software: @misc{pythia18software, title {Pythia}, author {Yu Jiang and Vivek Natarajan and Xinlei Chen and Marcus Rohrbach and Dhruv Batra and Devi Parikh}, howpublished {\url{ year {2018} } The technical report detailing the description and analysis for our winning entry to the VQA 2018 challenge: @article{pythia18arxiv, title {Pythia v0.1: the Winning Entry to the VQA Challenge 2018}, author {{Yu Jiang } and {Vivek Natarajan } and {Xinlei Chen } and Marcus Rohrbach and Dhruv Batra and Devi Parikh}, journal {arXiv preprint arXiv:1807.09956}, year {2018} } \ Yu Jiang, Vivek Natarajan and Xinlei Chen contributed equally to the winning entry to the VQA 2018 challenge. Installing pythia environment 1. Install Anaconda (Anaconda recommended: 2. Install cudnn v7.0 and cuda.9.0 3. Create environment for pythia bash conda create name vqa python 3.6 source activate vqa pip install demjson pyyaml pip install pip install torchvision pip install tensorboardX Quick start We provide preprocessed data files to directly start training and evaluating. Instead of using the original train2014 and val2014 splits, we split val2014 into val2train2014 and minival2014 , and use train2014 + val2train2014 for training and minival2014 for validation. Download data. This step may take some time. Check the sizes of files at the end of readme. bash git clone git@github.com:facebookresearch/pythia.git cd Pythia mkdir data cd data wget wget wget wget wget wget wget wget gunzip imdb.tar.gz tar xf imdb.tar gunzip rcnn_10_100.tar.gz tar xf rcnn_10_100.tar rm f rcnn_10_100.tar gunzip detectron.tar.gz tar xf detectron.tar rm f detectron.tar Optional command line arguments for train.py bash python train.py h usage: train.py h config CONFIG out_dir OUT_DIR seed SEED config_overwrite CONFIG_OVERWRITE force_restart optional arguments: h, help show this help message and exit config CONFIG config yaml file out_dir OUT_DIR output directory, default is current directory seed SEED random seed, default 1234, set seed to 1 if need a random seed between 1 and 100000 config_overwrite CONFIG_OVERWRITE a json string to update yaml config file force_restart flag to force clean previous result and restart training Run model without finetuning bash cd ../ python train.py If there is a out of memory error, try: bash python train.py config_overwrite '{data:{image_fast_reader:false}}' Run model with features from detectron with finetuning bash python train.py config config/keep/detectron.yaml Check result for the default run bash cd results/default/1234 The results folder contains the following info angular2html results _ default _ 1234 (default seed) _config.yaml _best_model.pth _best_model_predict_test.pkl _best_model_predict_test.json (json file for predicted results on test dataset) _model_00001000.pth (mpdel snapshot at iter 1000) _result_on_val.txt _ ... _(other_cofig_setting) _... _ (other_config_file) The log files for tensorbord are stored under boards/ Preprocess dataset If you want to start from the original VQA dataset and preprocess data by yourself, use the following instructions in data_preprocess.md (data_prep/data_preprocess.md). This part is not necessary if you download all data from quick start. Test with pretrained models Note: all of these models below are trained with validation set included Description performance (test dev) Link detectron_100_resnet_most_data 70.01 baseline 68.05 baseline +VG +VisDal +mirror 68.98 detectron_finetune 68.49 detectron_finetune+VG +VisDal +mirror 69.24 Best Pretrained Model The best pretrained model can be downloaded as follows: bash mkdir pretrained_models/ cd pretrained_models wget gunzip detectron_100_resnet_most_data.tar.gz tar xf detectron_100_resnet_most_data.tar rm f detectron_100_resnet_most_data.tar Get ResNet152 features and Detectron features with fixed 100 bounding boxes bash cd data wget gunzip detectron_fix_100.tar.gz tar xf detectron_fix_100.tar rm f detectron_fix_100.tar wget gunzip resnet152.tar.gz tar xf resnet152.tar rm f resnet152.tar Test the best model on the VQA test2015 dataset bash python run_test.py config pretrained_models/detectron_100_resnet_most_data/1234/config.yaml \ model_path pretrained_models/detectron_100_resnet_most_data/1234/best_model.pth \ out_prefix test_best_model The results will be saved as a json file test_best_model.json , and this file can be uploaded to the evaluation server on EvalAI . Ensemble models Download all the models above bash python ensemble.py res_dirs pretrained_models/ out ensemble_5.json Results will be saved in ensemble_5.json . This ensemble can get accuracy 71.65 on test dev. Ensemble 30 models To run an ensemble of 30 pretrained models, download the models and image features as follows. This gets an accuracy of 72.18 on test dev. bash wget Customize config To change models or adjust hyper parameters, see config_help.md (config_help.md) Docker demo To quickly tryout a model interactively with nvidia docker bash git clone nvidia docker build pythia t pythia:latest nvidia docker run ti net host pythia:latest This will open a jupyter notebook with a demo model to which you can ask questions interactively. AWS s3 dataset summary Here, we listed the size of some large files in our AWS S3 bucket. Description size data/rcnn_10_100.tar.gz 71.0GB data/detectron.tar.gz 106.2 GB data/detectron_fix_100.tar.gz 162.6GB data/resnet152.tar.gz 399.6GB ensembled.tar.gz 462.1GB Acknowledgements We would like to thank Peter Anderson, Abhishek Das, Stefan Lee, Jiasen Lu, Jianwei Yang, Licheng Yu, Luowei Zhou for helpful discussions, Peter Anderson for providing training data for the Visual Genome detector, Deshraj Yadav for responses on EvalAI related questions, Stefan Lee for suggesting the name Pythia , Meet Shah for building the docker demo for Pythia and Abhishek Das, Abhishek Kadian for feedback on our codebase. References Y. Jiang, and V. Natarajan and X. Chen and M. Rohrbach and D. Batra and D. Parikh. Pythia v0.1: The Winning Entry to the VQA Challenge 2018. CoRR, abs/1807.09956, 2018. P. Anderson, X. He, C. Buehler, D. Teney, M. Johnson, S. Gould, and L. Zhang. Bottom up and top down attenttion for image captioning and visual question answering. In _CVPR_, 2018. S. Antol, A. Agrawal, J. Lu, M. Mitchell, D. Batra,C. Lawrence Zitnick, and D. Parikh. VQA: Visual question answering. In _ICCV_, 2015 A. Das, S. Kottur, K. Gupta, A. Singh, D. Yadav, J. M. Moura, D. Parikh, and D. Batra. Visual Dialog. In _CVPR_, 2017 Y. Goyal, T. Khot, D. Summers Stay, D. Batra, and D. Parikh. Making the V in VQA matter: Elevating the role of image understanding in Visual Question Answering. In _CVPR_, 2017. D. Teney, P. Anderson, X. He, and A. van den Hengel. Tips and tricks for visual question answering: Learnings from the 2017 challenge. CoRR, abs/1708.02711, 2017. Z. Yu, J. Yu, C. Xiang, J. Fan, and D. Tao. Beyond bilinear: Generalized multimodal factorized high order pooling for visual question answering. _TNNLS_, 2018.",Visual Question Answering,Visual Question Answering 2513,Computer Vision,Computer Vision,Computer Vision,"basic_vqa Pytorch implementation of the paper VQA: Visual Question Answering . Usage 1. Clone the repositories. bash $ git clone 2. Download and unzip the dataset from official url of VQA: bash $ cd basic_vqa/utils $ chmod +x download_and_unzip_datasets.csh $ ./download_and_unzip_datasets.csh 3. Preproccess input data for (images, questions and answers). bash $ python resize_images.py input_dir '../datasets/Images' output_dir '../datasets/Resized_Images' $ python make_vacabs_for_questions_answers.py input_dir '../datasets' $ python build_vqa_inputs.py input_dir '../datasets' output_dir '../datasets' 4. Train model for VQA task. bash $ cd .. $ python train.py Results Comparison Result Model Metric Dataset Accuracy Source Paper Model Open Ended VQA v2 54.08 VQA Challenge My Model Multiple Choice VQA v2 54.72 Loss and Accuracy on VQA datasets v2 ! train1 (./png/train.png) References Paper implementation + Paper: VQA: Visual Question Answering + URL: Pytorch tutorial + URL: + Github: + Github: Preprocessing + Tensorflow implementation of N2NNM + Github:",Visual Question Answering,Visual Question Answering 2548,Computer Vision,Computer Vision,Computer Vision,"TensorFlow Models This repository contains a number of different models implemented in TensorFlow : The official models (official) are a collection of example models that use TensorFlow's high level APIs. They are intended to be well maintained, tested, and kept up to date with the latest stable TensorFlow API. They should also be reasonably optimized for fast performance while still being easy to read. We especially recommend newer TensorFlow users to start here. The research models are a large collection of models implemented in TensorFlow by researchers. They are not officially supported or available in release branches; it is up to the individual researchers to maintain the models and/or provide support on issues and pull requests. The samples folder (samples) contains code snippets and smaller models that demonstrate features of TensorFlow, including code presented in various blog posts. The tutorials folder (tutorials) is a collection of models described in the TensorFlow tutorials . Contribution guidelines If you want to contribute to models, be sure to review the contribution guidelines (CONTRIBUTING.md). License Apache License 2.0 (LICENSE)",Visual Question Answering,Visual Question Answering 2858,Computer Vision,Computer Vision,Computer Vision,"imageqa san Source code for Stacked attention networks for image question answering . Joint collaboration between CMU and MSR. Dependencies The code is in python and uses Theano package. Python 2.7 Theano Numpy h5py Usage Download the data from here and extract them at data_vqa folder. cd src; python san_att_conv_twolayer.py to start training. Reference If you use this code as part of your research, please cite our paper ''Stacked Attention Netowrks for Image Question Answering'' , Zichao Yang, Xiaodong He, Jianfeng Gao, Li Deng and Alex Smola. To appear in CVPR 2016. @article{YangHGDS15, author {Zichao Yang and Xiaodong He and Jianfeng Gao and Li Deng and Alexander J. Smola}, title {Stacked Attention Networks for Image Question Answering}, journal {CoRR}, volume {abs/1511.02274}, year {2015}, url { }",Visual Question Answering,Visual Question Answering 1827,Methodology,Miscellaneous,Other,"LearningToCompare_ZSL PyTorch code for CVPR 2018 paper: Learning to Compare: Relation Network for Few Shot Learning (Zero Shot Learning part) For Few Shot Learning part, please visit here . Requirements Python 2.7 Pytorch 0.3 Data Download data from here and unzip it unzip data.zip . Run ZSL and GZSL performance evaluated under GBU setting 1 : ResNet feature, GBU split, averaged per class accuracy. AwA1_RN.py will give you ZSL and GZSL performance on AwA1 with attribute under GBU setting 1 . AwA2_RN.py will give you ZSL and GZSL performance on AwA2 with attribute under GBU setting 1 . CUB_RN.py will give you ZSL and GZSL performance on CUB with attribute under GBU setting 1 . Model AwA1 T1 u s H CUB T1 u s H DAP 2 44.1 0.0 88.7 0.0 40.0 1.7 67.9 3.3 CONSE 3 45.6 0.4 88.6 0.8 34.3 1.6 72.2 3.1 SSE 4 60.1 7.0 80.5 12.9 43.9 8.5 46.9 14.4 DEVISE 5 54.2 13.4 68.7 22.4 52.0 23.8 53.0 32.8 SJE 6 65.6 11.3 74.6 19.6 53.9 23.5 59.2 33.6 LATEM 7 55.1 7.3 71.7 13.3 49.3 15.2 57.3 24.0 ESZSL 8 58.2 6.6 75.6 12.1 53.9 12.6 63.8 21.0 ALE 9 59.9 16.8 76.1 27.5 54.9 23.7 62.8 34.4 SYNC 10 54.0 8.9 87.3 16.2 55.6 11.5 70.9 19.8 SAE 11 53.0 1.8 77.1 3.5 33.3 7.8 54.0 13.6 DEM 12 68.4 32.8 84.7 47.3 51.7 19.6 57.9 29.2 RN (OURS) 68.2 31.4 91.3 46.7 55.6 38.1 61.4 47.0 Model AwA2 T1 u s H DAP 2 46.1 0.0 84.7 0.0 CONSE 3 44.5 0.5 90.6 1.0 SSE 4 61.0 8.1 82.5 14.8 DEVISE 5 59.7 17.1 74.7 27.8 SJE 6 61.9 8.0 73.9 14.4 LATEM 7 55.8 11.5 77.3 20.0 ESZSL 8 58.6 5.9 77.8 11.0 ALE 9 62.5 14.0 81.8 23.9 SYNC 10 46.6 10.0 90.5 18.0 SAE 11 54.1 1.1 82.2 2.2 DEM 12 67.1 30.5 86.4 45.1 RN (OURS) 64.2 30.0 93.4 45.3 Citing If you use this code in your research, please use the following BibTeX entry. @inproceedings{sung2018learning, title {Learning to Compare: Relation Network for Few Shot Learning}, author {Sung, Flood and Yang, Yongxin and Zhang, Li and Xiang, Tao and Torr, Philip HS and Hospedales, Timothy M}, booktitle {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition}, year {2018} } References 1 Zero Shot Learning A Comprehensive Evaluation of the Good, the Bad and the Ugly . Yongqin Xian, Christoph H. Lampert, Bernt Schiele, Zeynep Akata. arXiv, 2017. 2 Attribute Based Classification forZero Shot Visual Object Categorization . Christoph H. Lampert, Hannes Nickisch and Stefan Harmeling. PAMI, 2014. 3 Zero Shot Learning by Convex Combination of Semantic Embeddings . Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome, Greg S. Corrado, Jeffrey Dean. arXiv, 2013. 4 Zero Shot Learning via Semantic Similarity Embedding . Ziming Zhang, Venkatesh Saligrama. ICCV, 2015. 5 DeViSE: A Deep Visual Semantic Embedding Model . Andrea Frome , Greg S. Corrado , Jonathon Shlens , Samy BengioJeffrey Dean, Marc’Aurelio Ranzato, Tomas Mikolov. NIPS, 2013. 6 Evaluation of Output Embeddings for Fine Grained Image Classification . Zeynep Akata, Scott Reed, Daniel Walter, Honglak Lee, Bernt Schiele. CVPR, 2015. 7 Latent Embeddings for Zero shot Classification . Yongqin Xian, Zeynep Akata, Gaurav Sharma, Quynh Nguyen, Matthias Hein, Bernt Schiele CVPR, 2016. 8 An embarrassingly simple approach to zero shot learning . Bernardino Romera Paredes, Philip H. S. Torr. ICML, 2015. 9 Label Embedding for Image Classification . Zeynep Akata, Florent Perronnin, Zaid Harchaoui, Cordelia Schmid. PAMI, 2016. 10 Synthesized Classifiers for Zero Shot Learning . Soravit Changpinyo, Wei Lun Chao, Boqing Gong, Fei Sha. CVPR, 2016. 11 Semantic Autoencoder for Zero Shot Learning . Elyor Kodirov, Tao Xiang, Shaogang Gong. CVPR, 2017. 12 Learning a Deep Embedding Model for Zero Shot Learning . Li Zhang, Tao Xiang, Shaogang Gong. CVPR, 2017.",Few-Shot Learning,Miscellaneous 2131,Miscellaneous,Miscellaneous,Other,"Final_Project_MachineLearning_in_TensorFlow_Berkeley This is the final project of Berkeley extension's Machine Learning Course in TensorFlow. Project Proposal: The dataset, obtained from Kaggle is a list of 20,000 recipes listed by rating, nutritional information and assigned category that is already parsed. I plan to predict the public rating of recipes based on continuous and categorical features given in the dataset. The feature size is large (678 count) but extremely sparse. Hence, I believe that the dataset would make a great candidate to build a model combining wide linear model and a deep feed forward neural network (using DNNLinearCombinedClassifier). The wide linear model is able to memorize interactions with data but not able to generalize learned interactions on new data. The deep model generalizes well but is unable to learn exceptions within the data. It is intended that the wide and deep model combines the two models and is able to generalize while learning exceptions . The code is in Jupyter notebook. There are two notebooks: 1) Model to predict if food is a dessert: Final_project_DNNClassifier_predict_dessert.ipynb 2) Model to predict rating: Final_project_DNNClassifier_predict_rating.ipynb Instructions for running the model: 1) Download and unzip Epicurious dataset: epi_r.csv. Save it in a folder and use path to refer to in notebook 2) Run packages and functions in the notebook. 3) The model can run three types of model: wide , deep and wide + deep . 4) Define model_type and model_dir and run test_model_accuracy(model_type, model_dir) 5) Go to your define model_dir in terminal and run Tensorboard: tensorboard logdir ./",Click-Through Rate Prediction,Miscellaneous 2357,Miscellaneous,Miscellaneous,Other,"DeepCTR Python Versions Downloads PyPI Version GitHub Issues Activity Documentation Status Build Status Coverage Status Codacy Badge License DeepCTR is a Easy to use , Modular and Extendible package of deep learning based CTR models along with lots of core components layers which can be used to build your own custom model easily.It is implemented by tensorflow.You can use any complex model with model.fit() and model.predict() . Let's Get Started! ( Chinese Introduction ) Models List Model Paper : : : Convolutional Click Prediction Model CIKM 2015 A Convolutional Click Prediction Model Factorization supported Neural Network ECIR 2016 Deep Learning over Multi field Categorical Data: A Case Study on User Response Prediction Product based Neural Network ICDM 2016 Product based neural networks for user response prediction Wide & Deep DLRS 2016 Wide & Deep Learning for Recommender Systems DeepFM IJCAI 2017 DeepFM: A Factorization Machine based Neural Network for CTR Prediction Piece wise Linear Model arxiv 2017 Learning Piece wise Linear Models from Large Scale Data for Ad Click Prediction Deep & Cross Network ADKDD 2017 Deep & Cross Network for Ad Click Predictions Attentional Factorization Machine IJCAI 2017 Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks Neural Factorization Machine SIGIR 2017 Neural Factorization Machines for Sparse Predictive Analytics xDeepFM KDD 2018 xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems Deep Interest Network KDD 2018 Deep Interest Network for Click Through Rate Prediction Deep Interest Evolution Network AAAI 2019 Deep Interest Evolution Network for Click Through Rate Prediction AutoInt arxiv 2018 AutoInt: Automatic Feature Interaction Learning via Self Attentive Neural Networks NFFM arxiv 2019 Field aware Neural Factorization Machine for Click Through Rate Prediction (The original NFFM was first used by Yi Yang(yangyi868@gmail.com) in TSA competition in 2017.) FGCNN WWW 2019 Feature Generation by Convolutional Neural Network for Click Through Rate Prediction )",Click-Through Rate Prediction,Miscellaneous 2419,Methodology,Miscellaneous,Other,"March 15, 2019: for our most updated work on model compression and acceleration, please reference: ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware (ICLR’19) AMC: AutoML for Model Compression and Acceleration on Mobile Devices (ECCV’18) HAQ: Hardware Aware Automated Quantization (CVPR’19) Defenstive Quantization: When Efficiency Meets Robustness (ICLR'19) DSD Model Zoo This repo contains pre trained models by Dense Sparse Dense(DSD) training on Imagenet. Compared to conventional training method, dense→sparse→dense (DSD) training yielded higher accuracy with same model architecture. Sparsity is a powerful form of regularization. Our intuition is that, once the network arrives at a local minimum given the sparsity constraint, relaxing the constraint gives the network more freedom to escape the saddle point and arrive at a higher accuracy local minimum. Feel free to use the better accuracy DSD models to help your research. If you find DSD traing useful, please cite the following paper: @article{han2016_DSD, title {DSD: Dense Sparse Dense Training for Deep Neural Networks}, author {Song Han, Jeff Pool, Sharan Narang, Huizi Mao, Enhao Gong, Shijian Tang, Erich Elsen, Peter Vajda, Manohar Paluri, John Tran, Bryan Catanzaro, William J. Dally}, journal {International Conference on Learning Representations (ICLR)}, year {2017} } Download: AlexNet_DSD VGG16_DSD GoogleNet_DSD SqueezeNet_DSD ResNet18_DSD ResNet50_DSD Single crop (224x224) validation error rate: Baseline Top 1 error Top 5 error DSD Top 1 error Top 5 error AlexNet 42.78% 19.73% AlexNet_DSD 41.48% 18.71% VGG16 31.50% 11.32% VGG16_DSD 27.19% 8.67% GoogleNet 31.14% 10.96% GoogleNet_DSD 30.02% 10.34% SqueezeNet 42.56% 19.52% SqueezeNet_DSD 38.24% 16.53% ResNet18 30.43% 10.76% ResNet18_DSD 29.17% 10.13% ResNet50 24.01% 7.02% ResNet50_DSD 22.89% 6.47% The beseline of AlexNet, VGG16, GoogleNet, SqueezeNet are from Caffe Model Zoo . The baseline of ResNet18, ResNet50 are from fb.resnet.torch commit 500b698.",Architecture Search,Miscellaneous 2420,Methodology,Miscellaneous,Other,"March 15, 2019: for our most updated work on model compression and acceleration, please reference: ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware (ICLR’19) AMC: AutoML for Model Compression and Acceleration on Mobile Devices (ECCV’18) HAQ: Hardware Aware Automated Quantization (CVPR’19) Defenstive Quantization: When Efficiency Meets Robustness (ICLR'19) SqueezeNet Residual The repo contains the residual SqueezeNet, which is obtained by adding bypass layer to SqueezeNet_v1.0. Residual SqueezeNet improves the top 1 accuracy of SqueezeNet by 2.9% on ImageNet without changing the model size(only 4.8MB). Related repo and paper SqueezeNet SqueezeNet Deep Compression SqueezeNet Generator SqueezeNet DSD Training SqueezeNet Residual If you find residual SqueezeNet useful in your research, please consider citing the paper: @article{SqueezeNet, title {SqueezeNet: AlexNet level accuracy with 50x fewer parameters and The building block:",Architecture Search,Miscellaneous 2544,Reasoning,Miscellaneous,Other,"TensorFlow Models This repository contains a number of different models implemented in TensorFlow : The official models (official) are a collection of example models that use TensorFlow's high level APIs. They are intended to be well maintained, tested, and kept up to date with the latest stable TensorFlow API. They should also be reasonably optimized for fast performance while still being easy to read. We especially recommend newer TensorFlow users to start here. The research models are a large collection of models implemented in TensorFlow by researchers. They are not officially supported or available in release branches; it is up to the individual researchers to maintain the models and/or provide support on issues and pull requests. The samples folder (samples) contains code snippets and smaller models that demonstrate features of TensorFlow, including code presented in various blog posts. The tutorials folder (tutorials) is a collection of models described in the TensorFlow tutorials . Contribution guidelines If you want to contribute to models, be sure to review the contribution guidelines (CONTRIBUTING.md). License Apache License 2.0 (LICENSE)",Common Sense Reasoning,Miscellaneous 2677,Methodology,Miscellaneous,Other,"Looking back at Labels: A Class based Domain Adaptation Technique(IDDA) Torch code for Domain Adaptation model(IDDA) . For more information, please refer the paper Accepted at IJCNN 2019 Oral Project Page Link Paper Link Abstract In this paper, we tackle a problem of Domain Adaptation. In a domain adaptation setting, there is provided a labeled set of examples in a source dataset with multiple classes being present and a target dataset that has no supervision. In this setting, we propose an adversarial discriminator based approach. While the approach based on adversarial discriminator has been previously proposed; in this paper, we present an informed adversarial discriminator. Our observation relies on the analysis that shows that if the discriminator has access to all the information available including the class structure present in the source dataset, then it can guide the transformation of features of the target set of classes to a more structured adapted space. Using this formulation, we obtain the state of the art results for the standard evaluation on benchmark datasets. We further provide detailed analysis which shows that using all the labeled information results in an improved domain adaptation. ! Result Requirements This code is written in Lua and requires Torch . You also need to install the following package in order to sucessfully run the code. Torch loadcaffe Download Dataset Office 31 ImageClef Office Home Prepare Datasets Download the dataset Training Steps We have prepared everything for you ;) Clone the repositotry git clone Dataset prepare Downalod dataset put all source images inside mydataset/train/ such that folder name is class name mkdir p /path_to_wherever_you_want/mydataset/train/ put all target images inside mydataset/val/ such that folder name is class name mkdir p /path_to_wherever_you_want/mydataset/val/ creare softlink of dataset cd DiscriminatorDomainAdaptation/ ln sf /path_to_wherever_you_want/mydataset dataset Pretrained Alexnet model Download Alexnet pretraine caffe model Link cd DiscriminatorDomainAdaptation/ ln sf /path_to_where_model_is_downloaded/ pretrained_network Train model cd DiscriminatorDomainAdaptation/ ./train.sh Reference If you use this code as part of any published research, please acknowledge the following paper @article{kurmi2019looking, title {Looking back at Labels: A Class based Domain Adaptation Technique}, author {Kurmi, Vinod Kumar and Namboodiri, Vinay P}, journal {arXiv preprint arXiv:1904.01341}, year {2019} } Contributors Vinod K. Kurmi 1 (vinodkk@iitk.ac.in) 1 :",Domain Adaptation,Miscellaneous 2727,Miscellaneous,Miscellaneous,Other,"Deep Learning for Ad CTR Estimation NOTE: we have upgraded the code of this repository here with TensorFlow and more advanced models in our new paper Product based Neural Network for User Response Prediction . This repository hosts the code of several proposed deep learning models for estimating ad click through rates, implemented with Theano . The research paper Deep Learning over Multi field Categorical Data – A Case Study on User Response Prediction has been published on ECIR 2016. Different from traditional deep learning tasks like image or speech recognition, where neural nets work well on continuous dense input features, for ad click through rate estimation task, the input features are almost categorical and of multiple field. For example, the input context feature could be City London , Device Mobile . Such multi field categorical features are always transformed into sparse binary features via one hot encoding, normally millions of dimensions. Tranditional DNNs cannot work well on such input data beacuse of the large dimension and high sparsity. This work tries to address the above problems and the experiment results are promising. The corresponding research paper Deep Learning over Multi Field Categorical Data: A Case Study on User Response Prediction has been accepted and will be published in ECIR 2016. Note that this is just the authors' first attempt of training DNN models to predict ad click through rate. Significant efforts on research and engineering will be made further on this project. More any questions please contact Weinan Zhang (w.zhang@cs.ucl.ac.uk) and Tianming Du (dutianming@quicloud.cn). Code Installation and Running Theano and dependant packages (e.g., numpy and sklearn ) should be pre installed before running the code. After package installation, you can simple run the code with the demo tiny dataset. python FNN.py for FNN python SNN_DAE.py for SNN_DAE python SNN_RBM.py for SNN_RBM The descriptions of the proposed models (FNN, SNN) are available in the above research paper, which will be available soon. Note: directly running above code only checks the success of system installation. The input training/test are very small sample datasets, where the deep models are not effective. For large scale datasets, please refer iPinYou data formalizing repository Cretio 1T dataset etc. Note: In our further practice on very large data, the FM initialisation is not necessary any more to train a good FNN.",Click-Through Rate Prediction,Miscellaneous 2741,Methodology,Miscellaneous,Other,"Hardness Aware Deep Metric Learning Implementation of Hardness Aware Deep Metric Learning (CVPR 2019 Oral) in Tensorflow. HDML: Hardness Aware Deep Metric Learning Work in progress. Please use the citation provided below if it is useful to your research: Wenzhao Zheng, Zhaodong Chen, Jiwen Lu, and Jie Zhou, Hardness Aware Deep Metric Learning, arXiv, abs/1903.05503, 2019. bash @article{zheng2019hardness, title {Hardness Aware Deep Metric Learning}, author {Zheng, Wenzhao and Chen, Zhaodong and Lu, Jiwen and Zhou, Jie}, journal {arXiv preprint arXiv:1903.05503}, year {2019} } Dependencies bash pip install tensorflow 1.10.0 Dataset Stanford Cars Dataset (Cars196) Download from or use datasets/cars196_downloader.py. Convert to hdf5 file using cars196_converter.py. Put it in datasets/data/cars196/cars196.hdf5. Pretrained model GoogleNet V1 pretrained model can be downloaded from Usage For Cars196 dataset: bash python main_npair.py dataSet 'cars196' batch_size 128 Regular_factor 5e 3 init_learning_rate 7e 5 load_formalVal False embedding_size 128 loss_l2_reg 3e 3 init_batch_per_epoch 500 batch_per_epoch 64 max_steps 8000 beta 1e+4 lr_gen 1e 2 num_class 99 _lambda 0.5 s_lr 1e 3 Code Reference deep\_metric\_learning by ronekko for dataset codes.",Metric Learning,Miscellaneous 2784,Methodology,Miscellaneous,Other,"Meta Transfer Learning TensorFlow LICENSE This repository contains the TensorFlow implementation for CVPR 2019 Paper Meta Transfer Learning for Few Shot Learning by Qianru Sun , Yaoyao Liu , Tat Seng Chua and Bernt Schiele . If you have any problems when running this repo, feel free to send me an email or open an issue. I will reply to you as soon as I see them. (Email: liuyaoyao at tju.edu.cn) Summary: Introduction ( introduction) Installation ( installation) Dataset ( Dataset) Repo Architecture ( repo architecture) Usage ( usage) Citation ( citation) Acknowledgements ( acknowledgements) Introduction Meta learning has been proposed as a framework to address the challenging few shot learning setting. The key idea is to leverage a large number of similar few shot tasks in order to learn how to adapt a base learner to a new task for which only a few labeled samples are available. As deep neural networks (DNNs) tend to overfit using a few samples only, meta learning typically uses shallow neural networks (SNNs), thus limiting its effectiveness. In this paper we propose a novel few shot learning method called meta transfer learning (MTL) which learns to adapt a deep NN for few shot learning tasks . Specifically, meta refers to training multiple tasks, and transfer is achieved by learning scaling and shifting functions of DNN weights for each task. In addition, we introduce the hard task (HT) meta batch scheme as an effective learning curriculum for MTL. We conduct experiments using (5 class, 1 shot) and (5 class, 5 shot) recognition tasks on two challenging few shot learning benchmarks: mini ImageNet and Fewshot CIFAR100. Extensive comparisons to related works validate that our meta transfer learning approach trained with the proposed HT meta batch scheme achieves top performance. An ablation study also shows that both components contribute to fast convergence and high accuracy. Installation In order to run this repo, we advise you to install python 2.7 and TensorFlow 1.3.0 with Anaconda. You may download Anaconda and read the installation instruction on their official website: Create a new environment and install tensorflow on it: Bash conda create name tensorflow_1.3.0_gpu python 2.7 source activate tensorflow_1.3.0_gpu pip install ignore installed upgrade Clone the repo: Bash git clone cd meta transfer learning tensorflow Requirements: python 2.7 tensorflow 1.3.0 scipy tqdm opencv python Some basic requirements are not listed, you may install them easily with pip . Dataset mini ImageNet The mini ImageNet dataset was proposed by Vinyals et al. for few shot learning evaluation. Its complexity is high due to the use of ImageNet images but requires fewer resources and infrastructure than running on the full ImageNet dataset . In total, there are 100 classes with 600 samples of 84×84 color images per class. These 100 classes are divided into 64, 16, and 20 classes respectively for sampling tasks for meta training, meta validation, and meta test. To generate this dataset, you may use the repo mini ImageNet tools . You may also directly download processed images. \ Download Page\ Fewshot CIFAR100 Fewshot CIFAR100 (FC100) is based on the popular object classification dataset CIFAR100. The splits were proposed by TADAM . It offers a more challenging scenario with lower image resolution and more challenging meta training/test splits that are separated according to object super classes. It contains 100 object classes and each class has 600 samples of 32 × 32 color images. The 100 classes belong to 20 super classes. Meta training data are from 60 classes belonging to 12 super classes. Meta validation and meta test sets contain 20 classes belonging to 4 super classes, respectively. We will release the code for processing FC100 soon. You may directly download processed images. \ Download Page\ tiered ImageNet The tiered ImageNet dataset is a larger subset of ILSVRC 12 with 608 classes (779,165 images) grouped into 34 higher level nodes in the ImageNet human curated hierarchy. To generate this dataset, you may use the repo tiered ImageNet dataset: tiered ImageNet tools . You may also directly download processed images. \ Download Page\ Repo Architecture . ├── data_generator dataset generator ├── pre_data_generator.py data genertor for pre train phase └── meta_data_generator.py data genertor for meta train phase ├── docs project website source code ├── models tensorflow model files ├── models.py basic model class ├── pre_model.py.py pre train model class └── meta_model.py meta train model class ├── trainer tensorflow trianer files ├── pre.py pre train trainer class └── meta.py meta train trainer class ├── utils a series of tools used in this repo └── misc.py miscellaneous tool functions ├── main.py the python file with main function and parameter settings └── run_experiment.py the script to run the whole experiment Uasge To run the experiments: bash python run_experiment.py You may edit the run_experiment.py file to change the hyperparameters and default settings. The details for the parameters are in main.py . Pre train phase is included in the current framework. In the default setting, if you run python run_experiment.py , the pretrain process will be conducted before the meta train phase starts. If you want to use the model pretrained by us, you may download the model by the following link then replace the pretrain model loading directory in trainer/meta.py . Download Pretain Model ( mini ImageNet): \ Google Drive\ \ 百度网盘\ (提取码: efsv) We will release more pre trained models later. Todo x Hard task meta batch. The implement of hard task meta batch is not included in the published code. I still need time to rewrite the hard task meta batch code for the current framework. x More network architectures. We will add new backbones to the framework like ResNet18 and ResNet34. x PyTorch version. We will release the code for MTL on pytorch. It may takes several months to be completed. Citation Please cite our paper if it is helpful to your work: @inproceedings{sun2019mtl, title {Meta Transfer Learning for Few Shot Learning}, author {Qianru Sun and Yaoyao Liu and Tat{ }Seng Chua and Bernt Schiele}, booktitle {CVPR}, year {2019} } Acknowledgements This repository uses the source code from the following repositories: Model Agnostic Meta Learning Optimization as a Model for Few Shot Learning",Few-Shot Learning,Miscellaneous 2882,Miscellaneous,Miscellaneous,Other,"DeepInterestNetwork Deep Interest Network for Click Through Rate Prediction Introduction This is an implementation of the paper Deep Interest Network for Click Through Rate Prediction Guorui Zhou, Chengru Song, Xiaoqiang Zhu, Han Zhu, Ying Fan, Na Mou, Xiao Ma, Yanghui Yan, Xingya Dai, Junqi Jin, Han Li, Kun Gai Thanks Jinze Bai and Chang Zhou. Bibtex: sh @article{Zhou2017Deep, title {Deep Interest Network for Click Through Rate Prediction}, author {Zhou, Guorui and Song, Chengru and Zhu, Xiaoqiang and Ma, Xiao and Yan, Yanghui and Dai, Xingya and Zhu, Han and Jin, Junqi and Li, Han and Gai, Kun}, year {2017}, } Requirements Python > 2.6.1 NumPy > 1.12.1 Pandas > 0.20.1 TensorFlow > 1.4.0 (Probably earlier version should work too, though I didn't test it) GPU with memory > 10G Download dataset and preprocess Step 1: Download the amazon product dataset of electronics category, which has 498,196 products and 7,824,482 records, and extract it to raw_data/ folder. sh mkdir raw_data/; cd utils; bash 0_download_raw.sh; Step 2: Convert raw data to pandas dataframe, and remap categorical id. sh python 1_convert_pd.py; python 2_remap_id.py Training and Evaluation This implementation not only contains the DIN method, but also provides all the competitors' method, including Wide&Deep, PNN, DeepFM. The training procedures of all method is as follows: Step 1: Choose a method and enter the folder. cd din; Alternatively, you could also run other competitors's methods directly by cd deepFM cd pnn cd wide_deep , and follow the same instructions below. Step 2: Building the dataset adapted to current method. python build_dataset.py We put a processed data 'dataset.pkl' in DeepInterestNetwork/din. Considering the GitHub's file size limit of 100.00 MB, we split it into 3 file aa ab ac. cat aa ab ac > dataset.pkl Step 3: Start training and evaluating using default arguments in background mode. python train.py >log.txt 2>&1 & Step 4: Check training and evaluating progress. tail f log.txt tensorboard logdir save_path Dice There is also an implementation of Dice in folder 'din', you can try dice following the code annotation in din/model.py or replacing model.py with model\_dice.py",Click-Through Rate Prediction,Miscellaneous 2341,Computer Vision,Computer Vision,Computer Vision,"FaceBoxes: A CPU Real time Face Detector with High Accuracy License (LICENSE) By Shifeng Zhang Introduction We propose a novel face detector, named FaceBoxes, with superior performance on both speed and accuracy. Moreover, the speed of FaceBoxes is invariant to the number of faces. You can use the code to train/evaluate the FaceBoxes method for face detection. For more details, please refer to our paper . _Note: The performance of FDDB is the true positive rate (TPR) at 1000 false postives. The speed is for VGA resolution images._ Citing FaceBoxes Please cite our paper in your publications if it helps your research: @inproceedings{zhang2017faceboxes, title {Faceboxes: A CPU Real time Face Detector with High Accuracy}, author {Zhang, Shifeng and Zhu, Xiangyu and Lei, Zhen and Shi, Hailin and Wang, Xiaobo and Li, Stan Z.}, booktitle {IJCB}, year {2017} } Contents 1. Installation ( installation) 2. Training ( training) 3. Evaluation ( evaluation) 4. Others ( others) Installation 1. Get the code. We will call the cloned directory as $FaceBoxes_ROOT . Shell git clone 2. Build the code. Please follow Caffe instruction to install all necessary packages and build it. Shell cd $FaceBoxes_ROOT Modify Makefile.config according to your Caffe installation. Make sure to include $FaceBoxes_ROOT/python to your PYTHONPATH. cp Makefile.config.example Makefile.config make all j && make py Training 1. Download the WIDER FACE dataset, convert it to VOC format and create the LMDB file. Or you can directly download our created LMDB of WIDER FACE to $FaceBoxes_ROOT/examples/ . Shell You can modify create_list.sh and create_data.sh if needed. cd $FaceBoxes_ROOT ./data/WIDER_FACE/create_list.sh ./data/WIDER_FACE/create_data.sh 2. Train your model on WIDER FACE. Shell cd $FaceBoxes_ROOT/models/faceboxes sh train.sh Evaluation 1. Download the images of AFW , PASCAL Face and FDDB to $FaceBoxes_ROOT/examples/images/ . 2. If you do not train the model by yourself, you can download our trained model . 3. Check out test/demo.py on how to detect faces using the FaceBoxes model and how to plot detection results. 4. Evaluate the trained model via test/afw_test.py on AFW. 5. Evaluate the trained model via test/pascal_test.py on PASCAL Face. 6. Evaluate the trained model via test/fddb_test.py on FDDB. 7. Download the eval_tool to show the performance. Others 1. We reimplement the FaceBoxes with PyTorch as FaceBoxes.PyTorch . 2. We will release a trained model of the imporved version of FaceBoxes, which jonitly performs face detection and alignment (5 landmarks). _Note: If you can not download the created LMDB, the provided images and the trained model through the above links, you can download them through BaiduYun ._",Face Detection,Face Detection 2343,Computer Vision,Computer Vision,Computer Vision,"S³FD: Single Shot Scale invariant Face Detector By Shifeng Zhang Introduction S³FD is a real time face detector, which performs superiorly on various scales of faces with a single deep neural network, especially for small faces. For more details, please refer to our arXiv paper . Contents 1. Preparation ( preparation) 2. Eval ( eval) 3. Train ( train) Preparation 1. Get the SSD code. We will call the directory that you cloned Caffe into $SFD_ROOT Shell git clone cd $SFD_ROOT git checkout ssd 2. Build the code. Please follow Caffe instruction to install all necessary packages and build it. Shell Modify Makefile.config according to your Caffe installation. cp Makefile.config.example Makefile.config make j8 Make sure to include $CAFFE_ROOT/python to your PYTHONPATH. make py make test j8 (Optional) make runtest j8 3. Download our trained model from GoogleDrive or BaiduYun , and merge it with the folder $SFD_ROOT/models . 4. Download our above sfd_test_code folder and put it in the $SFD_ROOT . 5. Download AFW , PASCAL face , FDDB and WIDER FACE datasets. 6. Download the EVALUATION TOOLBOX for evaluation. Eval 1. Evaluate our model on AFW. Shell cd $SFD_ROOT/sfd_test_code/AFW You must modify the Path in the afw_test.py to your AFW path. It will creat sfd_afw_dets.txt and put it in the EVALUATION TOOLBOX to evalute. python afw_test.py 2. Evaluate our model on PASCAL face. Shell cd $SFD_ROOT/sfd_test_code/PASCAL_face You must modify the Path in the pascal_test.py to your PASCAL_face path. It will creat sfd_pascal_dets.txt and put it in the EVALUATION TOOLBOX to evalute. python pascal_test.py 3. Evaluate our model on FDDB. Shell cd $SFD_ROOT/sfd_test_code/FDDB You must modify the Path in the fddb_test.py to your FDDB path. It will creat sfd_fddb_dets.txt. python fddb_test.py Fitting the dets from rectangle box to ellipse box. It will creat sfd_fddb_dets_fit.txt and put it in the FDDB evalution code to evalute. cd fddb_from_rectangle_to_ellipse matlab nodesktop nosplash nojvm r run fitting.m;quit; If you want to get the results of FDDB in our paper, you should use our 'FDDB_annotation_ellipseList_new.txt' 4. Evaluate our model on WIDER FACE. Shell cd $SFD_ROOT/sfd_test_code/WIDER_FACE You must modify the path in the wider_test.py to your WIDERFACE path. It will creat detection results in the eval_tools_old version folder. python wider_test.py If you want to get the results of val set in our paper, you should use the provided eval_tools_old version . Or you can use latest eval_tools of WIDER FACE. There is a slight difference between them, since the annotation used for the evaluation is slightly change around March 2017. Train 1. Follow the intruction of SSD to create the lmdb of WIDER FACE. 2. Modify the data augmentation code of SSD to make sure that it does not change the image ratio. 3. Modify the anchor match code of SSD to implement the 'scale compensation anchor matching strategy'. 4. Train the model.",Face Detection,Face Detection 2344,Computer Vision,Computer Vision,Computer Vision,"Tiny Face Detector in TensorFlow A TensorFlow port(inference only) of Tiny Face Detector from authors' MatConvNet codes 1 . Requirements Codes are written in Python. At first install Anaconda . Then install OpenCV , TensorFlow . Usage Converting a pretrained model matconvnet_hr101_to_pickle reads weights of the MatConvNet pretrained model and write back to a pickle file which is used in a TensorFlow model as initial weights. 1. Download a ResNet101 based pretrained model(hr_res101.mat) from the authors' repo. 2. Convert the model to a pickle file by: python matconvnet_hr101_to_pickle.py matlab_model_path /path/to/pretrained_model weight_file_path /path/to/pickle_file Tesing Tiny Face Detector in TensorFlow 1. Prepare images in a directory. 2. tiny_face_eval.py reads images one by one from the image directory and write images to an output directory with bounding boxes of detected faces. python tiny_face_eval.py weight_file_path /path/to/pickle_file data_dir /path/to/input_image_directory output_dir /path/to/output_directory Neural network diagram This (pdf) is a network diagram of the ResNet101 based model used here for an input image(height: 1150, width: 2048, channel: 3). Examples Though this model is developed to detect tiny faces, I apply this to several types of images including 'faces' as experiments. selfie with many people This is the same image as one in the authors' repo 1 . ! selfie Original image selfie of celebrities ! selfie Original image selfie of celebrities Homer and Meryl Streep are missed. ! selfie Original image zombies ! selfie Original image monkeys ! selfie Original image dogs ! selfie Original image cats ! selfie Original image figure1 from a paper 2 ! selfie figure8 from a paper 2 . Facebook's face detector failed to detect these faces(as of the paper publication date 14 Feb 2016 ). ! selfie figure3 from a paper 2 ! selfie figure6 from a paper 2 ! selfie Acknowledgments Many python codes are borrowed from chinakook's MXNet tiny face detector parula colormap table is borrowed from fake_parula.py . Disclaimer Codes are tested only on CPUs, not GPUs. References 1. Hu, Peiyun and Ramanan, Deva, Finding Tiny Faces, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017). project page , arXiv 2. Michael J. Wilber, Vitaly Shmatikov, Serge Belongie, Can we still avoid automatic face detection, 2016. arXiv",Face Detection,Face Detection 2498,Computer Vision,Computer Vision,Computer Vision,"Hyper Face Hyper Face implementation which predicts face/non face, landmarks, pose and gender simultaneously. This is NOT official implementation. This software is released under the MIT License, see LICENSE.txt. Features Chainer implementation Image viewer on web browsers Testing Environments Ubuntu 16.04 Python 2.7 Chainer 1.14.0 OpenCV 2.4.9 Flask 0.11.1 Flask_SocketIO 2.4 Dlib 19.1.0 Arch Linux Python 3.5 Chainer 1.14.0 OpenCV 3.1.0 Flask 0.10.1 Flask_SocketIO 2.2 Dlib 19.1.0 Configuration Important variables are configured by config.json . Set gpu positive number to use GPU, port numbers of web servers and so on. Train Preparation Download AFLW Dataset and AlexNet Caffe Model , expand them and set aflw_sqlite_path , aflw_imgdir_path , and alexnet_caffemodel_path in config.json Pre training Pre training with RCNN_Face model. bash python ./scripts/train.py pretrain Open and with your web browser to see loss graphs, network weights and predictions. Port numbers are configured by config.json . Main training bash python ./scripts/train.py pretrainedmodel result_pretrain/model_epoch_40 Use arbitrary epoch number instead of 40. Test To skip training, please use trained model from here (or here (___Do not expand___ as zip)). AFLW test images bash python ./scripts/use_on_test.py model model_epoch_190 Open to see predictions. Your image file Set your image file with img argument. The dependence are less than other tests and demos. bash python ./scripts/use_on_file.py model model_epoch_190 img sample_images/lena_face.png Input images are contained in sample_images directory. Demos with post processes Open to see demos. AFLW test images bash python ./scripts/demo_on_test.py model model_epoch_190 Demo using AFLW test images Web camera on your browser bash python ./scripts/demo_live.py model model_epoch_190 ToDo Tune training parameters. Fix pose drawing. x Implement post processes. Tune post processes parameters.",Face Detection,Face Detection 2559,Computer Vision,Computer Vision,Computer Vision,"Face Detection with End to End Integration of a ConvNet and a 3D Model Reproducing all experimental results in the paper Yunzhu Li, Benyuan Sun, Tianfu Wu and Yizhou Wang, Face Detection with End to End Integration of a ConvNet and a 3D Model , ECCV 2016 The code is mainly written by Y.Z. Li (leo.liyunzhu@pku.edu.cn) and B.Y. Sun (sunbenyuan@pku.edu.cn). Please feel free to report issues to him. The code is based on the mxnet package . If you find the code is useful in your projects, please consider to cite the paper, @inproceedings{FaceDetection ConvNet 3D, author {Yunzhu Li and Benyuan Sun and Tianfu Wu and Yizhou Wang}, title {Face Detection with End to End Integration of a ConvNet and a 3D Model}, booktitle {ECCV}, year {2016} } Compile Please refer to on how to compile Prepare training data Download AFLW datset and generate a list for the training data in the form of: ID file_path width height resize_factor number_of_faces a list of information of each faces The information of different faces should be seperated by space and in the form: x y width height(of bounding box) x y width height(of projected bounding box) number_of_keypoints keypoint_name keypoint_x keypoint_y projected_keypoint_x projected_keypoint_y (for every keypoint) ellipse_x ellipse_y ellipse_radius ellipse_minoraxes ellipse_majoraxes 9 parameters of scale rotation matrix 3 translation parameters Note: projected information is not used now, so it can be replaces by any number training procedure 1. run Path_To_The_Code/ALFW/vgg16_rpn.py 2. To finetune on FDDB dataset, run Path_To_The_Code/ALFW/fddb_finetune.py prediction procedure AFW: run Path_To_The_Code/afw_predict.py FDDB: run Path_To_The_Code/predict_final.py",Face Detection,Face Detection 2602,Computer Vision,Computer Vision,Computer Vision,"! Demo result Finding Tiny Faces By Peiyun Hu and Deva Ramanan at Carnegie Mellon University. Introduction We develop a face detector (Tiny Face Detector) that can find 800 faces out of 1000 reportedly present, by making use of novel characterization of scale, resolution, and context to find small objects. Can you confidently identify errors? Tiny Face Detector was initially described in an arXiv tech report . In this repo, we provide a MATLAB implementation of Tiny face detector, including both training and testing code. A demo script is also provided. Citing us If you find our work useful in your research, please consider citing: latex @InProceedings{Hu_2017_CVPR, author {Hu, Peiyun and Ramanan, Deva}, title {Finding Tiny Faces}, booktitle {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, month {July}, year {2017} } Installation Clone the repo recursively so you have my fork of MatConvNet . zsh git clone recursive git@github.com:peiyunh/tiny.git Compile MatConvNet by running following commands in MATLAB (see Installing MatConvNet for more details): Matlab >> cd matconvnet/; >> addpath matlab/; >> vl_compilenn('enableImreadJpeg', true, 'enableGpu', true, 'cudaRoot', cuda_dir ,... 'cudaMethod', 'nvcc', 'enableCudnn', true, 'cudnnRoot', cudnn_dir ); >> vl_testnn('gpu', true); % vl_testnn('gpu', false) for cpu only Compile our MEX function in MATLAB and test if it works as expected: Matlab >> cd utils/; >> compile_mex; >> test_compute_dense_overlap; Download WIDER FACE and unzip data and annotation files to data/widerface such that: zsh $ ls data/widerface wider_face_test.mat wider_face_train.mat wider_face_val.mat WIDER_test/ WIDER_train/ WIDER_val/ Demo We provide a minimal demo tiny_face_detector.m that runs our detector on an single input image and output face detections: Matlab function bboxes tiny_face_detector(image_path, output_path, prob_thresh, nms_thresh, gpu_id) Here is a command you can run to reproduce our detection results on the world's largest selfie: Matlab >> bboxes tiny_face_detector('data/demo/selfie.jpg', './selfie.png', 0.5, 0.1, 1) The demo script will start by downloading an off the shelf ResNet101 based model, if it does not find one. Models based on other architecture are also available below: ResNet101 ResNet50 VGG16 Training To train a ResNet101 based Tiny Face Detector, run following command in MATLAB: Matlab >> hr_res101('train'); % which calls cnn_widerface.m After training, run the following command to test on the validation set: Matlab >> hr_res101('test'); % which calls cnn_widerface_test_AB.m Finally, run the following command to evaluate the trained models: Matlab >> hr_res101('eval'); % which calls cnn_widerface_eval.m Please refer to scripts/hr_res101.m for more details on how training/testing/evaluation is configured. Clustering We derive canonical bounding box shapes by K medoids clustering ( cluster_rects.m ). For reproducibility, we provide our clustering results in data/widerface/RefBox_N25.mat . We also provide the version after template resolution analysis in data/widerface/RefBox_N25_scaled.mat (Fig. 8 in our paper). Evaluation We provide both our own version of evaluation script ( cnn_widerface_eval.m ) and official evaluation script ( eval_tools/ ). Our implementation runs much faster and is easier to customize. However, our version produces slightly lower numbers comparing to the official one. We use our evaluation script only for prototyping. All numbers in the paper are based on the official evaluation script.",Face Detection,Face Detection 2658,Computer Vision,Computer Vision,Computer Vision,"Tiny Face Detector in TensorFlow A TensorFlow port(inference only) of Tiny Face Detector from authors' MatConvNet codes 1 . Requirements Codes are written in Python. At first install Anaconda . Then install OpenCV , TensorFlow . Usage Converting a pretrained model matconvnet_hr101_to_pickle reads weights of the MatConvNet pretrained model and write back to a pickle file which is used in a TensorFlow model as initial weights. 1. Download a ResNet101 based pretrained model(hr_res101.mat) from the authors' repo. 2. Convert the model to a pickle file by: python matconvnet_hr101_to_pickle.py matlab_model_path /path/to/pretrained_model weight_file_path /path/to/pickle_file Tesing Tiny Face Detector in TensorFlow 1. Prepare images in a directory. 2. tiny_face_eval.py reads images one by one from the image directory and write images to an output directory with bounding boxes of detected faces. python tiny_face_eval.py weight_file_path /path/to/pickle_file data_dir /path/to/input_image_directory output_dir /path/to/output_directory Neural network diagram This (pdf) is a network diagram of the ResNet101 based model used here for an input image(height: 1150, width: 2048, channel: 3). Examples Though this model is developed to detect tiny faces, I apply this to several types of images including 'faces' as experiments. selfie with many people This is the same image as one in the authors' repo 1 . ! selfie Original image selfie of celebrities ! selfie Original image selfie of celebrities Homer and Meryl Streep are missed. ! selfie Original image zombies ! selfie Original image monkeys ! selfie Original image dogs ! selfie Original image cats ! selfie Original image figure1 from a paper 2 ! selfie figure8 from a paper 2 . Facebook's face detector failed to detect these faces(as of the paper publication date 14 Feb 2016 ). ! selfie figure3 from a paper 2 ! selfie figure6 from a paper 2 ! selfie Acknowledgments Many python codes are borrowed from chinakook's MXNet tiny face detector parula colormap table is borrowed from fake_parula.py . Disclaimer Codes are tested only on CPUs, not GPUs. References 1. Hu, Peiyun and Ramanan, Deva, Finding Tiny Faces, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017). project page , arXiv 2. Michael J. Wilber, Vitaly Shmatikov, Serge Belongie, Can we still avoid automatic face detection, 2016. arXiv tiny_faces_temp",Face Detection,Face Detection 2698,Computer Vision,Computer Vision,Computer Vision,"MTCNN face detection & alignment all in TensorFlow Introduction This is a demo for MTCNN implementation all in TensorFlow api to take advantage of GPU computing resource.For more details of MTCNN, please refer to the paper arXiv paper . Dependencies TensorFlow 1.4.1 TF Slim Python 3.6 Ubuntu 16.04 Cuda 8.0 Usage First you should run 'python npy2ckpt.py' to convert the three npy files(get from facenet ) for pnet/rnet/onet to one checkpoint if you do not have the checkpoint file(Note:the three npy files and converted checkpoint file already in mtcnn_model of this repository). Then replace your pictures in 'examples' and run 'python demo.py'. Result demo_result: References 1. Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, Yu Qiao , Joint Face Detection and Alignment using Multi task Cascaded Convolutional Networks, IEEE Signal Processing Letter . 2. facenet",Face Detection,Face Detection 2873,Computer Vision,Computer Vision,Computer Vision,"A Unified Multi scale Deep Convolutional Neural Network for Fast Object Detection by Zhaowei Cai, Quanfu Fan, Rogerio Feris and Nuno Vasconcelos This implementation is written by Zhaowei Cai at UC San Diego. Introduction MS CNN is a unified multi scale object detection framework based on deep convolutional networks, which includes an object proposal sub network and an object detection sub network. The unified network can be trained altogether end to end. Citations If you use our code/model/data, please cite our paper: @inproceedings{cai16mscnn, author {Zhaowei Cai and Quanfu Fan and Rogerio Feris and Nuno Vasconcelos}, Title {A Unified Multi scale Deep Convolutional Neural Network for Fast Object Detection}, booktitle {ECCV}, Year {2016} } Updates This repository is merged to the latest Caffe. There is very minor numerical difference from the old version. By using the latest vresions of Caffe, CUDA and cuDNN, the speeds could be doubled. If you want to use the old version of code, you can download it from MSCNN V1.0 . Requirements 1. cuDNN is required to avoid the issue of out of memory and have the same running speed described in our paper. For now, CUDA 8.0 with cuDNN v5 is tested. The other versions should be working. 2. If you want to use our MATLAB scripts to run the detection demo, caffe MATLAB wrapper is required. Please build matcaffe before running the detection demo. 3. This code has been tested on Ubuntu 14.04 with an NVIDIA Titan GPU. Installation 1. Clone the MS CNN repository, and we'll call the directory that you cloned MS CNN into MSCNN_ROOT Shell git clone 2. Build MS CNN Shell cd $MSCNN_ROOT/ Follow the Caffe installation instructions here: If you're experienced with Caffe and have all of the requirements installed and your Makefile.config in place, then simply do: make all j 16 If you want to use MSCNN detection demo, build MATLAB wrapper as well make matcaffe Training MS CNN (KITTI car) 1. Set up KITTI dataset by yourself. 2. Get the training data for KITTI Shell cd $MSCNN_ROOT/data/ sh get_kitti_data.sh This will download train/val split image lists for the experiments, and window files for training/finetuning MS CNN models. You can also use the provided MATLAB scripts mscnn_kitti_car_window_file.m under $MSCNN_ROOT/data/kitti/ to generate your own window files. If you use the provided window files, replace /your/KITTI/path/ in the files to your KITTI path. 3. Download VGG16 from Caffe Model Zoo , and put it into $MSCNN_ROOT/models/VGG/ . 4. Now you can start to train MS CNN models. Multiple shell scripts are provided to train different models described in our paper. We take mscnn 7s 576 2x for example. Shell cd $MSCNN_ROOT/examples/kitti_car/mscnn 7s 576 2x/ sh train_mscnn.sh As described in the paper, the training process is split into two steps. Usually the first step can be shared by different models if you only have modifications on detection sub network. For example, the first training step can be shared by mscnn 7s 576 2x and mscnn 7s 576 . Meanwhile, log files will be generated along the training procedures. Pretrained model (KITTI car) Download pre trained MS CNN models Shell cd $MSCNN_ROOT/examples/kitti_car/ sh fetch_mscnn_car_model.sh This will download the pretrained model for KITTI car into $MSCNN_ROOT/examples/kitti_car/mscnn 8s 768 trainval pretrained/ . You can produce exactly the same results as described in our paper with these pretrained models. Testing Demo (KITTI car) Once the pretrained models or models trained by yourself are available, you can use the MATLAB script run_mscnn_detection.m under $MSCNN_ROOT/examples/kitti_car/ to obtain the detection and proposal results. Set the right dataset path and choose the model that you want to test in the demo script. The default setting is to test the pretrained model. The final results will be saved as .txt files. KITTI Evaluation Compile evaluate_object.cpp under $MSCNN_ROOT/examples/kitti_result/eval/ by yourself. Use writeDetForEval.m under $MSCNN_ROOT/examples/kitti_result/ to transform the detection results into KITTI data format and evaluate the detection performance. Remember to change the corresponding directories in the evaluation script. Disclaimer 1. The CPU version is not fully tested. The GPU version is strongly recommended. 2. Since some changes have been made after ECCV submission, you may not have exactly the same results in the paper by training your own models. But you should have equivelant performance. 3. Since the numbers of training samples vary vastly for different classes, the model robustness varies too (car>ped>cyc). 4. Although the final results we submitted were from model mscnn 8s 768 trainval , our later experiments have shown that mscnn 7s 576 2x trainval can achieve even better performance for car, and 2x faster speed. For ped/cyc however, the performance decreases due to the much less training instances. 5. If the training does not converge or the performance is very bad, try some other random seeds. You should obtain fair performance after a few tries. Due to the randomness, you cann't fully reproduce the same models, but the performance should be close. If you encounter any issue when using our code or model, please let me know.",Face Detection,Face Detection 2892,Computer Vision,Computer Vision,Computer Vision,"FaceBoxes in PyTorch License (LICENSE) By Zisian Wong , Shifeng Zhang A PyTorch implementation of FaceBoxes: A CPU Real time Face Detector with High Accuracy . The official code in Caffe can be found here . Performance Dataset Original Caffe PyTorch Implementation : : : : : AFW 98.98 % 98.47% PASCAL 96.77 % 96.84% FDDB 95.90 % 95.44% Citation Please cite the paper in your publications if it helps your research: @inproceedings{zhang2017faceboxes, title {Faceboxes: A CPU Real time Face Detector with High Accuracy}, author {Zhang, Shifeng and Zhu, Xiangyu and Lei, Zhen and Shi, Hailin and Wang, Xiaobo and Li, Stan Z.}, booktitle {IJCB}, year {2017} } Contents Installation ( installation) Training ( training) Evaluation ( evaluation) References ( references) Installation 1. Install PyTorch > v1.0.0 following official instruction. 2. Clone this repository. We will call the cloned directory as $FaceBoxes_ROOT . Shell git clone 3. Compile the nms: Shell ./make.sh _Note: Codes are based on Python 3+._ Training 1. Download WIDER FACE dataset, place the images under this directory: Shell $FaceBoxes_ROOT/data/WIDER_FACE/images 2. Convert WIDER FACE annotations to VOC format or download our converted annotations , place them under this directory: Shell $FaceBoxes_ROOT/data/WIDER_FACE/annotations 3. Train the model using WIDER FACE: Shell cd $FaceBoxes_ROOT/ python3 train.py If you do not wish to train the model, you can download our pre trained model and save it in $FaceBoxes_ROOT/weights . Evaluation 1. Download the images of AFW , PASCAL Face and FDDB to: Shell $FaceBoxes_ROOT/data/AFW/images/ $FaceBoxes_ROOT/data/PASCAL/images/ $FaceBoxes_ROOT/data/FDDB/images/ 2. Evaluate the trained model using: Shell dataset choices 'AFW', 'PASCAL', 'FDDB' python3 test.py dataset FDDB evaluate using cpu python3 test.py cpu 3. Download eval_tool to evaluate the performance. References Official release (Caffe) A huge thank you to SSD ports in PyTorch that have been helpful: ssd.pytorch , RFBNet _Note: If you can not download the converted annotations, the provided images and the trained model through the above links, you can download them through BaiduYun ._",Face Detection,Face Detection 1743,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Code for the paper: Sentence State LSTM for Text Representation. This package consists of the code for both classification and sequence labelling. README files are included in each individual subfolder. Cite @article{zhang2018slstm, title {Sentence State LSTM for Text Representation}, author {Zhang, Yue and Liu, Qi and Song, Linfeng}, booktitle {Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL)}, year {2018} }",Part-Of-Speech Tagging,Part-Of-Speech Tagging 1844,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Sequence labeler This is a neural network sequence labeling system. Given a sequence of tokens, it will learn to assign labels to each token. Can be used for named entity recognition, POS tagging, error detection, chunking, CCG supertagging, etc. The main model implements a bidirectional LSTM for sequence tagging. In addition, you can incorporate character level information either by concatenating a character based representation, or by using an attention/gating mechanism for combining it with a word embedding. Run with: python experiment.py config.conf Preferably with Tensorflow set up to use CUDA, so the process can run on a GPU. The script will train the model on the training data, test it on the test data, and print various evaluation metrics. Note: The original sequence labeler was implemented in Theano, but since Theano is soon ending support, I have reimplemented it in TensorFlow. I also used the chance to refactor the code a bit, and it should be better in every way. However, if you need the specific code used in previously published papers, you'll need to refer to older commits. Requirements python (tested with 2.7.12 and 3.5.2) numpy (tested with 1.13.3 and 1.14.0) tensorflow (tested with 1.3.0 and 1.4.1) Data format The training and test data is expected in standard CoNLL type tab separated format. One word per line, separate column for token and label, empty line between sentences. For error detection, this would be something like: I c saws i the c show c The first column is assumed to be the token and the last column is the label. There can be other columns in the middle, which are currently not used. For example: EU NNP I NP S ORG rejects VBZ I VP O German JJ I NP S MISC call NN I NP O to TO I VP O boycott VB I VP O British JJ I NP S MISC lamb NN I NP O . . O O Configuration Edit the values in config.conf as needed: path_train Path to the training data, in CoNLL tab separated format. One word per line, first column is the word, last column is the label. Empty lines between sentences. path_dev Path to the development data, used for choosing the best epoch. path_test Path to the test file. Can contain multiple files, colon separated. conll_eval Whether the standard CoNLL NER evaluation should be run. main_label The output label for which precision/recall/F measure are calculated. Does not affect accuracy or measures from the CoNLL eval. model_selector What is measured on the dev set for model selection: dev_conll_f:high for NER and chunking, dev_acc:high for POS tagging, dev_f05:high for error detection. preload_vectors Path to the pretrained word embeddings, in word2vec plain text format. If your embeddings are in binary, you can use convertvec to convert them to plain text. word_embedding_size Size of the word embeddings used in the model. crf_on_top If True, use a CRF as the output layer. If False, use softmax instead. emb_initial_zero Whether word embeddings should have zero initialisation by default. train_embeddings Whether word embeddings should be updated during training. char_embedding_size Size of the character embeddings. word_recurrent_size Size of the word level LSTM hidden layers. char_recurrent_size Size of the char level LSTM hidden layers. hidden_layer_size Size of the extra hidden layer on top of the bi LSTM. char_hidden_layer_size Size of the extra hidden layer on top of the character based component. lowercase Whether words should be lowercased when mapping to word embeddings. replace_digits Whether all digits should be replaced by 0. min_word_freq Minimal frequency of words to be included in the vocabulary. Others will be considered OOV. singletons_prob The probability of mapping words that appear only once to OOV instead during training. allowed_word_length Maximum allowed word length, clipping the rest. Can be necessary if the text contains unreasonably long tokens, eg URLs. max_train_sent_length Discard sentences longer than this limit when training. vocab_include_devtest Load words from dev and test sets also into the vocabulary. If they don't appear in the training set, they will have the default representations from the preloaded embeddings. vocab_only_embedded Whether the vocabulary should contain only words in the pretrained embedding set. initializer The method used to initialize weight matrices in the network. opt_strategy The method used for weight updates. learningrate Learning rate. clip Clip the gradient to a range. batch_equal_size Create batches of sentences with equal length. epochs Maximum number of epochs to run. stop_if_no_improvement_for_epochs Training will be stopped if there has been no improvement for n epochs. learningrate_decay If performance hasn't improved for 3 epochs, multiply the learning rate with this value. dropout_input The probability for applying dropout to the word representations. 0.0 means no dropout. dropout_word_lstm The probability for applying dropout to the LSTM outputs. tf_per_process_gpu_memory_fraction The fraction of GPU memory that the process can use. tf_allow_growth Whether the GPU memory usage can grow dynamically. main_cost Control the weight of the main labeling objective. lmcost_max_vocab_size Maximum vocabulary size for the language modeling loss. The remaining words are mapped to a single entry. lmcost_hidden_layer_size Hidden layer size for the language modeling loss. lmcost_gamma Weight for the language modeling loss. char_integration_method How character information is integrated. Options are: none (not integrated), concat (concatenated), attention (the method proposed in Rei et al. (2016)). save Path to save the model. load Path to load the model. garbage_collection Whether garbage collection is explicitly called. Makes things slower but can operate with bigger models. lstm_use_peepholes Whether to use the LSTM implementation with peepholes. random_seed Random seed for initialisation and data shuffling. This can affect results, so for robust conclusions I recommend running multiple experiments with different seeds and averaging the metrics. Printing output There is now a separate script for loading a saved model and using it to print output for a given input file. Use the save option in the config file for saving the model. The input file needs to be in the same format as the training data (one word per line, labels in a separate column). The labels are expected for printing output as well. If you don't know the correct labels, just print any valid label in that field. To print the output, run: python print_output.py labels model_file input_file This will print the input file to standard output, with an extra column at the end that shows the prediction. You can also use: python print_output.py probs model_file input_file This will print the individual probabilities for each of the possible labels. If the model is using CRFs, the probs option will output unnormalised state scores without taking the transitions into account. References The main sequence labeling model is described here: Compositional Sequence Labeling Models for Error Detection in Learner Writing Marek Rei and Helen Yannakoudakis In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL 2016) The character level component is described here: Attending to characters in neural sequence labeling models Marek Rei, Gamal K.O. Crichton and Sampo Pyysalo In Proceedings of the 26th International Conference on Computational Linguistics (COLING 2016) The language modeling objective is described here: Semi supervised Multitask Learning for Sequence Labeling Marek Rei In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL 2017) The CRF implementation is based on: Neural Architectures for Named Entity Recognition Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami and Chris Dyer In Proceedings of NAACL HLT 2016 The conlleval.py script is from: License The code is distributed under the Affero General Public License 3 (AGPL 3.0) by default. If you wish to use it under a different license, feel free to get in touch. Copyright (c) 2018 Marek Rei This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.",Part-Of-Speech Tagging,Part-Of-Speech Tagging 2485,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Natural language understanding papers A list of recent papers regarding natural language understanding and spoken language understanding. It contains sequence labelling, sentence classification, dialogue act classification, dialogue state tracking and so on. A review about NLU datasets for task oriented dialogue is here . Bookmarks Variant networks ( variant networks) Robustness to ASR error ( robustness to ASR error) Zero shot learning and domain adaptation ( zero shot learning and domain adaptation) Which may inspire us ($which may inspire us) Variant networks Using Recurrent Neural Networks for Slot Filling in Spoken Language Understanding . Grégoire Mesnil, et al.. TASLP, 2015. Code+data Attention based recurrent neural network models for joint intent detection and slot filling . Bing Liu and Ian Lane. InterSpeech, 2016. Code1 Code2 Encoder decoder with Focus mechanism for Sequence Labelling Based Spoken Language Understanding . Su Zhu and Kai Yu. ICASSP, 2017. Code Neural Models for Sequence Chunking . Fei Zhai, et al. AAAI, 2017. End to end Sequence Labeling via Bi directional LSTM CNNs CRF . Xuezhe Ma, Eduard Hovy. ACL, 2016. A Bi model based RNN Semantic Frame Parsing Model for Intent Detection and Slot Filling . Yu Wang, et al. NAACL 2018. Improving Slot Filling in Spoken Language Understanding with Joint Pointer and Attention . Lin Zhao and Zhe Feng. ACL, 2018. A Self Attentive Model with Gate Mechanism for Spoken Language Understanding . Changliang Li, et al. EMNLP 2018. from Kingsoft AI Lab Joint Slot Filling and Intent Detection via Capsule Neural Networks . Chenwei Zhang. et al. 2018. ongoing work Robustness to ASR error Discriminative spoken language understanding using word confusion networks . Matthew Henderson, et al.. SLT, 2012. Data Using word confusion networks for slot filling in spoken language understanding . Xiaohao Yang and Jia Liu. Interspeech, 2015. Joint Online Spoken Language Understanding and Language Modeling with Recurrent Neural Networks . Bing Liu and Ian Lane. SIGDIAL, 2016. Code Robust Spoken Language Understanding with unsupervised ASR error adaptation . Su Zhu, et al.. ICASSP, 2018. Neural Confnet Classification: Fully Neural Network Based Spoken Utterance Classification Using Word Confusion Networks . Ryo Masumura, et al.. ICASSP, 2018. Zero shot learning and domain adaptation A model of zero shot learning of spoken language understanding . Majid Yazdani and James Henderson. EMNLP, 2015. Zero shot Learning Of Intent Embeddings For Expansion By Convolutional Deep Structured Semantic Models . Yun Nung Chen, et al.. ICASSP 2016. Online Adaptative Zero shot Learning Spoken Language Understanding Using Word embedding . Emmanuel Ferreira, et al. ICASSP 2015. Towards Zero Shot Frame Semantic Parsing for Domain Scaling . Ankur Bapna, et al. Interspeech, 2017. Domain Attention with an Ensemble of Experts . Young Bum Kim, et al.. ACL, 2017. Adversarial Adaptation of Synthetic or Stale Data . Young Bum Kim, et al.. ACL, 2017. Concept Transfer Learning for Adaptive Language Understanding . Su Zhu and Kai Yu. SIGDIAL, 2018. An End to end Approach for Handling Unknown Slot Values in Dialogue State Tracking . Puyang Xu and Qi Hu. ACL, 2018. Large Scale Multi Domain Belief Tracking with Knowledge Sharing . Osman Ramadan, et al.. ACL, 2018. Data Fast and Scalable Expansion of Natural Language Understanding Functionality for Intelligent Agents . Anuj Goyal, et al. NAACL, 2018. from Amazon Alexa Machine Learning Bag of Experts Architectures for Model Reuse in Conversational Language Understanding . Rahul Jha, et al. NAACL, 2018. from Microsoft Corporation BERT: Pre training of Deep Bidirectional Transformers for Language Understanding . Jacob Devlin, et al. Arxiv 2018. from Google AI Language Which may inspire us Jointly Predicting Predicates and Arguments in Neural Semantic Role Labeling . Luheng He, et al. ACL, 2018. Code Sentence State LSTM for Text Representation . Yue Zhang, et al. ACL, 2018. Code Chinese NER Using Lattice LSTM . Yue Zhang, et al. ACL, 2018. Code+data SoPa: Bridging CNNs, RNNs, and Weighted Finite State Machines . Roy Schwartz, et al. ACL, 2018. Code Coarse to Fine Decoding for Neural Semantic Parsing . Li Dong and Mirella Lapata. ACL, 2018. Code Semantic Parsing for Task Oriented Dialog using Hierarchical Representations . Sonal Gupta, et al. EMNLP 2018. from Facebook AI Research Generalize Symbolic Knowledge With Neural Rule Engine . Shen Li, Hengru Xu, Zhengdong Lu. Arxiv 2018. from Deeplycurious.ai",Part-Of-Speech Tagging,Part-Of-Speech Tagging 2621,Natural Language Processing,Natural Language Processing,Natural Language Processing,"NER with Deep Learning This project uses Combination of Deep Neural Networks for Named Entity Recognition Task. The Project implements the method proposed by (Ma and Hovy et al., 2016) in The Combination of CNN, BiLSTM and CRF is used as proposed in the paper. The implementation uses Keras 2.0 library with tesnsorflow backend. The data used for training is CoNLL 2002 dataset for NER and POS Tagging. NOTE: The model is a bit different from the original implementation in the following ways: > A Dense Layer (100 units) has been added to the model for imporved performance. > Hyperparameter optimization has done for imporved results and rich features learning. > CoNLL 2002 dataset is used insead of CoNLL 2003 for traning and evaluation.",Part-Of-Speech Tagging,Part-Of-Speech Tagging 2724,Natural Language Processing,Natural Language Processing,Natural Language Processing,"JNN (Java Neural Network Toolkit) 2015 09 10 Original writer: Wang Ling This package contains a Java Neural Network Toolkit with implementations of: An Word Representation model allowing vectors to be generated as a word lookup table,a set of features, or/and the C2W model (words are representated by their sequence of characters) An LSTM based Language Model An LSTM based Part Of Speech Tagger Model The system requires Java 1.8+ to be installed, and approximately 8 16 GB of memory depending on the size and complexity of the network. 0.1 Quick Start Examples for training a Part of speech tagger and language models can be found in scripts/run_pos.sh and scripts/run_lm.sh, respectively. These can be run with the following commands: sh scripts/run_pos.sh sh scripts/run_lm.sh These scripts download currently available data for both tasks, and serve as examples of how the code is to be run. The POS tagger is trained on the Ark POS dataset found in . The language models are trained on subsets of wikipedia, which we make available at . 1.1 Language Modeling Sample datasets can be downloaded by running: sh scripts/download_wikidata.sh The LSTM based language model can be trained by calling: java Xmx10g cp jnn.jar:libs/ jnn.functions.nlp.app.lm.LSTMLanguageModel batch_size 10 iterations 1000000 lr 0.1 output_dir sample_lm_model softmax_function word 5000 test_file wiki/wiki.test.en threads 8 train_file wiki/wiki.train.en validation_file wiki/wiki.dev.en validation_interval 10000 word_dim 50 char_dim 50 char_state_dim 150 lm_state_dim 150 word_features characters nd4j_resource_dir nd4j_resources update momentum This command will train a neural language model using the training file wiki/wiki.train.en, validating on wiki/wiki.val.en, and testing on wiki/wiki.test.en. Arguments are described below: batch_size number of sentences (lines) processed in each each mini batch iterations number of iterations the model is to be trained (each iterations processes one mini batch) lr learning rate output_dir directory to save the model, write the statistics (perplexities), and scores for the test data word_features type of word representation used (options described in 3) softmax_function type of softmax unit used for predicting words (options described in 1.2) train_file training text file validation_file validation text file test_file test text file validation_interval number of mini batches to be run before computing perplexities on the validation set word_dim word vector dimension (In a lookup table, this will generate a vocab word_dim table, while in the C2W model, the character LSTM states will be projected into a vector of size word_dim) char_dim character vector dimension (Always uses a lookup table) char_state_dim LSTM state and cell dimensions used to build lstm states lm_state LSTM state and cell dimensions for the language model nd4j_resource_dir ND4J configuration directories (simply point to nd4j_resources) threads number of threads to be used (sentences in each mini batch will be divided among threads) update sgd method (regular, momentum or adagrad) The following files will be created in the directory specified by output_dir: model.gz The model is stored in this file every time the validation perplexity improves over the previous best value. If this file exists when the command is called, this model will be loaded and training will be carried out from this point. This way if something goes wrong during training (e.g. server crashes), training will resume at the last saved point. model.tmp.gz A backup copy of hte model.gz file, this is kept so that if the script fails when model.gz is being written, it is not lost. Thus, if model.gz is incomplete, simply copy model.tmp.gz over it. rep.gz The word representation model, this can be used in order to reuse the word representations trained on this task as initilization for other tasks. test.scores.gz Once the model finishes training, the file specified by test_file is trained and sentence level perplexities are computed and stored in this file. (to simply run a model on the test set, make sure the model.gz is created and set iterations to 0) stats Reports statistics during training. In this task, perplexities on the development set are reported. 1.2 Softmax functions The most straight forward way to predict each word as a softmax over the whole training vocabulary (set softmax_function to word). However, the normalization over the whole vocabulary is expensive. One way around this problem is to prune the vocabulary by replacing less frequent words by an unknown token. This can be done by setting softmax_function to word , where is the number of words to consider. Thus, word 5000, will perform a softmax over the top 5000 words and replaces the rest of the words by an unknown token. It is also possible to use Noise Constrastive Estimation by setting the softmax_function parameter to word nce. This allows parameters to be estimated for the whole vocabulary, while avoiding the normalization over the whole vocabulary at training time. 2.1 Part of Speech Tagging Sample datasets can be downloaded by running: sh scripts/download_posdata.sh The LSTM based Part of Speech Tagger can be trained by calling: java Xmx10g cp jnn.jar:libs/ jnn.functions.nlp.app.pos.PosTagger lr 0.3 batch_size 100 validation_interval 10 threads 8 train_file twpos data v0.3/oct27.splits/oct27.train validation_file twpos data v0.3/oct27.splits/oct27.dev test_file twpos data v0.3/oct27.splits/oct27.test input_format conll 0 1 word_features characters context_model blstm iterations 1000 output_dir /models/pos_model sequence_activation 2 word_dim 50 char_dim 50 char_state_dim 150 context_state_dim 150 update momentum nd4j_resource_dir nd4j_resources/ This command will train a POS tagger using the training file twpos data v0.3/oct27.splits/oct27.train, validating on twpos data v0.3/oct27.splits/oct27.dev, and testing on twpos data v0.3/oct27.splits/oct27.test. Arguments are described below: batch_size number of sentences (lines) processed in each each mini batch iterations number of iterations the model is to be trained (each iterations processes one mini batch) lr learning rate output_dir directory to save the model, write the statistics (accuracies), and scores for the test data word_features type of word representation used (options described in 3) train_file training file validation_file validation file test_file test file input_format file format (options described in 2.2) context_model model that encodes contextual information (options described in 2.3) word_dim word vector dimension (In a lookup table, this will generate a vocab word_dim table, while in the C2W model, the character LSTM states will be projected into a vector of size word_dim) char_dim character vector dimension (Always uses a lookup table) char_state_dim LSTM state and cell dimensions used to build lstm states lm_state LSTM state and cell dimensions for the language model sequence_activation Activation function applied to the word vector after the composition (0 none, 1 logistic, 2 tanh) nd4j_resource_dir ND4J configuration directories (simply point to nd4j_resources) threads number of threads to be used (sentences in each mini batch will be divided among threads) update sgd method (regular, momentum or adagrad) The following files will be created in the directory specified by output_dir: model.gz The model is stored in this file every time the validation perplexity exceeds the previous highest value. If this file exists when the command is called, this model will be loaded and training will be carried out from this point. This way if something goes wrong during training (e.g. server crashes), training will resume at the last saved point. model.tmp.gz A backup copy of hte model.gz file, this is kept so that if the script fails when model.gz is being written, it is not lost. Thus, if model.gz is incomplete, simply copy model.tmp.gz over it. rep.gz The word representation model, this can be used in order to reuse the word representations trained on this task as initilization for other tasks. validation.output Automatically tagged validation set using the tagger. test.output Automatically tagged test set using the tagger. validation.output Reports statistics during training on the validation set. In this task, tagging accuracies are reported. test.output Reports statistics during training on the test set. In this task, tagging accuracies are reported. validation.correct Lists correctly labelled words in the validation set. test.correct Lists correctly labelled words in the test set. validation.incorrect Lists incorrectly labelled words in the validation set. test.incorrect Lists incorrectly labelled words in the test set. 2.2 File Formats We allow 3 different formats. 1 The Conll column format is displayed as follows: 1 In _ IN IN _ 43 ADV _ _ 2 an _ DT DT _ 5 NMOD _ _ 3 Oct. _ NN NNP _ 5 TMP _ _ 4 19 _ CD CD _ 3 NMOD _ _ 5 review _ NN NN _ 1 PMOD _ _ 6 of _ IN IN _ 5 NMOD _ _ 7 _ _ 9 P _ _ 8 The _ DT DT _ 9 NMOD _ _ 9 Misanthrope _ NN NN _ 6 PMOD _ _ 10 '' _ '' '' _ 9 P _ _ 11 at _ IN IN _ 9 NMOD _ _ 12 Chicago _ NN NNP _ 15 NMOD _ _ 13 's _ PO POS _ 12 NMOD _ _ 14 Goodman _ NN NNP _ 15 NMOD _ _ 15 Theatre _ NN NNP _ 11 PMOD _ _ This format can be specified by setting input_format as conll +, where is the column of the token (starting from 0) and + is the column of the POS tag. In the example above you probably wish to set input_format as conll 1 4. 2 The parallel data format, where the tokens and tags are separated by : In an Oct. 19 review of The Misanthrope '' at Chicago 's Goodman Theatre IN DT NNP CD NN IN DT NN '' IN NNP POS NNP NNP This format can be specfied by setting input_format as parallel. 3 The Stanford original format with chunking information is displayed as follows: If/IN you/PRP 'd/MD really/RB rather/RB have/VB a/DT Buick/NNP ,/, do/VB n't/RB leave/VB home/NN without/IN the/DT American/NNP Express/NNP card/NN ./. This format can be specified by defining the input_format as stanford. 2 3 Context Models Our tagger uses a Bidirectional Long Short Term Memory RNN to encode contextual information before tagging each word, which can be specified by setting context_model to BLSTM. This leads to better results in general as compared to a window based model, which can be specified by setting context_model to window. 3 1 Word Representations Lookup Tables In general, words are converted into K dimensional vectors though a lookup table, where each individual word type is matched with an independent vector. This can be specified by setting the word_features argument as words . This can be generalized any discrete feature, for instance, if we wish to model words as their 3 letter prefix, we simply build a lookup table of all observed prefixes with size 3 and all words sharing the same 3 letter prefix will be associated with the same vector. In this view, the words feature can be seen as an identity feature, where no word types share the same vector. Finally, multiple features can be used simultanously by enumerating the desired features. For instance, setting word_features to words,prefix 3 , will simultaneuously use the identity feature and the prefix feature with size 3. Below are the features that are available: word : lowercased word prefix : prefix of size (e.g. setting to 3 means prefix with size 3) suffix : suffix of size (e.g. setting to 3 means suffix with size 3) capitalization : binary feature that is 1 or 0 depending on whether the first letter is uppercased casing : ternary feature that is 2 if all letters are uppercased, 1 if the first letter is uppercased and 0 otherwise shape : replaces uppercased letters with X, lowercased letters with x and digits with d (e.g. John12 > Xxxxdd) shape no repeat : same as shape, but removes repeated letters (e.g. John12 > Xxd) Finally, it is common for unseen feature activations in the training set to be found in the development and test sets, such as out of vocabulary words or prefixes. If this happens an unknown feature is activated instead. During training, we stochastically replace singleton feature activations by this token with probability 0.5. 3 2 C2W Model An option to lookup tables is to compose the word representation from its characters. In our work, we use LSTMs to compose the K dimensional vector for each word. We call this model C2W (character to word), and it can be used by setting word_features to one of the following options: characters : LSTM composition for characters characters lowercase : LSTM composition for lowercased characters Once again, it is possible to use the C2W model with lookup tables, for instance, setting the word_features to word,suffix 3,character will use the word lookup table, a suffix lookup table and the C2W model. It is also worth mentioning that if the same word occurs multiple times in the same mini batch, it will only be composed once to save computation. Thus, it is generally advised to user larger mini batches (more than 10 sentences) for faster computation. For more information regarding the C2W model, check out our work at: CONTENTS README This file. COPYING License file jnn.jar This is a JAR file containing all classes in JNN and necessary to run the language models and taggers. src A directory containing the Java 1.8 source code for JNN. libs These are java libraries needed to run JNN scripts This directory contains examples scripts for different tasks nd4j_resources Configuration files for ND4J LICENSE The code provided by this package free and is distributed under the following license Wang Ling wanglin1122@gmail.com Copyright (c) 2015 All Rights Reserved. Permission is hereby granted, free of charge, to use and distribute these items wiwithout restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of this work, and to permit persons to whom this work is furnished to do so, subject to the following conditions: 1. The contents of this package must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Any modifications must be clearly marked as such. 3. Original authors' names are not deleted. 4. The authors' names are not used to endorse or promote products derived from this software without specific prior written permission. THE AUTHORS AND THE CONTRIBUTORS TO THIS WORK DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL CARNEGIE MELLON UNIVERSITY NOR THE CONTRIBUTORS BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. CONTACT For more information, bug reports, fixes, contact: Wang Ling wanglin1122@gmail.com INESC ID & Carnegie Mellon University Lisbon, Portugal",Part-Of-Speech Tagging,Part-Of-Speech Tagging 2855,Natural Language Processing,Natural Language Processing,Natural Language Processing,"! alt text (resources/docs/flair_logo.svg) PyPI version GitHub Issues Contributions welcome (CONTRIBUTING.md) License: MIT Travis A very simple framework for state of the art NLP . Developed by Zalando Research . Flair is: A powerful NLP library. Flair allows you to apply our state of the art natural language processing (NLP) models to your text, such as named entity recognition (NER), part of speech tagging (PoS), sense disambiguation and classification. Multilingual. Thanks to the Flair community, we support a rapidly growing number of languages. We also now include ' one model, many languages ' taggers, i.e. single models that predict PoS or NER tags for input text in various languages. A text embedding library. Flair has simple interfaces that allow you to use and combine different word and document embeddings, including our proposed Flair embeddings , BERT embeddings and ELMo embeddings. A Pytorch NLP framework. Our framework builds directly on Pytorch , making it easy to train your own models and experiment with new approaches using Flair embeddings and classes. Now at version 0.4.1 ! Comparison with State of the Art Flair outperforms the previous best methods on a range of NLP tasks: Task Language Dataset Flair Previous best Named Entity Recognition English Conll 03 93.18 (F1) 92.22 (Peters et al., 2018) Named Entity Recognition English Ontonotes 89.3 (F1) 86.28 (Chiu et al., 2016) Emerging Entity Detection English WNUT 17 49.49 (F1) 45.55 (Aguilar et al., 2018) Part of Speech tagging English WSJ 97.85 97.64 (Choi, 2016) Chunking English Conll 2000 96.72 (F1) 96.36 (Peters et al., 2017) Named Entity Recognition German Conll 03 88.27 (F1) 78.76 (Lample et al., 2016) Named Entity Recognition German Germeval 84.65 (F1) 79.08 (Hänig et al, 2014) Named Entity Recognition Dutch Conll 03 90.44 (F1) 81.74 (Lample et al., 2016) Named Entity Recognition Polish PolEval 2018 86.6 (F1) (Borchmann et al., 2018) 85.1 (PolDeepNer) Here's how to reproduce these numbers (/resources/docs/EXPERIMENTS.md) using Flair. You can also find detailed evaluations and discussions in our papers: Contextual String Embeddings for Sequence Labeling . Alan Akbik, Duncan Blythe and Roland Vollgraf. 27th International Conference on Computational Linguistics, COLING 2018 . Pooled Contextualized Embeddings for Named Entity Recognition (to appear). Alan Akbik, Tanja Bergmann and Roland Vollgraf. 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics, NAACL 2019 . Quick Start Requirements and Installation The project is based on PyTorch 0.4+ and Python 3.6+, because method signatures and type hints are beautiful. If you do not have Python 3.6, install it first. Here is how for Ubuntu 16.04 . Then, in your favorite virtual environment, simply do: pip install flair Example Usage Let's run named entity recognition (NER) over an example sentence. All you need to do is make a Sentence , load a pre trained model and use it to predict tags for the sentence: python from flair.data import Sentence from flair.models import SequenceTagger make a sentence sentence Sentence('I love Berlin .') load the NER tagger tagger SequenceTagger.load('ner') run NER over sentence tagger.predict(sentence) Done! The Sentence now has entity annotations. Print the sentence to see what the tagger found. python print(sentence) print('The following NER tags are found:') iterate over entities and print for entity in sentence.get_spans('ner'): print(entity) This should print: console Sentence: I love Berlin . 4 Tokens The following NER tags are found: LOC span 3 : Berlin Tutorials We provide a set of quick tutorials to get you started with the library: Tutorial 1: Basics (/resources/docs/TUTORIAL_1_BASICS.md) Tutorial 2: Tagging your Text (/resources/docs/TUTORIAL_2_TAGGING.md) Tutorial 3: Using Word Embeddings (/resources/docs/TUTORIAL_3_WORD_EMBEDDING.md) Tutorial 4: Using BERT, ELMo, and Flair Embeddings (/resources/docs/TUTORIAL_4_ELMO_BERT_FLAIR_EMBEDDING.md) Tutorial 5: Using Document Embeddings (/resources/docs/TUTORIAL_5_DOCUMENT_EMBEDDINGS.md) Tutorial 6: Loading your own Corpus (/resources/docs/TUTORIAL_6_CORPUS.md) Tutorial 7: Training your own Models (/resources/docs/TUTORIAL_7_TRAINING_A_MODEL.md) Tutorial 8: Optimizing your own Models (/resources/docs/TUTORIAL_8_MODEL_OPTIMIZATION.md) Tutorial 9: Training your own Flair Embeddings (/resources/docs/TUTORIAL_9_TRAINING_LM_EMBEDDINGS.md) The tutorials explain how the base NLP classes work, how you can load pre trained models to tag your text, how you can embed your text with different word or document embeddings, and how you can train your own language models, sequence labeling models, and text classification models. Let us know if anything is unclear. There are also good third party articles and posts that illustrate how to use Flair: How to build a text classifier with Flair How to build a microservice with Flair and Flask A docker image for Flair Great overview of Flair functionality and how to use in Colab Citing Flair Please cite the following paper when using Flair: @inproceedings{akbik2018coling, title {Contextual String Embeddings for Sequence Labeling}, author {Akbik, Alan and Blythe, Duncan and Vollgraf, Roland}, booktitle {{COLING} 2018, 27th International Conference on Computational Linguistics}, pages {1638 1649}, year {2018} } If you use the pooled version of the Flair embeddings (PooledFlairEmbeddings), please cite: @inproceedings{akbik2019naacl, title {Pooled Contextualized Embeddings for Named Entity Recognition}, author {Akbik, Alan and Bergmann, Tanja and Vollgraf, Roland}, booktitle {{NAACL} 2019, 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics}, pages {to appear}, year {2019} } Contact Please email your questions or comments to Alan Akbik . Contributing Thanks for your interest in contributing! There are many ways to get involved; start with our contributor guidelines (CONTRIBUTING.md) and then check these open issues for specific tasks. For contributors looking to get deeper into the API we suggest cloning the repository and checking out the unit tests for examples of how to call methods. Nearly all classes and methods are documented, so finding your way around the code should hopefully be easy. Running unit tests locally You need Pipenv for this: bash pipenv install dev && pipenv shell pytest tests/ To run integration tests execute: bash pytest runintegration tests/ The integration tests will train small models. Afterwards, the trained model will be loaded for prediction. To also run slow tests, such as loading and using the embeddings provided by flair, you should execute: bash pytest runslow tests/ Code Style To ensure a standardized code style we use the formatter black . If your code is not formatted properly, travis will fail to build. If you want to automatically format your code on every commit, you can use pre commit . Just install it via pip install pre commit and execute pre commit install in the root folder. This will add a hook to the repository, which reformats files on every commit. If you want to set it up manually, install black via pip install black . To reformat files execute black . . License (/LICENSE) The MIT License (MIT) Flair is licensed under the following MIT license: The MIT License (MIT) Copyright © 2018 Zalando SE, Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.",Part-Of-Speech Tagging,Part-Of-Speech Tagging 1612,Computer Vision,Computer Vision,Computer Vision,"3D MNIST Classification Using PointNet, 2D CNN, 3D CNN, and some other ML methods Introduction In this work I used Pointnet, 2D CNN, 3D CNN, and some other ML methods to classify 3d mnist point clouds. You can find the dataset here . Point cloud is an important type of geometric data structure. Due to its irregular format, most researchers transform such data to regular 3D voxel grids or collections of images. You can use 3D CNN_classifier.ipynb for applying 3D CNN on 3D voxel grids. This, however, renders data unnecessarily voluminous and causes issues. In pointnet paper, they design a novel type of neural network that directly consumes point clouds, which well respects the permutation invariance of points in the input. Their network, named PointNet, provides a unified architecture for applications ranging from object classification, part segmentation, to scene semantic parsing. Though simple, PointNet is highly efficient and effective. In this repository, we used the code they published in their github and did some changes to use it in our work and 3d mnist data for training a PointNet classification network on point clouds. You can use vis_data.py file to visualize point cloud data. I used plotly library for this. Installation Install TensorFlow . I used python 3.5 with TensorFlow 1.5. You may also need to install h5py. To install h5py for Python: bash sudo apt get install libhdf5 dev sudo pip install h5py Usage To train a pointnet model to classify point clouds sampled from 3D shapes use main_pointnet_3dmnist.ipynb file. Put the 3d mnist data to data folder. You can also use other .ipynb's to use 2D CNN, 3D CNN, and some other ML methods for point cloud classification. Selected Projects that Use PointNet PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space by Qi et al. (NIPS 2017) A hierarchical feature learning framework on point clouds. The PointNet++ architecture applies PointNet recursively on a nested partitioning of the input point set. It also proposes novel layers for point clouds with non uniform densities. Exploring Spatial Context for 3D Semantic Segmentation of Point Clouds by Engelmann et al. (ICCV 2017 workshop). This work extends PointNet for large scale scene segmentation. PCPNET: Learning Local Shape Properties from Raw Point Clouds by Guerrero et al. (arXiv). The work adapts PointNet for local geometric properties (e.g. normal and curvature) estimation in noisy point clouds. VoxelNet: End to End Learning for Point Cloud Based 3D Object Detection by Zhou et al. from Apple (arXiv) This work studies 3D object detection using LiDAR point clouds. It splits space into voxels, use PointNet to learn local voxel features and then use 3D CNN for region proposal, object classification and 3D bounding box estimation. Frustum PointNets for 3D Object Detection from RGB D Data by Qi et al. (arXiv) A novel framework for 3D object detection with RGB D data. The method proposed has achieved first place on KITTI 3D object detection benchmark on all categories (last checked on 11/30/2017).",Object Localization,Object Localization 1613,Computer Vision,Computer Vision,Computer Vision,"3D Semantic Segmentation of virtual kitti dataset using PointNet The main code is from PointNet GitHub Repo Dataset You can download the dataset from here . All files are provided as numpy .npy files. Each file contains a N x F matrix, where N is the number of points in a scene and F is the number of features per point, in this case F 7. The features are XYZRGBL, the 3D XYZ position, the RGB color and the ground truth semantic label L. Each file is for a scene. Training Once you have downloaded and prepared data, to start training use main.ipynb. Visualise data For data visualization you can use vis_data_vispy.py file. Selected Projects that Use PointNet PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space by Qi et al. (NIPS 2017) A hierarchical feature learning framework on point clouds. The PointNet++ architecture applies PointNet recursively on a nested partitioning of the input point set. It also proposes novel layers for point clouds with non uniform densities. Exploring Spatial Context for 3D Semantic Segmentation of Point Clouds by Engelmann et al. (ICCV 2017 workshop). This work extends PointNet for large scale scene segmentation. PCPNET: Learning Local Shape Properties from Raw Point Clouds by Guerrero et al. (arXiv). The work adapts PointNet for local geometric properties (e.g. normal and curvature) estimation in noisy point clouds. VoxelNet: End to End Learning for Point Cloud Based 3D Object Detection by Zhou et al. from Apple (arXiv) This work studies 3D object detection using LiDAR point clouds. It splits space into voxels, use PointNet to learn local voxel features and then use 3D CNN for region proposal, object classification and 3D bounding box estimation. Frustum PointNets for 3D Object Detection from RGB D Data by Qi et al. (arXiv) A novel framework for 3D object detection with RGB D data. The method proposed has achieved first place on KITTI 3D object detection benchmark on all categories (last checked on 11/30/2017).",Object Localization,Object Localization 1765,Computer Vision,Computer Vision,Computer Vision,"PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation Created by Charles R. Qi , Hao Su , Kaichun Mo , Leonidas J. Guibas from Stanford University. ! prediction example Introduction This work is based on our arXiv tech report , which is going to appear in CVPR 2017. We proposed a novel deep net architecture for point clouds (as unordered point sets). You can also check our project webpage for a deeper introduction. Point cloud is an important type of geometric data structure. Due to its irregular format, most researchers transform such data to regular 3D voxel grids or collections of images. This, however, renders data unnecessarily voluminous and causes issues. In this paper, we design a novel type of neural network that directly consumes point clouds, which well respects the permutation invariance of points in the input. Our network, named PointNet, provides a unified architecture for applications ranging from object classification, part segmentation, to scene semantic parsing. Though simple, PointNet is highly efficient and effective. In this repository, we release code and data for training a PointNet classification network on point clouds sampled from 3D shapes, as well as for training a part segmentation network on ShapeNet Part dataset. Citation If you find our work useful in your research, please consider citing: @article{qi2016pointnet, title {PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation}, author {Qi, Charles R and Su, Hao and Mo, Kaichun and Guibas, Leonidas J}, journal {arXiv preprint arXiv:1612.00593}, year {2016} } Installation Install TensorFlow . You may also need to install h5py. The code has been tested with Python 2.7, TensorFlow 1.0.1, CUDA 8.0 and cuDNN 5.1 on Ubuntu 14.04. If you are using PyTorch, you can find a third party pytorch implementation here . To install h5py for Python: bash sudo apt get install libhdf5 dev sudo pip install h5py Usage To train a model to classify point clouds sampled from 3D shapes: python train.py Log files and network parameters will be saved to log folder in default. Point clouds of ModelNet40 models in HDF5 files will be automatically downloaded (416MB) to the data folder. Each point cloud contains 2048 points uniformly sampled from a shape surface. Each cloud is zero mean and normalized into an unit sphere. There are also text files in data/modelnet40_ply_hdf5_2048 specifying the ids of shapes in h5 files. To see HELP for the training script: python train.py h We can use TensorBoard to view the network architecture and monitor the training progress. tensorboard logdir log After the above training, we can evaluate the model and output some visualizations of the error cases. python evaluate.py visu Point clouds that are wrongly classified will be saved to dump folder in default. We visualize the point cloud by rendering it into three view images. If you'd like to prepare your own data, you can refer to some helper functions in utils/data_prep_util.py for saving and loading HDF5 files. Part Segmentation To train a model for object part segmentation, firstly download the data: cd part_seg sh download_data.sh The downloading script will download ShapeNetPart dataset (around 1.08GB) and our prepared HDF5 files (around 346MB). Then you can run train.py and test.py in the part_seg folder for training and testing (computing mIoU for evaluation). License Our code is released under MIT License (see LICENSE file for details). Selected Projects that Use PointNet PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space by Qi et al. (NIPS 2017) A hierarchical feature learning framework on point clouds. The PointNet++ architecture applies PointNet recursively on a nested partitioning of the input point set. It also proposes novel layers for point clouds with non uniform densities. Exploring Spatial Context for 3D Semantic Segmentation of Point Clouds by Engelmann et al. (ICCV 2017 workshop). This work extends PointNet for large scale scene segmentation. PCPNET: Learning Local Shape Properties from Raw Point Clouds by Guerrero et al. (arXiv). The work adapts PointNet for local geometric properties (e.g. normal and curvature) estimation in noisy point clouds. VoxelNet: End to End Learning for Point Cloud Based 3D Object Detection by Zhou et al. from Apple (arXiv) This work studies 3D object detection using LiDAR point clouds. It splits space into voxels, use PointNet to learn local voxel features and then use 3D CNN for region proposal, object classification and 3D bounding box estimation. Frustum PointNets for 3D Object Detection from RGB D Data by Qi et al. (arXiv) A novel framework for 3D object detection with RGB D data. The method proposed has achieved first place on KITTI 3D object detection benchmark on all categories (last checked on 11/30/2017).",Object Localization,Object Localization 2500,Computer Vision,Computer Vision,Computer Vision,"Frustum PointNets for 3D Object Detection from RGB D Data Created by Charles R. Qi , Wei Liu , Chenxia Wu , Hao Su and Leonidas J. Guibas from Stanford University and Nuro Inc. ! teaser Introduction This repository is code release for our CVPR 2018 paper (arXiv report here ). In this work, we study 3D object detection from RGB D data. We propose a novel detection pipeline that combines both mature 2D object detectors and the state of the art 3D deep learning techniques. In our pipeline, we firstly build object proposals with a 2D detector running on RGB images, where each 2D bounding box defines a 3D frustum region. Then based on 3D point clouds in those frustum regions, we achieve 3D instance segmentation and amodal 3D bounding box estimation, using PointNet/PointNet++ networks (see references at bottom). By leveraging 2D object detectors, we greatly reduce 3D search space for object localization. The high resolution and rich texture information in images also enable high recalls for smaller objects like pedestrians or cyclists that are harder to localize by point clouds only. By adopting PointNet architectures, we are able to directly work on 3D point clouds, without the necessity to voxelize them to grids or to project them to image planes. Since we directly work on point clouds, we are able to fully respect and exploit the 3D geometry one example is the series of coordinate normalizations we apply, which help canocalizes the learning problem. Evaluated on KITTI and SUNRGBD benchmarks, our system significantly outperforms previous state of the art and is still in leading positions on current KITTI leaderboard . For more details of our architecture, please refer to our paper or project website . Citation If you find our work useful in your research, please consider citing: @article{qi2017frustum, title {Frustum PointNets for 3D Object Detection from RGB D Data}, author {Qi, Charles R and Liu, Wei and Wu, Chenxia and Su, Hao and Guibas, Leonidas J}, journal {arXiv preprint arXiv:1711.08488}, year {2017} } Installation Install TensorFlow .There are also some dependencies for a few Python libraries for data processing and visualizations like cv2 , mayavi etc. It's highly recommended that you have access to GPUs. To use the Frustum PointNets v2 model, we need access to a few custom Tensorflow operators from PointNet++. The TF operators are included under models/tf_ops , you need to compile them (check tf_xxx_compile.sh under each ops subfolder) first. Update nvcc and python path if necessary. The compile script is written for TF1.4. There is also an option for TF1.2 in the script. If you are using earlier version it's possible that you need to remove the D_GLIBCXX_USE_CXX11_ABI 0 flag in g++ command in order to compile correctly. If we want to evaluate 3D object detection AP (average precision), we need also to compile the evaluation code (by running compile.sh under train/kitti_eval ). Check train/kitti_eval/README.md for details. Some of the demos require mayavi library. We have provided a convenient script to install mayavi package in Python, a handy package for 3D point cloud visualization. You can check it at mayavi/mayavi_install.sh . If the installation succeeds, you should be able to run mayavi/test_drawline.py as a simple demo. Note: the library works for local machines and seems do not support remote access with ssh or ssh X . The code is tested under TF1.2 and TF1.4 (GPU version) and Python 2.7 (version 3 should also work) on Ubuntu 14.04 and Ubuntu 16.04 with NVIDIA GTX 1080 GPU. It is highly recommended to have GPUs on your machine and it is required to have at least 8GB available CPU memory. Usage Currently, we support training and testing of the Frustum PointNets models as well as evaluating 3D object detection results based on precomputed 2D detector outputs (under kitti/rgb_detections ). You are welcomed to extend the code base to support your own 2D detectors or feed your own data for network training. Prepare Training Data In this step we convert original KITTI data to organized formats for training our Frustum PointNets. NEW: You can also directly download the prepared data files HERE (960MB) to support training and evaluation, just unzip the file and move the .pickle files to the kitti folder. Firstly, you need to download the KITTI 3D object detection dataset , including left color images, Velodyne point clouds, camera calibration matrices, and training labels. Make sure the KITTI data is organized as required in dataset/README.md . You can run python kitti/kitti_object.py to see whether data is downloaded and stored properly. If everything is fine, you should see image and 3D point cloud visualizations of the data. Then to prepare the data, simply run: (warning: this step will generate around 4.7GB data as pickle files) sh scripts/command_prep_data.sh Basically, during this process, we are extracting frustum point clouds along with ground truth labels from the original KITTI data, based on both ground truth 2D bounding boxes and boxes from a 2D object detector. We will do the extraction for the train ( kitti/image_sets/train.txt ) and validation set ( kitti/image_sets/val.txt ) using ground truth 2D boxes, and also extract data from validation set with predicted 2D boxes ( kitti/rgb_detections/rgb_detection_val.txt ). You can check kitti/prepare_data.py for more details, and run python kitti/prepare_data.py demo to visualize the steps in data preparation. After the command executes, you should see three newly generated data files under the kitti folder. You can run python train/provider.py to visualize the training data (frustum point clouds and 3D bounding box labels, in rect camera coordinate). Training Frustum PointNets To start training (on GPU 0) the Frustum PointNets model, just run the following script: CUDA_VISIBLE_DEVICES 0 sh scripts/command_train_v1.sh You can run scripts/command_train_v2.sh to trian the v2 model as well. The training statiscs and checkpoints will be stored at train/log_v1 (or train/log_v2 if it is a v2 model). Run python train/train.py h to see more options of training. NEW: We have also prepared some pretrained snapshots for both the v1 and v2 models. You can find them HERE (40MB) to support evaluation script, you just need to unzip the file and move the log_ folders to the train folder. Evaluation To evaluate a trained model (assuming you already finished the previous training step) on the validation set, just run: CUDA_VISIBLE_DEVICES 0 sh scripts/command_test_v1.sh Similarly, you can run scripts/command_test_v2.sh to evaluate a trained v2 model. The script will automatically evaluate the Frustum PointNets on the validation set based on precomputed 2D bounding boxes from a 2D detector (not released here), and then run the KITTI offline evaluation scripts to compute precision recall and calcuate average precisions for 2D detection, bird's eye view detection and 3D detection. Currently there is no script for evaluation on test set, yet it is possible to do it by yourself. To evaluate on the test set, you need to get outputs from a 2D detector on KITTI test set, store it as something in kitti/rgb_detections . Then, you need to prepare test set frustum point clouds for the test set, by modifying the code in kitti/prepare_data.py . Then you can modify test scripts in scripts by changing the data path, idx path and output file name. For our test set results reported, we used the entire trainval set for training. License Our code is released under the Apache 2.0 license (see LICENSE file for details). References PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation by Qi et al. (CVPR 2017 Oral Presentation). Code and data: here . PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space by Qi et al. (NIPS 2017). Code and data: here . Todo Add a demo script to run inference of Frustum PointNets based on raw input data. Add related scripts for SUNRGBD dataset",Object Localization,Object Localization 2688,Computer Vision,Computer Vision,Computer Vision,"Frustum PointNets for 3D Object Detection from RGB D Data Created by Charles R. Qi , Wei Liu , Chenxia Wu , Hao Su and Leonidas J. Guibas from Stanford University and Nuro Inc. ! teaser Introduction This repository is code release for our CVPR 2018 paper (arXiv report here ). In this work, we study 3D object detection from RGB D data. We propose a novel detection pipeline that combines both mature 2D object detectors and the state of the art 3D deep learning techniques. In our pipeline, we firstly build object proposals with a 2D detector running on RGB images, where each 2D bounding box defines a 3D frustum region. Then based on 3D point clouds in those frustum regions, we achieve 3D instance segmentation and amodal 3D bounding box estimation, using PointNet/PointNet++ networks (see references at bottom). By leveraging 2D object detectors, we greatly reduce 3D search space for object localization. The high resolution and rich texture information in images also enable high recalls for smaller objects like pedestrians or cyclists that are harder to localize by point clouds only. By adopting PointNet architectures, we are able to directly work on 3D point clouds, without the necessity to voxelize them to grids or to project them to image planes. Since we directly work on point clouds, we are able to fully respect and exploit the 3D geometry one example is the series of coordinate normalizations we apply, which help canocalizes the learning problem. Evaluated on KITTI and SUNRGBD benchmarks, our system significantly outperforms previous state of the art and is still in leading positions on current KITTI leaderboard . For more details of our architecture, please refer to our paper or project website . Citation If you find our work useful in your research, please consider citing: @article{qi2017frustum, title {Frustum PointNets for 3D Object Detection from RGB D Data}, author {Qi, Charles R and Liu, Wei and Wu, Chenxia and Su, Hao and Guibas, Leonidas J}, journal {arXiv preprint arXiv:1711.08488}, year {2017} } Installation Install TensorFlow .There are also some dependencies for a few Python libraries for data processing and visualizations like cv2 , mayavi etc. It's highly recommended that you have access to GPUs. To use the Frustum PointNets v2 model, we need access to a few custom Tensorflow operators from PointNet++. The TF operators are included under models/tf_ops , you need to compile them (check tf_xxx_compile.sh under each ops subfolder) first. Update nvcc and python path if necessary. The compile script is written for TF1.4. There is also an option for TF1.2 in the script. If you are using earlier version it's possible that you need to remove the D_GLIBCXX_USE_CXX11_ABI 0 flag in g++ command in order to compile correctly. If we want to evaluate 3D object detection AP (average precision), we need also to compile the evaluation code (by running compile.sh under train/kitti_eval ). Check train/kitti_eval/README.md for details. Some of the demos require mayavi library. We have provided a convenient script to install mayavi package in Python, a handy package for 3D point cloud visualization. You can check it at mayavi/mayavi_install.sh . If the installation succeeds, you should be able to run mayavi/test_drawline.py as a simple demo. Note: the library works for local machines and seems do not support remote access with ssh or ssh X . The code is tested under TF1.2 and TF1.4 (GPU version) and Python 2.7 (version 3 should also work) on Ubuntu 14.04 and Ubuntu 16.04 with NVIDIA GTX 1080 GPU. It is highly recommended to have GPUs on your machine and it is required to have at least 8GB available CPU memory. Usage Currently, we support training and testing of the Frustum PointNets models as well as evaluating 3D object detection results based on precomputed 2D detector outputs (under kitti/rgb_detections ). You are welcomed to extend the code base to support your own 2D detectors or feed your own data for network training. Prepare Training Data In this step we convert original KITTI data to organized formats for training our Frustum PointNets. Firstly, you need to download the KITTI 3D object detection dataset , including left color images, Velodyne point clouds, camera calibration matrices, and training labels. Make sure the KITTI data is organized as required in dataset/README.md . You can run python kitti/kitti_object.py to see whether data is downloaded and stored properly. If everything is fine, you should see image and 3D point cloud visualizations of the data. Then to prepare the data, simply run: (warning: this step will generate around 4.7GB data as pickle files) sh scripts/command_prep_data.sh Basically, during this process, we are extracting frustum point clouds along with ground truth labels from the original KITTI data, based on both ground truth 2D bounding boxes and boxes from a 2D object detector. We will do the extraction for the train ( kitti/image_sets/train.txt ) and validation set ( kitti/image_sets/val.txt ) using ground truth 2D boxes, and also extract data from validation set with predicted 2D boxes ( kitti/rgb_detections/rgb_detection_val.txt ). You can check kitti/prepare_data.py for more details, and run python kitti/prepare_data.py demo to visualize the steps in data preparation. After the command executes, you should see three newly generated data files under the kitti folder. You can run python train/provider.py to visualize the training data (frustum point clouds and 3D bounding box labels, in rect camera coordinate). Training Frustum PointNets To start training (on GPU 0) the Frustum PointNets model, just run the following script: CUDA_VISIBLE_DEVICES 0 sh scripts/command_train_v1.sh You can run scripts/command_train_v2.sh to trian the v2 model as well. The training statiscs and checkpoints will be stored at train/log_v1 (or train/log_v2 if it is a v2 model). Run python train/train.py h to see more options of training. Evaluation To evaluate a trained model (assuming you already finished the previous training step) on the validation set, just run: CUDA_VISIBLE_DEVICES 0 sh scripts/command_test_v1.sh Similarly, you can run scripts/command_test_v2.sh to evaluate a trained v2 model. The script will automatically evaluate the Frustum PointNets on the validation set based on precomputed 2D bounding boxes from a 2D detector (not released here), and then run the KITTI offline evaluation scripts to compute precision recall and calcuate average precisions for 2D detection, bird's eye view detection and 3D detection. Currently there is no script for evaluation on test set, yet it is possible to do it by yourself. To evaluate on the test set, you need to get outputs from a 2D detector on KITTI test set, store it as something in kitti/rgb_detections . Then, you need to prepare test set frustum point clouds for the test set, by modifying the code in kitti/prepare_data.py . Then you can modify test scripts in scripts by changing the data path, idx path and output file name. For our test set results reported, we used the entire trainval set for training. License Our code is released under the Apache 2.0 license (see LICENSE file for details). References PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation by Qi et al. (CVPR 2017 Oral Presentation). Code and data: here . PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space by Qi et al. (NIPS 2017). Code and data: here . Todo Add a demo script to run inference of Frustum PointNets based on raw input data. Add related scripts for SUNRGBD dataset",Object Localization,Object Localization 2830,Computer Vision,Computer Vision,Computer Vision,"To run the model, launch train.py to start training. To evaluate the model, run evaluate.py. In boht train.py and evaluate.py, the models can be toogle to (RadialNetBasicFC, RadialNetBasic, RadialNet, RadialNetInceptionAndTransform) line 19 and line 20 respectively. RadialNetBasic RBF Layer + Inception MLP RadialNetBasicFC RBF Layer + MLP RadialNetInceptionAndTransform RBF Layer + Affine Transform + Inception MLP RadialNet RBF Layer + Affine Transform + Inception MLP + Feature Transform Special thanks to Charles Qi for sharing the code for PointNet.",Object Localization,Object Localization 1911,Natural Language Processing,Natural Language Processing,Natural Language Processing,"VSL A PyTorch implementation of Variational Sequential Labelers for Semi Supervised Learning (EMNLP 2018) Prerequisites Python 3.5 PyTorch 0.3.0 Scikit Learn NumPy Data and Pretrained Embeddings Download: Twitter , Universal Dependencies , Embeddings (for Twitter and UD) Run process_{ner,twitter,ud}_data.py first to generate .pkl files and then use it as input for vsl_{g,gg}.py . Citation @inproceedings{mchen variational 18, author {Mingda Chen and Qingming Tang and Karen Livescu and Kevin Gimpel}, title {Variational Sequential Labelers for Semi Supervised Learning}, booktitle {Proc. of {EMNLP}}, year {2018} }",Named Entity Recognition (NER),Named Entity Recognition (NER) 1913,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Neural CRF AE Requirments PyTorch 0.3.0 spaCy 2.0.0 Python 3.6 External Resources Features are available at Google Drive Gazetteers are available at Google Drive Instructions 1. Clone this repo. 2. Create three new folders models , features and checkpoints . 3. Download pre trained word embeddings to models and feature files to features . 4. Run python main.py and the model will be save at checkpoints Acknowledgement Some programs are adapted from: Official PyTorch Tutorial A Theano implementation for Lample's work A PyTorch implementation for Ma's work A Keras implementation for Ma's work Thank you for your contributions.",Named Entity Recognition (NER),Named Entity Recognition (NER) 1991,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Entity Role Detection Paper Disclaimer: If at any step, this error occurs UnicodeDecodeError: 'utf 8' codec can't decode byte 0x80 in position 3131: invalid start byte This is becuase of .DS_Store file created by Mac OS. Remove this from that directory. Prerequisite 1. Softwares and Libraries python 3.x Install pip: $easy_install pip python packages: numpy, nltk, sklearn, pycrfsuite, gensim, runipy, scipy Jupyter Notebbok and Deep Learning Tensorflow sh python m pip install numpy nltk sklearn python crfsuite jupyter gensim runipy scipy python m pip install upgrade tensorflow Check Tensorflow Installation sh $python $>> import tensorflow as tf $>> Download punkt sh $python $>> import nltk $>> nltk.download('punkt') 2. Corpus This paper uses news articles which are tagged with following roles. LOC_Victim, LOC_Event, LOC_Accused, LOC_Others PER_Victim, PER_Accused, PER_Others ORG_Victim, ORG_Accused, ORG_Others Rest others tagged with O Corpus location > News article content: Data/input/Corpus/Original/content Corresponding Tags: Data/input/Corpus/Original/tags Corpus after Performing Lemmatization and Removal of Stopword > News article content: Data/input/Corpus/Lemmatized_without_Stopword/content Corresponding Tags: Data/input/Corpus/Lemmatized_without_Stopword/tags 3. Opening Jupyter Notebok Open command prompt or Terminal Go inside directory Entity Role Detection Paper sh $ cd /Entity Role Detection Paper $ jupyter notebook 4. Divide Cropus in Train and Test Open Jupyter Notebok Goto Code >Common > Build_Train_and_Test.ipynb Goto Cell >Run All NOTE: Use runipy command if want to run the notebook from command line directly. $runipy Build_Train_and_Test.ipynb Train Directory: Data/output/Corpus/train Test Directory: Data/output/Corpus/test 5. Download Glove Pretrained Vector Click on or go to and download glove.6B, if previous link doesn't work Copy glove.6B.zip to Entity Role Detection Paper/Data/input/Word_Embedding Extract the zip file, a folder glove.6B wuill be created. 6. In Entity Ranking models, We are using gensim library for building embeddings which needs word2vec. Since we have used Glove in NER models, we will convert the same Glove embeddings to word2vec format. Open Jupyter Notebok Goto Code >Common > Covert_Glove_2_Word2Vec.ipynb Goto Cell >Run All This will create a word2vec embedding for glove vectors 'glove_2_word2vec.6B.300d.txt' in folder Entity Role Detection Paper/Data/input/Word_Embedding/Glove2Word2Vec/ 7. Building Phrase and Corpus We are using Uigram and Bigram (Collocations and Relation) phrases. To build word embeddinds from corpus We will replace bigram phrases with unigram by concatenation by underscore 'was there' becomes 'was_there'. 7.1 Build Collocation Phrases Open Jupyter Notebok Goto Code >EntityRanking >Build_Phrases_and_Corpus >Generate_Collocation_Phrases.ipynb Goto Cell >Run All This will create a file 'Collocation_Phrases.txt' in Entity Role Detection Paper/Data/output/Phrases/ 7.2 Build corpus for Collocation by Replacing Collocation bigrams with unigrams Open Jupyter Notebok Goto Code >EntityRanking >Build_Phrases_and_Corpus >build_corpus_for_collocation_phrases.ipynb Goto Cell >Run All This will create corpus in Entity Role Detection Paper/Data/output/Phrases/Collocation_Phrase_Corpus BEGIN OPTIONAL STEP 7.3 Build Relation Phrases This step is OPTIONAL as it uses the Java and NetBeans, for StanfordCoreNLP 7.3.1 Annotate corpus Open Jupyter Notebok Goto Code >EntityRanking >Build_Phrases_and_Corpus >OpenIE >Annotate_for_Stannford_OpenIE.ipynb Goto Cell >Run All Annotated content files will be generated in Entity Role Detection Paper/Data/output/Phrases/OpenIE_Annotated_Corpus 7.3.2 Build All Relation Phrases Download Jars stanford corenlp 3.7.0 models.jar from stanford corenlp 3.7.0.jar from Place both jars Entity Role Detection Paper/Code/EntityRanking/Build_Phrases_and_Corpus/OpenIE/RelationExtractor Download Netbeans IDE (Java SE Bundle) from and install it. After installation open NetBeansIDE > File > Open Project Browse till Entity Role Detection Paper/Code/EntityRanking/Build_Phrases_and_Corpus/OpenIE/. You will see RelationExtractor project. Select it and click open project. OPTIONAL It will look for these jar. Click on Resolve Problems > Select jars one by one and select Resolve. Locate the jar file copied in above location and select. Right click on RelationExtractor and click 'Clean and Build' Look for BUILD SUCCESSFUL (total time: 3 seconds) in output window at below Goto RelationExtractor > Source Packages > relationextractor Right click on RelationExtractor.java and choose Run file. It will take 5 minutes to process file. This will create a file 'relation_phrases.csv' in Entity Role Detection Paper/Data/output/Phrases/ 7.3.3 Generate Bigram Relation Phrases Open Jupyter Notebok Goto Code >EntityRanking >Build_Phrases_and_Corpus >OpenIE >Create_Bigram_Relation_Phrases.ipynb Goto Cell >Run All Bigram relation phrases file 'bigram_relation_phrases.csv' will be generated in Entity Role Detection Paper/Data/output/Phrases/ END OPTIONAL 7.4 Build corpus for Relation Phrases by Replacing Relation bigrams with unigrams Open Jupyter Notebok Goto Code >EntityRanking >Build_Phrases_and_Corpus >OpenIE >build_corpus_for_relation_phrases.ipynb Goto Cell >Run All This will create corpus in Entity Role Detection Paper/Data/output/Phrases/Relation_Phrase_corpus Models A) Named Entity Recognition Models 1. Hidden Markov Model (HMM) Open Jupyter Notebok Goto Code >NER >HMM >HMM.ipynb Goto Cell >Run All Output will be shown in last cell 2. Conditional Random Field (CRF) Open Jupyter Notebok Goto Code >NER >CRF >CRF.ipynb Goto Cell >Run All Output will be shown in last cell 3. Bi LSTM Model We are using model proposed by Blog: GitHub: We are directly going to use this code. So need to transform our corpus as per the format of input required by this code. 3.1. Transform Corpus to Input Open Jupyter Notebok Goto Code >NER >LSTM >Prepare_Data.ipynb Goto Cell >Run All Verify that train.txt, test.txt and valid.txt files have been created in Code/NER/LSTM/sequence_tagging master/data directory. 3.2. Configure parameters, hyperparameters and others Go to sequence_tagging_master/model/config.py and check parameters. 3.3 Running Model Open Terminal > sh $ cd Entity Role Detection Paper/Code/NER/LSTM/sequence_tagging master 3.3.1 Build data for training sh $ python build_data.py It will create tags.txt, words.txt, chars.txt and trimmed glove vector in sequence_tagging master/data folder 3.3.2 Run Training sh $ python train.py. Current setting is to run 15 epochs.. each epoch usually takes 3 minutes so wait for 30 minutes to complete training. Model will be created under sequence_tagging master/results 3.3.3 Evaluate Open Jupyter Notebok Goto Code >NER >LSTM >sequence_tagging master >Evaluate.ipynb Goto Cell >Run All B) Entity Ranking/Retrieval Models 1. Learn Word Embeddings 1.1 Learn by Training on Corpus We are not using this right now.. Can skip this step REPEAT Open Jupyter Notebok Goto Code >EntityRanking >Learn_Embeddings >Learn_WordEmbedding_From_Corpus.ipynb Do the changes as below to Train for Word Based Approach word_based_approach True collocations_based_approach False relation_phrase_based_approach False Goto Cell >Run All END REPEAT Repeat the steps in REPEAT and END REPEAT for making collocations_based_approach True and other False. Repeat the steps in REPEAT and END REPEAT for making relation_phrase_based_approach True and other False. In Case of Memory Issue, shutdown the notebook and restart again Embeddings will be created: word_based_approach: Entity Role Detection Paper/Data/output/trained_word_embeddingds/word_based/word_trained_from_corpus/w2v_corpus_trained_gensim_300.txt collocations_based_approach: Entity Role Detection Paper/Data/output/trained_word_embeddingds/collocation_based/word_trained_from_corpus/w2v_corpus_trained_gensim_300.txt relation_phrase_based_approach: Entity Role Detection Paper/Data/output/trained_word_embeddingds/relation_based/word_trained_from_corpus/w2v_corpus_trained_gensim_300.txt' 1.2 Learn by Training on Corpus, Initialized with Pretrained Glove. REPEAT Open Jupyter Notebok Goto Code >EntityRanking >Learn_Embeddings >Learn_WordEmbedding_Pretrained_Trained_on_Corpus.ipynb Do the changes as below to Train for Word Based Approach word_based_approach True collocations_based_approach False relation_phrase_based_approach False Goto Cell >Run All END REPEAT Repeat the steps in REPEAT and END REPEAT for making collocations_based_approach True and other False. Repeat the steps in REPEAT and END REPEAT for making relation_phrase_based_approach True and other False. In Case of Memory Issue, shutdown the notebook and restart again Embeddings will be created: word_based_approach: Entity Role Detection Paper/Data/output/trained_word_embeddingds/word_based/word_pretrain_trained_on_corpus/w2v_pretain_corpus_trained_gensim_300.txt collocations_based_approach: Entity Role Detection Paper/Data/output/trained_word_embeddingds/collocation_based/word_pretrain_trained_on_corpus/w2v_pretain_corpus_trained_gensim_300.txt relation_phrase_based_approach: Entity Role Detection Paper/Data/output/trained_word_embeddingds/relation_based/word_pretrain_trained_on_corpus/w2v_pretain_corpus_trained_gensim_300.txt' 2. Building Entity Representations Please change following settings as needed to run different experiments: Settings: We are using senetence and document level For sentence level > doc_level False, sent_level True For document level > doc_level True, sent_level False Unigram and Bigram setting For word approach > word_based_approach True, collocations_based_approach False, relation_phrase_based_approach False For collocations approach > word_based_approach False, collocations_based_approach True, relation_phrase_based_approach False For relation phrase approach > word_based_approach False, collocations_based_approach False, relation_phrase_based_approach True 2.1 Extarct Context Word Sentence Level NOTE Three runs with 3 n gram approaches (word/collocation/relation phrase) END NOTE Open Jupyter Notebok Goto Code/EntityRanking/Entity_Representations/Extract_Entity_and_Context_Words_Sentence_Level.ipynb Goto Cell >Run All This will create files Entity Role Detection Paper/Data/output/entity_represenatation/word_based/train/entity_sent_level_context_word.p Entity Role Detection Paper/Data/output/entity_represenatation/word_based/test/entity_sent_level_context_word.p 2.2 Extarct Context Word Document Level NOTE Three runs with 3 n gram approaches (word/collocation/relation phrase) END NOTE Open Jupyter Notebok Goto Code/EntityRanking/Entity_Representations/Extract_Entity_and_Context_Words_Document_Level.ipynb Goto Cell >Run All This will create files Entity Role Detection Paper/Data/output/entity_represenatation/word_based/train/entity_doc_level_context_word.p Entity Role Detection Paper/Data/output/entity_represenatation/word_based/test/entity_doc_level_context_word.p 2.3 Entity Representation Centroid Method NOTE Six runs with 2 levels (doc/sentence) and 3 n gram approaches (word/collocation/relation phrase) word based, sent level word based, doc level collocation based, sent level collocation based, doc level relation phrase based, sent level relation phrase based, doc level END NOTE Open Jupyter Notebok Goto Code/EntityRanking/Entity_Representations/Entity_Representation_Centroid_Method.ipynb Goto Cell >Run All 2.4 Entity Representation Doc2Vec Method NOTE Six runs with 2 levels (doc/sentence) and 3 n gram approaches (word/collocation/relation phrase) word based, sent level word based, doc level collocation based, sent level collocation based, doc level relation phrase based, sent level relation phrase based, doc level END NOTE Open Jupyter Notebok Goto Code/EntityRanking/Entity_Representations/Entity_Representation_Doc2Vec_Method.ipynb Goto Cell >Run All 3. Building Type Representation 3.1 Build Corpus for Building Type Representaion NOTE Three runs with 3 n gram approaches (word/collocation/relation phrase) END NOTE Open Jupyter Notebok Goto Code/EntityRanking/Type_Representations/Build_Corpus_For_Learning_Type_Rep.ipynb Goto Cell >Run All This will create files Entity Role Detection Paper/Data/output/type_representation/word_based/train/corpus_replaced_entity_with_tag Entity Role Detection Paper/Data/output/type_representation/collocation_based/train/corpus_replaced_entity_with_tag Entity Role Detection Paper/Data/output/type_representation/relation_phrase_based/train/corpus_replaced_entity_with_tag 3.2 Build Type Representaions NOTE Three runs with 3 n gram approaches (word/collocation/relation phrase) END NOTE Open Jupyter Notebok Code/EntityRanking/Type_Representations/Build_Type_Representations.ipynb Goto Cell >Run All This will take around 30 minutes one run. 3 minutes for each tag Depends upon the System Configuration also. This will create files Entity Role Detection Paper/Data/output/type_representation/word_based/train/tag_vec_dict.p Entity Role Detection Paper/Data/output/type_representation/word_based/train/tag_vec_dict.p Entity Role Detection Paper/Data/output/type_representation/collocation_based/train/tag_vec_dict.p Entity Role Detection Paper/Data/output/type_representation/collocation_based/train/tag_vec_dict.p Entity Role Detection Paper/Data/output/type_representation/relation_phrase_based/train/tag_vec_dict.p Entity Role Detection Paper/Data/output/type_representation/relation_phrase_based/train/tag_vec_dict.p 4. Ranking Entity Against Role NOTE All the test scripts have results of All Configurations > Context Levels > document and sentence ngrams > unigram, collocation and relation phrase 4.1 Entity Representation Centroid of Context Words and Type Representation Embeddings Vector Learned By Relacing Entity with Role Open Jupyter Notebok Code/EntityRanking/Ranking_Mechnism/Entity_Centroid_Type_Vector_Rep_GA.ipynb Goto Cell >Run All Output will be printed at the bottom. 4.2 Entity Representation Doc2Vec of Context Words and Type Representation Embeddings Vector Learned By Relacing Entity with Role Open Jupyter Notebok Code/EntityRanking/Ranking_Mechnism/Entity_Doc2Vec_Type_Vector_Rep_GA.ipynb Goto Cell >Run All Output will be printed at the bottom. 4.3 Entity Representation Context Words and Type Representation Embeddings Vector Learned By Relacing Entity with Role Open Jupyter Notebok Code/EntityRanking/Ranking_Mechnism/Entity_Word_Type_Vector_Rep GA.ipynb Goto Cell >Run All Output will be printed at the bottom. 4.4 Entity Representation Context Words and Type Representation Top 20 similar words to Type Embeddings Learned By Relacing Entity with Role Open Jupyter Notebok Code/EntityRanking/Ranking_Mechnism/Entity_Word_Type_Word_Rep GA.ipynb Goto Cell >Run All Output will be printed at the bottom.",Named Entity Recognition (NER),Named Entity Recognition (NER) 2033,Natural Language Processing,Natural Language Processing,Natural Language Processing,"NER Tagger NER Tagger is an implementation of a Named Entity Recognizer that obtains state of the art performance in NER on the 4 CoNLL datasets (English, Spanish, German and Dutch) without resorting to any language specific knowledge or resources such as gazetteers. Details about the model can be found at: Initial setup To use the tagger, you need Python 2.7, with Numpy and Theano installed. Tag sentences The fastest way to use the tagger is to use one of the pretrained models: ./tagger.py model models/english/ input input.txt output output.txt The input file should contain one sentence by line, and they have to be tokenized. Otherwise, the tagger will perform poorly. Train a model To train your own model, you need to use the train.py script and provide the location of the training, development and testing set: ./train.py train train.txt dev dev.txt test test.txt The training script will automatically give a name to the model and store it in ./models/ There are many parameters you can tune (CRF, dropout rate, embedding dimension, LSTM hidden layer size, etc). To see all parameters, simply run: ./train.py help Input files for the training script have to follow the same format than the CoNLL2003 sharing task: each word has to be on a separate line, and there must be an empty line after each sentence. A line must contain at least 2 columns, the first one being the word itself, the last one being the named entity. It does not matter if there are extra columns that contain tags or chunks in between. Tags have to be given in the IOB format (it can be IOB1 or IOB2).",Named Entity Recognition (NER),Named Entity Recognition (NER) 2730,Natural Language Processing,Natural Language Processing,Natural Language Processing,"CollaboNet: collaboration of deep neural networks for biomedical named entity recognition This project provides a neural network(bi LSTM + CRF) approach for biomedical Named Entity Recognition. Our implementation is based on the Tensorflow library on python. __TITLE__ : CollaboNet: collaboration of deep neural networks for biomedical named entity recognition \ Accepted for CIKM 2018 workshop ACM 12th International Workshop on Data and Text Mining in Biomedical Informatics (DTMBIO2018). __AUTHOR__ : Wonjin Yoon 1! , Chan Ho So 2! , Jinhyuk Lee 1 and Jaewoo Kang 1\ __Author details__ 1 Department of Computer Science and Engineering, Korea University 2 Interdisciplinary Graduate Program in Bioinformatics, Korea University ! Equal contributor Quick Links Requirements ( requirements) Model ( model) Data ( data) Usage ( usage) Performance ( performance) Requirements At least one CUDA compatible GPU device is strongly recommanded for execution of this project codes. python 2.7 numpy 1.14.2 tensorflow gpu 1.7.0 License The code is distributed under MIT license (./LICENSE.md). __Citeable paper can be found at pre print server here This software includes third party software. See License thirdparty.txt for details. Model LEFT Character level word embedding using CNN and overview of Bidirectional LSTM with Conditional Random Field (BiLSTM CRF). RIGHT Structure of CollaboNet when Gene model act as a role of target model. Rhombus represents the CRF layer. Arrows show the flow of information when target model is training. Dashed arrows mean that information is not flowing when target model is under training. ! Model Data Train, Test Data We used datasets collected by Crichton et al. These datasets by Crichton et al. are available here . We found that the JNLPBA dataset from Crichton et al. contains sentences which were incorrectly split. So we re generated the dataset from the original corpus by Kim et al. . The details of each dataset is showed below: Corpora Entity type No. sentence No. annotations Data Size : : : : : : : : : : NCBI Disease (Dogan et al., 2014) Disease 7,639 6,881 793 abstracts JNLPBA (Kim et al., 2004) Gene/Proteins 22,562 35,336 2,404 abstracts BC5CDR (Li et al., 2016) Chemicals 14,228 15,935 1,500 articles BC5CDR (Li et al., 2016) Diseases 14,228 12,852 1,500 articles BC4CHEMD (Krallinger et al., 2015a) Chemicals 86,679 84,310 10,000 abstracts BC2GM (Akhondi et al., 2014) Gene/Proteins 20,510 24,583 20,000 sentences The datasets are publicly available by executing download.sh (./download.sh). Pre trained Embeddings We used pre trained word embeddings from Pyysalo et al. which is trained on PubMed, PubMed Central(PMC) and Wikipedia text. It will be automatically downloaded by executing download.sh (./download.sh). Usage Download Data bash download.sh Single Task Model STM (6 datasets) __Preperation phase (Phase 0) of CollaboNet__ python run.py ncbi jnlpba bc5_chem bc5_disease bc4 bc2 lr_pump lr_decay 0.05 You can also refer to stm.sh (./stm.sh) for detailed usage. CollaboNet (6 datasets) __You should produce pre trained STM model by executing Preperation phase before running CollaboNet.__ python run.py ncbi jnlpba bc5_chem bc5_disease bc4 bc2 lr_pump lr_decay 0.05 pretrained STM_MODEL_DIRECTORY_NAME(ex 201806210605) You can find STM_MODEL_DIRECTORY_NAME from ./modelSave folder. You can also refer to collabo.sh (./collabo.sh) for detailed usage. Performance STM Model NCBI disease JNLPBA BC5CDR chem BC5CDR disease BC4CHEMD BC2GM Average : : : : : : : : : : : : : : : : : : Habibi et al. (2017) STM F1 Score 84.44 77.25 90.63 83.49 86.62 77.82 83.38 Wang et al. (2018) STM F1 Score 83.92 72.17 89.85 82.68 88.75 80.00 82.90 Our STM F1 Score 84.69 77.39 92.74 82.61 88.40 79.27 84.03 Scores in the asterisked (\ ) cells are obtained in the experiments that we conducted; these scores are not reported in the original papers. The best scores from these experiments are in bold. CollaboNet NCBI disease JNLPBA BC5CDR chem BC5CDR disease BC4CHEMD BC2GM Average : : : : : : : : : : : : : : : : : : Wang et al. (2018) MTM F1 Score 86.14 73.52 91.29 83.33 89.37 80.74 84.07 Our CollaboNet F1 Score 86.36 78.58 93.31 84.08 88.85 79.73 85.15 Scores in the asterisked (\ ) cells are obtained in the experiments that we conducted; these scores are not reported in the original papers. The best scores from these experiments are in bold.",Named Entity Recognition (NER),Named Entity Recognition (NER) 2895,Natural Language Processing,Natural Language Processing,Natural Language Processing,引用 IDCNN+CRF参考论文: 语料 采用人民日报语料(下载地址: ,密码:kbp2). 将语料中以/nr和/nt结尾的词随机替换为现有库中的名字和组织结构名. 需要的环境 tensorflow> 1.3.1 keras 2.1.3 keras_contrib 0.0.2 训练 训练分词 python train_model.py model cws which_model idcnn data_path ../data/small_cws_train_data.txt epoch 5 ner_model ../model_dir/cws_model.h5 训练命名实体识别 python train_model.py model ner which_model idcnn data_path ../data/small_ner_train_data.txt epoch 5 ner_model ../model_dir/ner_model.h5 使用 python demo.py,Named Entity Recognition (NER),Named Entity Recognition (NER) 1710,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Auto Encoder Matching Model The code for An Auto Encoder Matching Model for Learning Utterance Level Semantic Dependency in Dialogue Generation Requirements Python 3 Tensorflow > 1.8 mlbootstrap 0.02 Data Preparation Get the DailyDialog dataset at Unzip the downloaded file Move dialogues_text.txt to data/source/daily/dialogues_text.txt To use your own data, create a folder data/source/ / and place the original data in the directory. Then write a parsing script (you can refer to daily.py (./process/daily.py)) and update the config.yaml to include the new data path. Training python play.py Evaluation Change the last line in play.py to bootstrap.evaluate() and run python play.py Hyperparameters You can change the hyperparameters in config.yaml according to your needs.",Text Generation,Text Generation 2264,Natural Language Processing,Natural Language Processing,Natural Language Processing,CodeGen Generate Codes using Sequence GAN Reference Papers: SeqGAN : LeakGAN :,Text Generation,Text Generation 2270,Natural Language Processing,Natural Language Processing,Natural Language Processing,"TensorFlow implementation of Improved Variational Autoencoders for Text Modeling using Dilated Convolutions paper: This is NOT an original implementation. There may be some minor differences from the original structure. Prerequisites Python 3.5 tensorflow gpu 1.3.0 matplotlib 2.0.2 numpy 1.13.1 scikit learn 0.19.0 Preparation Dataset is not contained. Please prepare your own dataset. Sentence Pickle file of Numpy array of word ids (shape batch_size, sentence_length ). Label Pickle file of Numpy array of a label of a class (sentiment, category, etc.) (shape batch_size ). Dictionary Pickle file of Python dictionary. It should contain \ , \ , \ as meta words. python dictionary {word1: id1, word2: id2, ...} Usage Simple VAE Train 1. modify config.py 2. run bash python train_vae.py Get sample sentences 1. modify sampling.py 2. run bash python sampling.py Semisupervised Classification 1. modify config.py 2. run bash python train_cvae.py License MIT Author Ryo Kamoi",Text Generation,Text Generation 2457,Natural Language Processing,Natural Language Processing,Natural Language Processing,SeqGAN : Sequence Generative Adverserial Nets with Policy Gradient Code Reference : Psuedo Code Object & Architecture,Text Generation,Text Generation 2651,Natural Language Processing,Natural Language Processing,Natural Language Processing,"LyricsGANs Welcome to the Repository of ai.write_lyrics(). This project is part of the course SOW MKI61 2017 PER3 V: 1718 Cognitive Computational Modeling of Language and Web Interaction at Radboud University. The Problem Most modern music today have predictable lyrics and share a lot of their topics and even vocabulary to an extent that there even is an online lyrics generator . Although the generator offers a large range of genres and even artists to generate texts to, they just fill a simple template, further reinforcing the idea that there is no creativity in modern music. We want to fix this serious problem that humanity is facing. But, since writing our own songs to combat modern lyrical uniformity seems futile with the huge amounts of monotone music published daily and our writing skills are admittedly lacking we decided to tackle the problem programmatically. A simple template approach like the online lyrics generator mentioned above will not suffice, as it will only deem us to repeat the mistakes of modern musicians. What we need is... Machine Learning ! The Approach We will train a set of Generative Adversarial Networks to generate music in a given style. One network will generate new songs while the second network will try to distinguish between original and generated text. Although considered infeasible due to the discrete nature of text, recent approaches to tackle the problem have shown promising results (Guo et al., 2017; Press, Bar, Bogin, Berant, & Wolf, 2017; Wang, Qin, & Wan, 2017). The GAN code was adapted from (last reference) The Dataset Everybody knows, that to train the collection of if statements commonly denoted as 'Neural Network' you need a lot of data. Luckily, the friendly people over at Kaggle have composed a dataset of over 380 thousand english lyrics . The Group This revolutionary project proposal was brought to you by: Mathis Sackers (@MathisSackers) & Valentin Koch (@valko073) aka. ai.write_lyrics() ™ References Song Lyrics Generator Lyrics for 380,000+ songs in English from MetroLyrics Sutskever, Ilya, James Martens, and Geoffrey E. Hinton. Generating text with recurrent neural networks. Proceedings of the 28th International Conference on Machine Learning (ICML 11). 2011. Taneja, Pratiksha, and Karun Guide Verma. Text Generation Using Different Recurrent Neural Networks. Diss. 2017. H. Wang, Z. Qin, and T. Wan, ‘Text Generation Based on Generative Adversarial Nets with Latent Variable’, arXiv:1712.00170 cs , Nov. 2017. J. Guo, S. Lu, H. Cai, W. Zhang, Y. Yu, and J. Wang, ‘Long Text Generation via Adversarial Training with Leaked Information’, arXiv:1709.08624 cs , Sep. 2017. O. Press, A. Bar, B. Bogin, J. Berant, and L. Wolf, ‘Language Generation with Recurrent Generative Adversarial Networks without Pre training’, arXiv:1706.01399 cs , Jun. 2017.",Text Generation,Text Generation 2661,Natural Language Processing,Natural Language Processing,Natural Language Processing,"SeqGAN Tensorflow implementation for the paper SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient by Lantao Yu et al. This repository is created to implement the architecture of the paper on synthetic data(like in paper) and any text dataset available publicly. data_loader.py is responsible for loading data in batches for both, Generator and Discriminator. discriminator.py has the architecture for Discriminator model, which is a Convolutional Neural Network for text classification. It also uses a highway network. generator.py has the architecture for Generator model, which according to the paper is a Recurrent Neural Network with LSTM units. target_lstm.py is similar to generator and responsible for creating synthetic data. It can be omitted if tested on a dataset. I have used InstaPic dataset captions to generate new set of captions from SeqGAN along with the synthetic dataset procedure the authors have presented in the paper.",Text Generation,Text Generation 1777,Computer Vision,Computer Vision,Computer Vision,EAST pytorch version paper: status:DONE!,Scene Text Detection,Scene Text Detection 2093,Computer Vision,Computer Vision,Computer Vision,"EAST Pytorch EAST的Pytorch版本, paper: Thanks argman/EAST songdejia/EAST Requirements Python2 or Python3 Pytorch > 0.4.0 numpy shapely Imagenet pretrain model resnet50 Dataset TODO Train ➜ python train.py h usage: EAST h b BATCH_SIZE l LR wd WD epochs EPOCHS j NUM_WORKERS s INPUT_SIZE text scale TEXT_SCALE min text size MIN_TEXT_SIZE gpus GPUS checkpoint CHECKPOINT DIR PTH positional arguments: DIR train dataset dir PTH pretrain model optional arguments: h, help show this help message and exit b BATCH_SIZE, batch size BATCH_SIZE batch size per GPU(default 14) l LR, lr LR lr(default 0.0001) wd WD weight decay(default 1e 5) epochs EPOCHS epochs(default 100) j NUM_WORKERS, num workers NUM_WORKERS dataloader workers(default 16) s INPUT_SIZE, input size INPUT_SIZE input image size(default 512)INPUT SIZE is the image size used by training,it should be compatible with TEXT_SCALE text scale TEXT_SCALE text_scale is the max text length EAST can detect,its restricted by the receptive field of CNNdefault 512 min text size MIN_TEXT_SIZE min text size(default 10) gpus GPUS gpu ids(default 0 checkpoint CHECKPOINT checkpoint dir(default ./checkpoint) Test TODO",Scene Text Detection,Scene Text Detection 2710,Computer Vision,Computer Vision,Computer Vision,"Shape Robust Text Detection with Progressive Scale Expansion Network Requirements Python 2.7 PyTorch v0.4.1+ pyclipper Polygon2 OpenCV 3.4 (for c++ version pse) opencv python 3.4 Introduction Progressive Scale Expansion Network (PSENet) is a text detector which is able to well detect the arbitrary shape text in natural scene. Training CUDA_VISIBLE_DEVICES 0,1,2,3 python train_ic15.py Testing CUDA_VISIBLE_DEVICES 0 python test_ic15.py scale 1 resume path of model Eval script for ICDAR 2015 and SCUT CTW1500 cd eval sh eval_ic15.sh sh eval_ctw1500.sh Performance (new version paper) ICDAR 2015 Method Extra Data Precision (%) Recall (%) F measure (%) FPS (1080Ti) Model PSENet 1s (ResNet50) 81.49 79.68 80.57 1.6 baiduyun (extract code: rxti); OneDrive PSENet 1s (ResNet50) pretrain on IC17 MLT 86.92 84.5 85.69 3.8 baiduyun (extract code: aieo); OneDrive PSENet 4s (ResNet50) pretrain on IC17 MLT 86.1 83.77 84.92 3.8 baiduyun (extract code: aieo); OneDrive SCUT CTW1500 Method Extra Data Precision (%) Recall (%) F measure (%) FPS (1080Ti) Model PSENet 1s (ResNet50) 80.57 75.55 78.0 3.9 baiduyun (extract code: ksv7); OneDrive PSENet 1s (ResNet50) pretrain on IC17 MLT 84.84 79.73 82.2 3.9 baiduyun (extract code: z7ac); OneDrive PSENet 4s (ResNet50) pretrain on IC17 MLT 82.09 77.84 79.9 8.4 baiduyun (extract code: z7ac); OneDrive Performance (old version paper) ICDAR 2015 (training with ICDAR 2017 MLT) Method Precision (%) Recall (%) F measure (%) PSENet 4s (ResNet152) 87.98 83.87 85.88 PSENet 2s (ResNet152) 89.30 85.22 87.21 PSENet 1s (ResNet152) 88.71 85.51 87.08 ICDAR 2017 MLT Method Precision (%) Recall (%) F measure (%) PSENet 4s (ResNet152) 75.98 67.56 71.52 PSENet 2s (ResNet152) 76.97 68.35 72.40 PSENet 1s (ResNet152) 77.01 68.40 72.45 SCUT CTW1500 Method Precision (%) Recall (%) F measure (%) PSENet 4s (ResNet152) 80.49 78.13 79.29 PSENet 2s (ResNet152) 81.95 79.30 80.60 PSENet 1s (ResNet152) 82.50 79.89 81.17 ICPR MTWI 2018 Challenge 2 Method Precision (%) Recall (%) F measure (%) PSENet 1s (ResNet152) 78.5 72.1 75.2 Results Figure 3: The results on ICDAR 2015, ICDAR 2017 MLT and SCUT CTW1500 Paper Link new version paper old version paper Other Implements tensorflow version (thanks @ liuheng92 )",Scene Text Detection,Scene Text Detection 2761,Computer Vision,Computer Vision,Computer Vision,"FOTS: Fast Oriented Text Spotting with a Unified Network Introduction This is a pytorch re implementation of FOTS: Fast Oriented Text Spotting with a Unified Network . The features are summarized blow: + Only detection part is implemented. Contents 1. Installation ( installation) 2. Download ( download) 3. Train ( train) 4. Test ( test) Installation 1. Any version of torch version > 0.3.1 should be ok. Download 1. Models trained on ICDAR 2015 (training set) + ICDAR 2017 (training set) Train If you want to train the model, you should provide the dataset path, in the dataset path, a separate gt text file should be provided for each image and run python main_train.py Test run python eval.py a text file will be then written to the output path.",Scene Text Detection,Scene Text Detection 2910,Computer Vision,Computer Vision,Computer Vision,"EAST : An Efficient and Accurate Scene Text Detector Introduction This is a tensorflow implemention of EAST. I only reimplement the RBOX part of the paper, which achieves an F1 score of 80.8 on the ICDAR 2015 dataset (which is about two points better than the result of pvanet in the paper, see The running speed is about 150ms (network) + 300ms (NMS) per image on a K40 card. The nms part is too slow because of the use of shapely in python, and can be further improved. Thanks for the author's ( @zxytim ) help! Please site his paper if you find this useful. Contents 1. Installation ( installation) 2. Download ( download) 3. Test ( train) 4. Train ( test) 5. Examples ( examples) Installation 1. I think any version of tensorflow version > 1.0 should be ok. Download 1. Models trained on ICDAR 2013 (training set) + ICDAR 2015 (training set): BaiduYun link 2. Resnet V1 50 provided by tensorflow slim: slim resnet v1 50 Train If you want to train the model, you should provide the dataset path, in the dataset path, a separate gt text file should be provided for each image and run python multigpu_train.py gpu_list 0 input_size 512 batch_size 14 checkpoint_path /tmp/east_icdar2015_resnet_v1_50_rbox/ \ text_scale 512 training_data_path /data/ocr/icdar2015/ geometry RBOX learning_rate 0.0001 num_readers 24 \ pretrained_model_path /tmp/resnet_v1_50.ckpt If you have more than one gpu, you can pass gpu ids to gpu_list Note: you should change the gt text file of icdar2015's filename to img_\ .txt instead of gt_img_\ .txt(or you can change the code in icdar.py), and some extra characters should be removed from the file. Test run python eval.py test_data_path /tmp/images/ gpu_list 0 checkpoint_path /tmp/east_icdar2015_resnet_v1_50_rbox/ \ output_path /tmp/ a text file will be then written to the output path. Examples Here is some test examples on icdar2015, enjoy the beautiful text boxes! ! image_1 (Examples/img_2.jpg) ! image_2 (Examples/img_10.jpg) ! image_3 (Examples/img_14.jpg) ! image_4 (Examples/img_26.jpg) ! image_5 (Examples/img_75.jpg) Please let me know if you encounter any issues(my email boostczc@gmail dot com).",Scene Text Detection,Scene Text Detection 2560,Natural Language Processing,Natural Language Processing,Natural Language Processing,Sequence to convolution Neural Networks arxiv link : We propose a new deep neural network model and its training scheme for text classification. Our model Sequence to convolution Neural Networks(Seq2CNN) consists of two blocks: Sequential Block that summarizes input texts and Convolution Block that receives summary of input and classifies it to certain label. Seq2CNN is trained end to end to classify various length texts without preprocessing inputs into fixed length. We also present Gradual Weight Shift(GWS) method that stabilizes training. GWS is applied to our model’s loss function. We compared our model with word based TextCNN trained with different data preprocessing methods. We obtained significant improvement of in classification accuracy over word based TextCNN without any ensemble or data augmentation. Here's the overview of Seq2CNN model. ! alt text,Text Classification,Text Classification 2573,Natural Language Processing,Natural Language Processing,Natural Language Processing,"VDCNNs for Text Classification in PyTorch A PyTorch implementation of Very Deep Convolutional Networks (VDCNNs) for text classification. Usage Training data should be formatted as below: sentence \t label sentence \t label ... To prepare data: python prepare.py training_data To train: python train.py model word_to_idx tag_to_idx training_data.csv num_epoch To predict: python predict.py model.epochN word_to_idx tag_to_idx test_data To evaluate: python evaluate.py model.epochN word_to_idx tag_to_idx test_data References Alexis Conneau, Holger Schwenk, Loïc Barrault, Yann Lecun. 2017. Very Deep Convolutional Networks for Text Classification. arXiv:1606.01781.",Text Classification,Text Classification 2648,Natural Language Processing,Natural Language Processing,Natural Language Processing,Very Deep Convolutional Neural Network for Sentiment Analysis This repo contains the ipython notebook implementing Very Deep Convolutional Neural Network for Text Analysis based on the research paper Why I implemented this research paper? In this paper they have tried the approach of VGG16 which is a deep learning model very efficient for image classification which have pushed the state of the art in computer vision. Instead of doing the traditional approach like bag of words or n grams and their TF IDF this paper has considered each character as a unique entity in itself and instead of doing a word2vec they have done char2vec similar to taking into consideration a pixel the most basic unit in a image for calculation. So basically they have tried to model their approach on the deep convolutional networks approach for computer vision. Architecture ! (arc1.png) ! (arc2.png) I am further looking at increasing the depth of the model for better results.,Text Classification,Text Classification 2804,Natural Language Processing,Natural Language Processing,Natural Language Processing,"word2vec_medical_record This project aim to utilize neural network to analyze sequence input from structured free text medical records. At this stage, models are trained to give categorical output and optimized with categorical loss. In addition, a word embedding can be generated during training. ! word2vec model Antoine Bordes, et al. Joint Learning of Words and Meaning Representations for Open Text Semantic Parsing. AISTATS(2012) Alexis Conneau, et al. Very Deep Convolutional Networks for Natural Language Processing. (2016) Preprocessing the original data The original data was stored in .xls document, extracted from a medical record database. We prepare our training data with the following procedures: Remove any training data with missing item Remove non english characters Remove all next line character '\r\n' Assign an UUID to each set of training data.",Text Classification,Text Classification 2900,Natural Language Processing,Natural Language Processing,Natural Language Processing,"VDCNN Tensorflow Implementation of Very Deep Convolutional Neural Network for Text Classification. Note This repository is a simple Keras implementation of VDCNN model proposed by Conneau et al. Paper for VDCNN. Note: Temporal batch norm not implemented. Temp batch norm applies same kind of regularization as batch norm, except that the activations in a mini batch are jointly normalized over temporal instead of spatial locations. Right now this project is using regular Tensorflow batch normalization only. See another VDCNN implementation in Pytorch if you feel more comfortable with Pytorch, in which the author is having detailed reproduced results as well. See the original Tensorflow implementation as well. It should be noted that the VDCNN paper states that the implementation is done originally in Touch 7. Prerequisites Python3 Tensorflow 1.0 or higher keras 2.1.5 or higher Numpy Datasets The original paper tests several NLP datasets, including DBPedia, AG's News, Sogou News and etc. data_helper.py operates with CSV format train and test files. Downloads of those NLP text classification datasets can be found here (Many thanks to ArdalanM): Dataset Classes Train samples Test samples source : : : : : : : : AG’s News 4 120 000 7 600 link Sogou News 5 450 000 60 000 link DBPedia 14 560 000 70 000 link Yelp Review Polarity 2 560 000 38 000 link Yelp Review Full 5 650 000 50 000 link Yahoo! Answers 10 1 400 000 60 000 link Amazon Review Full 5 3 000 000 650 000 link Amazon Review Polarity 2 3 600 000 400 000 link Parameters Setting For all versions of VDCNN, training and testing is done on a Ubuntu 16.04 Server with Tesla K80, with Momentum Optimizer of decay 0.9, exponential learning rate decay, a evaluation interval of 25, a batch size of 128. Weights are initialized by He initialization proposed in He et al . Batch normalizations are using a decay of 0.999. (There are tons of factors that can influence the testing accuracy of the model, but overall this project should be good to go. Training of a deep CNN model is not a easy task, patience is everything. _ ) Experiments TODO: Testing of more NLP benchmark datasets and presenting detailed results. Results are reported as follows: (i) / (ii) (i): Test set accuracy reported by the paper (acc 100% error_rate) (ii): Test set accuracy reproduced by this Keras implementation Results for Max Pooling: Depth ag_news DBPedia Sogou News : : : : : : : : 9 layers 90.83 / xx.xxxx 98.65 / xx.xxxx 96.30 / xx.xxxx 17 layers 91.12 / xx.xxxx 98.60 / xx.xxxx 96.46 / xx.xxxx 29 layers 91.27 / xx.xxxx 98.71 / xx.xxxx 96.64 / xx.xxxx Results for K max Pooling: Depth ag_news DBPedia Sogou News : : : : : : : : 9 layers 90.17 / xx.xxxx 98.44 / xx.xxxx 96.42 / xx.xxxx 17 layers 90.61 / xx.xxxx 98.39 / xx.xxxx 96.49 / xx.xxxx 29 layers 91.33 / xx.xxxx 98.59 / xx.xxxx 96.82 / xx.xxxx Results for Conv downsampling: Depth ag_news DBPedia Sogou News : : : : : : : : 9 layers 90.17 / xx.xxxx 98.44 / xx.xxxx 96.42 / xx.xxxx 17 layers 90.61 / xx.xxxx 98.39 / xx.xxxx 96.49 / xx.xxxx 29 layers 91.33 / xx.xxxx 98.59 / xx.xxxx 96.82 / xx.xxxx Results for Max Pooling with Shortcut: Depth ag_news DBPedia Sogou News : : : : : : : : 9 layers 90.83 / xx.xxxx 98.65 / xx.xxxx 96.30 / xx.xxxx 17 layers 91.12 / xx.xxxx 98.60 / xx.xxxx 96.46 / xx.xxxx 29 layers 91.27 / xx.xxxx 98.71 / xx.xxxx 96.64 / xx.xxxx Results for K max Pooling with Shortcut: Depth ag_news DBPedia Sogou News : : : : : : : : 9 layers 90.17 / xx.xxxx 98.44 / xx.xxxx 96.42 / xx.xxxx 17 layers 90.61 / xx.xxxx 98.39 / xx.xxxx 96.49 / xx.xxxx 29 layers 91.33 / xx.xxxx 98.59 / xx.xxxx 96.82 / xx.xxxx Results for Conv downsampling with Shortcut: Depth ag_news DBPedia Sogou News : : : : : : : : 9 layers 90.17 / xx.xxxx 98.44 / xx.xxxx 96.42 / xx.xxxx 17 layers 90.61 / xx.xxxx 98.39 / xx.xxxx 96.49 / xx.xxxx 29 layers 91.33 / xx.xxxx 98.59 / xx.xxxx 96.82 / xx.xxxx Reference Original preprocessing codes and VDCNN Implementation By geduo15 Train Script and data iterator from Convolutional Neural Network for Text Classification NLP Datasets Gathered by ArdalanM and Others",Text Classification,Text Classification 2591,Computer Vision,Computer Vision,Computer Vision,"Image Super Resolution using in Keras 2+ Implementation of Image Super Resolution CNN in Keras from the paper Image Super Resolution Using Deep Convolutional Networks . Also contains models that outperforms the above mentioned model, termed Expanded Super Resolution, Denoiseing Auto Encoder SRCNN which outperforms both of the above models and Deep Denoise SR, which with certain limitations, outperforms all of the above. Setup Supports Keras with Theano and Tensorflow backend. Due to recent report that Theano will no longer be updated, Tensorflow is the default backend for this project now. Usage Note : The project is going to be reworked. Therefore please refer to Framework Updates.md to see the changes which will affect performance. The model weights are already provided in the weights folder, therefore simply running : python main.py imgpath , where imgpath is a full path to the image. The default model is DDSRCNN (dsr), which outperforms the other three models. To switch models, python main.py imgpath model type , where type sr , esr , dsr , ddsr If the scaling factor needs to be altered then : python main.py imgpath scale s , where s can be any number. Default s 2 If the intermediate step (bilinear scaled image) is needed, then: python main.py imgpath scale s save_intermediate True Window Helper The windows_helper script contains a C program for Windows to easily use the Super Resolution script using any of the available models. Parameters model : Can be one of sr (Image Super Resolution), esr (Expanded SR), dsr (Denoiseing Auto Encoder SR), ddsr (Deep Denoise SR), rnsr (ResNet SR) or distilled_rnsr (Distilled ResNet SR) scale : Scaling factor can be any integer number. Default is 2x scaling. save_intermediate : Save the intermediate results before applying the Super Resolution algorithm. mode : fast or patch . Patch mode can be useful for memory constrained GPU upscaling, whereas fast mode submits whole image for upscaling in one pass. suffix : Suffix of the scaled image filename patch_size : Used only when patch mode is used. Sets the size of each patch Model Architecture Super Resolution CNN (SRCNN) The model above is the simplest model of the ones described in the paper above, consisting of the 9 1 5 model. Larger architectures can be easily made, but come at the cost of execution time, especially on CPU. However there are some differences from the original paper: 1 Used the Adam optimizer instead of RMSProp. 2 This model contains some 21,000 parameters, more than the 8,400 of the original paper. It is to be noted that the original models underperform compared to the results posted in the paper. This may be due to the only 91 images being the training set compared to the entire ILSVR 2013 image set. It still performs well, however images are slightly noisy. Expanded Super Resolution CNN (ESRCNN) The above is called Expanded SRCNN , which performs slightly worse than the default SRCNN model on Set5 (PSNR 31.78 dB vs 32.4 dB). The Expansion occurs in the intermediate hidden layer, in which instead of just 1x1 kernels, we also use 3x3 and 5x5 kernels in order to maximize information learned from the layer. The outputs of this layer are then averaged, in order to construct more robust upscaled images. Denoiseing (Auto Encoder) Super Resolution CNN (DSRCNN) The above is the Denoiseing Auto Encoder SRCNN , which performs even better than SRCNN on Set5 (PSNR 32.57 dB vs 32.4 dB). This model uses bridge connections between the convolutional layers of the same level in order to speed up convergence and improve output results. The bridge connections are averaged to be more robust. Since the training images are passed through a gausian filter (sigma 0.5), then downscaled to 1/3rd the size, then upscaled to the original 33x33 size images, the images can be considered noisy . Thus, this auto encoder quickly improves on the earlier results, and reduces the noisy output image problem faced by the simpler SRCNN model. Deep Denoiseing Super Resolution (DDSRCNN) The above is the Deep Denoiseing SRCNN , which is a modified form of the architecture described in the paper Image Restoration Using Convolutional Auto encoders with Symmetric Skip Connections applied to image super resolution. It can perform far better than even the Denoiseing SRCNN, but is currently not working properly. Similar to the paper Image Restoration Using Convolutional Auto encoders with Symmetric Skip Connections , this can be considered a highly simplified and shallow model compared to the 30 layer architecture used in the above paper. ResNet Super Resolution (ResNet SR) The above is the ResNet SR model, derived from the SRResNet model of the paper Photo Realistic Single Image Super Resolution Using a Generative Adversarial Network Currently uses only 6 residual blocks and 2x upscaling rather than the 15 residual blocks and the 4x upscaling from the paper. Efficient SubPixel Convolutional Neural Network (ESPCNN) The above model is the Efficient Subpixel Convolution Neural Network which uses the Subpixel Convolution layers to upscale rather than UpSampling or Deconvolution. Currently has not been trained properly. GAN Image Super Resolution (GANSR) The above model is the GAN trained Image Super Resolution network based on the ResNet SR and the SRGAN from the paper above. Note : Does not work properly right now. Distilled ResNet Super Resolution (Distilled ResNetSR) The above model is a smaller ResNet SR that was trained using model distilation techniques from the teacher model the original larger ResNet SR (with 6 residual blocks). The model was trained via the distill_network.py script which can be used to perform distilation training from any teacher network onto a smaller 'student' network. Non Local ResNet Super Resolution (Non Local ResNetSR) The above model is a trial to see if Non Local blocks can obtain better super resolution. Various issues : 1) They break the fully convolutional behaviour of the network. Due to the flatten and reshape parts of this module, you need to have a set size for the image when building it. Therefore you cannot construct one model and then pass random size input images to evaluate. 2) The non local blocks require vast amount of memory as their intermediate products. I think this is the reason they suggested to use this at the end of the network where the spatial dimension is just 14x14 or 7x7. I had consistent ooms when trying it on multiple positions of a super resolution network, and could only successfully place it at the last ResNet block without oom (on just 4 GB 980M). Finally, I was able to train a model anyway and it got pretty high psnr scores. I wasn't able to evaluate that, and was able to distill the model into ordinary ResNet. It got exactly same psnr score as the original non local model. Evaluating that, all the images were a little smoothed out. This is worse than a distilled ResNet which obtains a lower psnr score but sharper images. Training If you wish to train the network on your own data set, follow these steps (Performance may vary) : 1 Save all of your input images of any size in the input_images folder 2 Run img_utils.py function, transform_images(input_path, scale_factor) . By default, input_path is input_images path. Note: Unless you are training ESPCNN, set the variable true_upsampling to False and then run the img_utils.py script to generate the dataset. Only for ESPCNN training do you need to set true_upsampling to True. 3 Open tests.py and un comment the lines at model.fit(...), where model can be sr, esr or dsr, ddsr. Note: It may be useful to save the original weights in some other location. 4 Execute tests.py to begin training. GPU is recommended, although if small number of images are provided then GPU may not be required. Caveats Very large images may not work with the GPU. Therefore, 1 If using Theano, set device cpu and cnmem 0.0 in theanorc.txt 2 If using Tensorflow, set it to cpu mode On the CPU, extremely high resolution images of the size upto 6000 x 6000 pixels can be handled if 16 GB RAM is provided. Examples There are 14 extra images provided in results, 2 of which (Monarch Butterfly and Zebra) have been scaled using both bilinear, SRCNN, ESRCNN and DSRCNN. Monarch Butterfly Bilinear SRCNN ESRCNN DDSRCNN Zebra Bilinear SRCNN ESRCNN DDSRCNN",Image Denoising,Image Denoising 2616,Computer Vision,Computer Vision,Computer Vision,"Learning Proximal Operators This repository provides the implementation of our paper Learning Proximal Operators: Using Denoising Networks for Regularizing Inverse Imaging Problems (Tim Meinhardt, Michael Möller, Caner Hazirbas, Daniel Cremers, ICCV 2017) All results presented in our work were produced with this code. Additionally we provide a TensorFlow implementation of the denoising convolutional neural network (_DNCNN_) introduced in Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising Installation 1. Install git lfs for pulling the model and data files with the repository 2. git clone git@github.com:tum vision/learn_prox_ops.git 3. Install the following packages for Python 3.6: 1. pip3 install r requirements.txt 2. ProxImaL: pip3 install git+ 3. PyBM3D: 1. with CUDA: pip3 install git+ 2. without CUDA: pip3 install git+ 3. TensorFlow: 1. with CUDA: pip3 install tensorflow gpu 1.3.0 2. without CUDA: pip3 install tensorflow 1.3.0 4. OpenCV: 1. pip3 install opencv python 3.3.0.10 2. or for faster _NLM_ denoising compile OpenCV 3.3.0 manually with CUDA support and Python 3.6 bindings 4. Download the demosaicking (_McMaster_ and _Kodak_) and the greyscale deblurring datasets with data/download_datasets.sh . 5. ( Optional , for faster computation and training _DNCNN_ models) Install CUDA and set the CUDA_HOME environment variable. 6. ( Optional , for reproducing paper results and faster computation) Install Halide (Version: 2016/04/27 ) and set the HALIDE_PATH environment variable. Run an Experiment The evaluation of our method included two exemplary linear inverse problems, namely Bayer color demosaicking and grayscale deblurring. In order to configure, organize, log and reproduce our computational experiments we structured the problems with the Sacred framework. For a detailed explanation on a typical Sacred interface please read its documentation. We implemented two Sacred _ingredients_ ( elemental_ingredient, grid_ingredient ) which are both injected into our experiments. Among other things each of the experiments consists of multiple command line executable Sacred _commands_. If everything is setup correctly the print_config command for example prints the current configuration scope by executing: python src/experiment_deblurring.py print_config A typical run with a preset configuration scope for optimal _DNCNN_ parameters is executed with ( automain command): python src/experiment_deblurring.py with experiment_name experiment_a image_name barbara elemental.optimal_DNCNN_experiment_a Hyperparameter Grid Search We conducted multiple exhaustive grid searches to establish the optimal hyper parameters for both experiments. The set of searchable grid_params has to be set in the respective experiment file. A search for the optimal demosaicking parameters for all images and the BM3D denoising prior is started by executing: python src/experiment_demosaicking.py grid_search_all_images with elemental.denoising_prior BM3D The grid_search.param_dicts_file_path configuration parameter can be used to continue a previous search. Training a _DNCNN_ The training of the denoising convolutional neural network which we applied as a learned denoising prior was implemented with TensorFlow. With the help of command line tf.app.flags we provide full control over the training procedure. The single channel model provided with this repository was trained by executing: python src/tf_solver.py sigma_noise 0.02 batch_size 128 network DNCNN channels 1 pipeline bsds500 device_name /gpu:0 train_epochs 100 Publication If you use this software in your research, please cite our publication: @article{DBLP:journals/corr/Meinhardt0HC17, author {Tim Meinhardt and Michael Moeller and Caner Hazirbas and Daniel Cremers}, title {Learning Proximal Operators: Using Denoising Networks for Regularizing Inverse Imaging Problems}, journal {CoRR}, volume {abs/1704.03488}, year {2017}, url { timestamp {Wed, 07 Jun 2017 14:40:59 +0200}, biburl { bibsource {dblp computer science bibliography,",Image Denoising,Image Denoising 2660,Computer Vision,Computer Vision,Computer Vision,"Image Restoration Using Very Deep Convolutional Encoder Decoder Networks with Symmetric Skip Connections This is the testing code for the papers: Image Restoration Using Very Deep Convolutional Encoder Decoder Networks with Symmetric Skip Connections, Annual Conference on Neural Information Processing Systems (NIPS), 2016 , and Image Restoration Using Convolutional Auto encoders with Symmetric Skip Connections, arXiv, 2016 Download this repository For any question, please send email to xjmgl.nju@gmail.com or, chhshen@gmail.com If you use this code in your research, please cite our papers: @InProceedings{NIPS2016Mao, author Xiao{ }Jiao Mao and Chunhua Shen and Yu{ }Bin Yang , title Image Restoration Using Very Deep Convolutional Encoder Decoder Networks with Symmetric Skip Connections , booktitle Proc. Advances in Neural Inf. Process. Syst. , year 2016 , } @Article{MaoSY16a, author Xiao{ }Jiao Mao and Chunhua Shen and Yu{ }Bin Yang , title Image Restoration Using Convolutional Auto encoders with Symmetric Skip Connections , journal arXiv preprint , volume abs/1606.08921 , year 2016 , url timestamp Fri, 01 Jul 2016 17:39:49 +0200 , biburl } Install Caffe is required for running this code. For convenience, it is included in the folder Caffe and pre compiled in Ubuntu 14.04. The folder model contains the network definition in .prototxt and the trained weights in .caffemodel for different tasks. The folder utils contains the functions used for image restoration. The file demo_denoising.m shows that how to use the code for image denoising. The file demo_super_resolution.m shows that how to use the code for image super resolution. The file demo_jpeg_deblocking.m shows that how to use the code for JPEG deblocking. The file demo_debluring.m shows that how to use the code for non blind image debluring. The file demo_inpainting.m shows an example for scratch removal. Kindly note that the input_dim of the network should be adapted to datasets. Copyright Copyright (c) Xiao Jiao Mao, Chunhua Shen, Yu Bin Yang. 2016. This code is for non commercial purposes only. For commerical purposes, please contact Xiao Jiao Mao and Chunhua Shen This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see .",Image Denoising,Image Denoising 2679,Computer Vision,Computer Vision,Computer Vision,"Neural Nearest Neighbors Networks (NeurIPS 2018) Official implementation of the denoising (PyTorch) and correspondence classification (Tensorflow) N3NET, that will be published in our NeurIPS paper: Tobias Plötz and Stefan Roth, Neural Nearest Neighbors Networks , Advances in Neural Information Processing Systems (NeurIPS), 2018 Preprint: NeurIPS Proceedings: Contact: Tobias Plötz (tobias.ploetz@visinf.tu darmstadt.de) Installation The denoising code is tested with Python 3.6, PyTorch 0.4.1 and Cuda 8.0 but is likely to run with newer versions of PyTorch and Cuda. To install PyInn run pip install git+ Further requirements can be installed with pip install r requirements.txt Update March 28, 2019 Since tensor comprehensions is not maintained anymore, we provide a memory and time efficient implementation of an indexed matrix multiplication. To build the corresponding cuda kernel please cd into lib and run python setup.py install Please download the BSDS500, Urban100 and Set12 datasets by cd'ing into datasets/ and using the scripts provided therein. If you want to train your own Poisson Gaussian denoising model, please additionally download the DIV2k dataset and the Waterloo dataset. For setting up the correspondence classification code, please clone the repository , follow their instructions to setup your environment and copy the files located in src_correspondence/ . This can also conveniently be done using the script src_correspondence/clone_CNNet.sh . Running the denoising code To run the following commands, please cd into src_denoising . To train a new model run: python main.py To test your model run: python main.py eval eval_epoch evaldir To test our pretrained networks run: python main.py eval eval_epoch 51 evaldir pretrained_sigma A note on pretrained models. The pretrained models will give slightly different results than in the paper on Set5 due to differences in the random seed. On the other, larger datasets, results are as in the paper. Furthermore, in the paper we used a strong learning rate decay. Training our model again with a slower decay yields better results and we will add new pretrained models soon. Running the correspondence classification code To run the following commands, please cd into src_correspondence/CNNet . To train a new model run: python main.py run_mode train net_arch nips_2018_nl We provide two pretrained models. One is trained on the Brown indoor dataset, the other is trained on the St. Peters outdoor dataset. To test our pretrained models run: python main.py run_mode test net_arch nips_2018_nl data_te data_va log_dir test_log_dir Using the Neural Nearest Neighbors Block for Your Project The core of the PyTorch implementation is located in src_denoising/models/non_local.py which provides classes for neural nearest neighbors selection ( NeuralNearestNeighbors ), a domain agnostic N3Block ( N3AggregationBase ) and a N3Block tailored towards image data ( N3Aggregation2D ). The file src_denoising/models/n3net.py contains the N3Net module that uses the 2D N3Block as non local processing layer. The core of the Tensorflow implementation is located in src_correspondence/non_local.py which provides analogous functionality as above. Citation @inproceedings{Ploetz:2018:NNN, title {Neural Nearest Neighbors Networks}, author {Pl\ otz, Tobias and Roth, Stefan}, booktitle {Advances in Neural Information Processing Systems (NeurIPS)}, year {2018} }",Image Denoising,Image Denoising 2843,Computer Vision,Computer Vision,Computer Vision,"Residual Dense Network for Image Super Resolution This repository is for RDN introduced in the following paper Yulun Zhang , Yapeng Tian , Yu Kong , Bineng Zhong , and Yun Fu , Residual Dense Network for Image Super Resolution , CVPR 2018 (spotlight), arXiv Yulun Zhang , Yapeng Tian , Yu Kong , Bineng Zhong , and Yun Fu , Residual Dense Network for Image Restoration , arXiv 2018, arXiv The code is built on EDSR (Torch) and tested on Ubuntu 14.04 environment (Torch7, CUDA8.0, cuDNN5.1) with Titan X/1080Ti/Xp GPUs. Other implementations: PyTorch_version has been implemented by Nguyễn Trần Toàn (trantoan060689@gmail.com) and merged into EDSR_PyTorch . TensorFlow_version by hengchuan. Contents 1. Introduction ( introduction) 2. Train ( train) 3. Test ( test) 4. Results ( results) 5. Citation ( citation) 6. Acknowledgements ( acknowledgements) Introduction A very deep convolutional neural network (CNN) has recently achieved great success for image super resolution (SR) and offered hierarchical features as well. However, most deep CNN based SR models do not make full use of the hierarchical features from the original low resolution (LR) images, thereby achieving relatively low performance. In this paper, we propose a novel residual dense network (RDN) to address this problem in image SR. We fully exploit the hierarchical features from all the convolutional layers. Specifically, we propose residual dense block (RDB) to extract abundant local features via dense connected convolutional layers. RDB further allows direct connections from the state of preceding RDB to all the layers of current RDB, leading to a contiguous memory (CM) mechanism. Local feature fusion in RDB is then used to adaptively learn more effective features from preceding and current local features and stabilizes the training of wider network. After fully obtaining dense local features, we use global feature fusion to jointly and adaptively learn global hierarchical features in a holistic way. Experiments on benchmark datasets with different degradation models show that our RDN achieves favorable performance against state of the art methods. ! RDB (/Figs/RDB.png) Figure 1. Residual dense block (RDB) architecture. ! RDN (/Figs/RDN.png) Figure 2. The architecture of our proposed residual dense network (RDN). Train Prepare training data 1. Download DIV2K training data (800 training + 100 validtion images) from DIV2K dataset or SNU_CVLab . 2. Place all the HR images in 'Prepare_TrainData/DIV2K/DIV2K_HR'. 3. Run 'Prepare_TrainData_HR_LR_BI/BD/DN.m' in matlab to generate LR images for BI, BD, and DN models respectively. 4. Run 'th png_to_t7.lua' to convert each .png image to .t7 file in new folder 'DIV2K_decoded'. 5. Specify the path of 'DIV2K_decoded' to ' datadir' in 'RDN_TrainCode/code/opts.lua'. For more informaiton, please refer to EDSR(Torch) . Begin to train 1. (optional) Download models for our paper and place them in '/RDN_TrainCode/experiment/model'. All the models can be downloaded from Dropbox or Baidu . 2. Cd to 'RDN_TrainCode/code', run the following scripts to train models. You can use scripts in file 'TrainRDN_scripts' to train models for our paper. bash BI, scale 2, 3, 4 BIX2F64D18C6G64P48, input 48x48, output 96x96 th main.lua scale 2 netType RDN nFeat 64 nFeaSDB 64 nDenseBlock 16 nDenseConv 8 growthRate 64 patchSize 96 dataset div2k datatype t7 DownKernel BI splitBatch 4 trainOnly true BIX3F64D18C6G64P32, input 32x32, output 96x96, fine tune on RDN_BIX2.t7 th main.lua scale 3 netType resnet_cu nFeat 64 nFeaSDB 64 nDenseBlock 16 nDenseConv 8 growthRate 64 patchSize 96 dataset div2k datatype t7 DownKernel BI splitBatch 4 trainOnly true preTrained ../experiment/model/RDN_BIX2.t7 BIX4F64D18C6G64P32, input 32x32, output 128x128, fine tune on RDN_BIX2.t7 th main.lua scale 4 nGPU 1 netType resnet_cu nFeat 64 nFeaSDB 64 nDenseBlock 16 nDenseConv 8 growthRate 64 patchSize 128 dataset div2k datatype t7 DownKernel BI splitBatch 4 trainOnly true nEpochs 1000 preTrained ../experiment/model/RDN_BIX2.t7 BD, scale 3 BDX3F64D18C6G64P32, input 32x32, output 96x96, fine tune on RDN_BIX3.t7 th main.lua scale 3 nGPU 1 netType resnet_cu nFeat 64 nFeaSDB 64 nDenseBlock 16 nDenseConv 8 growthRate 64 patchSize 96 dataset div2k datatype t7 DownKernel BD splitBatch 4 trainOnly true nEpochs 200 preTrained ../experiment/model/RDN_BIX3.t7 DN, scale 3 DNX3F64D18C6G64P32, input 32x32, output 96x96, fine tune on RDN_BIX3.t7 th main.lua scale 3 nGPU 1 netType resnet_cu nFeat 64 nFeaSDB 64 nDenseBlock 16 nDenseConv 8 growthRate 64 patchSize 96 dataset div2k datatype t7 DownKernel DN splitBatch 4 trainOnly true nEpochs 200 preTrained ../experiment/model/RDN_BIX3.t7 Only RDN_BIX2.t7 was trained using 48x48 input patches. All other models were trained using 32x32 input patches in order to save training time. However, smaller input patch size in training would lower the performance to some degree. We also set ' trainOnly true' to save GPU memory. Test Quick start 1. Download models for our paper and place them in '/RDN_TestCode/model'. All the models can be downloaded from Dropbox or Baidu . 2. Run 'TestRDN.lua' You can use scripts in file 'TestRDN_scripts' to produce results for our paper. bash No self ensemble: RDN BI degradation model, X2, X3, X4 th TestRDN.lua model RDN_BIX2 degradation BI scale 2 selfEnsemble false dataset Set5 th TestRDN.lua model RDN_BIX3 degradation BI scale 3 selfEnsemble false dataset Set5 th TestRDN.lua model RDN_BIX4 degradation BI scale 4 selfEnsemble false dataset Set5 BD degradation model, X3 th TestRDN.lua model RDN_BDX3 degradation BD scale 3 selfEnsemble false dataset Set5 DN degradation model, X3 th TestRDN.lua model RDN_DNX3 degradation DN scale 3 selfEnsemble false dataset Set5 With self ensemble: RDN+ BI degradation model, X2, X3, X4 th TestRDN.lua model RDN_BIX2 degradation BI scale 2 selfEnsemble true dataset Set5 th TestRDN.lua model RDN_BIX3 degradation BI scale 3 selfEnsemble true dataset Set5 th TestRDN.lua model RDN_BIX4 degradation BI scale 4 selfEnsemble true dataset Set5 BD degradation model, X3 th TestRDN.lua model RDN_BDX3 degradation BD scale 3 selfEnsemble true dataset Set5 DN degradation model, X3 th TestRDN.lua model RDN_DNX3 degradation DN scale 3 selfEnsemble true dataset Set5 The whole test pipeline 1. Prepare test data. Place the original test sets (e.g., Set5, other test sets are available from GoogleDrive or Baidu ) in 'OriginalTestData'. Run 'Prepare_TestData_HR_LR.m' in Matlab to generate HR/LR images with different degradation models. 2. Conduct image SR. See Quick start 3. Evaluate the results. Run 'Evaluate_PSNR_SSIM.m' to obtain PSNR/SSIM values for paper. Results ! PSNR_SSIM_BI (/Figs/PSNR_SSIM_BI.png) Table 1. Benchmark results with BI degradation model. Average PSNR/SSIM values for scaling factor ×2, ×3, and ×4. ! PSNR_SSIM_BD_DN (/Figs/PSNR_SSIM_BD_DN.png) Table 2. Benchmark results with BD and DN degradation models. Average PSNR/SSIM values for scaling factor ×3. Citation If you find the code helpful in your resarch or work, please cite the following papers. @InProceedings{Lim_2017_CVPR_Workshops, author {Lim, Bee and Son, Sanghyun and Kim, Heewon and Nah, Seungjun and Lee, Kyoung Mu}, title {Enhanced Deep Residual Networks for Single Image Super Resolution}, booktitle {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month {July}, year {2017} } @inproceedings{zhang2018residual, title {Residual Dense Network for Image Super Resolution}, author {Zhang, Yulun and Tian, Yapeng and Kong, Yu and Zhong, Bineng and Fu, Yun}, booktitle {CVPR}, year {2018} } @article{zhang2018rdnir, title {Residual Dense Network for Image Restoration}, author {Zhang, Yulun and Tian, Yapeng and Kong, Yu and Zhong, Bineng and Fu, Yun}, booktitle {arXiv}, year {2018} } Acknowledgements This code is built on EDSR (Torch) . We thank the authors for sharing their codes of EDSR Torch version and PyTorch version .",Image Denoising,Image Denoising 2523,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Parser This repository contains the code used to train the parsers described in the paper Deep Biaffine Attention for Neural Dependency Parsing . Here we describe how the source code is structured and how to train/validate/test models. Where are the files you care about? lib/linalg.py : This file contains general purpose functions that don't require any knowledge of hyperparameters. For example, the linear and bilinear functions, which simply return the result of applying an affine or biaffine transformation to the input. configurable.py : This file contains the Configurable class, which wraps a SafeConfigParser that stores model hyperparameter options (such as dropout keep probability and recurrent size). Most or all classes in this repository inherit from it. lib/models/nn.py : This file contains the NN class, which inherits from Configurable . It contains functions such as MLP and RNN that are general purpose but require knowledge of model hyperparameters. lib/models/rnn.py : This file contains functions for building tensorflow recurrent neural networks. It is largely copied and pasted tensorflow source code with a few modifications to include a dynamic bidirectional recurrent neural network (rather than just a dynamic unidirectional one, which was all that was available when this project was started) and same mask recurrent dropout. lib/models/parsers : This directory contains different parser architectures. All parsers inherit from BaseParser , which in turn inherits from NN . The README in that directory details the differences between architectures. lib/rnn_cells : This directory contains a number of different recurrent cells (including LSTMs and GRUs). All recurrent cells inherit from BaseCell which inherits from Configurable (but not NN ). The README in that directory details the different cell types. lib/optimizers : This directory contains the optimizer used to optimize the network. All optimizers inherit from BaseOptimizer which inherits from Configurable (again not NN ). See the README in that directory for further explanation. vocab.py : This file contains the Vocab class, which manages a vocabulary of discrete strings (tokens, POS tags, dependency labels). bucket.py : This file contains the Bucket class, which manages all sequences of data up to a certain length, and pads everything shorter than that length with special tokens. metabucket.py : This file contains the Metabucket class, which manages a group of multiple buckets, efficiently determining which bucket a new sentence goes in. dataset.py : This file contains the Dataset class, which manages an entire dataset (e.g. the training set or the test set), reading in a conll file and grabbing minibatches. network.py : This file contains the Network class, which manages the training and testing of a neural network. It contains three Dataset objects one for the training set, one for the validation set, and one for the test set three Vocab objects one for the words, one for the POS tags, and one for the dependency labels one NN object a parser architecture or other user defined architecutre and a BaseOptimizer object (stored in the self._ops dictionary). This is also the file you call to run the network. How do you run the model? Data After downloading the repository, you will need a few more things: pretrained word embeddings : We used 100 dimensional GloVe embeddings data : We used the Penn TreeBank dataset automatically converted to Stanford Dependencies, but since this dataset is proprietary, you can instead use the freely available English Web Treebank in Universal Dependencies format. We will assume that the dataset has been downloaded and exists in the directory data/EWT and the word embeddings exist in data/glove . Config files All configuration options can be specified on the command line, but it's much easier to instead store them in a configuration file. This includes the location of the data files. We recommend creating a new configuration file config/myconfig.cfg in the config directory: OS embed_dir data/glove embed_file %(embed_dir)s/en.100d.txt data_dir data/EWT train_file %(data_dir)s/train.conllu valid_file %(data_dir)s/dev.conllu test_file %(data_dir)s/test.conllu This is also where other options can be specified; for example, to use the same configuration options used in the paper, one would also add Layers n_recur 4 Dropout mlp_keep_prob .67 ff_keep_prob .67 Regularization l2_reg 0 Radam chi 0 Learning rate learning_rate 2e 3 decay_steps 2500 Training The model can be trained with bash python network.py config_file config/myconfig.cfg save_dir saves/mymodel The saves directory must already exist. It will attempt to create a mymodel directory in saves ; if saves/mymodel already exists, it will warn the user and ask if they want to continue. This is to prevent accidentally overwriting trained models. The model then reads in the training files and prints out the shapes of each bucket. By default, all matrices are initialized orthonormally; in order to generate orthonormal matrices, it starts with a random normal matrix and optimizes it to be orthonormal (on the CPU, using numpy). The final loss of this is printed, so that if the optimizer diverges (which is very rare but does occasionally happen) the researcher can restart. Durint training, the model prints out training and validation loss, labeled attachment accuracy, and runtime (in sentences/second). During validation, the model also generates a sanitycheck.txt file in the save directory that prints out the model's predictions on sentences in the validation file. It also saves history.pkl to the save directory, which records the model's training and validation loss and accuracy. At this stage the model makes no attempt to ensure that the trees are well formed and it makes no attempt to ignore punctuation. The model will periodically save its tensorflow state so that it can be reloaded in the event of a crash or accidental termination. If the researcher wishes to terminate the model prematurely, they can do so with ; in this event, they will be prompted to save the model with or discard it with another . Testing The model can be validated with bash python network.py save_dir saves/mymodel validate python network.py save_dir saves/mymodel test This creates a parsed copy of the validation and test files in the save directory. The model also reports unlabeled and labeled attachment accuracy in saves/mymodel/scores.txt , but these calculate punctuation differently from what is standard. One should instead use the perl script in bin to compute accuracies: bash perl bin/eval.pl q b g data/EWT/dev.conllu \ s saves/mymodel/dev.conllu \ o saves/mymodel/dev.scores.txt Statistical significance between two models can similarly be computed using a perl script: bash perl bin/compare.pl saves/mymodel/dev.scores.txt saves/defaults/dev.scores.txt The current build is designed for research purposes, so explicit functionality for parsing texts is not currently supported. What does the model put in the save directory? config.cfg : A configuration file containing the model hyperparameters. Since hyperparameters can come from a variety of different sources (including multiple config files and command line arguments), this is necessary for restoring it later and remembering what hyperparameters were used. HEAD : The github repository head keeps track of the current github build, so that if the current github version is incompatible with the trained model, the researcher knows which commit they need to restore to run it. history.pkl : A python pickle file containing a dictionary of training and validation history. : tensorflow checkpoint file indicating which model to restore. trained (.txt) : tensorflow model after training for iterations. words.txt / tags.txt / rels.txt : Vocabulary files containing all words/tags/labels in the training set and their frequency, sorted by frequency. sanitycheck.txt : The model's validation output. The sentences are grouped by bucket, not in the original order they were observed in the file, and the parses are chosen greedily rather than using any MST parsing algorithm to ensure well formedness. Predicted heads/relations are put in second to last two columns, and gold heads/relations are put in the last two columns. scores.txt : The model's self reported unlabeled/labeled accuracy scores. As previously stated, don't trust these numbers too much use the perl script. dev.conllu / test.conllu : The parsed validation and test datasets.",Dependency Parsing,Dependency Parsing 2524,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Parser This repository contains the code used to train the parsers described in the paper Deep Biaffine Attention for Neural Dependency Parsing . Here we describe how the source code is structured and how to train/validate/test models. Where are the files you care about? lib/linalg.py : This file contains general purpose functions that don't require any knowledge of hyperparameters. For example, the linear and bilinear functions, which simply return the result of applying an affine or biaffine transformation to the input. configurable.py : This file contains the Configurable class, which wraps a SafeConfigParser that stores model hyperparameter options (such as dropout keep probability and recurrent size). Most or all classes in this repository inherit from it. lib/models/nn.py : This file contains the NN class, which inherits from Configurable . It contains functions such as MLP and RNN that are general purpose but require knowledge of model hyperparameters. lib/models/rnn.py : This file contains functions for building tensorflow recurrent neural networks. It is largely copied and pasted tensorflow source code with a few modifications to include a dynamic bidirectional recurrent neural network (rather than just a dynamic unidirectional one, which was all that was available when this project was started) and same mask recurrent dropout. lib/models/parsers : This directory contains different parser architectures. All parsers inherit from BaseParser , which in turn inherits from NN . The README in that directory details the differences between architectures. lib/rnn_cells : This directory contains a number of different recurrent cells (including LSTMs and GRUs). All recurrent cells inherit from BaseCell which inherits from Configurable (but not NN ). The README in that directory details the different cell types. lib/optimizers : This directory contains the optimizer used to optimize the network. All optimizers inherit from BaseOptimizer which inherits from Configurable (again not NN ). See the README in that directory for further explanation. vocab.py : This file contains the Vocab class, which manages a vocabulary of discrete strings (tokens, POS tags, dependency labels). bucket.py : This file contains the Bucket class, which manages all sequences of data up to a certain length, and pads everything shorter than that length with special tokens. metabucket.py : This file contains the Metabucket class, which manages a group of multiple buckets, efficiently determining which bucket a new sentence goes in. dataset.py : This file contains the Dataset class, which manages an entire dataset (e.g. the training set or the test set), reading in a conll file and grabbing minibatches. network.py : This file contains the Network class, which manages the training and testing of a neural network. It contains three Dataset objects one for the training set, one for the validation set, and one for the test set three Vocab objects one for the words, one for the POS tags, and one for the dependency labels one NN object a parser architecture or other user defined architecutre and a BaseOptimizer object (stored in the self._ops dictionary). This is also the file you call to run the network. How do you run the model? Data After downloading the repository, you will need a few more things: pretrained word embeddings : We used 100 dimensional GloVe embeddings data : We used the Penn TreeBank dataset automatically converted to Stanford Dependencies, but since this dataset is proprietary, you can instead use the freely available English Web Treebank in Universal Dependencies format. We will assume that the dataset has been downloaded and exists in the directory data/EWT and the word embeddings exist in data/glove . Config files All configuration options can be specified on the command line, but it's much easier to instead store them in a configuration file. This includes the location of the data files. We recommend creating a new configuration file config/myconfig.cfg in the config directory: OS embed_dir data/glove embed_file %(embed_dir)s/en.100d.txt data_dir data/EWT train_file %(data_dir)s/train.conllu valid_file %(data_dir)s/dev.conllu test_file %(data_dir)s/test.conllu This is also where other options can be specified; for example, to use the same configuration options used in the paper, one would also add Layers n_recur 4 Dropout mlp_keep_prob .67 ff_keep_prob .67 Regularization l2_reg 0 Radam chi 0 Learning rate learning_rate 2e 3 decay_steps 2500 Training The model can be trained with bash python network.py config_file config/myconfig.cfg save_dir saves/mymodel The saves directory must already exist. It will attempt to create a mymodel directory in saves ; if saves/mymodel already exists, it will warn the user and ask if they want to continue. This is to prevent accidentally overwriting trained models. The model then reads in the training files and prints out the shapes of each bucket. By default, all matrices are initialized orthonormally; in order to generate orthonormal matrices, it starts with a random normal matrix and optimizes it to be orthonormal (on the CPU, using numpy). The final loss of this is printed, so that if the optimizer diverges (which is very rare but does occasionally happen) the researcher can restart. Durint training, the model prints out training and validation loss, labeled attachment accuracy, and runtime (in sentences/second). During validation, the model also generates a sanitycheck.txt file in the save directory that prints out the model's predictions on sentences in the validation file. It also saves history.pkl to the save directory, which records the model's training and validation loss and accuracy. At this stage the model makes no attempt to ensure that the trees are well formed and it makes no attempt to ignore punctuation. The model will periodically save its tensorflow state so that it can be reloaded in the event of a crash or accidental termination. If the researcher wishes to terminate the model prematurely, they can do so with ; in this event, they will be prompted to save the model with or discard it with another . Testing The model can be validated with bash python network.py save_dir saves/mymodel validate python network.py save_dir saves/mymodel test This creates a parsed copy of the validation and test files in the save directory. The model also reports unlabeled and labeled attachment accuracy in saves/mymodel/scores.txt , but these calculate punctuation differently from what is standard. One should instead use the perl script in bin to compute accuracies: bash perl bin/eval.pl q b g data/EWT/dev.conllu \ s saves/mymodel/dev.conllu \ o saves/mymodel/dev.scores.txt Statistical significance between two models can similarly be computed using a perl script: bash perl bin/compare.pl saves/mymodel/dev.scores.txt saves/defaults/dev.scores.txt The current build is designed for research purposes, so explicit functionality for parsing texts is not currently supported. What does the model put in the save directory? config.cfg : A configuration file containing the model hyperparameters. Since hyperparameters can come from a variety of different sources (including multiple config files and command line arguments), this is necessary for restoring it later and remembering what hyperparameters were used. HEAD : The github repository head keeps track of the current github build, so that if the current github version is incompatible with the trained model, the researcher knows which commit they need to restore to run it. history.pkl : A python pickle file containing a dictionary of training and validation history. : tensorflow checkpoint file indicating which model to restore. trained (.txt) : tensorflow model after training for iterations. words.txt / tags.txt / rels.txt : Vocabulary files containing all words/tags/labels in the training set and their frequency, sorted by frequency. sanitycheck.txt : The model's validation output. The sentences are grouped by bucket, not in the original order they were observed in the file, and the parses are chosen greedily rather than using any MST parsing algorithm to ensure well formedness. Predicted heads/relations are put in second to last two columns, and gold heads/relations are put in the last two columns. scores.txt : The model's self reported unlabeled/labeled accuracy scores. As previously stated, don't trust these numbers too much use the perl script. dev.conllu / test.conllu : The parsed validation and test datasets.",Dependency Parsing,Dependency Parsing 2533,Natural Language Processing,Natural Language Processing,Natural Language Processing,"TensorFlow Models This repository contains a number of different models implemented in TensorFlow : The official models (official) are a collection of example models that use TensorFlow's high level APIs. They are intended to be well maintained, tested, and kept up to date with the latest stable TensorFlow API. They should also be reasonably optimized for fast performance while still being easy to read. We especially recommend newer TensorFlow users to start here. The research models are a large collection of models implemented in TensorFlow by researchers. They are not officially supported or available in release branches; it is up to the individual researchers to maintain the models and/or provide support on issues and pull requests. The samples folder (samples) contains code snippets and smaller models that demonstrate features of TensorFlow, including code presented in various blog posts. The tutorials folder (tutorials) is a collection of models described in the TensorFlow tutorials . Contribution guidelines If you want to contribute to models, be sure to review the contribution guidelines (CONTRIBUTING.md). License Apache License 2.0 (LICENSE)",Dependency Parsing,Dependency Parsing 2553,Natural Language Processing,Natural Language Processing,Natural Language Processing,"TensorFlow Models This repository contains a number of different models implemented in TensorFlow : The official models (official) are a collection of example models that use TensorFlow's high level APIs. They are intended to be well maintained, tested, and kept up to date with the latest stable TensorFlow API. They should also be reasonably optimized for fast performance while still being easy to read. We especially recommend newer TensorFlow users to start here. The research models are a large collection of models implemented in TensorFlow by researchers. They are not officially supported or available in release branches; it is up to the individual researchers to maintain the models and/or provide support on issues and pull requests. The samples folder (samples) contains code snippets and smaller models that demonstrate features of TensorFlow, including code presented in various blog posts. The tutorials folder (tutorials) is a collection of models described in the TensorFlow tutorials . Contribution guidelines If you want to contribute to models, be sure to review the contribution guidelines (CONTRIBUTING.md). License Apache License 2.0 (LICENSE)",Dependency Parsing,Dependency Parsing 2695,Natural Language Processing,Natural Language Processing,Natural Language Processing,"Singlish Parser This repository contains the modified code used to train the Singlish dependency parser, proposed in the ACL2017 long paper Universal Dependencies Parsing for Colloquial Singaporean English . The Singlish dependency parser is built on top of an English base parser trained using the network described in Deep Biaffine Attention for Neural Dependency Parsing , whose code is available at and its original Readme follows this Readme. The Singlish dependency treebank is released here as a new dependency parsing dataset, annotated with Universal Dependencies , for an important creole of English, Colloquial Singaporean English (Singlish), contained in the folder Singlish/treebank. The model for the Singlish parser with neural stacking, as presented in the paper, is in the folder Singlish/model. The corresponding config file is config/Singlish.cfg and the Singlish embeddings used is Singlish/embedding/Singlish.ice.vec.txt The model for the Singlish POS tagger with neural stacking is in the folder Singlish/pos_tagger. The codes to train such a POS tagger is at NNHetSeq Modified by Jie . Tip: words.txt, tags.txt, and rels.txt should be saved when training the base English parser, and put in the saves directory when loading the base model. Please go to the ud_tf0.12 branch to clone the Singlish dependency parser code and materials. Bibtex : Original Readme: Parser This repository contains the code used to train the parsers described in the paper Deep Biaffine Attention for Neural Dependency Parsing . Here we describe how the source code is structured and how to train/validate/test models. Where are the files you care about? lib/linalg.py : This file contains general purpose functions that don't require any knowledge of hyperparameters. For example, the linear and bilinear functions, which simply return the result of applying an affine or biaffine transformation to the input. configurable.py : This file contains the Configurable class, which wraps a SafeConfigParser that stores model hyperparameter options (such as dropout keep probability and recurrent size). Most or all classes in this repository inherit from it. lib/models/nn.py : This file contains the NN class, which inherits from Configurable . It contains functions such as MLP and RNN that are general purpose but require knowledge of model hyperparameters. lib/models/rnn.py : This file contains functions for building tensorflow recurrent neural networks. It is largely copied and pasted tensorflow source code with a few modifications to include a dynamic bidirectional recurrent neural network (rather than just a dynamic unidirectional one, which was all that was available when this project was started) and same mask recurrent dropout. lib/models/parsers : This directory contains different parser architectures. All parsers inherit from BaseParser , which in turn inherits from NN . The README in that directory details the differences between architectures. lib/rnn_cells : This directory contains a number of different recurrent cells (including LSTMs and GRUs). All recurrent cells inherit from BaseCell which inherits from Configurable (but not NN ). The README in that directory details the different cell types. lib/optimizers : This directory contains the optimizer used to optimize the network. All optimizers inherit from BaseOptimizer which inherits from Configurable (again not NN ). See the README in that directory for further explanation. vocab.py : This file contains the Vocab class, which manages a vocabulary of discrete strings (tokens, POS tags, dependency labels). bucket.py : This file contains the Bucket class, which manages all sequences of data up to a certain length, and pads everything shorter than that length with special tokens. metabucket.py : This file contains the Metabucket class, which manages a group of multiple buckets, efficiently determining which bucket a new sentence goes in. dataset.py : This file contains the Dataset class, which manages an entire dataset (e.g. the training set or the test set), reading in a conll file and grabbing minibatches. network.py : This file contains the Network class, which manages the training and testing of a neural network. It contains three Dataset objects one for the training set, one for the validation set, and one for the test set three Vocab objects one for the words, one for the POS tags, and one for the dependency labels one NN object a parser architecture or other user defined architecutre and a BaseOptimizer object (stored in the self._ops dictionary). This is also the file you call to run the network. How do you run the model? Data After downloading the repository, you will need a few more things: pretrained word embeddings : We used 100 dimensional GloVe embeddings data : We used the Penn TreeBank dataset automatically converted to Stanford Dependencies, but since this dataset is proprietary, you can instead use the freely available English Web Treebank in Universal Dependencies format. We will assume that the dataset has been downloaded and exists in the directory data/EWT and the word embeddings exist in data/glove . Config files All configuration options can be specified on the command line, but it's much easier to instead store them in a configuration file. This includes the location of the data files. We recommend creating a new configuration file config/myconfig.cfg in the config directory: OS embed_dir data/glove embed_file %(embed_dir)s/en.100d.txt data_dir data/EWT train_file %(data_dir)s/train.conllu valid_file %(data_dir)s/dev.conllu test_file %(data_dir)s/test.conllu This is also where other options can be specified; for example, to use the same configuration options used in the paper, one would also add Layers n_recur 4 Dropout mlp_keep_prob .67 ff_keep_prob .67 Regularization l2_reg 0 Radam chi 0 Learning rate learning_rate 2e 3 decay_steps 2500 Training The model can be trained with bash python network.py config_file config/myconfig.cfg save_dir saves/mymodel The saves directory must already exist. It will attempt to create a mymodel directory in saves ; if saves/mymodel already exists, it will warn the user and ask if they want to continue. This is to prevent accidentally overwriting trained models. The model then reads in the training files and prints out the shapes of each bucket. By default, all matrices are initialized orthonormally; in order to generate orthonormal matrices, it starts with a random normal matrix and optimizes it to be orthonormal (on the CPU, using numpy). The final loss of this is printed, so that if the optimizer diverges (which is very rare but does occasionally happen) the researcher can restart. Durint training, the model prints out training and validation loss, labeled attachment accuracy, and runtime (in sentences/second). During validation, the model also generates a sanitycheck.txt file in the save directory that prints out the model's predictions on sentences in the validation file. It also saves history.pkl to the save directory, which records the model's training and validation loss and accuracy. At this stage the model makes no attempt to ensure that the trees are well formed and it makes no attempt to ignore punctuation. The model will periodically save its tensorflow state so that it can be reloaded in the event of a crash or accidental termination. If the researcher wishes to terminate the model prematurely, they can do so with ; in this event, they will be prompted to save the model with or discard it with another . Testing The model can be validated with bash python network.py save_dir saves/mymodel validate python network.py save_dir saves/mymodel test This creates a parsed copy of the validation and test files in the save directory. The model also reports unlabeled and labeled attachment accuracy in saves/mymodel/scores.txt , but these calculate punctuation differently from what is standard. One should instead use the perl script in bin to compute accuracies: bash perl bin/eval.pl q b g data/EWT/dev.conllu \ s saves/mymodel/dev.conllu \ o saves/mymodel/dev.scores.txt Statistical significance between two models can similarly be computed using a perl script: bash perl bin/compare.pl saves/mymodel/dev.scores.txt saves/defaults/dev.scores.txt The current build is designed for research purposes, so explicit functionality for parsing texts is not currently supported. What does the model put in the save directory? config.cfg : A configuration file containing the model hyperparameters. Since hyperparameters can come from a variety of different sources (including multiple config files and command line arguments), this is necessary for restoring it later and remembering what hyperparameters were used. HEAD : The github repository head keeps track of the current github build, so that if the current github version is incompatible with the trained model, the researcher knows which commit they need to restore to run it. history.pkl : A python pickle file containing a dictionary of training and validation history. : tensorflow checkpoint file indicating which model to restore. trained (.txt) : tensorflow model after training for iterations. words.txt / tags.txt / rels.txt : Vocabulary files containing all words/tags/labels in the training set and their frequency, sorted by frequency. sanitycheck.txt : The model's validation output. The sentences are grouped by bucket, not in the original order they were observed in the file, and the parses are chosen greedily rather than using any MST parsing algorithm to ensure well formedness. Predicted heads/relations are put in second to last two columns, and gold heads/relations are put in the last two columns. scores.txt : The model's self reported unlabeled/labeled accuracy scores. As previously stated, don't trust these numbers too much use the perl script. dev.conllu / test.conllu : The parsed validation and test datasets.",Dependency Parsing,Dependency Parsing