{
  "DOI": "10.5281/zenodo.495570",
  "abstract": "% ==============================================================================\n% CAD 120 Affordance Dataset\n% Version 1.0\n% ------------------------------------------------------------------------------\n% If you use the dataset please cite:\n%\n% Johann Sawatzky, Abhilash Srikantha, Juergen Gall.\n% Weakly Supervised Affordance Detection.\n% IEEE Conference on Computer Vision and Pattern Recognition (CVPR'17)\n%\n% and\n%\n% H. S. Koppula and A. Saxena.\n% Physically grounded spatio-temporal object affordances.\n% European Conference on Computer Vision (ECCV'14)\n%\n% Any bugs or questions, please email sawatzky AT iai DOT uni-bonn DOT de.\n% ==============================================================================\n\n\nThis is the CAD 120 Affordance Segmentation Dataset based on the Cornell Activity\nDataset CAD 120 (see http://pr.cs.cornell.edu/humanactivities/data.php).\n\n\nContent\n\n\nframes/*.png:\nRGB frames selected from Cornell Activity Dataset. To find out the location of the frame\nin the original videos, see video_info.txt.\n\n\nobject_crop_images/*.png\nimage crops taken from the selected frames and resized to 321*321. Each crop is a padded\nbounding box of an object the human interacts with in the video. Due to the padding,\nthe crops may contain background and other objects.\nIn each selected frame, each bounding box was processed. The bounding boxes are already\ngiven in the Cornell Activity Dataset.\nThe 5-digit number gives the frame number, the second number gives the bounding box number\nwithin the frame.\n\n\nsegmentation_mat/*.mat\n321*321*6 segmentation masks for the image crops. Each channel corresponds to an\naffordance (openabe, cuttable, pourable, containable, supportable, holdable, in this order).\nAll pixels belonging to a particular affordance are labeled 1 in the respective channel,\notherwise 0. \u00a0\n\n\nsegmentation_png/*.png\n321*321 png images, each containing the binary mask for one of the affordances.\n\n\nlists/*.txt\nLists containing the train and test sets for two splits. The actor split ensures that\ntrain and test images stem from different videos with different actors while the object split ensures\nthat train and test data have no (central) object classes in common.\nThe train sets are additionally subdivided into 3 subsets A,B and C. For the actor split,\nthe subsets stem from different videos. For the object split, each subset contains\nevery third crop of the train set.\n\n\ncrop_coordinate_info.txt\nMaps image crops to their coordinates in the frames.\n\n\nhpose_info.txt\nMaps frames to 2d human pose coordinates. Hand annotated by us.\n\n\nobject_info.txt\nMaps image crops to the (central) object it contains.\n\n\nvisible_affordance_info.txt\nMaps image crops to affordances visible in this crop\n\n\n\u00a0\n\n\n%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%55\nThe crops contain the following object classes:\n1.table\n2.kettle\n3.plate\n4.bottle\n5.thermal cup\n6.knife\n7.medicine box\n8.can\n9.microwave\n10.paper box\n11.bowl\n12.mug\n\n\nAffordances in our set:\n1.openable\n2.cuttable\n3.pourable\n4.containable\n5.supportable\n6.holdable\n\n\nNote that our object affordance labeling differs from the Cornell Activity Dataset:\nE.g. the cap of a pizza box is considered to be supportable.",
  "author": [
    {
      "family": "Sawatzky",
      "given": "Johann"
    },
    {
      "family": "Srikantha",
      "given": "Abhilash"
    },
    {
      "family": "Gall",
      "given": "Juergen"
    }
  ],
  "event": "Computer Vision and Pattern Recognition (CVPR)",
  "id": "495570",
  "issued": {
    "date-parts": [
      [
        "2017",
        "04",
        "07"
      ]
    ]
  },
  "publisher": "Zenodo",
  "title": "CAD 120 affordance dataset",
  "type": "dataset"
}