Dataset Open Access

CAD 120 affordance dataset

Sawatzky, Johann; Srikantha, Abhilash; Gall, Juergen

DataCite XML Export

<?xml version='1.0' encoding='utf-8'?>
<resource xmlns:xsi="" xmlns="" xsi:schemaLocation="">
  <identifier identifierType="DOI">10.5281/zenodo.495570</identifier>
      <creatorName>Sawatzky, Johann</creatorName>
      <affiliation>University of Bonn</affiliation>
      <creatorName>Srikantha, Abhilash</creatorName>
      <affiliation>Carl Zeiss AG</affiliation>
      <creatorName>Gall, Juergen</creatorName>
      <affiliation>University of Bonn</affiliation>
    <title>Cad 120 Affordance Dataset</title>
    <subject>computer vision</subject>
    <subject>semantic image segmentation</subject>
    <subject>weakly supervised learning</subject>
    <subject>convolutional neural network</subject>
    <subject>anticipating human behavior</subject>
    <subject>mapping on demand</subject>
    <date dateType="Issued">2017-04-07</date>
  <resourceType resourceTypeGeneral="Dataset"/>
    <alternateIdentifier alternateIdentifierType="url"></alternateIdentifier>
    <relatedIdentifier relatedIdentifierType="URL" relationType="IsSupplementTo"></relatedIdentifier>
    <relatedIdentifier relatedIdentifierType="URL" relationType="IsSupplementTo"></relatedIdentifier>
    <rights rightsURI="">Creative Commons Attribution 4.0</rights>
    <rights rightsURI="info:eu-repo/semantics/openAccess">Open Access</rights>
    <description descriptionType="Abstract">&lt;p&gt;% ==============================================================================&lt;br&gt;
% CAD 120 Affordance Dataset&lt;br&gt;
% Version 1.0&lt;br&gt;
% ------------------------------------------------------------------------------&lt;br&gt;
% If you use the dataset please cite:&lt;br&gt;
% Johann Sawatzky, Abhilash Srikantha, Juergen Gall.&lt;br&gt;
% Weakly Supervised Affordance Detection.&lt;br&gt;
% IEEE Conference on Computer Vision and Pattern Recognition (CVPR'17)&lt;br&gt;
% and&lt;br&gt;
% H. S. Koppula and A. Saxena.&lt;br&gt;
% Physically grounded spatio-temporal object affordances.&lt;br&gt;
% European Conference on Computer Vision (ECCV'14)&lt;br&gt;
% Any bugs or questions, please email sawatzky AT iai DOT uni-bonn DOT de.&lt;br&gt;
% ==============================================================================&lt;/p&gt;

&lt;p&gt;This is the CAD 120 Affordance Segmentation Dataset based on the Cornell Activity&lt;br&gt;
Dataset CAD 120 (see;/p&gt;


RGB frames selected from Cornell Activity Dataset. To find out the location of the frame&lt;br&gt;
in the original videos, see video_info.txt.&lt;/p&gt;

image crops taken from the selected frames and resized to 321*321. Each crop is a padded&lt;br&gt;
bounding box of an object the human interacts with in the video. Due to the padding,&lt;br&gt;
the crops may contain background and other objects.&lt;br&gt;
In each selected frame, each bounding box was processed. The bounding boxes are already&lt;br&gt;
given in the Cornell Activity Dataset.&lt;br&gt;
The 5-digit number gives the frame number, the second number gives the bounding box number&lt;br&gt;
within the frame.&lt;/p&gt;

321*321*6 segmentation masks for the image crops. Each channel corresponds to an&lt;br&gt;
affordance (openabe, cuttable, pourable, containable, supportable, holdable, in this order).&lt;br&gt;
All pixels belonging to a particular affordance are labeled 1 in the respective channel,&lt;br&gt;
otherwise 0.  &lt;/p&gt;

321*321 png images, each containing the binary mask for one of the affordances.&lt;/p&gt;

Lists containing the train and test sets for two splits. The actor split ensures that&lt;br&gt;
train and test images stem from different videos with different actors while the object split ensures&lt;br&gt;
that train and test data have no (central) object classes in common.&lt;br&gt;
The train sets are additionally subdivided into 3 subsets A,B and C. For the actor split,&lt;br&gt;
the subsets stem from different videos. For the object split, each subset contains&lt;br&gt;
every third crop of the train set.&lt;/p&gt;

Maps image crops to their coordinates in the frames.&lt;/p&gt;

Maps frames to 2d human pose coordinates. Hand annotated by us.&lt;/p&gt;

Maps image crops to the (central) object it contains.&lt;/p&gt;

Maps image crops to affordances visible in this crop&lt;/p&gt;

&lt;p&gt; &lt;/p&gt;

The crops contain the following object classes:&lt;br&gt;
5.thermal cup&lt;br&gt;
7.medicine box&lt;br&gt;
10.paper box&lt;br&gt;

&lt;p&gt;Affordances in our set:&lt;br&gt;

&lt;p&gt;Note that our object affordance labeling differs from the Cornell Activity Dataset:&lt;br&gt;
E.g. the cap of a pizza box is considered to be supportable.&lt;/p&gt;

&lt;p&gt; &lt;/p&gt;</description>
    <description descriptionType="Other">Acknowledgments. The work has been financially sup-
ported by the DFG projects GA 1927/5-1 (DFG Research
Unit FOR 2535 Anticipating Human Behavior) and GA
1927/2-2 (DFG Research Unit FOR 1505 Mapping on De-
    <description descriptionType="Other">{"references": ["Sawatzky, J., Srikantha, A., Gall, J.: Weakly supervised affordance detection.  CVPR (2017)"]}</description>


Cite as