Dataset Open Access

ITOP Dataset

Haque, Albert; Peng, Boya; Luo, Zelun; Alahi, Alexandre; Yeung, Serena; Fei-Fei, Li


JSON-LD (schema.org) Export

{
  "inLanguage": {
    "alternateName": "eng", 
    "@type": "Language", 
    "name": "English"
  }, 
  "description": "<p><strong>Summary</strong></p>\n\n<p>The ITOP dataset (Invariant Top View) contains 100K depth images from side and top views of a person in a scene. For each image, the location of 15 human body parts are labeled with 3-dimensional (x,y,z) coordinates, relative to the sensor&#39;s position. Read the full paper for more context [<a href=\"https://arxiv.org/pdf/1603.07076.pdf\">pdf</a>].</p>\n\n<p><strong>Getting Started</strong></p>\n\n<p>Download then decompress the h5.gz file.</p>\n\n<pre><code class=\"language-bash\">gunzip ITOP_side_test_depth_map.h5.gz</code></pre>\n\n<p>Using Python and <a href=\"https://www.h5py.org/\">h5py</a> (<em>pip install h5py</em> or <em>conda install h5py</em>), we can load the contents:</p>\n\n<pre><code class=\"language-python\">import h5py\nimport numpy as np\n\nf = h5py.File('ITOP_side_test_depth_map.h5', 'r')\ndata, ids = f.get('data'), f.get('id')\ndata, ids = np.asarray(data), np.asarray(ids)\n\nprint(data.shape, ids.shape)\n# (10501, 240, 320) (10501,)</code></pre>\n\n<p><strong>Note:</strong> For any of the <em>*_images.h5.gz</em> files, the underlying file is a tar file and not a h5 file. Please rename the file extension from <em>h5.gz</em> to <em>tar.gz</em> before opening. The following commands will work:</p>\n\n<pre><code class=\"language-bash\">mv ITOP_side_test_images.h5.gz ITOP_side_test_images.tar.gz\ntar xf ITOP_side_test_images.tar.gz</code></pre>\n\n<p><strong>Metadata</strong></p>\n\n<p>File sizes for images, depth maps, point clouds, and labels refer to the uncompressed size.</p>\n\n<pre><code>+-------+--------+---------+---------+----------+------------+--------------+---------+\n| View  | Split  | Frames  | People  | Images   | Depth Map  | Point Cloud  | Labels  |\n+-------+--------+---------+---------+----------+------------+--------------+---------+\n| Side  | Train  | 39,795  |     16  | 1.1 GiB  | 5.7 GiB    | 18 GiB       | 2.9 GiB |\n| Side  | Test   | 10,501  |      4  | 276 MiB  | 1.6 GiB    | 4.6 GiB      | 771 MiB |\n| Top   | Train  | 39,795  |     16  | 974 MiB  | 5.7 GiB    | 18 GiB       | 2.9 GiB |\n| Top   | Test   | 10,501  |      4  | 261 MiB  | 1.6 GiB    | 4.6 GiB      | 771 MiB |\n+-------+--------+---------+---------+----------+------------+--------------+---------+</code></pre>\n\n<p><strong>Data Schema</strong></p>\n\n<p>Each file contains several HDF5 datasets at the root level. Dimensions, attributes, and data types are listed below. The key refers to the (HDF5) dataset name. Let <span class=\"math-tex\">\\(n\\)</span> denote the number of images.<br>\n<br>\n<strong>Transformation</strong></p>\n\n<p>To convert from point clouds to a&nbsp;<span class=\"math-tex\">\\(240 \\times 320\\)</span> image, the following transformations were used. Let&nbsp;<span class=\"math-tex\">\\(x_{\\textrm{img}}\\)</span> and&nbsp;<span class=\"math-tex\">\\(y_{\\textrm{img}}\\)</span> denote the&nbsp;<span class=\"math-tex\">\\((x,y)\\)</span> coordinate in the image plane. Using the raw point cloud&nbsp;<span class=\"math-tex\">\\((x,y,z)\\)</span> real world coordinates, we compute the depth map as follows:&nbsp;<span class=\"math-tex\">\\(x_{\\textrm{img}} = \\frac{x}{Cz} + 160\\)</span> and&nbsp;<span class=\"math-tex\">\\(y_{\\textrm{img}} = -\\frac{y}{Cz} + 120\\)</span> where <span class=\"math-tex\">\\(C\\approx 3.50\u00d710^{\u22123} = 0.0035\\)</span> is the intrinsic camera calibration parameter. This results in the depth map:&nbsp;<span class=\"math-tex\">\\((x_{\\textrm{img}}, y_{\\textrm{img}}, z)\\)</span>.</p>\n\n<p><strong>Joint ID (Index) Mapping</strong></p>\n\n<pre><code>joint_id_to_name = {\n  0: 'Head',        8: 'Torso',\n  1: 'Neck',        9: 'R Hip',\n  2: 'R Shoulder',  10: 'L Hip',\n  3: 'L Shoulder',  11: 'R Knee',\n  4: 'R Elbow',     12: 'L Knee',\n  5: 'L Elbow',     13: 'R Foot',\n  6: 'R Hand',      14: 'L Foot',\n  7: 'L Hand',\n}</code></pre>\n\n<p><strong>Depth Maps</strong></p>\n\n<ul>\n\t<li><em>Key:</em> id\n\n\t<ul>\n\t\t<li><em>Dimensions:</em> <span class=\"math-tex\">\\((n,)\\)</span></li>\n\t\t<li><em>Data Type:</em> uint8</li>\n\t\t<li><em>Description:</em> Frame identifier in the form XX_YYYYY where XX is the person&#39;s ID number and YYYYY is the frame number.</li>\n\t</ul>\n\t</li>\n\t<li><em>Key: </em>data\n\t<ul>\n\t\t<li><em>Dimensions: </em><span class=\"math-tex\">\\((n,240,320)\\)</span></li>\n\t\t<li><em>Data Type:</em> float16</li>\n\t\t<li><em>Description:</em> Depth map (i.e. mesh) corresponding to a single frame. Depth values are in real world meters (m).</li>\n\t</ul>\n\t</li>\n</ul>\n\n<p><strong>Point Clouds</strong></p>\n\n<ul>\n\t<li><em>Key:</em> id\n\n\t<ul>\n\t\t<li><em>Dimensions:</em> <span class=\"math-tex\">\\((n,)\\)</span></li>\n\t\t<li><em>Data Type:</em> uint8</li>\n\t\t<li><em>Description:</em> Frame identifier in the form XX_YYYYY where XX is the person&#39;s ID number and YYYYY is the frame number.</li>\n\t</ul>\n\t</li>\n\t<li><em>Key: </em>data\n\t<ul>\n\t\t<li><em>Dimensions: </em><span class=\"math-tex\">\\((n,76800,3)\\)</span></li>\n\t\t<li><em>Data Type: float16</em></li>\n\t\t<li><em>Description:</em> Point cloud containing 76,800 points (240x320). Each point is represented by a 3D tuple measured in real world meters (m).</li>\n\t</ul>\n\t</li>\n</ul>\n\n<p><strong>Labels</strong></p>\n\n<ul>\n\t<li><em>Key: </em>id\n\n\t<ul>\n\t\t<li><em>Dimensions: </em><span class=\"math-tex\">\\((n,)\\)</span></li>\n\t\t<li><em>Data Type: </em>uint8</li>\n\t\t<li><em>Description:</em> Frame identifier in the form XX_YYYYY where XX is the person&#39;s ID number and YYYYY is the frame number.</li>\n\t</ul>\n\t</li>\n\t<li><em>Key: </em>is_valid\n\t<ul>\n\t\t<li><em>Dimensions: </em><span class=\"math-tex\">\\((n,)\\)</span></li>\n\t\t<li><em>Data Type: </em>uint8</li>\n\t\t<li><em>Description:</em> Flag corresponding to the result of the human labeling effort. This is a boolean value (represented by an integer) where a one (1) denotes clean, human-approved data. A zero (0) denotes noisy human body part labels. If is_valid is equal to zero, you should not use any of the provided human joint locations for the particular frame.</li>\n\t</ul>\n\t</li>\n\t<li><em>Key: </em>visible_joints\n\t<ul>\n\t\t<li><em>Dimensions: </em><span class=\"math-tex\">\\((n,15)\\)</span></li>\n\t\t<li><em>Data Type: </em>int16</li>\n\t\t<li><em>Description:</em> Binary mask indicating if each human joint is visible or occluded. This is denoted by&nbsp;<span class=\"math-tex\">\\(\\alpha\\)</span> in the paper. If&nbsp;<span class=\"math-tex\">\\(\\alpha_j=1\\)</span> then the&nbsp;<span class=\"math-tex\">\\(j^{th}\\)</span> joint is visible (i.e. not occluded). Otherwise, if&nbsp;<span class=\"math-tex\">\\(\\alpha_j = 0\\)</span> then the <span class=\"math-tex\">\\(j^{th}\\)</span> joint is occluded.</li>\n\t</ul>\n\t</li>\n\t<li><em>Key: </em>image_coordinates\n\t<ul>\n\t\t<li><em>Dimensions: </em><span class=\"math-tex\">\\((n,15,2)\\)</span></li>\n\t\t<li><em>Data Type: </em>int16</li>\n\t\t<li><em>Description:</em> Two-dimensional&nbsp;<span class=\"math-tex\">\\((x,y)\\)</span> points corresponding to the location of each joint in the depth image or depth map.</li>\n\t</ul>\n\t</li>\n\t<li><em>Key: </em>real_world_coordinates\n\t<ul>\n\t\t<li><em>Dimensions: </em><span class=\"math-tex\">\\((n,15,3)\\)</span></li>\n\t\t<li><em>Data Type: </em>float16</li>\n\t\t<li><em>Description:</em> Three-dimensional&nbsp;<span class=\"math-tex\">\\((x,y,z)\\)</span> points corresponding to the location of each joint in real world meters (m).</li>\n\t</ul>\n\t</li>\n\t<li><em>Key: </em>segmentation\n\t<ul>\n\t\t<li><em>Dimensions: </em><span class=\"math-tex\">\\((n,240,320)\\)</span></li>\n\t\t<li><em>Data Type: </em><em>int8</em></li>\n\t\t<li><em>Description:</em> Pixel-wise assignment of body part labels. The background class (i.e. no body part) is denoted by &minus;1.</li>\n\t</ul>\n\t</li>\n</ul>\n\n<p><strong>Citation</strong></p>\n\n<p>If you would like to cite our work, please use the following.</p>\n\n<p><strong>Haque A, Peng B, Luo Z, Alahi A, Yeung S, Fei-Fei L. (2016). Towards Viewpoint Invariant 3D Human Pose Estimation. European Conference on Computer Vision. Amsterdam, Netherlands. Springer.</strong></p>\n\n<pre>@inproceedings{haque2016viewpoint,\n    title={Towards Viewpoint Invariant 3D Human Pose Estimation},\n    author={Haque, Albert and Peng, Boya and Luo, Zelun and Alahi, Alexandre and Yeung, Serena and Fei-Fei, Li},\n    booktitle = {European Conference on Computer Vision},\n    month = {October},\n    year = {2016}\n}</pre>\n\n<ul>\n</ul>", 
  "license": "https://creativecommons.org/licenses/by/4.0/legalcode", 
  "creator": [
    {
      "affiliation": "Stanford University", 
      "@id": "https://orcid.org/0000-0001-6769-6370", 
      "@type": "Person", 
      "name": "Haque, Albert"
    }, 
    {
      "affiliation": "Stanford University", 
      "@type": "Person", 
      "name": "Peng, Boya"
    }, 
    {
      "affiliation": "Stanford University", 
      "@type": "Person", 
      "name": "Luo, Zelun"
    }, 
    {
      "affiliation": "Stanford University", 
      "@type": "Person", 
      "name": "Alahi, Alexandre"
    }, 
    {
      "affiliation": "Stanford University", 
      "@id": "https://orcid.org/0000-0003-0529-0628", 
      "@type": "Person", 
      "name": "Yeung, Serena"
    }, 
    {
      "affiliation": "Stanford University", 
      "@id": "https://orcid.org/0000-0002-7481-0810", 
      "@type": "Person", 
      "name": "Fei-Fei, Li"
    }
  ], 
  "url": "https://zenodo.org/record/3932973", 
  "citation": [
    {
      "@id": "https://arxiv.org/abs/arXiv:1603.07076", 
      "@type": "CreativeWork"
    }
  ], 
  "datePublished": "2016-10-08", 
  "version": "1.0", 
  "keywords": [
    "depth sensor", 
    "human pose estimation", 
    "computer vision", 
    "3D vision"
  ], 
  "@context": "https://schema.org/", 
  "distribution": [
    {
      "contentUrl": "https://zenodo.org/api/files/aefee483-2fb7-4f1c-a24e-7fe82399c5f4/ITOP_side_test_depth_map.h5.gz", 
      "encodingFormat": "gz", 
      "@type": "DataDownload"
    }, 
    {
      "contentUrl": "https://zenodo.org/api/files/aefee483-2fb7-4f1c-a24e-7fe82399c5f4/ITOP_side_test_images.h5.gz", 
      "encodingFormat": "gz", 
      "@type": "DataDownload"
    }, 
    {
      "contentUrl": "https://zenodo.org/api/files/aefee483-2fb7-4f1c-a24e-7fe82399c5f4/ITOP_side_test_labels.h5.gz", 
      "encodingFormat": "gz", 
      "@type": "DataDownload"
    }, 
    {
      "contentUrl": "https://zenodo.org/api/files/aefee483-2fb7-4f1c-a24e-7fe82399c5f4/ITOP_side_test_point_cloud.h5.gz", 
      "encodingFormat": "gz", 
      "@type": "DataDownload"
    }, 
    {
      "contentUrl": "https://zenodo.org/api/files/aefee483-2fb7-4f1c-a24e-7fe82399c5f4/ITOP_side_train_depth_map.h5.gz", 
      "encodingFormat": "gz", 
      "@type": "DataDownload"
    }, 
    {
      "contentUrl": "https://zenodo.org/api/files/aefee483-2fb7-4f1c-a24e-7fe82399c5f4/ITOP_side_train_images.h5.gz", 
      "encodingFormat": "gz", 
      "@type": "DataDownload"
    }, 
    {
      "contentUrl": "https://zenodo.org/api/files/aefee483-2fb7-4f1c-a24e-7fe82399c5f4/ITOP_side_train_labels.h5.gz", 
      "encodingFormat": "gz", 
      "@type": "DataDownload"
    }, 
    {
      "contentUrl": "https://zenodo.org/api/files/aefee483-2fb7-4f1c-a24e-7fe82399c5f4/ITOP_side_train_point_cloud.h5.gz", 
      "encodingFormat": "gz", 
      "@type": "DataDownload"
    }, 
    {
      "contentUrl": "https://zenodo.org/api/files/aefee483-2fb7-4f1c-a24e-7fe82399c5f4/ITOP_top_test_depth_map.h5.gz", 
      "encodingFormat": "gz", 
      "@type": "DataDownload"
    }, 
    {
      "contentUrl": "https://zenodo.org/api/files/aefee483-2fb7-4f1c-a24e-7fe82399c5f4/ITOP_top_test_images.h5.gz", 
      "encodingFormat": "gz", 
      "@type": "DataDownload"
    }, 
    {
      "contentUrl": "https://zenodo.org/api/files/aefee483-2fb7-4f1c-a24e-7fe82399c5f4/ITOP_top_test_labels.h5.gz", 
      "encodingFormat": "gz", 
      "@type": "DataDownload"
    }, 
    {
      "contentUrl": "https://zenodo.org/api/files/aefee483-2fb7-4f1c-a24e-7fe82399c5f4/ITOP_top_test_point_cloud.h5.gz", 
      "encodingFormat": "gz", 
      "@type": "DataDownload"
    }, 
    {
      "contentUrl": "https://zenodo.org/api/files/aefee483-2fb7-4f1c-a24e-7fe82399c5f4/ITOP_top_train_depth_map.h5.gz", 
      "encodingFormat": "gz", 
      "@type": "DataDownload"
    }, 
    {
      "contentUrl": "https://zenodo.org/api/files/aefee483-2fb7-4f1c-a24e-7fe82399c5f4/ITOP_top_train_images.h5.gz", 
      "encodingFormat": "gz", 
      "@type": "DataDownload"
    }, 
    {
      "contentUrl": "https://zenodo.org/api/files/aefee483-2fb7-4f1c-a24e-7fe82399c5f4/ITOP_top_train_labels.h5.gz", 
      "encodingFormat": "gz", 
      "@type": "DataDownload"
    }, 
    {
      "contentUrl": "https://zenodo.org/api/files/aefee483-2fb7-4f1c-a24e-7fe82399c5f4/ITOP_top_train_point_cloud.h5.gz", 
      "encodingFormat": "gz", 
      "@type": "DataDownload"
    }, 
    {
      "contentUrl": "https://zenodo.org/api/files/aefee483-2fb7-4f1c-a24e-7fe82399c5f4/sample_front.jpg", 
      "encodingFormat": "jpg", 
      "@type": "DataDownload"
    }, 
    {
      "contentUrl": "https://zenodo.org/api/files/aefee483-2fb7-4f1c-a24e-7fe82399c5f4/sample_front_labeled.jpg", 
      "encodingFormat": "jpg", 
      "@type": "DataDownload"
    }, 
    {
      "contentUrl": "https://zenodo.org/api/files/aefee483-2fb7-4f1c-a24e-7fe82399c5f4/sample_top.jpg", 
      "encodingFormat": "jpg", 
      "@type": "DataDownload"
    }, 
    {
      "contentUrl": "https://zenodo.org/api/files/aefee483-2fb7-4f1c-a24e-7fe82399c5f4/sample_top_labeled.jpg", 
      "encodingFormat": "jpg", 
      "@type": "DataDownload"
    }
  ], 
  "identifier": "https://doi.org/10.5281/zenodo.3932973", 
  "@id": "https://doi.org/10.5281/zenodo.3932973", 
  "@type": "Dataset", 
  "name": "ITOP Dataset"
}
373
5,334
views
downloads
All versions This version
Views 373373
Downloads 5,3345,334
Data volume 17.0 TB17.0 TB
Unique views 284284
Unique downloads 588588

Share

Cite as