Dataset Open Access
Haque, Albert;
Peng, Boya;
Luo, Zelun;
Alahi, Alexandre;
Yeung, Serena;
Fei-Fei, Li
<?xml version='1.0' encoding='UTF-8'?> <record xmlns="http://www.loc.gov/MARC21/slim"> <leader>00000nmm##2200000uu#4500</leader> <datafield tag="041" ind1=" " ind2=" "> <subfield code="a">eng</subfield> </datafield> <datafield tag="653" ind1=" " ind2=" "> <subfield code="a">depth sensor</subfield> </datafield> <datafield tag="653" ind1=" " ind2=" "> <subfield code="a">human pose estimation</subfield> </datafield> <datafield tag="653" ind1=" " ind2=" "> <subfield code="a">computer vision</subfield> </datafield> <datafield tag="653" ind1=" " ind2=" "> <subfield code="a">3D vision</subfield> </datafield> <controlfield tag="005">20200731062425.0</controlfield> <controlfield tag="001">3932973</controlfield> <datafield tag="700" ind1=" " ind2=" "> <subfield code="u">Stanford University</subfield> <subfield code="a">Peng, Boya</subfield> </datafield> <datafield tag="700" ind1=" " ind2=" "> <subfield code="u">Stanford University</subfield> <subfield code="a">Luo, Zelun</subfield> </datafield> <datafield tag="700" ind1=" " ind2=" "> <subfield code="u">Stanford University</subfield> <subfield code="a">Alahi, Alexandre</subfield> </datafield> <datafield tag="700" ind1=" " ind2=" "> <subfield code="u">Stanford University</subfield> <subfield code="0">(orcid)0000-0003-0529-0628</subfield> <subfield code="a">Yeung, Serena</subfield> </datafield> <datafield tag="700" ind1=" " ind2=" "> <subfield code="u">Stanford University</subfield> <subfield code="0">(orcid)0000-0002-7481-0810</subfield> <subfield code="a">Fei-Fei, Li</subfield> </datafield> <datafield tag="856" ind1="4" ind2=" "> <subfield code="s">245104261</subfield> <subfield code="z">md5:65f431c9f7540db6118d99bc9bae7576</subfield> <subfield code="u">https://zenodo.org/record/3932973/files/ITOP_side_test_depth_map.h5.gz</subfield> </datafield> <datafield tag="856" ind1="4" ind2=" "> <subfield code="s">257980348</subfield> <subfield code="z">md5:1803c50e44746dca7ccf03c2d46c466e</subfield> <subfield code="u">https://zenodo.org/record/3932973/files/ITOP_side_test_images.h5.gz</subfield> </datafield> <datafield tag="856" ind1="4" ind2=" "> <subfield code="s">3699135</subfield> <subfield code="z">md5:7205b0ba47f76892742ded774754d7a1</subfield> <subfield code="u">https://zenodo.org/record/3932973/files/ITOP_side_test_labels.h5.gz</subfield> </datafield> <datafield tag="856" ind1="4" ind2=" "> <subfield code="s">2061701631</subfield> <subfield code="z">md5:3f5227d6f260011b19f325fffde08a65</subfield> <subfield code="u">https://zenodo.org/record/3932973/files/ITOP_side_test_point_cloud.h5.gz</subfield> </datafield> <datafield tag="856" ind1="4" ind2=" "> <subfield code="s">926228035</subfield> <subfield code="z">md5:80736f716b0e83f7cc73ec85bb13effc</subfield> <subfield code="u">https://zenodo.org/record/3932973/files/ITOP_side_train_depth_map.h5.gz</subfield> </datafield> <datafield tag="856" ind1="4" ind2=" "> <subfield code="s">1010377751</subfield> <subfield code="z">md5:e325ed23ed962f86594b70f17c048a30</subfield> <subfield code="u">https://zenodo.org/record/3932973/files/ITOP_side_train_images.h5.gz</subfield> </datafield> <datafield tag="856" ind1="4" ind2=" "> <subfield code="s">16833112</subfield> <subfield code="z">md5:e62a67678d5cddc13e07cfdd1eb0a176</subfield> <subfield code="u">https://zenodo.org/record/3932973/files/ITOP_side_train_labels.h5.gz</subfield> </datafield> <datafield tag="856" ind1="4" ind2=" "> <subfield code="s">7840345186</subfield> <subfield code="z">md5:6ca457e8471e7514222624e937e11a9c</subfield> <subfield code="u">https://zenodo.org/record/3932973/files/ITOP_side_train_point_cloud.h5.gz</subfield> </datafield> <datafield tag="856" ind1="4" ind2=" "> <subfield code="s">245493889</subfield> <subfield code="z">md5:d8ad31ecbbcd13ee5e1f02874c0cb3d0</subfield> <subfield code="u">https://zenodo.org/record/3932973/files/ITOP_top_test_depth_map.h5.gz</subfield> </datafield> <datafield tag="856" ind1="4" ind2=" "> <subfield code="s">246678932</subfield> <subfield code="z">md5:21f702e3ce0e5602340957e6cae6148a</subfield> <subfield code="u">https://zenodo.org/record/3932973/files/ITOP_top_test_images.h5.gz</subfield> </datafield> <datafield tag="856" ind1="4" ind2=" "> <subfield code="s">9280299</subfield> <subfield code="z">md5:6a9c5d7845dc7fdf6d168ee4dd356afd</subfield> <subfield code="u">https://zenodo.org/record/3932973/files/ITOP_top_test_labels.h5.gz</subfield> </datafield> <datafield tag="856" ind1="4" ind2=" "> <subfield code="s">2020245383</subfield> <subfield code="z">md5:3ac977488864e27ac13e8cf17d03f8c7</subfield> <subfield code="u">https://zenodo.org/record/3932973/files/ITOP_top_test_point_cloud.h5.gz</subfield> </datafield> <datafield tag="856" ind1="4" ind2=" "> <subfield code="s">917859800</subfield> <subfield code="z">md5:159a8694f653f5b639252de84469f7b9</subfield> <subfield code="u">https://zenodo.org/record/3932973/files/ITOP_top_train_depth_map.h5.gz</subfield> </datafield> <datafield tag="856" ind1="4" ind2=" "> <subfield code="s">923855225</subfield> <subfield code="z">md5:6e2daf5be0f0bf6eddf611913e718417</subfield> <subfield code="u">https://zenodo.org/record/3932973/files/ITOP_top_train_images.h5.gz</subfield> </datafield> <datafield tag="856" ind1="4" ind2=" "> <subfield code="s">32165804</subfield> <subfield code="z">md5:95776e7beeb9a769bef25eb336afb5bd</subfield> <subfield code="u">https://zenodo.org/record/3932973/files/ITOP_top_train_labels.h5.gz</subfield> </datafield> <datafield tag="856" ind1="4" ind2=" "> <subfield code="s">7620649272</subfield> <subfield code="z">md5:f5fd64240296be0bfff5318beca19884</subfield> <subfield code="u">https://zenodo.org/record/3932973/files/ITOP_top_train_point_cloud.h5.gz</subfield> </datafield> <datafield tag="856" ind1="4" ind2=" "> <subfield code="s">20450</subfield> <subfield code="z">md5:86d7be54b61841fe22b27949fffc042d</subfield> <subfield code="u">https://zenodo.org/record/3932973/files/sample_front.jpg</subfield> </datafield> <datafield tag="856" ind1="4" ind2=" "> <subfield code="s">22911</subfield> <subfield code="z">md5:25aaef40a70ad75f452438824a2bb71f</subfield> <subfield code="u">https://zenodo.org/record/3932973/files/sample_front_labeled.jpg</subfield> </datafield> <datafield tag="856" ind1="4" ind2=" "> <subfield code="s">18689</subfield> <subfield code="z">md5:0afbd5971faee803d14969e4c2a71267</subfield> <subfield code="u">https://zenodo.org/record/3932973/files/sample_top.jpg</subfield> </datafield> <datafield tag="856" ind1="4" ind2=" "> <subfield code="s">17461</subfield> <subfield code="z">md5:5d6c045333e9f520c24d335f57e0422e</subfield> <subfield code="u">https://zenodo.org/record/3932973/files/sample_top_labeled.jpg</subfield> </datafield> <datafield tag="542" ind1=" " ind2=" "> <subfield code="l">open</subfield> </datafield> <datafield tag="260" ind1=" " ind2=" "> <subfield code="c">2016-10-08</subfield> </datafield> <datafield tag="909" ind1="C" ind2="O"> <subfield code="p">openaire_data</subfield> <subfield code="o">oai:zenodo.org:3932973</subfield> </datafield> <datafield tag="100" ind1=" " ind2=" "> <subfield code="u">Stanford University</subfield> <subfield code="0">(orcid)0000-0001-6769-6370</subfield> <subfield code="a">Haque, Albert</subfield> </datafield> <datafield tag="245" ind1=" " ind2=" "> <subfield code="a">ITOP Dataset</subfield> </datafield> <datafield tag="540" ind1=" " ind2=" "> <subfield code="u">https://creativecommons.org/licenses/by/4.0/legalcode</subfield> <subfield code="a">Creative Commons Attribution 4.0 International</subfield> </datafield> <datafield tag="650" ind1="1" ind2="7"> <subfield code="a">cc-by</subfield> <subfield code="2">opendefinition.org</subfield> </datafield> <datafield tag="520" ind1=" " ind2=" "> <subfield code="a"><p><strong>Summary</strong></p> <p>The ITOP dataset (Invariant Top View) contains 100K depth images from side and top views of a person in a scene. For each image, the location of 15 human body parts are labeled with 3-dimensional (x,y,z) coordinates, relative to the sensor&#39;s position. Read the full paper for more context [<a href="https://arxiv.org/pdf/1603.07076.pdf">pdf</a>].</p> <p><strong>Getting Started</strong></p> <p>Download then decompress the h5.gz file.</p> <pre><code class="language-bash">gunzip ITOP_side_test_depth_map.h5.gz</code></pre> <p>Using Python and <a href="https://www.h5py.org/">h5py</a> (<em>pip install h5py</em> or <em>conda install h5py</em>), we can load the contents:</p> <pre><code class="language-python">import h5py import numpy as np f = h5py.File('ITOP_side_test_depth_map.h5', 'r') data, ids = f.get('data'), f.get('id') data, ids = np.asarray(data), np.asarray(ids) print(data.shape, ids.shape) # (10501, 240, 320) (10501,)</code></pre> <p><strong>Note:</strong> For any of the <em>*_images.h5.gz</em> files, the underlying file is a tar file and not a h5 file. Please rename the file extension from <em>h5.gz</em> to <em>tar.gz</em> before opening. The following commands will work:</p> <pre><code class="language-bash">mv ITOP_side_test_images.h5.gz ITOP_side_test_images.tar.gz tar xf ITOP_side_test_images.tar.gz</code></pre> <p><strong>Metadata</strong></p> <p>File sizes for images, depth maps, point clouds, and labels refer to the uncompressed size.</p> <pre><code>+-------+--------+---------+---------+----------+------------+--------------+---------+ | View | Split | Frames | People | Images | Depth Map | Point Cloud | Labels | +-------+--------+---------+---------+----------+------------+--------------+---------+ | Side | Train | 39,795 | 16 | 1.1 GiB | 5.7 GiB | 18 GiB | 2.9 GiB | | Side | Test | 10,501 | 4 | 276 MiB | 1.6 GiB | 4.6 GiB | 771 MiB | | Top | Train | 39,795 | 16 | 974 MiB | 5.7 GiB | 18 GiB | 2.9 GiB | | Top | Test | 10,501 | 4 | 261 MiB | 1.6 GiB | 4.6 GiB | 771 MiB | +-------+--------+---------+---------+----------+------------+--------------+---------+</code></pre> <p><strong>Data Schema</strong></p> <p>Each file contains several HDF5 datasets at the root level. Dimensions, attributes, and data types are listed below. The key refers to the (HDF5) dataset name. Let <span class="math-tex">\(n\)</span> denote the number of images.<br> <br> <strong>Transformation</strong></p> <p>To convert from point clouds to a&nbsp;<span class="math-tex">\(240 \times 320\)</span> image, the following transformations were used. Let&nbsp;<span class="math-tex">\(x_{\textrm{img}}\)</span> and&nbsp;<span class="math-tex">\(y_{\textrm{img}}\)</span> denote the&nbsp;<span class="math-tex">\((x,y)\)</span> coordinate in the image plane. Using the raw point cloud&nbsp;<span class="math-tex">\((x,y,z)\)</span> real world coordinates, we compute the depth map as follows:&nbsp;<span class="math-tex">\(x_{\textrm{img}} = \frac{x}{Cz} + 160\)</span> and&nbsp;<span class="math-tex">\(y_{\textrm{img}} = -\frac{y}{Cz} + 120\)</span> where <span class="math-tex">\(C\approx 3.50×10^{−3} = 0.0035\)</span> is the intrinsic camera calibration parameter. This results in the depth map:&nbsp;<span class="math-tex">\((x_{\textrm{img}}, y_{\textrm{img}}, z)\)</span>.</p> <p><strong>Joint ID (Index) Mapping</strong></p> <pre><code>joint_id_to_name = { 0: 'Head', 8: 'Torso', 1: 'Neck', 9: 'R Hip', 2: 'R Shoulder', 10: 'L Hip', 3: 'L Shoulder', 11: 'R Knee', 4: 'R Elbow', 12: 'L Knee', 5: 'L Elbow', 13: 'R Foot', 6: 'R Hand', 14: 'L Foot', 7: 'L Hand', }</code></pre> <p><strong>Depth Maps</strong></p> <ul> <li><em>Key:</em> id <ul> <li><em>Dimensions:</em> <span class="math-tex">\((n,)\)</span></li> <li><em>Data Type:</em> uint8</li> <li><em>Description:</em> Frame identifier in the form XX_YYYYY where XX is the person&#39;s ID number and YYYYY is the frame number.</li> </ul> </li> <li><em>Key: </em>data <ul> <li><em>Dimensions: </em><span class="math-tex">\((n,240,320)\)</span></li> <li><em>Data Type:</em> float16</li> <li><em>Description:</em> Depth map (i.e. mesh) corresponding to a single frame. Depth values are in real world meters (m).</li> </ul> </li> </ul> <p><strong>Point Clouds</strong></p> <ul> <li><em>Key:</em> id <ul> <li><em>Dimensions:</em> <span class="math-tex">\((n,)\)</span></li> <li><em>Data Type:</em> uint8</li> <li><em>Description:</em> Frame identifier in the form XX_YYYYY where XX is the person&#39;s ID number and YYYYY is the frame number.</li> </ul> </li> <li><em>Key: </em>data <ul> <li><em>Dimensions: </em><span class="math-tex">\((n,76800,3)\)</span></li> <li><em>Data Type: float16</em></li> <li><em>Description:</em> Point cloud containing 76,800 points (240x320). Each point is represented by a 3D tuple measured in real world meters (m).</li> </ul> </li> </ul> <p><strong>Labels</strong></p> <ul> <li><em>Key: </em>id <ul> <li><em>Dimensions: </em><span class="math-tex">\((n,)\)</span></li> <li><em>Data Type: </em>uint8</li> <li><em>Description:</em> Frame identifier in the form XX_YYYYY where XX is the person&#39;s ID number and YYYYY is the frame number.</li> </ul> </li> <li><em>Key: </em>is_valid <ul> <li><em>Dimensions: </em><span class="math-tex">\((n,)\)</span></li> <li><em>Data Type: </em>uint8</li> <li><em>Description:</em> Flag corresponding to the result of the human labeling effort. This is a boolean value (represented by an integer) where a one (1) denotes clean, human-approved data. A zero (0) denotes noisy human body part labels. If is_valid is equal to zero, you should not use any of the provided human joint locations for the particular frame.</li> </ul> </li> <li><em>Key: </em>visible_joints <ul> <li><em>Dimensions: </em><span class="math-tex">\((n,15)\)</span></li> <li><em>Data Type: </em>int16</li> <li><em>Description:</em> Binary mask indicating if each human joint is visible or occluded. This is denoted by&nbsp;<span class="math-tex">\(\alpha\)</span> in the paper. If&nbsp;<span class="math-tex">\(\alpha_j=1\)</span> then the&nbsp;<span class="math-tex">\(j^{th}\)</span> joint is visible (i.e. not occluded). Otherwise, if&nbsp;<span class="math-tex">\(\alpha_j = 0\)</span> then the <span class="math-tex">\(j^{th}\)</span> joint is occluded.</li> </ul> </li> <li><em>Key: </em>image_coordinates <ul> <li><em>Dimensions: </em><span class="math-tex">\((n,15,2)\)</span></li> <li><em>Data Type: </em>int16</li> <li><em>Description:</em> Two-dimensional&nbsp;<span class="math-tex">\((x,y)\)</span> points corresponding to the location of each joint in the depth image or depth map.</li> </ul> </li> <li><em>Key: </em>real_world_coordinates <ul> <li><em>Dimensions: </em><span class="math-tex">\((n,15,3)\)</span></li> <li><em>Data Type: </em>float16</li> <li><em>Description:</em> Three-dimensional&nbsp;<span class="math-tex">\((x,y,z)\)</span> points corresponding to the location of each joint in real world meters (m).</li> </ul> </li> <li><em>Key: </em>segmentation <ul> <li><em>Dimensions: </em><span class="math-tex">\((n,240,320)\)</span></li> <li><em>Data Type: </em><em>int8</em></li> <li><em>Description:</em> Pixel-wise assignment of body part labels. The background class (i.e. no body part) is denoted by &minus;1.</li> </ul> </li> </ul> <p><strong>Citation</strong></p> <p>If you would like to cite our work, please use the following.</p> <p><strong>Haque A, Peng B, Luo Z, Alahi A, Yeung S, Fei-Fei L. (2016). Towards Viewpoint Invariant 3D Human Pose Estimation. European Conference on Computer Vision. Amsterdam, Netherlands. Springer.</strong></p> <pre>@inproceedings{haque2016viewpoint, title={Towards Viewpoint Invariant 3D Human Pose Estimation}, author={Haque, Albert and Peng, Boya and Luo, Zelun and Alahi, Alexandre and Yeung, Serena and Fei-Fei, Li}, booktitle = {European Conference on Computer Vision}, month = {October}, year = {2016} }</pre> <ul> </ul></subfield> </datafield> <datafield tag="773" ind1=" " ind2=" "> <subfield code="n">arxiv</subfield> <subfield code="i">cites</subfield> <subfield code="a">arXiv:1603.07076</subfield> </datafield> <datafield tag="773" ind1=" " ind2=" "> <subfield code="n">doi</subfield> <subfield code="i">isVersionOf</subfield> <subfield code="a">10.5281/zenodo.3932972</subfield> </datafield> <datafield tag="024" ind1=" " ind2=" "> <subfield code="a">10.5281/zenodo.3932973</subfield> <subfield code="2">doi</subfield> </datafield> <datafield tag="980" ind1=" " ind2=" "> <subfield code="a">dataset</subfield> </datafield> </record>
All versions | This version | |
---|---|---|
Views | 1,592 | 1,592 |
Downloads | 6,614 | 6,614 |
Data volume | 18.8 TB | 18.8 TB |
Unique views | 1,320 | 1,320 |
Unique downloads | 1,012 | 1,012 |