{ "access": { "embargo": { "active": false, "reason": null }, "files": "public", "record": "public", "status": "open" }, "created": "2017-08-26T15:17:31.722473+00:00", "custom_fields": { "meeting:meeting": { "acronym": "CVAVM", "dates": "23 October", "place": "Venice, Italy", "title": "ICCV 2017 Workshop on Computer Vision for Audio-Visual Media", "url": "https://cvavm2017.wordpress.com/" } }, "deletion_status": { "is_deleted": false, "status": "P" }, "files": { "count": 1, "enabled": true, "entries": { "PID4967623.pdf": { "checksum": "md5:41c07f69ed46f44d6faac8ad165b6fa6", "ext": "pdf", "id": "acb36813-cbfd-4fe1-baf0-e20d8c36e06d", "key": "PID4967623.pdf", "metadata": null, "mimetype": "application/pdf", "size": 1553217 } }, "order": [], "total_bytes": 1553217 }, "id": "848650", "is_draft": false, "is_published": true, "links": { "access": "https://zenodo.org/api/records/848650/access", "access_links": "https://zenodo.org/api/records/848650/access/links", "access_request": "https://zenodo.org/api/records/848650/access/request", "access_users": "https://zenodo.org/api/records/848650/access/users", "archive": "https://zenodo.org/api/records/848650/files-archive", "archive_media": "https://zenodo.org/api/records/848650/media-files-archive", "communities": "https://zenodo.org/api/records/848650/communities", "communities-suggestions": "https://zenodo.org/api/records/848650/communities-suggestions", "doi": "https://doi.org/10.5281/zenodo.848650", "draft": "https://zenodo.org/api/records/848650/draft", "files": "https://zenodo.org/api/records/848650/files", "latest": "https://zenodo.org/api/records/848650/versions/latest", "latest_html": "https://zenodo.org/records/848650/latest", "media_files": "https://zenodo.org/api/records/848650/media-files", "parent": "https://zenodo.org/api/records/848649", "parent_doi": "https://zenodo.org/doi/10.5281/zenodo.848649", "parent_html": "https://zenodo.org/records/848649", "requests": "https://zenodo.org/api/records/848650/requests", "reserve_doi": "https://zenodo.org/api/records/848650/draft/pids/doi", "self": "https://zenodo.org/api/records/848650", "self_doi": "https://zenodo.org/doi/10.5281/zenodo.848650", "self_html": "https://zenodo.org/records/848650", "self_iiif_manifest": "https://zenodo.org/api/iiif/record:848650/manifest", "self_iiif_sequence": "https://zenodo.org/api/iiif/record:848650/sequence/default", "versions": "https://zenodo.org/api/records/848650/versions" }, "media_files": { "count": 0, "enabled": false, "entries": {}, "order": [], "total_bytes": 0 }, "metadata": { "creators": [ { "affiliations": [ { "name": "Universidad de la Rep\u00fablica" } ], "person_or_org": { "family_name": "Zinemanas", "given_name": "Pablo", "name": "Zinemanas, Pablo", "type": "personal" } }, { "affiliations": [ { "name": "ENS Cachan, Universit\u00e9 Paris Saclay" } ], "person_or_org": { "family_name": "Arias", "given_name": "Pablo", "name": "Arias, Pablo", "type": "personal" } }, { "affiliations": [ { "name": "Universitat Pompeu Fabra" } ], "person_or_org": { "family_name": "Haro", "given_name": "Gloria", "name": "Haro, Gloria", "type": "personal" } }, { "affiliations": [ { "name": "Universitat Pompeu Fabra" } ], "person_or_org": { "family_name": "Gomez", "given_name": "Emilia", "name": "Gomez, Emilia", "type": "personal" } } ], "description": "
Automatic transcription is a well-known task in the music information retrieval (MIR) domain, and consists on the computation of a symbolic music representation (e.g. MIDI) from an audio recording. In this work, we address the automatic transcription of video recordings when the audio modality is missing or it does not have enough quality, and thus analyze the visual information. We focus on the clarinet which is played by opening/closing a set of holes and keys. We propose a method for automatic visual note estimation by detecting the fingertips of the player and measuring their displacement with respect to the holes and keys of the clarinet. To this aim, we track the clarinet and determine its position on every frame. The relative positions of the fingertips are used as features of a machine learning algorithm trained for note pitch classification. For that purpose, a dataset is built in a semiautomatic way by estimating pitch information from audio signals in an existing collection of 4.5 hours of video recordings from six different songs performed by nine different players. Our results confirm the difficulty of performing visual vs audio automatic transcription mainly due to motion blur and occlusions that cannot be solved with a single view.
", "publication_date": "2017-10-23", "publisher": "Zenodo", "resource_type": { "id": "publication-conferencepaper", "title": { "de": "Konferenzbeitrag", "en": "Conference paper" } }, "rights": [ { "description": { "en": "The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited." }, "icon": "cc-by-icon", "id": "cc-by-4.0", "props": { "scheme": "spdx", "url": "https://creativecommons.org/licenses/by/4.0/legalcode" }, "title": { "en": "Creative Commons Attribution 4.0 International" } } ], "subjects": [ { "subject": "automatic music transcription" }, { "subject": "computer vision" }, { "subject": "deep learning" }, { "subject": "music information retrieval" }, { "subject": "multimodality" }, { "subject": "Department of Information and Communication Technologies, UPF, Barcelona" } ], "title": "Visual music transcription of clarinet video recordings trained with audio-based labelled data" }, "parent": { "access": { "owned_by": { "user": 35274 } }, "communities": { "entries": [ { "access": { "member_policy": "open", "members_visibility": "public", "record_policy": "open", "review_policy": "open", "visibility": "public" }, "children": { "allow": false }, "created": "2015-10-20T13:25:24+00:00", "custom_fields": {}, "deletion_status": { "is_deleted": false, "status": "P" }, "id": "2fec7511-7dd7-4ccf-863a-9ac7d382e82d", "links": {}, "metadata": { "curation_policy": "", "description": "This community brings together researchers working on multimodality, that is, how multiple modes of communication interact and co-operate.\n\nFeel free to contribute articles, links, calls, posters, code, etc. on all aspects of multimodal research.", "page": "", "title": "Multimodal research" }, "revision_id": 0, "slug": "multimodality", "updated": "2015-11-17T08:11:02+00:00" }, { "access": { "member_policy": "open", "members_visibility": "public", "record_policy": "open", "review_policy": "open", "visibility": "public" }, "children": { "allow": false }, "created": "2016-07-15T16:08:14+00:00", "custom_fields": {}, "deletion_status": { "is_deleted": false, "status": "P" }, "id": "874109ed-f6b7-4252-97dd-b864b29b0885", "links": {}, "metadata": { "curation_policy": "", "description": "", "page": "", "title": "Music Information Retrieval" }, "revision_id": 0, "slug": "mir", "updated": "2017-03-06T10:09:19.405429+00:00" }, { "access": { "member_policy": "open", "members_visibility": "public", "record_policy": "open", "review_policy": "open", "visibility": "public" }, "children": { "allow": false }, "created": "2016-10-19T04:53:00.471317+00:00", "custom_fields": {}, "deletion_status": { "is_deleted": false, "status": "P" }, "id": "cea556c7-3bb8-4ea9-84af-48e4709eeaa0", "links": {}, "metadata": { "curation_policy": "Image and video processing are each distinct fields from computer vision. Works that would be shown at IEEE CVPR, IET Computer Vision, SIAM SIIMS, and the like are generally suitable for this community.
\r\n\r\n\r\n", "description": "Computer vision generally deals with the reduction of images or video sequences to actionable quantities. The overlapping field of machine vision falls under the aegis of machine vision for our purposes.", "page": "", "title": "Computer Vision" }, "revision_id": 0, "slug": "computer-vision", "updated": "2017-05-24T19:31:37.717953+00:00" }, { "access": { "member_policy": "open", "members_visibility": "public", "record_policy": "open", "review_policy": "open", "visibility": "public" }, "children": { "allow": false }, "created": "2016-11-07T19:12:28.860929+00:00", "custom_fields": {}, "deletion_status": { "is_deleted": false, "status": "P" }, "id": "f9b7307d-def2-422e-ac8a-92f8a4588b85", "links": {}, "metadata": { "curation_policy": "
Resources by staff at the Department of Information and Communication Technologies, Universitat Pompeu Fabra, Barcelona, with emphasis on resources used in the context of the the María de Maeztu Strategic Research Program on data-driven knowledge extraction, by members of the Department of Information and Communication Technologies, Universitat Pompeu Fabra, Barcelona
\r\n\r\nhttp://www.upf.edu/mdm-dtic
\r\n", "description": "Department of Information and Communication Technologies. Maria de Maeztu (MdM) Unit of Excellence. UPF, Barcelona\n\nMaria de Maeztu Unit of Excellence -\n\nMdM Strategic Research Program on data-driven knowledge extraction\n\nhttp://www.upf.edu/mdm-dtic", "page": "", "title": "Department of Information and Communication Technologies, UPF, Barcelona" }, "revision_id": 0, "slug": "mdm-dtic-upf", "updated": "2018-01-24T23:31:53.330327+00:00" } ], "ids": [ "2fec7511-7dd7-4ccf-863a-9ac7d382e82d", "874109ed-f6b7-4252-97dd-b864b29b0885", "cea556c7-3bb8-4ea9-84af-48e4709eeaa0", "f9b7307d-def2-422e-ac8a-92f8a4588b85" ] }, "id": "848649", "pids": { "doi": { "client": "datacite", "identifier": "10.5281/zenodo.848649", "provider": "datacite" } } }, "pids": { "doi": { "client": "datacite", "identifier": "10.5281/zenodo.848650", "provider": "datacite" }, "oai": { "identifier": "oai:zenodo.org:848650", "provider": "oai" } }, "revision_id": 11, "stats": { "all_versions": { "data_volume": 248514720.0, "downloads": 160, "unique_downloads": 147, "unique_views": 266, "views": 303 }, "this_version": { "data_volume": 243855069.0, "downloads": 157, "unique_downloads": 145, "unique_views": 264, "views": 300 } }, "status": "published", "updated": "2020-01-20T16:43:32.981879+00:00", "versions": { "index": 1, "is_latest": true } }