Dataset Open Access

# Artificially-generated Lecture Video Fragmentation Dataset and Ground Truth

D. Galanopoulos; V. Mezaris

### DataCite XML Export

<?xml version='1.0' encoding='utf-8'?>
<identifier identifierType="DOI">10.5281/zenodo.1462432</identifier>
<creators>
<creator>
<creatorName>D. Galanopoulos</creatorName>
<affiliation>Centre for Research and Technology-Hellas (CERTH)</affiliation>
</creator>
<creator>
<creatorName>V. Mezaris</creatorName>
<affiliation>Centre for Research and Technology-Hellas (CERTH)</affiliation>
</creator>
</creators>
<titles>
<title>Artificially-generated Lecture Video Fragmentation Dataset and Ground Truth</title>
</titles>
<publisher>Zenodo</publisher>
<publicationYear>2018</publicationYear>
<dates>
<date dateType="Issued">2018-10-15</date>
</dates>
<resourceType resourceTypeGeneral="Dataset"/>
<alternateIdentifiers>
<alternateIdentifier alternateIdentifierType="url">https://zenodo.org/record/1462432</alternateIdentifier>
</alternateIdentifiers>
<relatedIdentifiers>
<relatedIdentifier relatedIdentifierType="DOI" relationType="IsVersionOf">10.5281/zenodo.1462431</relatedIdentifier>
<relatedIdentifier relatedIdentifierType="URL" relationType="IsPartOf">https://zenodo.org/communities/moving-h2020</relatedIdentifier>
</relatedIdentifiers>
<rightsList>
<rights rightsURI="info:eu-repo/semantics/openAccess">Open Access</rights>
</rightsList>
<descriptions>
<description descriptionType="Abstract">&lt;p&gt;We provide a large-scale lecture video dataset consisting of artificially-generated lectures, and the corresponding ground-truth fragmentation, for the purpose of evaluating lecture video fragmentation techniques.&lt;/p&gt;

&lt;p&gt;For creating this dataset, 1498 speech transcript files (generated automatically by ASR software) were used from the world&amp;#39;s biggest academic online video repository, the VideoLectures.NET. These transcripts correspond to lectures from various fields of science, such as Computer science, Mathematics, Medicine, Politics etc. In order to create the synthetic video lectures, all transcripts were randomly split in fragments, the duration of which ranges between 4 and 8 minutes. Each synthetic lecture was then assembled by combining (stitching) exactly 20 randomly selected fragments. 300 such artificially-generated lectures are included in the released dataset. Each such lecture file has a mean duration of about 120 minutes, thus the dataset contains altogether about 600 hours of artificially-generated lectures. Every pair of consecutive fragments in these lectures originally comes from different videos, consequently the point in time where such two fragments are joined is a known ground-truth fragment boundary. All these boundaries form the dataset&amp;#39;s ground truth. We should stress that we do not generate the corresponding video files for the artificially-generated lectures (only the transcripts), and one should not try to reverse-engineer the dataset creation process so as to use in some way the visual modality for detecting the fragments in this dataset.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;File format&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;After you download the provided .zip and unpack it, the extracted folder will contain two sub-folders:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;1. ALV_srt
2. ALV_srt_GT
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Each of them contains 300 files.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;ALV_srt&lt;/strong&gt; folder contains the transcripts of every artificially-generated lecture, in the standard SRT format:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;1. A numeric counter identifying each sequential subtitle
2. The time that the subtitle should appear on the screen, followed by --&amp;gt; and the time it should disappear
3. Subtitle's text itself on one or more lines
4. A blank line containing no text
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The &lt;strong&gt;ALV_srt_GT&lt;/strong&gt; folder contains the ground truth (GT) fragments corresponding to the lectures (transcripts) of the &lt;strong&gt;ALV_srt&lt;/strong&gt; folder. Each GT file consists of 3 tab-separated columns and 20 rows, in the following format:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&amp;lt;Fragment_ID_1&amp;gt;	&amp;lt;StartTime_1&amp;gt;	&amp;lt;EndTime_1&amp;gt;
&amp;lt;Fragment_ID_2&amp;gt;	&amp;lt;StartTime_2&amp;gt;	&amp;lt;EndTime_2&amp;gt;
&amp;lt;Fragment_ID_3&amp;gt;	&amp;lt;StartTime_3&amp;gt;	&amp;lt;EndTime_3&amp;gt;
.
.
.
&amp;lt;Fragment_ID_20&amp;gt;	&amp;lt;StartTime_20&amp;gt;	&amp;lt;EndTime_20&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Each row indicates a fragment. The first column indicates the ID of a fragment while the second and the third column indicate the start and the end time of the fragment respectively.&lt;/p&gt;

&lt;p&gt;This dataset is provided for academic, non-commercial use only. If you find this dataset useful in your work, please cite the following publication where the dataset is introduced:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;D. Galanopoulos, V. Mezaris, &amp;ldquo;Temporal Lecture Video Fragmentation using Word Embeddings&amp;rdquo;, Proc. 25th Int. Conf. on Multimedia Modeling (MMM2019), Thessaloniki, Greece, Jan. 2019.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Acknowledgements&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This work was supported by the EU&amp;rsquo;s Horizon 2020 research and innovation programme under grant agreement No 693092 MOVING. We are grateful to JSI/VideoLectures.NET for providing the lectures&amp;rsquo; transcripts.&lt;/p&gt;</description>
</descriptions>
<fundingReferences>
<fundingReference>
<funderName>European Commission</funderName>
<funderIdentifier funderIdentifierType="Crossref Funder ID">10.13039/501100000780</funderIdentifier>
<awardNumber awardURI="info:eu-repo/grantAgreement/EC/H2020/693092/">693092</awardNumber>
<awardTitle>Training towards a society of data-savvy information professionals to enable open leadership innovation</awardTitle>
</fundingReference>
</fundingReferences>
</resource>

97
7
views