Dataset Restricted Access

Understanding Brand Consistency from Web Content

Soumyadeep Roy; Niloy Ganguly; Shamik Sural; Niyati Chhaya; Anandhavelu Natarajan


DataCite XML Export

<?xml version='1.0' encoding='utf-8'?>
<resource xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://datacite.org/schema/kernel-4" xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4.1/metadata.xsd">
  <identifier identifierType="URL">https://zenodo.org/record/3565079</identifier>
  <creators>
    <creator>
      <creatorName>Soumyadeep Roy</creatorName>
      <affiliation>Indian Institute of Technology Kharagpur, India</affiliation>
    </creator>
    <creator>
      <creatorName>Niloy Ganguly</creatorName>
      <affiliation>Indian Institute of Technology Kharagpur, India</affiliation>
    </creator>
    <creator>
      <creatorName>Shamik Sural</creatorName>
      <affiliation>Indian Institute of Technology Kharagpur, India</affiliation>
    </creator>
    <creator>
      <creatorName>Niyati Chhaya</creatorName>
      <affiliation>Adobe Research, Bangalore, India</affiliation>
    </creator>
    <creator>
      <creatorName>Anandhavelu Natarajan</creatorName>
      <affiliation>Adobe Research, Bangalore, India</affiliation>
    </creator>
  </creators>
  <titles>
    <title>Understanding Brand Consistency from Web Content</title>
  </titles>
  <publisher>Zenodo</publisher>
  <publicationYear>2019</publicationYear>
  <subjects>
    <subject>brand personality</subject>
    <subject>reputation management</subject>
    <subject>affective computing</subject>
    <subject>text classification</subject>
  </subjects>
  <dates>
    <date dateType="Issued">2019-06-26</date>
  </dates>
  <language>en</language>
  <resourceType resourceTypeGeneral="Dataset"/>
  <alternateIdentifiers>
    <alternateIdentifier alternateIdentifierType="url">https://zenodo.org/record/3565079</alternateIdentifier>
  </alternateIdentifiers>
  <relatedIdentifiers>
    <relatedIdentifier relatedIdentifierType="DOI" relationType="IsIdenticalTo">10.1145/3292522.3326048</relatedIdentifier>
  </relatedIdentifiers>
  <rightsList>
    <rights rightsURI="info:eu-repo/semantics/restrictedAccess">Restricted Access</rights>
  </rightsList>
  <descriptions>
    <description descriptionType="Abstract">&lt;p&gt;If you want this dataset, kindly fill the &amp;quot;Request access&amp;quot; form towards the bottom of this page and also mail at : soumyadeep.roy9@gmail.com.&lt;/p&gt;

&lt;p&gt;Kindly cite the paper :&amp;nbsp;&lt;a href="https://dl.acm.org/citation.cfm?id=3326048"&gt;https://dl.acm.org/citation.cfm?id=3326048&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;BibTex :&amp;nbsp;&lt;/strong&gt;&lt;/p&gt;

&lt;pre&gt;@inproceedings{Roy:2019:UBC:3292522.3326048,
 author = {Roy, Soumyadeep and Ganguly, Niloy and Sural, Shamik and Chhaya, Niyati and Natarajan, Anandhavelu},
 title = {Understanding Brand Consistency from Web Content},
 booktitle = {Proceedings of the 10th ACM Conference on Web Science},
 series = {WebSci &amp;#39;19},
 year = {2019},
 isbn = {978-1-4503-6202-3},
 location = {Boston, Massachusetts, USA},
 pages = {245--253},
 numpages = {9},
 url = {http://doi.acm.org/10.1145/3292522.3326048},
 doi = {10.1145/3292522.3326048},
 acmid = {3326048},
 publisher = {ACM},
 address = {New York, NY, USA},
 keywords = {affective computing, brand personality, reputation management, text classification},
} &lt;/pre&gt;

&lt;p&gt;&lt;strong&gt;Abstract :&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Brands produce content to engage with the audience continually and tend to maintain a set of human characteristics in their marketing campaigns. In this era of digital marketing, they need to create a lot of content to keep up the engagement with their audiences. However, such kind of content authoring at scale introduces challenges in maintaining consistency in a brand&amp;#39;s messaging tone, which is very important from a brand&amp;#39;s perspective to ensure a persistent impression for its customers and audiences. In this work, we quantify brand personality and formulate its linguistic features. We score text articles extracted from brand communications on five personality dimensions: sincerity, excitement, competence, ruggedness and sophistication, and show that a linear SVM model achieves a decent F1 score of $0.822$. The linear SVM allows us to annotate a large set of data points free of any annotation error. We utilize this huge annotated dataset to characterize the notion of brand consistency, which is maintaining a company&amp;#39;s targeted brand personality across time and over different content categories; we make certain interesting observations. As per our knowledge, this is the first study which investigates brand personality from the company&amp;#39;s official websites, and that formulates and analyzes the notion of brand consistency on such a large scale.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dataset description:&lt;/strong&gt;&lt;br&gt;
Each file contain the scrapped textual content from the official webpages of Fortune 1000 companies. We use the 2017 Fortune 1000 list ranks. Please read the paper for details about data collection and cleaning&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Directory structure : &lt;/strong&gt;compressed size -&lt;strong&gt; 3.7 GB&lt;/strong&gt;, uncompressed size -&amp;nbsp;28.9 GB&lt;/p&gt;

&lt;pre&gt;├── Cleaned MTlarge data
│&amp;nbsp;&amp;nbsp; ├── final_dynamic_data.csv (1.0 GB) : Dynamic pages per company
│&amp;nbsp;&amp;nbsp; └── final_static_data.csv (3.8 MB) : Static pages for each company
└── Raw Scrapped Data (27.8 GB)
    ├── first50fortune.csv : contains raw scrapped files for Fortune 1000 companies between the rank 1 and 50
    ├── fortune150_300.csv : Between Rank 150 and 300
    ├── fortune300_500.csv : Between Rank 300 to 500
    ├── fortune500_550.csv : Between Rank 500 and 550
    ├── fortune50_150.csv : Between Rank 50 and 150
    ├── fortune550_800.csv : Between Rank 550 and 800
    └── fortune800_1000.csv : Between Rank 800 and 1000
&lt;/pre&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;</description>
  </descriptions>
</resource>
160
16
views
downloads
Views 160
Downloads 16
Data volume 59.1 GB
Unique views 131
Unique downloads 12

Share

Cite as