Dataset Restricted Access

Understanding Brand Consistency from Web Content

Soumyadeep Roy; Niloy Ganguly; Shamik Sural; Niyati Chhaya; Anandhavelu Natarajan

MARC21 XML Export

<?xml version='1.0' encoding='UTF-8'?>
<record xmlns="">
  <datafield tag="260" ind1=" " ind2=" ">
    <subfield code="c">2019-06-26</subfield>
  <controlfield tag="005">20200124192444.0</controlfield>
  <controlfield tag="001">3565079</controlfield>
  <datafield tag="909" ind1="C" ind2="O">
    <subfield code="p">openaire_data</subfield>
    <subfield code="o"></subfield>
  <datafield tag="711" ind1=" " ind2=" ">
    <subfield code="d">June 3- - July 3, 2019</subfield>
    <subfield code="g">WebSci'19</subfield>
    <subfield code="a">11th ACM Conference on Web Science</subfield>
    <subfield code="c">Boston, MA, USA</subfield>
  <datafield tag="520" ind1=" " ind2=" ">
    <subfield code="a">&lt;p&gt;If you want this dataset, kindly fill the &amp;quot;Request access&amp;quot; form towards the bottom of this page and also mail at :;/p&gt;

&lt;p&gt;Kindly cite the paper :&amp;nbsp;&lt;a href=""&gt;;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;BibTex :&amp;nbsp;&lt;/strong&gt;&lt;/p&gt;

 author = {Roy, Soumyadeep and Ganguly, Niloy and Sural, Shamik and Chhaya, Niyati and Natarajan, Anandhavelu},
 title = {Understanding Brand Consistency from Web Content},
 booktitle = {Proceedings of the 10th ACM Conference on Web Science},
 series = {WebSci &amp;#39;19},
 year = {2019},
 isbn = {978-1-4503-6202-3},
 location = {Boston, Massachusetts, USA},
 pages = {245--253},
 numpages = {9},
 url = {},
 doi = {10.1145/3292522.3326048},
 acmid = {3326048},
 publisher = {ACM},
 address = {New York, NY, USA},
 keywords = {affective computing, brand personality, reputation management, text classification},
} &lt;/pre&gt;

&lt;p&gt;&lt;strong&gt;Abstract :&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Brands produce content to engage with the audience continually and tend to maintain a set of human characteristics in their marketing campaigns. In this era of digital marketing, they need to create a lot of content to keep up the engagement with their audiences. However, such kind of content authoring at scale introduces challenges in maintaining consistency in a brand&amp;#39;s messaging tone, which is very important from a brand&amp;#39;s perspective to ensure a persistent impression for its customers and audiences. In this work, we quantify brand personality and formulate its linguistic features. We score text articles extracted from brand communications on five personality dimensions: sincerity, excitement, competence, ruggedness and sophistication, and show that a linear SVM model achieves a decent F1 score of $0.822$. The linear SVM allows us to annotate a large set of data points free of any annotation error. We utilize this huge annotated dataset to characterize the notion of brand consistency, which is maintaining a company&amp;#39;s targeted brand personality across time and over different content categories; we make certain interesting observations. As per our knowledge, this is the first study which investigates brand personality from the company&amp;#39;s official websites, and that formulates and analyzes the notion of brand consistency on such a large scale.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dataset description:&lt;/strong&gt;&lt;br&gt;
Each file contain the scrapped textual content from the official webpages of Fortune 1000 companies. We use the 2017 Fortune 1000 list ranks. Please read the paper for details about data collection and cleaning&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Directory structure : &lt;/strong&gt;compressed size -&lt;strong&gt; 3.7 GB&lt;/strong&gt;, uncompressed size -&amp;nbsp;28.9 GB&lt;/p&gt;

&lt;pre&gt;├── Cleaned MTlarge data
│&amp;nbsp;&amp;nbsp; ├── final_dynamic_data.csv (1.0 GB) : Dynamic pages per company
│&amp;nbsp;&amp;nbsp; └── final_static_data.csv (3.8 MB) : Static pages for each company
└── Raw Scrapped Data (27.8 GB)
    ├── first50fortune.csv : contains raw scrapped files for Fortune 1000 companies between the rank 1 and 50
    ├── fortune150_300.csv : Between Rank 150 and 300
    ├── fortune300_500.csv : Between Rank 300 to 500
    ├── fortune500_550.csv : Between Rank 500 and 550
    ├── fortune50_150.csv : Between Rank 50 and 150
    ├── fortune550_800.csv : Between Rank 550 and 800
    └── fortune800_1000.csv : Between Rank 800 and 1000

  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">Indian Institute of Technology Kharagpur, India</subfield>
    <subfield code="a">Niloy Ganguly</subfield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">Indian Institute of Technology Kharagpur, India</subfield>
    <subfield code="a">Shamik Sural</subfield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">Adobe Research, Bangalore, India</subfield>
    <subfield code="a">Niyati Chhaya</subfield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">Adobe Research, Bangalore, India</subfield>
    <subfield code="a">Anandhavelu Natarajan</subfield>
  <datafield tag="542" ind1=" " ind2=" ">
    <subfield code="l">restricted</subfield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="y">Conference website</subfield>
    <subfield code="u"></subfield>
  <datafield tag="980" ind1=" " ind2=" ">
    <subfield code="a">dataset</subfield>
  <datafield tag="100" ind1=" " ind2=" ">
    <subfield code="u">Indian Institute of Technology Kharagpur, India</subfield>
    <subfield code="a">Soumyadeep Roy</subfield>
  <datafield tag="041" ind1=" " ind2=" ">
    <subfield code="a">eng</subfield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">brand personality</subfield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">reputation management</subfield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">affective computing</subfield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">text classification</subfield>
  <datafield tag="024" ind1=" " ind2=" ">
    <subfield code="a">10.1145/3292522.3326048</subfield>
    <subfield code="2">doi</subfield>
  <datafield tag="245" ind1=" " ind2=" ">
    <subfield code="a">Understanding Brand Consistency from Web Content</subfield>
Views 160
Downloads 16
Data volume 59.1 GB
Unique views 131
Unique downloads 12


Cite as