Dataset Restricted Access

Understanding Brand Consistency from Web Content

Soumyadeep Roy; Niloy Ganguly; Shamik Sural; Niyati Chhaya; Anandhavelu Natarajan

Citation Style Language JSON Export

  "DOI": "10.1145/3292522.3326048", 
  "language": "eng", 
  "title": "Understanding Brand Consistency from Web Content", 
  "issued": {
    "date-parts": [
  "abstract": "<p>If you want this dataset, kindly fill the &quot;Request access&quot; form towards the bottom of this page and also mail at :</p>\n\n<p>Kindly cite the paper :&nbsp;<a href=\"\"></a></p>\n\n<p><strong>BibTex :&nbsp;</strong></p>\n\n<pre>@inproceedings{Roy:2019:UBC:3292522.3326048,\n author = {Roy, Soumyadeep and Ganguly, Niloy and Sural, Shamik and Chhaya, Niyati and Natarajan, Anandhavelu},\n title = {Understanding Brand Consistency from Web Content},\n booktitle = {Proceedings of the 10th ACM Conference on Web Science},\n series = {WebSci &#39;19},\n year = {2019},\n isbn = {978-1-4503-6202-3},\n location = {Boston, Massachusetts, USA},\n pages = {245--253},\n numpages = {9},\n url = {},\n doi = {10.1145/3292522.3326048},\n acmid = {3326048},\n publisher = {ACM},\n address = {New York, NY, USA},\n keywords = {affective computing, brand personality, reputation management, text classification},\n} </pre>\n\n<p><strong>Abstract :</strong></p>\n\n<p>Brands produce content to engage with the audience continually and tend to maintain a set of human characteristics in their marketing campaigns. In this era of digital marketing, they need to create a lot of content to keep up the engagement with their audiences. However, such kind of content authoring at scale introduces challenges in maintaining consistency in a brand&#39;s messaging tone, which is very important from a brand&#39;s perspective to ensure a persistent impression for its customers and audiences. In this work, we quantify brand personality and formulate its linguistic features. We score text articles extracted from brand communications on five personality dimensions: sincerity, excitement, competence, ruggedness and sophistication, and show that a linear SVM model achieves a decent F1 score of $0.822$. The linear SVM allows us to annotate a large set of data points free of any annotation error. We utilize this huge annotated dataset to characterize the notion of brand consistency, which is maintaining a company&#39;s targeted brand personality across time and over different content categories; we make certain interesting observations. As per our knowledge, this is the first study which investigates brand personality from the company&#39;s official websites, and that formulates and analyzes the notion of brand consistency on such a large scale.</p>\n\n<p><strong>Dataset description:</strong><br>\nEach file contain the scrapped textual content from the official webpages of Fortune 1000 companies. We use the 2017 Fortune 1000 list ranks. Please read the paper for details about data collection and cleaning</p>\n\n<p><strong>Directory structure : </strong>compressed size -<strong> 3.7 GB</strong>, uncompressed size -&nbsp;28.9 GB</p>\n\n<pre>\u251c\u2500\u2500 Cleaned MTlarge data\n\u2502&nbsp;&nbsp; \u251c\u2500\u2500 final_dynamic_data.csv (1.0 GB) : Dynamic pages per company\n\u2502&nbsp;&nbsp; \u2514\u2500\u2500 final_static_data.csv (3.8 MB) : Static pages for each company\n\u2514\u2500\u2500 Raw Scrapped Data (27.8 GB)\n    \u251c\u2500\u2500 first50fortune.csv : contains raw scrapped files for Fortune 1000 companies between the rank 1 and 50\n    \u251c\u2500\u2500 fortune150_300.csv : Between Rank 150 and 300\n    \u251c\u2500\u2500 fortune300_500.csv : Between Rank 300 to 500\n    \u251c\u2500\u2500 fortune500_550.csv : Between Rank 500 and 550\n    \u251c\u2500\u2500 fortune50_150.csv : Between Rank 50 and 150\n    \u251c\u2500\u2500 fortune550_800.csv : Between Rank 550 and 800\n    \u2514\u2500\u2500 fortune800_1000.csv : Between Rank 800 and 1000\n</pre>\n\n<p>&nbsp;</p>", 
  "author": [
      "family": "Soumyadeep Roy"
      "family": "Niloy Ganguly"
      "family": "Shamik Sural"
      "family": "Niyati Chhaya"
      "family": "Anandhavelu Natarajan"
  "id": "3565079", 
  "event-place": "Boston, MA, USA", 
  "type": "dataset", 
  "event": "11th ACM Conference on Web Science (WebSci'19)"
Views 160
Downloads 16
Data volume 59.1 GB
Unique views 131
Unique downloads 12


Cite as