Dataset Open Access

Enron Email Time-Series Network

Volodymyr Miz; Benjamin Ricaud; Pierre Vandergheynst

We use the Enron email dataset to build a network of email addresses. It contains 614586 emails sent over the period from 6 January 1998 until 4 February 2004. During the pre-processing, we remove the periods of low activity and keep the emails from 1 January 1999 until 31 July 2002 which is 1448 days of email records in total. Also, we remove email addresses that sent less than three emails over that period. In total, the Enron email network contains 6 600 nodes and 50 897 edges.

To build a graph G = (V, E), we use email addresses as nodes V. Every node vi has an attribute which is a time-varying signal that corresponds to the number of emails sent from this address during a day. We draw an edge eij between two nodes i and j if there is at least one email exchange between the corresponding addresses.

Column 'Count' in 'edges.csv'  file is the number of 'From'->'To' email exchanges between the two addresses. This column can be used as an edge weight.

The file 'nodes.csv' contains a dictionary that is a compressed representation of time-series. The format of the dictionary is Day->The Number Of Emails Sent By the Address During That Day. The total number of days is 1448.

'id-email.csv' is a file containing the actual email addresses.

Files (1.6 MB)
Name Size
edges.csv
md5:71faba3a944150a06390378963ea16e4
608.1 kB Download
id-email.csv
md5:515fded80547ff97b73b6b228f64d636
190.6 kB Download
nodes.csv
md5:b033ffa2510d8bd5dc8b03fc904c0c8c
753.5 kB Download
293
1,123
views
downloads
All versions This version
Views 293293
Downloads 1,1231,123
Data volume 302.0 MB302.0 MB
Unique views 262262
Unique downloads 996996

Share

Cite as