Thank you. It's a big honor to be here and very intimidating to be surrounded by people
we respect. So, many visualizations deal with numbers. Usually we're visualizing table
of numbers or entire data sets. And there's this thing about trying to get at some quantitative
truth, some notion of objectivity. But what about subjective truths? What about, you know,
if instead of looking at bigger numbers or smaller numbers, we were interested in looking
at what is more important or less important, more controversial or less controversial.
How do you get at that? How do you get at what people are thinking about? How do you
get at what people are talking about? Or fighting about? Or even obsessing about? And this is
a lot of what our talk is going to be about today. It's actually going to talk about our
obsession, an obsession that has lasted a decade, which is visualizing text. And we're
very happy to share this obsession with Ben Rubin. So, we feel like we're a very good
company here.
So we want to start by setting the stage with the first project that we worked on together.
This has turned out to be a visualization of talking and fighting, but we set out simply
to visualize Wikipedia. Back then, what we like to do is start with a mystery or a puzzle.
And we were puzzled by Wikipedia. In 2003, it was a bit of a mysterious object. There
were actually very good articles appearing, but back then, the whole idea of crowdsourcing
or letting the gates open to uncredentialed people is strange in the encyclopedia world.
We wanted to study why it worked. And so, luckily, being people who like data, we discovered
there's a lot of data available. For any article on Wikipedia, you have a full edit history
available publicly. You have the names of people who touched each version. You have
the actual text of the version itself. You have time stamps of the version. And we decided,
okay, this is enough that we can plunge right in. We quickly discovered it's sort of a tall
order to read 100 versions of the same article. So, okay, visualization to the rescue. We
created a method called history flow, and I'll quickly recap it, and then Fernanda will
give you a demo.
The idea behind history flow is if you have several versions of a document, you represent
each version by a vertical line. The length of a line tells you the length of the document
at each version. And you color sections of the document by who wrote each section. You
just assign authors colors. So here, for example, Mary wrote all of version one because it's
orange, and version two is longer. Now, it's a little hard to follow this, so we can play
some games. We can connect identical passages from version to version. And when we do this,
it's a little bit like taking a normal programming diff view and just extending it to infinity.
We see it's much easier to see, okay, so Suzanne added some text at the end of version two,
and she inserted text going from version three to version four. It can finally play sort
of all the normal interaction games that one likes to do with InfoViz. We can space these
things according to the time between versions rather than giving them equal space. And we
can sort of focus in on any one version and get details about what the text actually looked
like.
So I'm actually going to ask that we dim the lights a little bit here in the stage. Let's
look at history flow in action. So even though it's a little dark there, this is the diagram
of the evolution of the article on design. And you can see that the text of the article
is here on the right. Here are the number of people, the different people who have touched
this article. I have also this wand that I can bring back and I can travel through each
one of these versions and see how the text changed over time. So that's fine. Design
is okay. But the real thing is cats. So let's look at cats. A lot more people interested
in cats than design. And it's a much longer article too. I can scroll down here forever.
But then interesting things start to happen. I can see like this stripey pattern here.
And what happened there is that someone added a table. If you look at where my cursor is
pointing here, someone added a table that talks about the kingdom, the class, the order
of the cat's species. That's all well and good. But then there is something weird going
on here at the bottom. It looks like this antenna. And it's something that is sticking
out and is not going anywhere. So let me bring my wand here. And I can see that someone added
a bunch of paragraphs in white talking about the Unix command cat. I knew this would resonate
with you. So what happened? Did someone just deleted the whole thing afterwards? If I go
to the next version, I can see that no, instead of just deleting this content, someone redirected
the content to a new page called cat Unix. And this is exactly the kind of collaboration
and sort of negotiation that we were interested in to look into when we created this visualization.
So now let's look at something a little bit more controversial. So this is the article
on abortion. It's huge. It's very long. Does anyone see anything that jumps out as different
or weird? The gashes, right? Yeah, right. These gashes, these black gashes. So what this is
is we had a beautiful long article and then boom, someone deleted the entire thing. And
then it gets fixed here, gets reverted. Or over here, someone deleted the entire article
and said, abortion is great. And then added abortion is good. And then it got reverted.
But one of the things that we all knew that vandalism happens on Wikipedia, this is called
a mass deletion. But one of the things we did not know until we started visualizing
these articles was that for something like this, the mass deletion happened here at the
bottom, I have a timestamp. You can't see, but I will tell you. It happened on the 17th
of December at 4.06. This got fixed on the same day at 4.07. So it took a minute for
them to fix it. And we were very puzzled because this is something we kept seeing. We're like,
things would get fixed in minutes. And we actually talked to Wikipedians and we asked
them, how are you doing this? And they explained to us that there's something called a watch
list, which means that whenever you edit an article, you may want to add that article
to your watch list so that you get notifications whenever someone touches that article. And
this is how they sort of do community surveillance. And also, you know, if you get a notification
that someone you trust edited the article, you may not bother to go see. But if it's
a new person you have never seen before or an anonymous IP address, you might want to
check and make sure that it's not a Vandal. In fact, if I visualize the same data, but
over real time, these Vandalism acts happened so fast and got fixed so fast that we don't
even see the gashes anymore. So now let me show you one of our favorite pages, which
is chocolate. It's very pink, but there isn't so far anything jumping out, except that I'm
visualizing this over real time. So now let me do it by versions. I get this interesting
zigzag. When I got this, I told Martin I wanted a scarf that looked like this. But does anyone
want to guess what's happening here? An edit war. Exactly. Right? There's an argument.
Someone is putting a piece of text. Someone else is taking it out. They're putting and
taking it out. And I can show you what this is. So back here, someone added a piece of
white text, which is this short paragraph. I'll read it for you. It says, extremely rarely.
Melted chocolate has been used to make a kind of surrealist sculpture called Koolage. Okay.
So that survives for a long time. Daniel C. Boyer is the person here at the top who inserted
that paragraph. I come over here and someone says, removing Boyer invention. Well, Daniel
comes back and says, Koolage is not a Boyer invention. Well, Google search for chocolate
Koolage finds only Boyer. Removing, reverting. Leave your humbug out, reverting, and so on
and so forth. Until Daniel C. Boyer sort of gives up and says, whatever, I don't care.
Which is sad because Martin and I did a search for chocolate Koolage and it seems to exist
after all. But voila, that's life on Wikipedia. But this is, again, one of the interesting
patterns that we uncovered by thinking about visualizing this sort of log, these really
long files. Another thing we did with the same visualization is to sort of disregard
authorship, who was writing what, and just look at the age of text. So the darker a piece
of text is here, the older it is. So if I scroll down, you can see that there's quite
a lot of darker, well, this is all text here, if you could see. But there is old text. And
why do we care about this? Well, we care because old text could be used as a proxy for high
quality text when you're talking about Wikipedia because it means that it's something that
went through a bunch of revisions and nobody touched it. So it probably means it is good
stuff. And this is one of the things that we got from visualizing a bunch of articles
on Wikipedia is that we started to get interested in how representative some of the stuff that
we were seeing here was. So we ended up downloading all of Wikipedia and running a statistical
analysis to understand how quickly were things like mass deletions being fixed. And it turns
out that the median time was about three minutes for fixing mass deletions, which told us that
what we were seeing was representative and not just sort of one-offs.
So we remained fascinated by Wikipedia because you realize looking at these things are just
scratching the surface. That whole thing about chocolate coulage, it turned out talking more
with people. It's like that guy was, in fact, a surrealist artist and it wouldn't have been
out of character necessarily for him to insert something in there that was surprising. So
we got interested in Wikipedia and we decided to start visualizing individual ones, other
histories. And that turned out to be ultimately a portrait of obsession. So this is what we
started with. Again, we started with data and we said, okay, we've got for each person
on Wikipedia a history of what they've contributed, all the pages they've contributed to when
they did it and so forth. We have a comment, the title of the page, everything we need.
I should add this is work with Kate, who's here at IO. And it was interesting looking
at this because once again, it's a little too much data and we tried a bunch of stuff
and eventually here's what we came up with. We said, all right, here's the data set we
want to narrow down to. We can't do it for everyone. So we're going to look at specifically
admins, people who have special privileges and are especially active. So that was one
level of narrowing. It turned out this actually already focused it to sort of an unusual group
of people, people who made the median of 12,000 edits total. And in fact, the largest human
that we found on the list or the most active was making an edit on average every 10 minutes
over the course of two years. Think about that. So how do you visualize all of that?
It's really hard because our history is largely text, its titles, its comments. How can you
see patterns? Well, here's what we did and it's going to sound absolutely preposterous
and a little weird but bear with me. So we said we're going to convert the text to colors.
Here's our mapping. We're going to say the first letter is going to map to the hue of
a color, the second to brightness, the third to saturation. For numbers, we're going to
go a little bit crazy and not do a color. We'll just do gray scale where the darkness
tells us the first digit. That's it. So we're throwing away a vast amount of information
and this is actually what you end up with. That you see, okay, you know, numbers one
to eight are gray. You can see articles here, maps to red as does anything with A. By the
bottom you see, you know, going down to alphabet and through the rainbow, wiki project maps
to purple. So the question is, what can you learn by throwing away so much information?
Is there any signal left? So imagine that we are using this mapping
to understand what articles people are touching. The title of the article is what we're going
to use to color things. So let's look at what this looks like. So this is a single person's
activity on Wikipedia over time. I'm scrolling down. You can see it's very bursty. So time
goes down. And each one of these colored tiles is one edit that this person made. This happens
to be back in October 2004 for this person. And you can see that, for instance, you know,
here he is adding, I don't know if you can read it, but at the top basically this person
is editing the article on bourgeois. So the kind of wine. And let's, again, very bursty,
not active all the time. But if I start to mouse over some of the stuff here, he talks
about fermentation, palmist wine making, wine, another kind, maceration, carbonic maceration,
wine again, he's very much into wines. And then different kinds of wines to this person.
Big white house, doing all sorts of new world wine. So it's very cohesive kind of editing.
Then takes a couple of weeks off and comes back. And if I look at what he's taking, looking
at exactly, it's like breweries. So he's done with wine and he's looking into beer, brewers
and brewers, all different kinds of beers that we had never heard of. But as a Wikipedia,
you have to be very diverse. And so you get a sense of this person is giving themselves
a project, right? I can also visualize this as a wall instead of just being very respectful
of time. Now it's just a sequence of edits. So basically I'm wrapping everything together
into a big wall. And the oldest edit is at the top here at the top left. And the most
recent edit is here at the bottom right. And we're fitting more than 5000 items into the
screen and individual rectangles are still visible. So now let's look at another person.
Let's try to understand if this is how useful this is. So I'm going to bring a different
admin, someone who actually has an IP address. And their edit history is very, very different.
So at the top here, this sort of bluish tone is all about the article called October. The
one here that's sort of turquoise is all November. It's all November. Different days, November
25th, 26th, 28th, and so forth. And then we're going to December. Now by now you know what
the algorithm is. And then the gray here is all the years. So this is 1945, 1946, 1947,
and so forth. So what could this person be doing? Paying attention only to these kinds
of articles. If I switch the mapping from the title of the article to the comments that
this person is leaving as they are editing, I see an interesting pattern. Everything that
is orange is births. And everything that is sort of this dark green is deaths. So this
person is going day by day, month by month, year by year, adding births and deaths to
Wikipedia. So they're keeping up these sort of count on what notable people have died
or been born on different dates. Let's look at something very different. I have a different
person here. This is someone who has over 14,000 edits. And one of the things I want
to call attention to is if you start looking around this area here, you start to sort of
see a rainbow. You get things over here that are sort of orangey. And then you get things
here that are sort of yellowish going into green. And you get your greens and blues
and finally purples. So this person, and if I again switch to the comment, all the blue
there is what's called stub sorting. So this is an admin kind of task that Wikipedians
do to understand what articles should be up for deletion, what articles aren't good, what
stub articles, which are very, very short articles, what you should be doing with them.
But what this person is doing on top of doing this is that they are using an alphabetized
list of articles to go over. So little did we know that these things even existed on
Wikipedia. When we created our, ooh, let's map text to colors, we had no idea that we
would find rainbows on Wikipedia. Let me show you a different person.
Yeah, one interesting thing is that we can then statistically see that articles starting
with A get more edits than articles starting with Z.
Because obviously you start your alphabetized list, but by the time you get to Z, you're
bored.
So if you edit, edit and reverse alphabetical order, do the end of the alphabet a favor.
You're going to be doing Wikipedia favor. Here's someone who's highly, highly methodical.
They are going over articles for deletion and then different cities.
And finally, and you can sort of see a rainbow pattern that is very sort of staccato. And
then finally, I want to end with this one user here who has tons of edits. So it's going
to take a little bit to open up.
Right. So this is, we were very impressed when we saw this. Martin was reminded of his
Berkeley days.
Yeah, man.
And you can just see the patterns just going crazy. So what's going on here? It turns out
that this is not a human. This is a bot. This is Pearl the bot. And obviously Pearl is going
through things in alphabetical order and actually finishing the list, which is good for a change.
But again, one interesting way of trying to find machines versus humans and how even humans
are using the same methodology that machines are on Wikipedia for dealing with edits, with
things that need edits.
Just to tie it back a little bit, one of the things that this project made very clear to
us is that certain people are very, very obsessed with certain kinds of edits or they give themselves
little projects that they do over a long period of time. So for instance, one person, all
they do is to go on Wikipedia and find it's versus it's with an apostrophe or not.
It's important. Come on.
That's all they do. They're copy editing and it's very important.
So you've seen that actually mapping words to color in this crazy way gives you some
information. We want to talk about another project, mapping words to color. And this
stemmed from this sort of crazy dream. I was glad to see Ben Rubin's talk because it made
me feel like there were kindred spirits out there. This dream of just seeing all of language
at once. How could you do that? Let me show you one way.
What you are looking at here in this sort of undifferentiated grid is an effectively full
collection of all the nouns in the English language. This comes from WordNet. It's very
complete. Right now I'm just showing them in alphabetical order. And you can see there's
just some of the words I know, some I don't. But some are pretty obscure. I once read that
the average American vocabulary is 20,000 words. So by seeing 30,000 nouns, we're getting
this is a reasonable picture of what is in your head of the things you can talk about
in a single word. Now, the colors for these words have been taken, created in an interesting
way. We did a web search with the help of Jonathan Feinberg, looked at hundreds of images
for each word and then averaged the colors. And that actually gives you a little bit of
interesting signal on the words. For example, I can arrange all of these words in alphabetical
or not alphabetical but by color. And I can see that many of these green things are chemicals
or plants. Often science seems to lend itself to green. I don't totally understand. Golden
seal, obscure flowers and so forth. And this is fun. You can sort of confirm for yourself
looking at that, yes, most of these are kind of flesh colored because there are a lot of
people on the web, especially if you don't have safe search turned on. So a more meaningful
thing is to combine meaning and color. One of the excellent things about the wordnet
database is that it divides nouns up into groups and subgroups. So there's a whole hierarchy
that you can get out of it. And when you do that, patterns start to snap into focus. So
for example, you can see this whole section on fruit trees and flowers and plants. Let's
just zoom in on that. Actually, I find this very soothing. I just like to look at these
fruit trees. I like to look at the plums. That's too many plums. The papayas, I don't
quite understand. Oxford, Tangerine. It's very, very enjoyable. You know, I can also
sort of put a frequency view and say, okay, what people really talk about are plums and
peaches more than anything else. I've made the common words big here. I can then, you
know, zoom out. Let's just see what else is in the English language. What are the things
that we talk about? We talk about food a lot. Food is this nice soothing kind of orangy color.
You know, there's a whole lot of pasta in life. Let's just look at some pasta. There's
22 words that, you know, there's rigatoni, there's ravioli, there's spaghetti. You can
see the sauce is getting into the image there. If we go by frequency, really, it's just pasta
and spaghetti. Everything else kind of fades away. That's okay. Let's look at some other
things. What do we have here? There are other interesting color areas. These bright pinks
are actually sort of awful. Like, these are medical operations. Here we have some diseases.
Here we have injuries. It's actually a little bit horrible. I don't want to spend too much
time, but, you know, if you look at all this skin diseases and stuff, it really, it's bright
pink and it starts to make you itch a little bit. Just looking at it. Let's zoom out from
that. What else is there? I mean, what do we have words for? Okay, here's a whole section
of words about people. You know, in ranging from whistleblower here, okay, these were
lyricists. There's a whole section of women are common enough to talk about that we can
see that. And actually, we start to see the sort of sad view of sexism in the language
if you start looking at what all of these various synonyms are. But, you know, women
and girls are the most common and so forth. Let's zoom out again. And, you know, there's
other stuff we can play. We have the whole language in our hand here. We can start making
all sorts of comparisons. Let's size these by frequency. We get a sort of more boring
image here. Turns out we don't talk about colorful things that often. We use words like
thing, you know, time and so forth. But this lets us start to make these comparisons. Okay,
time is huge. Compare that for the entire section of plants, most of which are boringly
just bush and tree. You know, pretty much we spend as much time using the word time
as we do using any word for plant in the English language. Similarly for food. It's strange
and it's humbling and you realize how much of the time we use just use. Even now I'm
using time over and over again having it being primed. Just how much of the time we use sort
of ordinary words. So if you get nothing out of this visualization, I want you to think
of two things. Like, there aren't that many words even if you use all 30,000 words. And
if you're using only common ones, you're not being as colorful as you could be.
So this is a picture of an entire language in a sense. Right? But one more. But we know
that we don't use language as a bag of words. We actually weave words together to make meaning.
And context is really important. So how can we start to create visualizations that actually
keep that in mind. Keep context. Keep sentences in mind. But still give you sort of an overview
and macro and micro way of looking at a piece of text. So this is, I want to switch now
to something we did when we were at IBM with the visual communication lab. Some of the
folks who are here from that lab, where we were very interested, this is a project called
many eyes, which is a public website. It's free. Anyone can go there, upload data, create
interactive visualizations and share these visualizations with the world. And one of
the things that happened right away, as soon as we launched many eyes, we had about a dozen
different techniques for visualizing numbers. Because that's what you want to do. You want
to visualize numbers, of course. And then as soon as we launched the site, people were
uploading text. And they wanted to visualize text. And we had no good way to visualize
text. So a couple of the projects you're going to see now are actually our answer and the
beginning of a research agenda for us in trying to help people visualize text and make meaning
out of large bodies of text. So the first one I want to talk about is this visualization
called the word tree. And what the word tree does is that it lets you search for a word
or a phrase in the large body of text. So this actually is the entire Bible. It's the
King James Bible. And I just searched for love. Okay. And it's telling me everything
that comes after love. So I'm going to actually find out, okay, love the Lord. And I can start
navigating and see what it says. Love the Lord thy God with all thine heart. Okay. And
I can start going back and be like, okay, what else? The love of God and so forth. In
fact, I can start looking for other things. So let me look for sin. So sin is interesting
because now it's telling me about offerings, all kinds of offerings. So very Old Testament.
And I can go back and I can actually put sin towards the end and see if there's anything
interesting there that ends on sin. So again, you can very biblical. Another thing we can
do that is always nice for the end of one of these trees is a question mark. So I have
all the questions in the Bible. And I can see what is being questioned of me. What are
people asking done unto me? Let me see what else before me and so forth. Another one I
want to do. Moses. So one of the interesting things here is that you immediately get a
sense that not only was Moses very popular, but also the fact that God spoke to him a
lot or said a lot of things, right? So you can start to zoom in into some of that. So
what was it that God said? And you get entire sentences. So you never lose the context of
what it is you were saying or of the raw data that you're looking at. So this is the Bible.
Let's look at something else. This is Twitter. So this was back in 2009. One of our friends,
Lee Byron, uploaded this onto Many Eyes, and it was just a bunch of tweets. So people
are saying they need to get out of here. I need to get out of the house. What else are
people saying? I need to stop. I need to stop eating so much. I need to catch up on my music.
I need to stop drinking so much. Again, interest day or coffee at work. So you get a sense
of all the things that people are saying I need to see, I need to make, I need to start,
and so forth. Now, one of the cool things also about Many Eyes is that anyone could
upload any sort of data set they wanted. This was all public. And one of the data sets that
someone uploaded is this series of personals ads of men looking for love, but men who are
married. So all these men are married. And it says so in their personals. And so I am
married. And I love... The thing that I love is the difference in philosophy. I am married
and looking. Or I am married but looking. So let's see. I am married but looking for
discreet encounters in the Colorado spring area. I am married and plan on staying that
way. I am married and often get lonely. So you get a sense of how very quickly you can
start slicing and dicing these data sets. So the word tree is a very polite, respectful
visualization in the sense that it takes the words, it doesn't reorder them, it keeps them
all neatly organized as the author intended. But you could wonder what if you were a little
bit more violent with a text and wanted to rearrange it. In fact, what if what you wanted
to do is figure out some way to create an atlas of a text. Many different maps of the concepts
that are mentioned. So I would like to show a visualization that will do this. And it's
a little bit involved, but we're going to walk through it slowly. And I think you'll
see it's worth it. And in fact, it's ultimately very simple and implementable. So this is
called a phrase net. And the idea is that you start with a little template for two words.
So something involving two words, for example, your template could be you have a word, you
have literally the word and, and then another word. And then this was also launched on many
aspects, but you could, you know, anyone could do it. It's what you could do is go through
an in your text, like say you're looking at pride and prejudice, you could look for every
occurrence of one word and another word. So for example, you might find pride and impertence
if you were the computer reading this text. And when you do that, you start to create
a graph, you create two little nodes, a pride node, and an impertence node and an edge between
those. And you add that to the graph. And then you just keep doing that. And as you
do this, you're going to build up a whole graph that will represent the text. I'm going
to actually walk through an example of this. So let's say I'm doing a search for the word
and in pride and prejudice, I might find this nice phrase here, we see Jane and Elizabeth.
Okay, so we start our little graph with the Jane node and the Elizabeth node and the connection.
Let's add another one. Okay, we get pride and conceit. We'll add pride node, conceit
node, connect. Oops, not Jane and Elizabeth, but Elizabeth and Jane, we're going to make
those things bigger and make that arrow double headed. What next? So Bingley and Jane. Okay,
our social network is starting to expand here. Vanity and pride. All right, pride gets bigger,
vanity gets added to this little network of sentiments. We had more leisure and tranquility.
Okay, some other interesting now is a whole other section of the network. And you can
see not through any sort of, you know, huge semantic analysis, but just this very simple
lexical matching. We're starting to build a map of the text. So let's see how this
works in practice. This is all of pride and prejudice through a X and Y pattern. So you
can see that immediately we get to the left, we get a social network of the book. And I'm
going to zoom in there and you can see things like, you know, Elizabeth and Jane, Catherine
and Jane, Lydia, Mr. Bingley is connected to, but a little lonely, which makes sense.
And then to the right, we have other kinds of clusters as Martin was saying, where you
have things like pride and prejudice, vanity, conceit, all connected together. If I get
the same text and I change the template, instead of X and Y, I do X at Y, I get a different
network, I get a network of places. So I have Pemberley, Longbird, Netherfield, all the big
houses where the action of the novel is taking place. Now let's look at a different text.
So this again is the Bible. And we're doing X beget Y. So we get a family tree of the Bible.
And who knew that it was as circular as it is. In fact, I'm going to zoom in there. You
can see even more, you know, interesting little back and forth patterns there. And what's
happening there is that you have a father and a son named the, you know, and then the
son names his son after his father. And so you have things going that way. Again, sticking
with the Bible, if you do X of Y and you do the Old Testament versus the New Testament,
you get things like children of Israel, king of Israel versus the son of God, the kingdom
of God. And then finally, this is a very different data set. This is a data set of titles of
novels dating back a number of centuries. In fact, Casey's talk made me think about this
because some of these titles are a paragraph long. And here the pattern we're looking at
is X is Y, some sort of possessive. And what you end up with interesting enough is a lot
of female characters. You have woman, daughter, wife, bride. So all of these characters are
possessions of others. And I'm going to actually zoom in to wife and bride here. So we have
like the father's wife, the ambassador's wife, the bachelor's wife. So these are all respectable
citizens. But let's look at bride. You have the bandit's bride, the sailor's bride, the
free brooder's bride, the pirate's bride. So basically the take home message from these
novels is you should definitely become a wife if you can. Being a bride is not a good deal.
So there's a lot of text and books, obviously, but that's not the only place we find text.
In classic places, singing, songs, lyrics. And Fernand and I at one point became fascinated
by lyrics. And we started looking at them and thinking about sort of lenses with which
we could look at these lyrics. One idea came out right away is what is a universal in song?
It's body parts. We start at an early age, we hear the hokey pokey, you put your right
hand in, you take your right hand out. The body is something we talk about all the time.
And it talked about in different ways. Sometimes in some songs it can be talked about very
poetically. Other songs could be extremely direct as in ZZ Top. And it can really be
taken to sort of ridiculous extents if you're ACDC or you can really start using some absolutely
preposterous phrases if you're the black eyed peas. But no matter how it's talked about,
the body is there. It's there in every genre. And we got interested in this. So we started
collecting data. We got lyrics to 10,000 songs divided into 11 genres. Through diligent thought,
we came up with 83 different body part words. And we started to just compute. It's very
easy for a computer to go through and say, all right, in folk music, how much is each
particular body part mentioned? You get a table like this. In itself, it's not that
enlightening. The foot is mentioned in 1.09% of folk songs. I don't know. Is that important?
You can obviously try to make a graph out of it. It's not super interesting. Finally,
we hit on the idea of actually using images of the body parts themselves and discovered
that in fact, when you do that, things start popping out much more. So this is folk music.
It's sort of like a portrait of a very gentle genre. What about other genres? Let's just
look. I'm curious. Look at blues. How about? It's also very, very gentle. The knees are
figuring more prominently. Maybe people are down on their knees. I don't know. What about
heavy metal? So heavy metal, interestingly also, you see sort of hand, eye, face, head.
Those are the main ones. But you can see some other body parts creeping in. It's actually
a little bit more diverse than those other ones. What about hip-hop? Hip-hop, you see,
really much greater diversity, it turns out. It's quite distinctive. In fact, we can actually
just compare all 11 genres side-by-side. You can see there's basically three things
going on here. There's eye music. There's hand music. Then hip-hop stands out as a kind
of outlier. So we thought this was an interesting exercise, and we didn't want to stop at music.
We thought, okay, so what if we could have this kind of visualization, but to look at
prose, to look at books, for instance. Is it interesting to look at a book just by looking
at what parts of the body a book talks about? So this is what we did, as you can tell. By
now, one of our favorites, Pride and Prejudice. So what you're looking at here is an image
of, imagine, if you will, the entire book. But we're only showing you the times when
the book talks about a part of the body. So you just read it from left to right, just
like you would a normal book. You can start to see things like there are a lot of repetitions.
So you have a lot of eyes. There's a lot of gazing. You also have a lot of hands, and
it turns out that one of the biggest parts of that novel have to do with someone's hand
being asked or offered in marriage. So that's why you end up with a lot of hands. We then
decided to turn this into, to look at different kinds of books where, for instance, hands
may be not a metaphorical thing, but a more physical thing. So we looked at The World
I Live In by Helen Keller, and you get a sense of just how prominent hands and fingers are
here, also eyes, but also arms, and then how much more you get mentions of the body here,
because obviously this is crucial for her. And then we also started wondering, okay,
so what about a different genre of book, the one that deals explicitly with the body? So
here's the Kamasutra, and you can see that there are a lot of mentions, obviously, of
body parts. There's also interesting repetition going on. So if you look at the top here,
there are a lot of lips and tongues, but then here there are a lot of teeth. It turns out
that there is a whole list of how people should bite. And so there's the biting section here.
In fact, I'm going to call out a couple of sections here, and obviously there are things
that are more explicit here than the other books, right? So you can imagine this is going
to happen. So if I call out this section here, it's, oh, it's too dark, but I'll bring it
up for you. These are acts to be done by the man. And as you can see by how explicit these
parts are, you can imagine some of the things they're talking about. If I highlight a different
part of this, there's a lot of lips going on here, lips and mouths. And this is all about
oral sex. Now if I did interesting that if you look at the parts of the body here, you
know exactly who's supposed to give pleasure to whom, if you can tell. So, and finally,
this is the perfumed garden, which happens to be a sex manual from the Arabic world in
the 15th century. It's much, much longer than a Kamasutra, and it mentions a lot of body
parts as you would imagine. Now one of the things that was interesting to us when we
generated this image is the fact that there's sort of a division at the top half and the
bottom half. The bottom half is a lot more explicit graphic than the top half. And what's
happening here is that just around there where I'm highlighting, there's a lot of explanation
about positions. So there's a lot of legs, where legs should be, where legs should go,
how they should turn. And then at the bottom, it's all about terminology. So they're naming
things, they're talking about all the different names that organs can have. And so it gets
very, very specific.
So you can see that we've actually gotten very, very far from the world of just visualizing
a table of numbers to get at objective truth. In fact, we've, you know, at this point we've
seen pictures of people fighting, people obsessing over things, what they're looking for, how
they write about things, songs and novels. And all of these, I think, start to get at
a subjective truth. And this is exactly why we're obsessed with visualizing text. And
we hope that in giving this talk, we've helped infect you a little bit with some of this
obsession too. Thank you very much.
