And I'm a professor in the Annenberg School for Communication at the University of Southern
California and I direct both a global communication program and USC's scenario lab, which is
a research group that looks at the future.
So I'm interested in social data for a variety of reasons.
The first and most important reason is that it is reshaping the way people understand the
world and for the younger generation it reshapes the way they live their lives.
And we understand fully how dramatic a change this is and we also understand the degree
of which we now have access to information around the world about how people understand
and make sense of their environment that we didn't have access to before.
I think the most interesting social data right now is probably a mix of Facebook and
Twitter for the West, probably a mix of WeChat and Weibo in Chinese and then you work around
the world and there are some other platforms that are very important.
My European friends are really big on WhatsApp and in Japan it's all about line and so you
have to know what's important in the part of the world where you're doing your investigations.
Well because our research group is primarily interested in how people make sense of the
future, how they talk about the future, how they understand the future and the kinds of
stories that they tell about the world they want to live in, we are very focused on those
issues that are really important to change and especially rapid, obvious, big changes.
An example would be climate change and so we've been working on climate change for a
couple of years.
People talk about it a lot on social media and it depends on what's happening in the
world so you get a lot of conversations around a UN summit.
You get a lot of conversations about the environment and climate change on the days
that you can't see across Tiananmen Square in Beijing or the day that Paris becomes more
environmentally impacted by smog than Beijing or New Delhi and so it's very event driven.
Our research questions vary a lot depending on the project.
One of our very first social media projects was with a group called Cool California which
was focusing on climate change in the state of California and because of the recent announcement
by our governor that we have to have a 25% water reduction, everybody is again talking
about climate change and those issues.
What we want to know is how people think their lives are going to change, what they are going
to do differently and what kind of, in that case, California they want to build for the
future and so it varies.
In general in climate change one of our big drivers comes from NASA and NASA is very concerned
that people don't understand science.
They are very concerned that there's a lot of science denying that goes on and they're
very concerned that people don't understand the connections between the way they live
their lives today and what that does for the future of the planet.
We've been asking of our data a lot of questions about how people see the connection between
day to day activities now and what the earth will look like in 50 years.
We have massive sets of data around climate change and a variety of different languages,
some that we haven't coded yet.
And so we're talking about hundreds of thousands within a given time frame of messages from
Facebook or Twitter for example and if you are looking at how much data we could collect
about conversations around climate on any given day in multiple languages we're talking
about hundreds of millions.
We're trying to learn to do a more sophisticated job of dealing with all of this data but we've
been using discover text to help us sort through and manage a lot of the data.
The software has been very valuable because it creates opportunities for filtering information
that we're not interested in because when you're asking questions about climate we get
lots of people talking about whether or not they're going to be able to go to the beach
that day especially here in California.
And so we're more interested in getting the data that are most pertinent to helping us
understand those larger stories, the big narratives about climate that drive people's understanding
of the issues.
And so we filter and then we filter again and then we have people try to sort through
all of the different categories and the best thing about this new tool is our ability to
create different buckets of information that will help us better understand the multiple
stories that we've in and out of people's understanding.
In a lot of communication research they talk about receiver orientations in the research
so the understanding is it's not what I want to say it's what other people need to hear
and when we're looking at all of this data there's an opportunity to forget that and
to put sort of the researcher imposed ideas of what the data should say and what we're
learning through discover text is that what's important to people and the way they come
to understand something like climate change is often not about those issues that we thought
were important.
And so for example we discover that one of the items that people talk about and re-talk
about on Facebook and tweet and retweet is all of the changes that are taking place
agriculturally that might shut down their favorite winery or could mean that the cheese
in France that they most love won't be available you know in 10 or 20 years.
And so it's very much about how does it impact me and the things that I care about and that
it may not necessarily be about saving you know the polar bears or whatever it is that
a lot of environmentalists believe are driving the conversation.
In California right now we are discovering that what people are most tweeting about is
having to tear up their lawns and that if people had made it clear longer ago that California
was going to go the way of Phoenix and other places and that that's what their front lawn
would look like that people would have paid attention earlier.
When you're sorting through emotions one of the things that you discover is that people
talk about that in a different way and so all of a sudden we're dealing often not in
words we're dealing with emoticons and we're having to figure out okay is this emoticon
is this really because the person is sad or is this because the person is trying to be
ironic and so we've got many many layers of different kinds of emotions we've also got
lots of just absolutely fascinating images that get used.
You see people in gas masks you know you see people trying to use bicycles to create energy
you see people sitting on a stationary bike in an ice cream shop churning ice cream because
that's you know their notion of what sustainability is but at the same time they don't have to
give up ice cream.
The most important way collaboration comes into a project like this is our ability both
to get multiple perspectives from multiple cultural backgrounds.
Our research team has students and sometimes other faculty from all over the world and
so that allows us to bring to bear multiple local understandings at the same time looking
at issues that are very global.
And so for us it means that we can start to frame the global local issues and that takes
lots of people when we're actually working with the discover text tool what happens
is very formative in the sense that we bring lots of different kinds of experiences around
the world to create our categories and to look through the theoretical options that
we have for understanding this virtual space.
But then what really happens is that when you're digging down into the message structure
that we start to get multiple perspectives different ways in which we might be able to
code the data and then we have these very intense discussions about how would you code
this piece of a message and how would you add this other information in and so there's
a huge discussion that takes place before we code and then you've got the opportunity
to have multiple people coding the data.
And then we can start to look at how when they're by themselves the coders independently
look at the information and then you can make comparisons across the coders and we start
to see where our code book isn't clear enough and it allows for a more sophisticated and
faster collaboration than we could do in a lot of my other projects before we started
working with this tool.
I've been teaching and working with content and doing content analysis for 30 years.
So you know this has been a godsend in many ways.
So when we're looking at the ethical issues of using social media data it brings up about
five different key concerns.
The first is we have to abide by the university's IRB so we have an institutional review board
that has guidelines for us but we know that those guidelines are not enough especially
when you're dealing with certain kinds of data and so our second rule of thumb in our
group is do no harm and so we're always trying to decide what level of privacy is critical
given all of the different kinds of data that we collect.
On any controversial topic whether it's gun control or right now we have a big project
on terrorism we have to make sure that you're not putting someone in the proverbial line
of fire by having enough information that a comment that they make could potentially
put them on a list of somebody who has said something that other people find to be offensive
in some way.
And so we've got university guidelines then we have to develop our own guidelines to make
sure that we do no harm.
The third thing that we're always looking at are the legal issues.
We try to pay very close attention because we collect data internationally in lots of
different languages in terms of service for the different social media groups around the
world and that's very very difficult to keep track of.
For example the Twitter terms of service are not translated into Arabic on Twitter's
website and so we're not sure exactly what that means.
The fourth thing that we have to look at is how different rules for different nations
interact with the terms of service that the different companies that own these platforms
have.
There's an interesting example going on in France right now where they're trying to
create laws around the use of social media that conflict with the terms of service statements
from some of the companies that are highly used in France.
And so that's probably the fourth big issue and then the last one is we're trying to in
our own minds create our own categories for how we put different kinds of individuals
or groups of organizations into categories so that we decide how we're going to treat
them.
Somebody who has millions of followers or who puts out hundreds of thousands of tweets
a year or something like that falls more in our opinion into a broadcaster category as
opposed to somebody who tweets irregularly and who seems to have more of an individual
status and so we will be treating those differently, those accounts and the messages that come
from those accounts.
And I'm sure that over time we'll come up with other issues as well but those are our
big concerns.
I studied computer science as an undergraduate and I still do little bits of coding when
I have to.
I made a lot of money in graduate school doing programming for professors who didn't know
how and so I could design studies and create the code for it and then after a while I got
lazy and didn't do it anymore because there was all this software out there that didn't
require it but there is a new interesting shift in the landscape and as more and more
of our work gets done on platforms like R where there is a lot of code that currently
exists that you can just cut and paste that you often still have to do a little bit of
your own coding and so having that sensibility about the world is a little bit on the helpful
side but often not enough and I still rely on my graduate students who are way more steeped
in a lot of the new code than I am to get everything done.
What we're doing right now with Discover Text is we're using the machine learning to speed
up our coding because we can work really hard on a coding system and then we know that the
coding system is more or less stable when we can get the machine classification to look
like the human classification and so there are other kinds of iterative processes that
I know other people use.
We in some ways use it a little in almost the opposite direction where we validate how
stable our message structure has become in the different codes by looking at their replicability
on some kind of massive scale.
Whenever you're in a new area you're going to discover that you need a new tool.
It's almost impossible to simply use the prior generations tools on a whole new way of life
and so Discover Text is right now the one that we have the most experience with and
are using the most because it has the largest number of features that we're interested in.
I'm also happy that every time we start a new project it seems like there are new functionalities
and that's very helpful.
There are a number of new ways that we want to analyze data so for example we're now trying
to do a better job of linking organizations together in a social network and then to find
out how that social network over time puts constraints and enablers on the different
kinds of stories that are told in that social network and so we're now trying to figure
out how to do that within Discover Text and I'm hopeful that even if that capability isn't
there now that at some point in time it will be.
The most challenging thing about Discover Text is learning all of the ways in which there
are hidden benefits within the system.
As with any software there are always little features that seem obvious to the people who
created it but it takes time for the outsider to become knowledgeable with all those little
features.
When we first started using Discover Text I think we were doing way more simple kinds
of processes than the tool actually allows for and it took months I think before we started
using it to the tools best advantage for us and so you have to pay very close attention
to all of the smaller features that end up having a lot of big bang value.
