Thank you for the introduction. I'm going to be talking about fusing of physical and
social sensors for situation awareness. So there are two parts of the talk. One I'm going
to quickly give an overview of some of the work we had done earlier and then I'm going
to go into more detail into one of our very recent works. And this work has been done
in collaboration with Yuhui Wang who is my PhD student. He's actually sitting right there.
He's almost finished his PhD into the job market right now.
So we all know this is the area of big data and I would say big sensor data. So there's
a lot of type of sensors. There are cameras. There are Fitbits. There are all kinds of
analog sensors. They're all generating data. And I would also say that there is a lot of
sensor data generated by social sensors. And social sensors are basically human beings.
And a prototypical example of a social sensor is a human being who observes something of
interest and tweets. And this is the example I'm going to use for a lot of my work. And
what I'm trying to do in our work is can we use observations from physical sensors and
can we use these observations by social sensors and can we merge them together in order to
get a better understanding of what's going on. I mean that's the whole idea of this work.
And really the motivation is that if something is going on, so this is the map of Manhattan
and I've deliberately chosen Manhattan because one of our data set is actually the surveillance
cameras on the streets of Manhattan. So if there is something going on, say a parade
going on in Manhattan and it is observed by a camera out there on one of the streets,
if it is an interesting enough event, then maybe somebody out there in the streets would
tweet about it. And the question is these are two very different perspectives of the
same event, one captured by a camera which is kind of low level and the other is by a
human being who's observed something and found something of interest and they're tweeted.
Sometimes about the parade, sometimes about something else going on out there. And can
we combine this information so that we can infer what's going on there better than just
using the physical sensors or just using social sensors. So that's what we're trying to do.
So we can try to gauge the accuracy of the detection and we can actually run experiments
by just using information from one set of sensors and see what is the accuracy and the
other set of sensors see the accuracy and then combine and see the accuracy.
But this is quantitative.
Yes, this is quantitative.
Not quantitative.
Not qualitative.
No, this is quantitative.
No.
Now, there are some challenges in this. One is that both of them, both the kind of sensors
have their own limitations. Most of the physical sensors have numeric and output. It could
be pixel values or if it's a pollution sensor, it could be pollution values. It could be
any kind of sensors. On the other hand, the social sensors give out symbolic information
because they will interpret what's going on and then give their interpretation in usually
language using text.
And also, there are different densities in the sense that, for example, cameras are very
sparsely located. They're not densely located. But in a city like New York City, it's steaming
with thousands if not tens of thousands of people in Manhattan and all of them could
be tweeting. So there's a lot of tweets going on there.
On the other hand, while the spatial density of the social sensor is higher in many places,
the temporal density of the physical sensor is higher because cameras are capturing the
scene continuously and they're broadcasting the video continuously. On the other hand,
human beings accept a few rare exceptions don't tweet continuously. They would tweet
when they are interested.
And also, there is a lot of noise. For example, in social media, just because you are in front
of a parade, it's not that you're going to tweet about parade. You can tweet about anything.
You have no constraint on what you want to do. So there's a lot of noise. In terms of
sensors, it could be a sensor could have failed, a physical sensor. Or in terms of camera,
there could be dust, there could be haze, there could be raindrops if it's raining or
snow. And that would lead to noisy observations. So both sensors have their own problems.
And the hope is by combining this, we're going to overcome the limitations of each one of
them.
Okay. So our first work actually was what we call tweeting cameras for event data. This
is a slightly dated work. We actually presented it first in the World Wide Web Conference
in 2015. And the idea was this, that traditionally, the cameras are actually always broadcasting
videos, which is pixel data, pixel information. So can we actually have cameras instead of
spewing out continuous video, would tweet whenever something interesting happens, just
like human beings. So we wanted a tweeting camera system. That means you have a bunch
of cameras, which would tweet if it's found something interesting, oh, there's a parade
or there's a traffic jam or anything interesting happening, then it would tweet. And rather
than sending out a video. And then can we use this information? So that was our original
work.
So we actually came up with a multi-layer tweeting camera framework where we did not have a tweeting
camera, but we simulated as if the cameras tweeted. So we took the information from cameras
in Manhattan, and then we kind of built this framework, a three-level framework where we
would do feature extraction and use concept detectors to find out if it can detect some
concepts in the video, say like vehicle, people, parade, car, snow, day, night. And then it
would basically, we would simulate it tweeting, saying that whenever it detected a concept,
it would broadcast on Twitter. And what we did is, we basically used concept detectors,
off-the-shelf concept detectors, which would, things like parade, crowd, traffic car, and
it would tweet.
And then we can aggregate tweets from a single, we could take tweets from a single camera,
and then we could aggregate tweets from multiple cameras. And also, we can then use tweets
from human beings near the cameras, and then combine the information.
So then we can actually do some filtering and analytical operators, because depending
on what you are of interest, depending on the locations, you could do some kind of filtering,
and then you could do signal detection, and then combine it with cross-media analysis
using from social information, and then combine it together.
So this is how roughly our tweeting camera work. So we use the concept detector, and
there are many standard concept detectors available, so we use the VRO. So the whole
idea is the following. So this would be one of the cameras in Manhattan. And if this camera
observed the scene, it observed the street scene, and then we would apply the concept
detectors on that periodically, and then it would detect several concepts. So it would
detect crowd with confidence 0.9, parade with 0.8, car with 0.1, outdoors 0.5, and then
we could pick the salient concepts from there, and then what we would do is, this camera
would tweet, I'm seeing a crowd here now, 90% sure, so 0.9% crowd. And then you can
imagine that this tweet of the camera, you can imagine it has a bunch of other cameras,
and there's a server, and actually there are humans also out there, and it could tweet
to everybody. And similarly, some of these humans would have also been tweeting about
things going on out there. So what about the human tweets? So what we could do is, we could
take the tweet, so we know the location of the camera, and many of the tweets are geotagged.
So we can figure out which are the tweets being tweeted from near that camera, and we
could gather those tweets, and then we could do an analysis, we could go through a tokenizer,
we can do all kinds of normalizing, remove the stop words, and all those usual text processing
operations, and then figure out what are the salient words out there being tweeted. And
then what we can do is, we are interested about a situation, that means we are interested
in what's going on during a certain time period. So let's assume that's a time period, and
we can do a representative term mining that all, so we could collect the tweets, say if
you are interested between 3 p.m. and 4 p.m., we can actually collect all the tweets between
3 p.m. and 4 p.m. for today, and then we can also look at all the tweets between 3 p.m.
to 4 p.m. in that same location on yesterday, day before yesterday, over a period of time.
And you can imagine each of these set of terms from all those tweets are like documents,
and then we can do tricks from usual text processing, we can do a TFIDF kind of analysis,
and we can find out what are the salient things people are talking about at the various locations.
So we can get a word cloud like this. So we got this information from humans by doing
mining, a representative term mining, and we converted the video captured by cameras into
tweets by doing concept detection. So now we got two sets of tweets, and now what we
can do is we can actually combine them. So I'll just show the results because the work
on how we do the combination was actually presented in this conference ISM by Yuhui
two days ago on Sunday. So I'll just show you the results. So we actually collected
data from 149 cameras all over Manhattan. We've been collecting for two years now.
These are publicly available, and we've been also collecting tweets from there, and it's
a lot of data. And what we did is we looked at certain salient events going on, and we
got this information from NYPD. They put out about, because they give out traffic advisories
and also the DMV puts out information. So we knew some of the events which are going
to happen. So this was good for our experiments. So I'll kind of tell you about one particular
event. So what we did is in Manhattan we looked at certain cameras, and we knew the pictures
and we did the concept detection. And then what we did is we took a radius around that
camera and collected all the tweets around that camera, and then we did this analysis.
And so now I'll kind of play a short video which shows, for one particular event which
was in December 2014, which was a million people march. And I'm going to play a video
which will show how the combination of the tweets from the cameras and the tweets from
the humans are combined together. And you'll notice it's basically a Manhattan area where
the march started from somewhere here, it went up, and then it went down. It kind of ended
in battery park area. So let me play that. I hope I can play this. So it kind of starts
somewhere here, and then people are tweeting about a million people march and crowds, and
we are combining this. So you can see the word cloud when it has become significant
enough, it kind of moves up and then comes down. So this was on I think December 13,
2014. No, it's not. It's total about 60,000 tweets in Manhattan only every day. And we
also do filtering, which are not relevant. And this is just the most important term.
So it was the million people march. Now, what was interesting to us was the schedule, yes.
The most important terms were the, I think the term was important.
So a very good question. Here we were looking for the parade. So we were mining from there.
But in general, if you did not look, we're looking for anything, then you would look
at all the important things which people are talking about.
It depends on what your application is, what your interest, what kind of situations are
you interested in. It could be anything. I mean, what you like. I mean, so we actually,
this is part of the system. Yes. Yeah. So it's important to me. Yeah. So it's important
to me. Yeah. So we were interested in this particular experiment to just verify that
this is actually useful. So we were looking for some events which are preplanned. And
so what was interesting is we picked up, so if you notice one of the, it's a million people
marches there. There's something called Santa Claus. So we were, this was something which
threw us off. At the same time, the million people march was going on. There was a march
of Santa Claus from here to up there because there was a Santa conference out there, which
kind of just came up in our data. So which was interesting, which was surprising to us.
Yeah. Yeah. It was, so this was the work we had done earlier. But I don't want to focus
on that. I really want to focus on the matrix factorized base fusion of physical and social
senses because here we started with cameras and tweets. So we thought, we got ambitious.
So why stop at tweeting cameras? Basically, you can have the same notion as a notion of
tweeting sensors. Any type of physical sensors, instead of having physical readings, you can
interpret them and you can have tweeting sensors. Okay. So we wanted to generalize it. So that's
why we started this piece of work. And again, the physical sensors. And here we were interested
last year, we had a huge problem of haze in Singapore. Basically, there are these logging
companies which burn rainforests in Indonesia, which leaves huge fires and all the smoke
comes on to Singapore. And it's a big problem around September, October, every year. This
year, lucky, we were lucky, there were a lot of rains in Indonesia and those fires were
there. So there are these pollution monitoring stations in Singapore and people, when there
is haze, people tweet about haze. And we again wanted to combine the pollution information
from this PSI reading stations and the tweets about the haze in order to get a better understanding.
So the idea is same. The physical sensors generate sparse inaccurate and social information
then can explain the readings. And by combining, we hope to understand situation in a better
way.
So you really think that the social information gives you a less inaccurate reading than the
physical sensor for things like this?
Sorry? Less inaccurate? Yes, in many...
More accurate.
Well, not more accurate, but it gives a better semantic sense on what is going on. And are
people affected by it? Because people tweet when it bothers them. For example, the haze.
And we found that certain areas people tweeted more and certain areas people tweeted less.
And we also found that there were not pollution monitoring stations all over Singapore uniformly.
And we could... What we wanted to do is can we infer the pollution reading in places where
there are no stations? But people are tweeting about it, what they're feeling. And then can
we use missing data by doing this inferencing? So that was the motivation.
That is, it's not so bad that they can't see it at all.
That's right. It was not. So what we try to do is we take a particular time window and
this is a representation, say it's a certain time window. And these are the different locations
of the sensors. And the location could be where the camera is installed or it could be
a pollution measuring station. And then at each of the locations, there are people...
We have geotact tweets about what's going on out there. So now we want to combine this
information and we want to use a matrix factorization approach, which means that we take this matrix
of this is the time window and these are the different locations. And you can imagine these
are the readings of the sensors. Or if it's cameras, it is the images for which we have
applied concept detectors and the confidence values of that particular concept.
And then we can actually factorize into... So we were actually inspired by things in
recommendation systems because you have the users and the items in recommendation systems.
And then you actually factorize into the user latent factors and item latent factors in
order to get a better understanding of recommendation. So here we can have temporal latent factors
and spatial latent factors. And I'll explain more as we go along. So let's imagine... So
this is the same diagram and these are the times. And let's assume that we are talking
about the pollution sensor. So these are the values, the PSI index of say 2.5 microgram
particles. And then certain places, the readings would be missing or it could be noisy and
these are different locations. So we have a set of sensor signals, N locations, M different
timestamps. And also, at the same duration, we have the social signals, we have the location
documents, say you can imagine them as tweets. And the tweets consist P1, P2, P3, PK tweets.
Each tweet consists of a post P1, the P consists of different words. And the words belong to
a set of, you know, belong to a dictionary. So what we can try to do is we can do a topic
analysis on this social documents and try to get an understanding of what people are
talking, what topics are people talking about. And then we get the sensor readings from the
sensor signal and then we want to combine both this information in a very principled
manner. So what we do is, so this is our, the basic model. So these are the times and
these are the locations and we can do a factorization. And in the factorization, this is the very
standard way of factorizing because once you factorize, if you multiply, then you get
the value which is your s hat, which is by multiplying these two matrices. And this is
the estimate. What you're trying to do is you're trying to figure out what is the dimensionality
of the factorization, right? What are the, what is the dimensions of the latent vectors,
what are the number of latent vectors? So the standard way of doing that is you have
an average term and you have the biases for the times and the locations. And what you're
trying to do is this quantity, which is the actual value minus the estimated value with
a regularization term and you want to minimize that. This is your standard matrix factorization
technique. And the idea is when you do that, then you have factored this matrix into latent
factors, the latent temporal factors and the spatial latent factors, which is the best
representation of that situation out there. Mind you, this is just the physical signals,
we have not just got the social signals yet. So we're going to add that in more. So what
we, so in the objective function, we wanted to minimize this estimate of the sensor signal
value and the actual signal value subject to some regularization. So then what we do
is we want to now add in into this minimization function, something from the social side.
And this is how we bring in, we bring in the social signal, which is I, as I said, we had
this documents, which are the tweets collected at that particular time around that particular
sensor, right? Those are the social documents. What we do is we do an LDA model. So the assumption
here is the following. At a particular location, what is the situation we are trying to monitor?
The pollution situation. So the pollution situation is being measured by this physical
sensor, the PSI value. Also the people tweeting about it. And the assumption is at least some
of the tweets people are tweeting about is about the pollution, which we are trying to
understand and can one help the other in trying to get a better understanding. So what we do
is we can imagine an LDA model where you try to understand the topics. The topics are basically
distributions around words and the words belong to the documents. So we basically bring in
another factor, the social factor, which is the LDA model of the tweets around those particular
locations of the sensors. And then we can have a gradient-based learning and jointly minimize
this error for the factorization. That means you want to get the factorization in such a way
that it explains, it's the best explanation for the physical readings at that places and also
it's the best explanation for the social documents, for the topics out there. So this is a nice way
of connecting both the physical and the social. So effectively what we have done is we have taken
the location, the sensor readings at different time, for a time period at different locations
and then we have got the word, the topics of the Twitter tweets at that time around those
sensors and we have kind of modeled it into a distribution and then used that into the factorization
where you want to minimize the physical readings error in the factorization and also maximize
the social observations likelihood of those topics. And by doing this we have actually
done the fusion of both the type of signals. And what is nice about it is once you have
done this then if you had at a particular location the sensor values missing that means in one part
of Singapore you did not have a measuring station for pollution but if people tweeted around there
we could try to estimate what was the the psi value at that point. So it's like a cold start
problem where there's a new item comes in where nobody has seen it and you don't know the rating
and you can try to estimate the rating based on the global average and what are the biases of the
users, right? It there's an exact analogy out there.
Yes, the sort of satisfying part about this to me is that you're taking a calibrated
measurement signals where you can compare the results and calibrate to the kind of information
you get from sensors. They may not be absolutely correct but at least you have an error about them.
And then you go into social signals where you know your Indonesian login guys could send
10,000 people to Singapore and they're all be tweeting jeez it's not so bad today you know and I
don't even see the sun. And now you have you have totally uncalibrated signals which are mixing
you with calibrated signals and you know is that really providing you with a valuable result?
I think it's very fair criticism. The social signals are subjected to manipulation,
it's subject to fraud, it is subject to a lot of noise and and also there could be a situation where
people get so inured to pollution they stop tweeting about it completely, right?
A visitor who doesn't have a historical record as we compared yesterday and last week
is going to have a different, a whole different view. It is likely
he was more extreme than somebody who is there all the time because they have competition to it.
So do you take any of those factors into this mouse?
Currently no but these are very valid criticisms and and maybe there are ways of handling that. Yes.
Actually I want to comment on that. Fundamentally what he has is multiple sensors, multiple
sources of various reliability and various levels of noise. He has a system model and based on that
he can predict what's happening, he can look at the measurements, he can compare predicted
measurements, he can done corrections, he has a classical carbon filter problem. Yes.
Except that these are highly partitioned, right? So that you have one class which are the sensors
where you can reason about the liability and margin of error and another class that is totally
right. Well that is so the social and that's actually what I was going to ask. So I do that
as a carbon filter problem. Yes. But what carbon filter requires is that you also have a good
noise model. Which is a Gaussian model which is a Gaussian model. So what I'm kind of
interested in is that what is for the social signal which is something totally new only exists for
a couple of years that we don't fully understand how we model the noise. I think that's a very
good observation and I completely agree and I don't think we have a good understanding of what
the noise model in social media is like. We have no idea and also it is as I said it's highly
subject to adversarial manipulation and also there could be interested parties.
But in our limited experiments we found that having this information from social media is
interesting and useful. Now whether it will always be useful I'm not sure.
Well I guess more to the point it's not clear what you're saying. If you're trying to go on and
measure the presence of an adversary the fact that you've got an adversarial look but coming
here is really great. You've got an adversarial sensor. On the other hand if you go and have
people that are a bunch of Catholic nuns that are going on you can trust they're doing this stuff.
Yes. And it turns out mostly about measuring something. It's something different. So it's
clear that you're measuring something. Right. What it is. What the noise model is and whatever.
I agree. I agree. Another question is also your sensor is real time probably very regularly
sampled. Yes. Social signal is sampled at a very very probably way much lower rate than in the
actual sensor. So what is the actual time window over which you have to assume that the state of
your system is constant before you can move to the next window and say okay let me evaluate
it. Yeah. So it's a great question. We just got around it by doing our experiments in dense
urban areas where there is a lot of human beings who are tweeting. But but you're right. It does
not extend to somewhere a more sparsely populated area where the number of tweets if you want to
gather enough number of tweets the phenomenon the state of the phenomenon has changed. Meanwhile
again back to outliers if you look at Australia a couple of weeks ago they had a news flash
where there was a special atmosphere phenomenon in the game where you had the storms pulling
pollen up from the air so a lot of people don't have any increase in fact in all the
government. The respiratory emergencies. Yes. Because of this particular phenomenon. Now in
fact there what it was is it's not the standard tweets. It's an outlier. Yes. And we have outliers
occurring using the same fact not for situation awareness but rather for outlier protection
that is the idea in fact is to go on to determine if there's something new coming in. Yes. Don't
know what it is exactly. It's not calculating but at least we're aware. Now let's go on and try to
calculate that with our existing sensors. So we know something new is happening. How do we
characterize it with all of this stuff that we already have calculated well. That seems to be
kind of a direction. Yeah. I agree with you. And in a way the social media tells what is the impact
of those sensor readings and the influence and sometimes the explanation also of why this is
happening. Yeah. So let me let me move on. So we did two experiments one for Singapore Hayes
and the other one for the New York City events for Singapore Hayes. We took the PSI readings and this
was a really bad September 2015. And what we did is there were seven stations in Singapore and we
did not use the readings of four stations but we use the readings of three stations and then the
tweets around that region. And what we did is we tried to do a correlation of the physical
and social sensor correlation. We tried to understand the latent factors. So first we took
all the PSI readings and then created the matrix and then found out the latent vectors
the factorization the latent vectors. And then we did for only the three stations without tweets
and with tweets and then the factorization and then we compared the latent vectors. And the next
diagram shows that the latent vectors for any two stations if you do the comparison they are much
smaller when we use the tweets. So in a way the use of Twitter data actually helped get a better
factorization than not using it. So that was the first conclusion. The second one was we
basically removed some of the stations. So we did not use the data from some of the stations
but we had the full factorization available and then we tried to estimate the readings in the other
places. And then again what we found was that if we did not use so this was the root mean square
error of the matrix when you do the comparison. When you had the full data versus we had the
partial data and then again when we did the factorization without tweets and with the tweets
and again you can see the mean square error actually is less. So you can get a better
approximation of the missing values of PSI readings when you use social data.
These are the PSI values. PSI. The pollutant standard index.
And then this was for New York City. We looked at three of the events, the million people in March,
St. Patrick's Day Parade and Columbus Day Parade. And then we chose these because we knew these
were the events which are going on and there were sufficient number of tweets. So which came
around the problem what you alluded to. And then again we looked at times when there was the parade
which are the positive samples and the negatives when there was no parade. And then we compared
using just the camera information and the camera plus social information. And again we found that
using the social information actually helps in this case. In these limited experiments it helps.
In this particular case the error is again the probability. So it was a probability matrix
of the concepts. It's a probability. It was not. I mean we already had captured and done an analysis.
So this is not real time. But it could conceivably be done but we had not done it.
Yeah so we've been collecting the data. We have been collecting the data for last two years.
Also then we did an event classification and again our precision recall and the effort
are better using the physical and social rather than just the physical. So the conclusion is that
in many cases correlations do exist between physical and social signals that reflect the
common event being observed by both of these things. And in many cases social signals reveal
why for the physical sensor readings. And then maybe it is even useful in the outlier cases.
And what we found is matrix factorization is a good technique for fusing these sources of
information. It helps solve the cold start problem. And in fact what I didn't present it also helps in
noise filtering. We are also wanting to look into explore other text features than just using
LDA topic modeling. We also want to look at some other factorization methods for doing that.
And what would be interesting is the evolution of the situation or the temporal event evolution
event evolution. And also we want to see whether for sentiment analysis of particular region
you know can we can we use a similar technique. So that's what we want to do. And that's the end of
my talk. And I just want to end with this that we are hoping that if this works well then we can
have tweeting sensors and tweeting humans and exchanging information with each other hopefully
for a better world.
