Social Media Syndromic Surveillance

The Public Health Agency of Sweden has an initiative called Hälsorapport, which is part of the European system Influenzanet, whose overall goal is “monitor the activity of influenza-like-illness (ILI) with the aid of volunteers via the internet.” The goal of Hälsorapport is similarly to monitor the spreading of diseases in Sweden and to inform the general public, the health care system, and other government agencies about the current health status of Sweden. The monitoring is done by eliciting weekly reports from volunteers regarding their general health status, and in particular regarding any symptoms they might have. According to the website, there are currently 3558 volunteers providing weekly reports about their health. The following plot demonstrates a result from the project. The orange line shows the occurrence of influenza-like diseases (no clear trend thus far for 2014), and the blue line shows the occurrence of respiratory infections: apparently there was a significant outbreak of respiratory infections in Sweden during the first weeks of September.

Resultat från Hälsorapport

Resultat från Hälsorapport

This is a great initiative; the recent outbreak of Ebola is a chilling reminder of the importance of syndromic surveillance, and the best way to understand how people feel is to ask them how they feel. Asking some 3000 people is a very good start!

Now imagine you could ask everyone. And imagine you could ask them not only once a week, but all the time. That is sort of what we do when it comes to social media monitoring, but with the difference that we listen to what people say rather than eliciting answers from them. Listening has the advantage of avoiding any elicitation effects, which is the problem that people may overstate (or understate) symptoms when prompted for a report about their health. Furthermore, we listen to the entire Swedish social media feed (which includes the entire Swedish blogoshpere, the main forums, the entire Swedish Twitter feed, and all open posts on Facebook), and we listen all the time; whenever someone posts something on social media in Sweden, we (or rather, our systems) read it. Rather than asking a few thousand people how they feel or what they think, we listen to everyone who posts on social media.

We have previously blogged about how we can use monitoring of discussions in social media to measure flu trends. This is not something we do only as an isolated case study. On the contrary, we continuously monitor expressions of a large number of symptoms in Swedish social media. The plot below shows the frequency of mentions of respiratory symptoms in Swedish social media from 2012 to 2014. Note the pronounced spike for 2014 (the red circle): this is the very same outbreak of respiratory infection during the first weeks of September that was detected by Häslokontroll. Note also that similar spikes occur at the same time period each year.

Mentions of respiratory symptoms in Swedish social media from 2012 to 2014

Mentions of respiratory symptoms in Swedish social media from 2012 to 2014

Since we have no medical expertise, we will abstain from speculating about the causes of these outbreaks of respiratory infection. However, we will make a prediction based on this yearly recurring pattern: we predict that there will be an increase in respiratory infections in Sweden in September 2015 (and in 2016, and in 2017).

The fact that our measures correlate with the report from Hälsorapport demonstrates the viability of using social media for syndromic surveillance. Note the difference between our approach and Google Flu Trends: we monitor the use of terms relating to various symptoms in social media, whereas Google monitors when people use various search terms (on Google). We believe the former approach may lead to earlier outbreak detection, since people typically express themselves very directly and spontaneously in social media, and they post about whatever symptom they might have at the moment, without necessarily realizing they have an infection:

(We of course have no idea whether Maria, who writes “my cough is killing me” actually has a respiratory infection or not.)

Another benefit of social media monitoring is the fact that we listen to what people say, all the time. We can monitor expressions of symptoms down to minute resolution, as in the following plot that shows expressions of respiratory symptoms in Swedish social media per minute in the first week of September 2014. Such fine-grained time resolution may be important in critical scenarios.

Mentions of respiratory symptoms per minute

Mentions of respiratory symptoms per minute

We believe listening to what everyone says is important. In our case, we use our technology to read what everyone writes. In many cases, this is a viable proxy for the voice of the population, not the least when it comes to syndromic surveillance.

Gavagai on the 33-list

We are pleased to announce that Gavagai made it to the prestigious 33-list, Sweden’s top list of innovative high-tech companies. The list is compiled yearly by Ny Teknik and Affärsvärlden, Sweden’s leading technical and business magazines respectively. The honor was awarded by the editor in chief of Affärsvärlden, Jon Åsberg.

Gavagai co-founder Jussi Karlgren (center) receiving the 33-list award. Photo by Annika Rudemo.

—There is no short-cut to understanding the wealth of information found in human language. This requires specialized technology, which is what we build at Gavagai. Our goal is to build tools that allow every creative developer to tap into this knowledge, says Dr. Jussi Karlgren, co-founder of Gavagai.

Miserable Monday and the Effect of Vacation in Swedish Social Media

Recently, we found out that Miserable Monday might not be anything but a myth. As avid fans of the idea of a complete banishment of Mondays, it will take more than a couple of news articles to convince us. Luckily, Ethersource is more than ready to clear up any doubts.

For some time, we have been monitoring the Swedish domain of social media, and how people are feeling when talking about themselves. The curves have been steadily working their ups and downs. However, these past few months we have been noticing a very curious occurrence. First, let’s take a look at this graph.

What we are seeing is a curve representing the general happiness of people when speaking of themselves, for a period of time around March earlier this year, measured using an index we call Positivity Propensity Index (PPI). It’s not a particularly exciting graph, other than affirming what has already been stated: People do seem to speak more fondly of themselves when weekends are upon them. But other than that, there doesn’t seem to be any certain weekday that stands out among others. Our previous hard stance against the impartiality of research might have starten to soften up a bit.

Now, let’s continue on to the peculiarities.

This is a graph from the beginning of May until today. For this graph’s sudden change to make any sense, you might need to obtain some background info in Swedish culture, and especially in a holiday called Midsommar – a day full of culinary deliciousness and drinking. This is the peak of June 23 you see, and what happens thereafter seems to indicate that Swedes are no longer slaves of time. Suddenly, Tuesday no longer differs from Saturday, people are generally happier, and the regularities we clearly could see earlier in spring starts to become more clouded. Vacation has arrived.

Swedish social media has yet to return to its normal, moody self. But surely, it seems inevitable.

Winter is indeed coming.

Measuring the popularity of the contestants in the Eurovision Song Contest using Twitter

In this post, we confirm that Loreen is well placed to win the popular vote in the Eurovision Song Contest final 2012.

We have previously shown in this blog that Ethersource monitoring of on-line sentiment can predict the popular vote in certain high-profile media events, such as the national Eurovision Song Contest. In this post, we report on some observations on using Ethersource to measure the popularity of the contestants in the international Eurovision Song Contest, based on analysis of expressions of popularity on Twitter. The following image shows the relative popularity scores of the participating countries.

Popularity of each country

It should be obvious to anyone following the pre-contest speculations about who will win the ESC 2012 that the proportions of popularity in this image do not correlate with current betting odds for the ESC final (the current odds can be found at any betting site). The image shows Ireland and the UK as the most popular contributions in the ESC final (they are ranked 11th and 5th in the current betting odds). One reason for this discrepancy can be that popularity and betting odds do not refer to the same type of measurement; popularity refers to population-wide opinion, while betting odds are estimates of who will win the actual contest (which is determined both by popular and jury votes). Another reason for this discrepancy is the issues identified in commentaries of other recent attempts to predict election votes based on sentiment analysis of the Tweet stream:

  • Twitter users (and users of other social media) do not constitute a perfect sample of the population, which means that measurements based on Twitter may not be representative for the population as a whole.
  • Twitter is a perfect medium for marketers and campaigns, which makes the analysis sensitive to ad-bots and automated Twitter campaigns.

These concerns are of course valid also for the present scenario. However, even more important when comparing measurements based on Twitter analysis across different countries are the following issues:

  • There is a huge difference in population size between the European countries: Russia has a European population of more than 100 million, while Iceland has a population of a mere 300 000 inhabitants.
  • The Twitter penetration (i.e. proportion of the population that use Twitter) is very different for different countries. In the present scenario, where we measure expressions of popularity on Twitter, it means that some countries may get high popularity scores merely because a comparatively large proportion of the population in that country uses Twitter (people tend to promote their own country’s entry in the ESC).

It is somewhat difficult to find recent and reliable estimates of the Twitter penetration per country, but not so recent studies show that the Netherlands, Turkey, UK, and Ireland top the list for Twitter penetration in Europe. Perhaps this explains the results we see in the image above? Scaling the popularity scores for each country by the estimated number of Twitter users in that country produces the following image:

Popularity of each country

When scaling with Twitter penetration, Sweden gets the highest relative popularity score. This is in line with current betting odds, which unanimously rank Sweden as the most likely winner. However, the other countries that receive high normalized popularity scores do not correlate with odds rankings: Greece has the second highest popularity score (ranked 14th place in the odds rankings), followed by Denmark (ranked 8th place), Ireland (11th), and Iceland (7th). These discrepancies may be due to the issues with non-representativeness and Twitter penetration discussed above. We may also add the following issues:

  • The activity level of the Twitter population in some countries may not correspond with the Twitter penetration; Twitter users may be more active in some countries than others.
  • The interest for the ESC may be higher in certain countries than others, thus leading to more Tweets about the contestants from that country.

We conclude this post with the observation that Loreen seems to be the likely winner of the popular vote in the ESC final 2012. We also conclude that attempting to model population-wide opinions based on Twitter analysis is a non-trivial task that requires more than merely counting word frequencies.

Monitoring on-line social media for as-it-happens customer churn related to mobile network operators in the US

Churn is a measure of customers leaving a subscription-based service over time. In this post, we use Ethersource to

  • demonstrate real-time monitoring of churn-propensity related to telecom services;
  • characterize customer churn by means of annoyance, uncertainty, change, and negativity;
  • identify and extract, in real time, the source documents provoking the churn for a service (in this particular case, a rumour surrounding Sprint Nextel’s service).

A challenging question in subscription-based industry segments is: As a service provider, how do I detect that a churn-provoking event is taking place, in a timely manner permitting me to act on that information, in order to short-circuit the situation (as opposed to finding it out at the end of the fiscal Quarter)? 

Consider as back-drop to this post the age old adage “A Lost Customer is Not a Potential Customer”, and its relative pertaining to the higher cost associated with gaining a new customer, than it is to retain an existing one: “A Bird in the Hand”. As a service provider operating in a competitive landscape, you are concerned with those of your customers who are on the verge of terminating their subscription with you in favor of one of your competitors. Hence, the question starting off the post. What’s even more pressing is that since churn and retention is (approximately) a zero-sum game, you need to be aware of churn information related to your competitors when deciding on, and executing your contingency plans.

We use Ethersource to take a look at the five largest mobile network operators in the United States with respect to the relative manifestation in English social media of the churn components, introduced below, during August and September 2011. The operators are (rank and numbers from wikipedia): Verizon Wireless (107.7M subscribers; 35% of the subscribers among the five operators compared), AT&T Mobility (100.7M; 32%), Sprint Nextel (51.1M; 16%), T-Mobile USA (33.73M; 11%), and TracFone Wireless (17.75M; 6%).

Ethersource facilitates the detection of peaks in churn component signals, and allows its user to identify and thus to directly engage the individuals airing the concerns that underlie such peaks. As an operator, you can use this opportunity to raise your visibility in order to make yourself available, allowing for the user to contact you at will; directly approach individual users, or; launch and follow up campaigns targeted at a select group of users.

We charactersize customer churn in terms of a number of core components, all related to how people express themselves in relation to a service provider with respect to annoyance, uncertainty, change, and negativity. Increasing or fluctuating signal levels for any of these components, or combinations thereof, may constitute a cause for concern.

First off, let’s see how the number of subscribers per operator relates to the on-line chatter for the given time period. Image 1 shows the relative amount of chatter for the operators in September. The only operator generating an on-line buzz larger than its proportion in terms of subscribers is Sprint Nextel.

Image 1: The relative number of on-line chatter for the five mobile network operators in August 2011.

Image 1: The relative number of on-line chatter for the five mobile network operators in September 2011.

Given the disproportional attention awarded them, Sprint is what we’ll focus on. We use the time series, as they are produced by Ethersource with respect to the churn components and the companies outlined above, in an as-it-happens manner to identify a situation in September in which Sprint, but not its competitors, may see an increase in customer churn. Note that this approach allows for continuous monitoring of events as they take place; there is no need to wait until after-the-fact to carry out a proper analysis.

The images below show expressions of uncertainty (Image 2), annoyance (Image 3), change (Image 4), and negativity (Image 4) towards the five mobile network operators. We are monitoring all the time series depicted below simultaneously, looking for occasions when the values for Sprint Nextel are higher than those of its competitors. A high value for a combination of the churn components, including as many components as possible, warrants a closer inspection. Looking at the images below, there is one date in particular that is interesting: September 16, 2011. It is the only date on which all churn components exhibit higher values for Sprint than they do for any of its competitors. (Note that the graphs are timed to Stockholm time, and so the start of the event is really on September 15 in the timezones hosting Sprint.)

Image 3: Expressions of uncertainty. Red circles mark dates when values for Sprint Nextel are higher than for its competitors.

Image 2: Expressions of uncertainty. Red circles mark dates in September when values for Sprint Nextel are higher than for its competitors.

Image 4: Expressions of annoyance. Red circles mark dates when values for Sprint Nextel are higher than those of its competitors.

Image 3: Expressions of annoyance. Red circles mark dates in September when values for Sprint Nextel are higher than for its competitors.

Image 2: Expressions of change. Red circles mark dates when values for Sprint Nextel are higher than those of its competitors.

Image 4: Expressions of change. Red circles mark dates in September when values for Sprint Nextel are higher than for its competitors.

Image 5: Expressions of negativity. Red circles mark dates when values for Sprint Nextel are higher than those of its competitors.

Image 5: Expressions of negativity. Red circles mark dates in September when values for Sprint Nextel are higher than for its competitors.

What happened to Sprint on September 16?

Rumors of iPhone 5 happened to Sprint. The rumors had it that Sprint would be the exclusive reseller of the new Apple handset.

By using Ethersource, we can inspect the expressions in the social media underlying the signals, and thus make sure we understand exactly what is going on. It turns out that people express themselves in relation to

  • uncertainty about whether the Sprint network can handle the traffic generated by the new iPhone: “Can Sprint handle iPhone traffic?”
  • annoyance about rumored alterations to contracts as a result of the introduction of the new handset: “Sprint Readies To Remove Even More Customer Incentives.”
  • change of the abovementioned contracts: “Sprint puts an end to the Premier loyalty program”
  • negativity that the new handset might have on Sprint’s services: “What Sprint users should find alarming is his acknowledgement that the iPhone could potentially hurt the company in the near term because of the higher subsidies involved…”.

The quotes above are taken verbatim from top-ranked sources in Ethersource. Image 6 below shows a partial screenshot of Ethersource with a number of sources ranked according to their uncertainty value, early on September 16.

Image 6: A partial screen shot of Ethersource, showing the top five sources contributing to the uncertainty score at the point in time labelled "Z" in the red oval

Image 6: A partial screen shot of Ethersource, containing the top five sources (cloaked) contributing to the uncertainty score at the point in time labelled "Z" in the red oval.

So, the rumors of a possible advantage for Sprint, that is, the new iPhone 5  handset from Apple, turns out not to be all that positively received.

We’re sure Sprint was all over this particular event; it is used here merely as a case study to show some of the capabilities of Ethersource, when working with a polarity set-up beyond that of the ordinary positive-negative dichotomy. In fact, it would’ve been very hard to use negativity only to identify the rumor of the iPhone 5 set out on September 16 as something extraordinary for Sprint.

Gavagai analyses the Greeks’ attitudes toward the cancelled referendum and the Eurozone

Greece has now officially scrapped plans for a referendum on the Euro bailout plan. Our research shows that a small majority (53%) of Greeks did not want the referendum, the exact subject matter and formulation of which remains unspecified and unclear. In our view, a majority (79%) of Greeks wants to remain in the eurozone. We note, however, that willingness to keep the Euro falls dramatically when the issue is raised in the context of austerity measures, which leads us to believe that if a referendum – on whether to remain in the Eurozone subject to harsh austerity conditions – was held today, almost every second Greek (46%) might opt for leaving the Eurozone. Similarly to patterns of violent demonstrations, the trend in pro-exit attitude is extremely volatile and directly linked to the current (daily) dominating topics, and to austerity and fear in particular. Between November 4th and 6th, there was a significant increase in pro-exit attitude, as illustrated by the image below.

This analysis is based on large volumes of Greek-language open sources including social media.

Pro-exit of the Eurozone as measured in Greek online media, November 4 to 6, 2011.

Greek pro-exit attitudes toward the Eurozone as measured in Greek online media, November 4 to 6, 2011.

Analysis of buzz, hatred, and associations during the unraveling of Håkan Juholts accommodation reimbursements affair

In this post, we analyze the mention frequency, strong negative sentiment or hatred and terminology associated with the leader of the Swedish Social Democrats, Håkan Juholt, during the unraveling of his accommodation reimbursements affair. The analysis is made on the Swedish blogosphere between October 1 and 19, 2011, by using Ethersource technology and our proprietary associations engine.

The short version of the story is: Juholt went into the accommodation affair in early October with a fairly low buzz in the blogosphere, and a reasonable level of strong negative sentiment or hatred expressed towards him considering he is a leading politician in an opposition party. At the mid-end of the month, the buzz once again settles, but the hatred has reached high levels! The terms associated with him suggests that the affair will not wear off easily.

Image 1 and Image 2, below, illustrate several things. The blue curve denotes the mention frequency, that is, the number of mentions of Juholt in the Swedish blogosphere. The red curve denotes the hatred expressed in relation to Juholt. The words pinned to each day are the new prominent terms associated with Juholt with respect to the terms for the previous day.

The time period covered by Image 1 ranges from October 6 – 10, 2011. On October 6, the terms associated with Juholt are mainly concerned with the shadow budget proposed by the Social Democrats on the day before. The mention frequency is quite small, and the hatred expressed in relation to Juholt is not exceptional. October 7 is the day of publication of an article by the Swedish newspaper Aftonbladet claiming that Juholt had requested too much allowance for his residence. The mention frequency increases markedly, while the hatred is similar to the day before. Although still influenced by online discussion pertaining to the shadow budget, the terms associated with Juholt clearly show evidence of an affair in the making; the article by Aftonbladet has gained traction in the blogosphere. Saturday, October 8, shows further increase in mention frequency, which also tends to vary with the time-of-day. The graph also shows that the hatred is on the rise; bloggers are picking up on the reimbursements affair. This is also evident in the associated terms where Juholt is compared to a cameral mishap made by the former Social Democratic leader Mona Sahlin in 1995 known as tobleroneaffären. The terms also reflect that the affair is about Juholt’s apartment, and that he will hold a press conference. Moving on to October 9, we see that the mention frequency, now clearly varying with the time-of-day, levels out. As does the hatred. The associated terms concern solidarity of Juholt, that his trust is declining, and also refers to the Swedish Prosecution Authority. Finally, Image 1 shows that, for October 10, the mention frequency is still high, and the hatred is rising markedly; the aversion vented in relation to Juholt is reaching high levels! The associated terms are related to crime (preliminary investigation, fraud, prosecution) and to politics (voters, party, resign, party leaders, citizens).

Image 1: The period covering the on-set of the affair. The terms for a given day in the image are the new terms associated with Håkan Juholt for that particular day.

Image 1: The period covering the on-set of the affair, October 6 - 10, 2011. The terms for a given day in the image are the new terms associated with Håkan Juholt for that particular day. Click the image for a larger version.

Moving on to a later part of the affair, Image 2 illustrates the period of October 15 – 19. The general trend regarding mention frequency for the period is that it is declining. The divergence between frequency and hatred is interesting, and the fact that the hatred rises as the mention frequency declines suggests that while people are talking less about Juholt, those who do are still very upset. Let’s look at the associated terms day-by-day. On October 15, the blogosphere is mainly about the media hunt for Juholt. October 16 concerns the “obivious” rules for accommodation reimbursements (calling editor-in-chief Jan Helin). October 17, again, concerns the rules, the intent of Juholt, and Sundbyberg, where Juholt held a meeting with his fellow party members. October 18 was about cheating and the form Juholt filled out when requesting the allowance. Finally, Image 2 ends with October 19, mentioning netroots and politometern, both being portals for political blogs, as well as the Swedish Radio.

Image 2: The period covering a later part of the affair, October 15 - 19, 2011. Click the image for a larger version.

Image 2: The period covering a later part of the affair, October 15 - 19, 2011. Click the image for a larger version.

The complete list of salient terms associated with Håkan Juholt for the period October 1 – 19 is available at Legend to the list: a blue term means it is new on the list, a red term means that the association between it and Juholt is weaker than it was before. Analogously, a green term means that its association with Juholt is stronger than it was before.

New words in New Text

New Text is what we like to call the sort of spontaneous non-edited material we spend much of our time processing. We contrast this primarily with traditional text from editorial sources. There are interesting differences between new text and traditional text — and this has been the subject of much debate in philological, sociological, and to some extent even computational circles. Much of what has been said is interesting, much is pure piffle, and we have made our own pronouncements about what sort of changes we believe are ahead (this one prononuncement in Swedish). We expect we will have reason to return to this discussion.

Two of the most obviously noticeable things about new text as compared to traditional are its lexical creativity and its proof-reading sloppiness. From the point of view of a text analysis processor they amount to much the same thing: a constant influx of new tokens, never seen before. Newspeak, lolspeak, l33tspeak, new topics, puns, lesser used sociolects and idiolects, misspellings, mistypings, and teenage angst all contribute to a vast and vastly growing symbol table.

Image 1: Number of unique words, as a function of time, in Tweets and newsprint

We are perfectly happy to cope with that sort of growth! This is one of the underlying principles of our design – not to be fazed by new and unexpected usage. An example: in a conversation, if your counterpart mispronounces a central term (or speaks an unexpected dialect or variety of the language) or uses a synonym which you didn’t know before, it might throw you the first time. Next time around, you cope with it. It might annoy you a few times but eventually – after only a handful of observations – you will be habituated to the pronunciation or the synonym. You will not be retranslating the new observation back to your previous knowledge of the world – some perceptual and lexical process in your language analysis system simply notes that this concept appears to be subject to some variance. This happens without retraining or recompilation of your lexicon: you don’t stop conversation to figure it all out. (Or, well, you shouldn’t. If you do so often, you will eventually lose friends.) This is the way we believe things should be done!

Here are some numbers we got out of our text collections (see Image 1 above). Take two years of newsprint from a reputable newspaper (in this case, first year from a major US daily, and the second year from a major UK daily). That’s about a 100 Mwords and about 100 kWords per day. The first few days, most words are new, but it settles pretty rapidly into a steady 100-200 new tokens per day. (The switch across the Atlantic probably contributes to slightly more of those, but not too noticeably.) In itself, an argument for a learning system!

Now, as comparison, take two months of tweets on various topics in English. (Well, mostly in English. Mixed-language tweets are in our test material. We don’t want to take them out – that’s the way the world is.) That’s more than 1 Gwords, working out to about 20 Mwords a day. And about 200 000 new words. Per day. Try to keep up with that, manually! We firmly believe the architecture of the system to handle this needs to view this sort of data variance as normal, not something to meet by quick hacks or filtering. Ethersource is built with this in mind.

And the future is near: what do you think will happen when language analytics will move to processing speech data as well as text? Do you believe the number of tokens in the data stream will converge to a smaller number?

The killing of Mashaal Tammo through the eyes of Arabic social media

In this post, we show three things:

  1. The possibility of using Ethersource to monitor Arabic social media
  2. to detect violent on-line chatter, and
  3. to identify the real-worlds events underlying the resulting signal.

On the evening of Friday, October 7 2011, Kurdish opposition politician and founder of the Kurdish Future Movement Party Mashaal Tammo was shot dead by masked men in his home, in north eastern Syria. His killing was soon attributed to the regime of Syria. The next day, Saturday, October 8, the funeral party for Tammo, with 50,000 to 100,000 attendants, turned into the largest gathering of protesters since the start of the uprising seven months prior. Syrian security forces intercepted the crowd, and shot at least 5 people dead, injuring numerous others.

We’ve used Ethersource to monitor Syria in Arabic social media for quite some time. Image 1, below, illustrates the on-line violent chatter pertaining to Syria between Thursday, October 6, and Sunday, October 9, 2011; the weekend when Tammo was killed. Image 1 is annotated with the time of the killing of Tammo as he was reportedly attacked in the evening of the 7th (Syria being in a time zone one hour ahead of the time scale of the graph), and the approximate time for his funeral. What is striking is the surge in chatter after the demise of Tammo, caused by reactions by people active in social media. At this point in time, the steep rise, and the high level of violent expressions indicate that physical manifestation related to the killing of Tammo is likely. We call it a crowd induced event; a process sparked by a real-world event, and then fueled by an on-line crowd in such a way that increases the possibility of a physical reaction to the initial event.  After the attack on the funeral party, the levels of violent chatter increased even more.

Image 1: Violent chatter in Arabic social media with respect to Syria for the weekend when Kurdish opposition politician Tammo was gunned down.

Ethersource facilitates the verification of a signal by allowing the operator to inspect the individual documents contributing toit. Image 2, below, shows three screenshots representing some of the sources underlying the signal on Friday, October 7, and Saturday, October 8. The translations from Arabic to English was made using Google Translate.

Image 2: Screenshots of some of the sources contributing to the on-line violent expressions. The translations were made with Google Translate.

To sum up: By using Ethersource, we are able to aggregate the attitudes expressed in on-line media, as they are emitted, with respect to a given entity, in a given language, thus constructing a view of attitudes over time. The view facilitates the identification of time periods in which on-line activity warrants our attention. Ethersource, then, provides access to the documents contributing to the aggregated attitudes in the time period under scrutiny.

So, with Ethersource, we can follow any target with respect to any attitude in any language. On top of that, Ethersource continuously learns from the language it is exposed to.