Gavagai will be giving a keynote talk at tomorrow’s Internet Discovery Day, a part of Internetdagarna in Stockholm. Come listen!
Monthly Archives: November 2011
ECIR Industry Day
We are pleased to announce that our paper “Usefulness of Sentiment Analysis” has been accepted for presentation at the Industry Day of the 34th European Conference on Information Retrieval in Barcelona on April 5, 2012. It will bring up some of the issues previously discussed in this blog and elaborate on them.
Monitoring on-line social media for as-it-happens customer churn related to mobile network operators in the US
Churn is a measure of customers leaving a subscription-based service over time. In this post, we use Ethersource to
- demonstrate real-time monitoring of churn-propensity related to telecom services;
- characterize customer churn by means of annoyance, uncertainty, change, and negativity;
- identify and extract, in real time, the source documents provoking the churn for a service (in this particular case, a rumour surrounding Sprint Nextel’s service).
A challenging question in subscription-based industry segments is: As a service provider, how do I detect that a churn-provoking event is taking place, in a timely manner permitting me to act on that information, in order to short-circuit the situation (as opposed to finding it out at the end of the fiscal Quarter)?
Consider as back-drop to this post the age old adage “A Lost Customer is Not a Potential Customer”, and its relative pertaining to the higher cost associated with gaining a new customer, than it is to retain an existing one: “A Bird in the Hand”. As a service provider operating in a competitive landscape, you are concerned with those of your customers who are on the verge of terminating their subscription with you in favor of one of your competitors. Hence, the question starting off the post. What’s even more pressing is that since churn and retention is (approximately) a zero-sum game, you need to be aware of churn information related to your competitors when deciding on, and executing your contingency plans.
We use Ethersource to take a look at the five largest mobile network operators in the United States with respect to the relative manifestation in English social media of the churn components, introduced below, during August and September 2011. The operators are (rank and numbers from wikipedia): Verizon Wireless (107.7M subscribers; 35% of the subscribers among the five operators compared), AT&T Mobility (100.7M; 32%), Sprint Nextel (51.1M; 16%), T-Mobile USA (33.73M; 11%), and TracFone Wireless (17.75M; 6%).
Ethersource facilitates the detection of peaks in churn component signals, and allows its user to identify and thus to directly engage the individuals airing the concerns that underlie such peaks. As an operator, you can use this opportunity to raise your visibility in order to make yourself available, allowing for the user to contact you at will; directly approach individual users, or; launch and follow up campaigns targeted at a select group of users.
We charactersize customer churn in terms of a number of core components, all related to how people express themselves in relation to a service provider with respect to annoyance, uncertainty, change, and negativity. Increasing or fluctuating signal levels for any of these components, or combinations thereof, may constitute a cause for concern.
First off, let’s see how the number of subscribers per operator relates to the on-line chatter for the given time period. Image 1 shows the relative amount of chatter for the operators in September. The only operator generating an on-line buzz larger than its proportion in terms of subscribers is Sprint Nextel.

Image 1: The relative number of on-line chatter for the five mobile network operators in September 2011.
Given the disproportional attention awarded them, Sprint is what we’ll focus on. We use the time series, as they are produced by Ethersource with respect to the churn components and the companies outlined above, in an as-it-happens manner to identify a situation in September in which Sprint, but not its competitors, may see an increase in customer churn. Note that this approach allows for continuous monitoring of events as they take place; there is no need to wait until after-the-fact to carry out a proper analysis.
The images below show expressions of uncertainty (Image 2), annoyance (Image 3), change (Image 4), and negativity (Image 4) towards the five mobile network operators. We are monitoring all the time series depicted below simultaneously, looking for occasions when the values for Sprint Nextel are higher than those of its competitors. A high value for a combination of the churn components, including as many components as possible, warrants a closer inspection. Looking at the images below, there is one date in particular that is interesting: September 16, 2011. It is the only date on which all churn components exhibit higher values for Sprint than they do for any of its competitors. (Note that the graphs are timed to Stockholm time, and so the start of the event is really on September 15 in the timezones hosting Sprint.)

Image 2: Expressions of uncertainty. Red circles mark dates in September when values for Sprint Nextel are higher than for its competitors.

Image 3: Expressions of annoyance. Red circles mark dates in September when values for Sprint Nextel are higher than for its competitors.

Image 4: Expressions of change. Red circles mark dates in September when values for Sprint Nextel are higher than for its competitors.

Image 5: Expressions of negativity. Red circles mark dates in September when values for Sprint Nextel are higher than for its competitors.
What happened to Sprint on September 16?
Rumors of iPhone 5 happened to Sprint. The rumors had it that Sprint would be the exclusive reseller of the new Apple handset.
By using Ethersource, we can inspect the expressions in the social media underlying the signals, and thus make sure we understand exactly what is going on. It turns out that people express themselves in relation to
- uncertainty about whether the Sprint network can handle the traffic generated by the new iPhone: “Can Sprint handle iPhone traffic?”
- annoyance about rumored alterations to contracts as a result of the introduction of the new handset: “Sprint Readies To Remove Even More Customer Incentives.”
- change of the abovementioned contracts: “Sprint puts an end to the Premier loyalty program”
- negativity that the new handset might have on Sprint’s services: “What Sprint users should find alarming is his acknowledgement that the iPhone could potentially hurt the company in the near term because of the higher subsidies involved…”.
The quotes above are taken verbatim from top-ranked sources in Ethersource. Image 6 below shows a partial screenshot of Ethersource with a number of sources ranked according to their uncertainty value, early on September 16.

Image 6: A partial screen shot of Ethersource, containing the top five sources (cloaked) contributing to the uncertainty score at the point in time labelled "Z" in the red oval.
So, the rumors of a possible advantage for Sprint, that is, the new iPhone 5 handset from Apple, turns out not to be all that positively received.
We’re sure Sprint was all over this particular event; it is used here merely as a case study to show some of the capabilities of Ethersource, when working with a polarity set-up beyond that of the ordinary positive-negative dichotomy. In fact, it would’ve been very hard to use negativity only to identify the rumor of the iPhone 5 set out on September 16 as something extraordinary for Sprint.
The Perry Oops and the impact on Perry Worry
US presidential primaries are a great spectator sport for those of us who do not need to actually cast our vote. The preceding week was no exception – and of course the sight of a presidential candidate fumbling the ball completely on camera was a rare sight. But how immediate was the damage, really?
(For those of you who missed it, Gov. Rick Perry of Texas, one of the front runners, was about to name the three government agencies he would close on assuming office. This is apparently a standard talking point in most of his campaign speeches. On this occasion his memory blanked midway through the point list and he stopped in mid-sentence trying to recall the third agency to be closed – a quite embarrassing moment for a public speaker, but frequent enough for most of us to have experienced something similar.)
In Image 1 you will find a graph of an aggregate of various negative sentiments with respect to the four leading GOP presidential hopefuls for the past week. Nothing too surprising there. Herman Cain is battling allegations of sexual misconduct and impropriety, which generates a fair amount of negative text. Rick Perry is discussed in non-flattering terms after his debate freeze gaffe. Michele Bachmann is a distant fourth to the other three. And while Rick Perry appears to have heavy going ahead of him, the challenge for Herman Cain remains much higher.
Looking closer at Rick Perry, however, we find something quite interesting. In Image 2 you will find a graph of worry and concern with respect to Rick Perry. (Other non-flattering sentiments and attitudes follow much the same pattern.) The debate situation went down on Wednesday evening (20:00 / 8 pm EST). The worry graph only starts rising on Thursday, after lunch. This is, allowing for the time difference (our graph is timed to Stockholm time), early morning on the Eastern seaboard of the US. The negative sentiment is not an immediate reaction to his performance, but a reaction on what media reports and possibly to his attempts at regaining footing. One wonders what the effect of a momentary blankout might have been without the amplification given by chattering commentators!
We will be returning with more reports on the presidential campaign during the next few months!
Ethersource as a tool for data-driven journalism: the case of the fading interest in Julian Assange
The world has taken a keen interest in Wikileaks founder Julian Assange for quite some time now. A recent article in a leading Swedish newspaper, Svenska Dagbladet, claims that the general interest in Assange and Wikileaks has faded, an assertion based on frequency counts of search queries for Assange obtained from Google Trends.
By consulting Ethersource, we can confirm the trend at large. Now, there’s only so much to be said based on frequency counts of search queries. The fact is that people are still showing a clear interest in Assange. However valid the claim that the general public’s interest in Assange is fading might be, the odds for it are low, and as such it makes the story less interesting than it deserves to be.
Data-driven journalism can arguably dig deeper!
The Image below illustrates a different take on Assange. During October, we see two significant occasions on which people active in social media has expressed concerns not possible to distinguish by looking at frequency alone; they are benevolent, and they express worry (the blue and red curve, respectively). There may lurk an interesting story in any of these.
The benevolence expressed during October 13 to 19, is related to Assange’s engagement in the Occupy Wall Street movement. What might be more interesting is the fact that the benevolence falls, while at the same time, the worry rises sharply from November 1st and onward. The event underlying the changing characteristics of the graph at that time, is the extradition of Assange from the UK to Sweden where he is to be questioned over rape allegations. Surely, there must be a story to be told here, based on the views of the inhabitants of on-line social media.
Tracking Swedish political sympathies in social media
This Fall has been more eventful than many in Swedish politics, in spite of no real political events taking place. The recently elected leader of the Social Democratic Workers’ Party has been tangled into a mess of possibly excessive and in any case overcompensated housing allowance remuneration.
Our take:
- Centerpartiet is fading out of view in spite of a newly elected and positively received leader.
- The Social Democrats have become the focus for negative sentiment in the blogosphere, even to the point of distracting the established negative sentiment visavi Sverigedemokraterna.
- A leadership change in the Social Democratic party is imminent.
Our charts of political sentiment over the period of September and October show how the Swedish political scene is expressed in text. The issue is published by Aftonbladet, a left-leaning Stockholm tabloid, on October 7, and the activity of text written on the Social Democrats explodes in intensity on that day. The timeline in Image 1 illustrates this, and the pie chart in Image 2 shows how different the attention towards the parties is to their polled approval ratings, with most notably Social Democrats and the xenophobic Sverigedemokraterna taking the stage to an extent out of proportion to their following. The bar chart in Image 3 shows how the frequency of mention in October (red bars) has changed (blue bars) compared to the previous month. (Also notable is how Centerpartiet is fading out of view in spite of electing a new leader on September 23, and that the yearly congress on October 14-16 for the liberal party Folkpartiet hardly made any impression in the frequency of mention.)

Image 1: Frequency of occurrences of Swedish political parties in Swedish social media during September and October 2011.

Image 2: The proportion of attention awarded the different political parties in Swedish social media in October 2011.

Image 3: The attention awarded to the political parties in October 2011 (red bars), along with the change from September (blue bars).
The bar chart in Image 4 shows how much violent and aversive sentiment is expressed visavi the various parties as a proportion of all written text in the month of October (red bars) and change from September (blue bars). This shows us how the constant pressure from the liberally oriented blogosphere in Sweden on the xenophobic Sverigedemokraterna to an unprecedented extent has been replaced with aversion towards the Social Democrats. The timeline in Image 5 shows how Social Demokrats are different from Sverigedemokraterna: the latter party is a violent fringe party and is frequently mentioned in bursts of violent rhetoric, often on their own initiative, generating peaks in the graph. The Social Democrats, by contrast, have now risen to a new higher and steady level of violent and aversive mention.

Image 4: The violent attitudes expressed toward the political parties in October 2011 (red bars), along with the difference from September (blue bars)

Image 5: Violent and aversive attitude expressed in Swedish social media toward the political parties over time.

Image 6: The proportion of violent and aversive expressions in Swedish social media related to the political parties in October 2011.
Finally, the time line in Image 7 shows how uncertainty as expressed in the Swedish social media has peaked with respect to the Social Democrats.

Image 7: Uncertainty related to the political parties expressed in Swedish social media during September and October 2011.
In conclusion, this analysis shows that the Social Democrats are in a different place than they were only a few weeks ago. The level of strongly negative rhetoric tinged by violence is on a steady level only matched by the habitually violent Sverigedemokraterna and the uncertain sentiment expressed in relation to the Social Democrats adds to a confusing image of the party. This pattern of mention and the now established scarring of the image of the party will be impossible to break without a considerable change in discourse. We cannot see how the current leader of the Social Democrats will be able to achieve this – our prediction is that he will not stay in office for long, and that after a grace period, the length of which is determined more by decency than political expediency, he and the current leadership will be very rapidly replaced.
Gavagai analyses the Greeks’ attitudes toward the cancelled referendum and the Eurozone
Greece has now officially scrapped plans for a referendum on the Euro bailout plan. Our research shows that a small majority (53%) of Greeks did not want the referendum, the exact subject matter and formulation of which remains unspecified and unclear. In our view, a majority (79%) of Greeks wants to remain in the eurozone. We note, however, that willingness to keep the Euro falls dramatically when the issue is raised in the context of austerity measures, which leads us to believe that if a referendum – on whether to remain in the Eurozone subject to harsh austerity conditions – was held today, almost every second Greek (46%) might opt for leaving the Eurozone. Similarly to patterns of violent demonstrations, the trend in pro-exit attitude is extremely volatile and directly linked to the current (daily) dominating topics, and to austerity and fear in particular. Between November 4th and 6th, there was a significant increase in pro-exit attitude, as illustrated by the image below.
This analysis is based on large volumes of Greek-language open sources including social media.
Greece: Politics, Economics, and Rowdy Behaviour separated
Many of us are trying to keep track of what is going on in Greece these days. We will have reason to give you more news flashes of how events in Greece are being narrated and reported – here is a first peek at results from our ongoing monitor. This monitor is based on English-language reports, chatter, tweets, blog and forum posts on the Greek crisis.
Noticeable is that there is a clear spike in our vpi signal in mid-October (blue ring). This signal – the violence propensity index – indicates violent sentiment. The news reports we have collected from those days do not belie this. Negativity in general gives no spike here – these texts report on violence but do so dispassionately. By contrast, the last two days, in wake of rising concern over the impending referendum and political uncertainty in Greece, we find a rising trend (red ring) of negative signal, both editorial reporting and comments – but no violence.
This, of course, is vastly preferable! (And from our point of view it is pleasing to be able to separate political and economic browbeating, however heated, from tear gas and brick throwing.)
We don’t do training, we do learning
We have already expressed in this blog how very pleased we are with the design of Ethersource, the technology we have developed. For Ethersource, the memory model and the processing model are the same thing. The memory model we have built Ethersource with has a built-in processing model. New data is projected into our memory model without confounding previous knowledge and without resizing the memory model. (We will return to technical details here in the near future.) Ethersource delivers salient term-term relations in real time, on-line, without recomputation or postprocessing of the aggregated data.
We are frequently asked how much data we need to train the model and how long it takes to train it.
We never quite know how to respond. Our design renders that question moot.
As new texts arrive, so do new words, terms, and turns of phrase. The inventiveness and creativity of human authorship is inexhaustible (as is our occasional unwillingness to conform to norm). Any system built to handle realistic amounts of new text must be built – by design! – to handle anomalies, divergence from norms, change in norm. This poses a challenge to systems: knowledge-based systems will need to maintain and update their knowledge resources and statistics-based systems need to update and realign their parameters and rebuild their training data sets. This is because they are designed with a disconnect between their knowledge and their processing — statistical regularities or symbolic rules are extracted from training data and then used on future incoming data. The process of acquiring new knowledge can be variously demanding in terms of data, human editorial effort, processing requirements.
By contrast, the Ethersource semantic model is available from start. We don’t do training. Ethersource is built to learn, not to be trained. From the very first token processed, it is at service. Learning is done on the fly. Of course, there is a learning curve: at first, Ethersource knows little, after a while it knows more, eventually it is quite erudite. But at all times, the semantic relationships between the units most recently encoded by Ethersource is only a query away.





