Monitoring on-line social media for as-it-happens customer churn related to mobile network operators in the US

Churn is a measure of customers leaving a subscription-based service over time. In this post, we use Ethersource to

  • demonstrate real-time monitoring of churn-propensity related to telecom services;
  • characterize customer churn by means of annoyance, uncertainty, change, and negativity;
  • identify and extract, in real time, the source documents provoking the churn for a service (in this particular case, a rumour surrounding Sprint Nextel’s service).

A challenging question in subscription-based industry segments is: As a service provider, how do I detect that a churn-provoking event is taking place, in a timely manner permitting me to act on that information, in order to short-circuit the situation (as opposed to finding it out at the end of the fiscal Quarter)? 

Consider as back-drop to this post the age old adage “A Lost Customer is Not a Potential Customer”, and its relative pertaining to the higher cost associated with gaining a new customer, than it is to retain an existing one: “A Bird in the Hand”. As a service provider operating in a competitive landscape, you are concerned with those of your customers who are on the verge of terminating their subscription with you in favor of one of your competitors. Hence, the question starting off the post. What’s even more pressing is that since churn and retention is (approximately) a zero-sum game, you need to be aware of churn information related to your competitors when deciding on, and executing your contingency plans.

We use Ethersource to take a look at the five largest mobile network operators in the United States with respect to the relative manifestation in English social media of the churn components, introduced below, during August and September 2011. The operators are (rank and numbers from wikipedia): Verizon Wireless (107.7M subscribers; 35% of the subscribers among the five operators compared), AT&T Mobility (100.7M; 32%), Sprint Nextel (51.1M; 16%), T-Mobile USA (33.73M; 11%), and TracFone Wireless (17.75M; 6%).

Ethersource facilitates the detection of peaks in churn component signals, and allows its user to identify and thus to directly engage the individuals airing the concerns that underlie such peaks. As an operator, you can use this opportunity to raise your visibility in order to make yourself available, allowing for the user to contact you at will; directly approach individual users, or; launch and follow up campaigns targeted at a select group of users.

We charactersize customer churn in terms of a number of core components, all related to how people express themselves in relation to a service provider with respect to annoyance, uncertainty, change, and negativity. Increasing or fluctuating signal levels for any of these components, or combinations thereof, may constitute a cause for concern.

First off, let’s see how the number of subscribers per operator relates to the on-line chatter for the given time period. Image 1 shows the relative amount of chatter for the operators in September. The only operator generating an on-line buzz larger than its proportion in terms of subscribers is Sprint Nextel.

Image 1: The relative number of on-line chatter for the five mobile network operators in August 2011.

Image 1: The relative number of on-line chatter for the five mobile network operators in September 2011.

Given the disproportional attention awarded them, Sprint is what we’ll focus on. We use the time series, as they are produced by Ethersource with respect to the churn components and the companies outlined above, in an as-it-happens manner to identify a situation in September in which Sprint, but not its competitors, may see an increase in customer churn. Note that this approach allows for continuous monitoring of events as they take place; there is no need to wait until after-the-fact to carry out a proper analysis.

The images below show expressions of uncertainty (Image 2), annoyance (Image 3), change (Image 4), and negativity (Image 4) towards the five mobile network operators. We are monitoring all the time series depicted below simultaneously, looking for occasions when the values for Sprint Nextel are higher than those of its competitors. A high value for a combination of the churn components, including as many components as possible, warrants a closer inspection. Looking at the images below, there is one date in particular that is interesting: September 16, 2011. It is the only date on which all churn components exhibit higher values for Sprint than they do for any of its competitors. (Note that the graphs are timed to Stockholm time, and so the start of the event is really on September 15 in the timezones hosting Sprint.)

Image 3: Expressions of uncertainty. Red circles mark dates when values for Sprint Nextel are higher than for its competitors.

Image 2: Expressions of uncertainty. Red circles mark dates in September when values for Sprint Nextel are higher than for its competitors.

Image 4: Expressions of annoyance. Red circles mark dates when values for Sprint Nextel are higher than those of its competitors.

Image 3: Expressions of annoyance. Red circles mark dates in September when values for Sprint Nextel are higher than for its competitors.

Image 2: Expressions of change. Red circles mark dates when values for Sprint Nextel are higher than those of its competitors.

Image 4: Expressions of change. Red circles mark dates in September when values for Sprint Nextel are higher than for its competitors.

Image 5: Expressions of negativity. Red circles mark dates when values for Sprint Nextel are higher than those of its competitors.

Image 5: Expressions of negativity. Red circles mark dates in September when values for Sprint Nextel are higher than for its competitors.

What happened to Sprint on September 16?

Rumors of iPhone 5 happened to Sprint. The rumors had it that Sprint would be the exclusive reseller of the new Apple handset.

By using Ethersource, we can inspect the expressions in the social media underlying the signals, and thus make sure we understand exactly what is going on. It turns out that people express themselves in relation to

  • uncertainty about whether the Sprint network can handle the traffic generated by the new iPhone: “Can Sprint handle iPhone traffic?”
  • annoyance about rumored alterations to contracts as a result of the introduction of the new handset: “Sprint Readies To Remove Even More Customer Incentives.”
  • change of the abovementioned contracts: “Sprint puts an end to the Premier loyalty program”
  • negativity that the new handset might have on Sprint’s services: “What Sprint users should find alarming is his acknowledgement that the iPhone could potentially hurt the company in the near term because of the higher subsidies involved…”.

The quotes above are taken verbatim from top-ranked sources in Ethersource. Image 6 below shows a partial screenshot of Ethersource with a number of sources ranked according to their uncertainty value, early on September 16.

Image 6: A partial screen shot of Ethersource, showing the top five sources contributing to the uncertainty score at the point in time labelled "Z" in the red oval

Image 6: A partial screen shot of Ethersource, containing the top five sources (cloaked) contributing to the uncertainty score at the point in time labelled "Z" in the red oval.

So, the rumors of a possible advantage for Sprint, that is, the new iPhone 5  handset from Apple, turns out not to be all that positively received.

We’re sure Sprint was all over this particular event; it is used here merely as a case study to show some of the capabilities of Ethersource, when working with a polarity set-up beyond that of the ordinary positive-negative dichotomy. In fact, it would’ve been very hard to use negativity only to identify the rumor of the iPhone 5 set out on September 16 as something extraordinary for Sprint.

Gavagai analyses the Greeks’ attitudes toward the cancelled referendum and the Eurozone

Greece has now officially scrapped plans for a referendum on the Euro bailout plan. Our research shows that a small majority (53%) of Greeks did not want the referendum, the exact subject matter and formulation of which remains unspecified and unclear. In our view, a majority (79%) of Greeks wants to remain in the eurozone. We note, however, that willingness to keep the Euro falls dramatically when the issue is raised in the context of austerity measures, which leads us to believe that if a referendum – on whether to remain in the Eurozone subject to harsh austerity conditions – was held today, almost every second Greek (46%) might opt for leaving the Eurozone. Similarly to patterns of violent demonstrations, the trend in pro-exit attitude is extremely volatile and directly linked to the current (daily) dominating topics, and to austerity and fear in particular. Between November 4th and 6th, there was a significant increase in pro-exit attitude, as illustrated by the image below.

This analysis is based on large volumes of Greek-language open sources including social media.

Pro-exit of the Eurozone as measured in Greek online media, November 4 to 6, 2011.

Greek pro-exit attitudes toward the Eurozone as measured in Greek online media, November 4 to 6, 2011.

Analysis of buzz, hatred, and associations during the unraveling of Håkan Juholts accommodation reimbursements affair

In this post, we analyze the mention frequency, strong negative sentiment or hatred and terminology associated with the leader of the Swedish Social Democrats, Håkan Juholt, during the unraveling of his accommodation reimbursements affair. The analysis is made on the Swedish blogosphere between October 1 and 19, 2011, by using Ethersource technology and our proprietary associations engine.

The short version of the story is: Juholt went into the accommodation affair in early October with a fairly low buzz in the blogosphere, and a reasonable level of strong negative sentiment or hatred expressed towards him considering he is a leading politician in an opposition party. At the mid-end of the month, the buzz once again settles, but the hatred has reached high levels! The terms associated with him suggests that the affair will not wear off easily.

Image 1 and Image 2, below, illustrate several things. The blue curve denotes the mention frequency, that is, the number of mentions of Juholt in the Swedish blogosphere. The red curve denotes the hatred expressed in relation to Juholt. The words pinned to each day are the new prominent terms associated with Juholt with respect to the terms for the previous day.

The time period covered by Image 1 ranges from October 6 – 10, 2011. On October 6, the terms associated with Juholt are mainly concerned with the shadow budget proposed by the Social Democrats on the day before. The mention frequency is quite small, and the hatred expressed in relation to Juholt is not exceptional. October 7 is the day of publication of an article by the Swedish newspaper Aftonbladet claiming that Juholt had requested too much allowance for his residence. The mention frequency increases markedly, while the hatred is similar to the day before. Although still influenced by online discussion pertaining to the shadow budget, the terms associated with Juholt clearly show evidence of an affair in the making; the article by Aftonbladet has gained traction in the blogosphere. Saturday, October 8, shows further increase in mention frequency, which also tends to vary with the time-of-day. The graph also shows that the hatred is on the rise; bloggers are picking up on the reimbursements affair. This is also evident in the associated terms where Juholt is compared to a cameral mishap made by the former Social Democratic leader Mona Sahlin in 1995 known as tobleroneaffären. The terms also reflect that the affair is about Juholt’s apartment, and that he will hold a press conference. Moving on to October 9, we see that the mention frequency, now clearly varying with the time-of-day, levels out. As does the hatred. The associated terms concern solidarity of Juholt, that his trust is declining, and also refers to the Swedish Prosecution Authority. Finally, Image 1 shows that, for October 10, the mention frequency is still high, and the hatred is rising markedly; the aversion vented in relation to Juholt is reaching high levels! The associated terms are related to crime (preliminary investigation, fraud, prosecution) and to politics (voters, party, resign, party leaders, citizens).

Image 1: The period covering the on-set of the affair. The terms for a given day in the image are the new terms associated with Håkan Juholt for that particular day.

Image 1: The period covering the on-set of the affair, October 6 - 10, 2011. The terms for a given day in the image are the new terms associated with Håkan Juholt for that particular day. Click the image for a larger version.

Moving on to a later part of the affair, Image 2 illustrates the period of October 15 – 19. The general trend regarding mention frequency for the period is that it is declining. The divergence between frequency and hatred is interesting, and the fact that the hatred rises as the mention frequency declines suggests that while people are talking less about Juholt, those who do are still very upset. Let’s look at the associated terms day-by-day. On October 15, the blogosphere is mainly about the media hunt for Juholt. October 16 concerns the “obivious” rules for accommodation reimbursements (calling editor-in-chief Jan Helin). October 17, again, concerns the rules, the intent of Juholt, and Sundbyberg, where Juholt held a meeting with his fellow party members. October 18 was about cheating and the form Juholt filled out when requesting the allowance. Finally, Image 2 ends with October 19, mentioning netroots and politometern, both being portals for political blogs, as well as the Swedish Radio.

Image 2: The period covering a later part of the affair, October 15 - 19, 2011. Click the image for a larger version.

Image 2: The period covering a later part of the affair, October 15 - 19, 2011. Click the image for a larger version.

The complete list of salient terms associated with Håkan Juholt for the period October 1 – 19 is available at http://www.gavagai.se/reports/juholt-october-2011/ Legend to the list: a blue term means it is new on the list, a red term means that the association between it and Juholt is weaker than it was before. Analogously, a green term means that its association with Juholt is stronger than it was before.

New words in New Text

New Text is what we like to call the sort of spontaneous non-edited material we spend much of our time processing. We contrast this primarily with traditional text from editorial sources. There are interesting differences between new text and traditional text — and this has been the subject of much debate in philological, sociological, and to some extent even computational circles. Much of what has been said is interesting, much is pure piffle, and we have made our own pronouncements about what sort of changes we believe are ahead (this one prononuncement in Swedish). We expect we will have reason to return to this discussion.

Two of the most obviously noticeable things about new text as compared to traditional are its lexical creativity and its proof-reading sloppiness. From the point of view of a text analysis processor they amount to much the same thing: a constant influx of new tokens, never seen before. Newspeak, lolspeak, l33tspeak, new topics, puns, lesser used sociolects and idiolects, misspellings, mistypings, and teenage angst all contribute to a vast and vastly growing symbol table.

Image 1: Number of unique words, as a function of time, in Tweets and newsprint

We are perfectly happy to cope with that sort of growth! This is one of the underlying principles of our design – not to be fazed by new and unexpected usage. An example: in a conversation, if your counterpart mispronounces a central term (or speaks an unexpected dialect or variety of the language) or uses a synonym which you didn’t know before, it might throw you the first time. Next time around, you cope with it. It might annoy you a few times but eventually – after only a handful of observations – you will be habituated to the pronunciation or the synonym. You will not be retranslating the new observation back to your previous knowledge of the world – some perceptual and lexical process in your language analysis system simply notes that this concept appears to be subject to some variance. This happens without retraining or recompilation of your lexicon: you don’t stop conversation to figure it all out. (Or, well, you shouldn’t. If you do so often, you will eventually lose friends.) This is the way we believe things should be done!

Here are some numbers we got out of our text collections (see Image 1 above). Take two years of newsprint from a reputable newspaper (in this case, first year from a major US daily, and the second year from a major UK daily). That’s about a 100 Mwords and about 100 kWords per day. The first few days, most words are new, but it settles pretty rapidly into a steady 100-200 new tokens per day. (The switch across the Atlantic probably contributes to slightly more of those, but not too noticeably.) In itself, an argument for a learning system!

Now, as comparison, take two months of tweets on various topics in English. (Well, mostly in English. Mixed-language tweets are in our test material. We don’t want to take them out – that’s the way the world is.) That’s more than 1 Gwords, working out to about 20 Mwords a day. And about 200 000 new words. Per day. Try to keep up with that, manually! We firmly believe the architecture of the system to handle this needs to view this sort of data variance as normal, not something to meet by quick hacks or filtering. Ethersource is built with this in mind.

And the future is near: what do you think will happen when language analytics will move to processing speech data as well as text? Do you believe the number of tokens in the data stream will converge to a smaller number?

The killing of Mashaal Tammo through the eyes of Arabic social media

In this post, we show three things:

  1. The possibility of using Ethersource to monitor Arabic social media
  2. to detect violent on-line chatter, and
  3. to identify the real-worlds events underlying the resulting signal.

On the evening of Friday, October 7 2011, Kurdish opposition politician and founder of the Kurdish Future Movement Party Mashaal Tammo was shot dead by masked men in his home, in north eastern Syria. His killing was soon attributed to the regime of Syria. The next day, Saturday, October 8, the funeral party for Tammo, with 50,000 to 100,000 attendants, turned into the largest gathering of protesters since the start of the uprising seven months prior. Syrian security forces intercepted the crowd, and shot at least 5 people dead, injuring numerous others.

We’ve used Ethersource to monitor Syria in Arabic social media for quite some time. Image 1, below, illustrates the on-line violent chatter pertaining to Syria between Thursday, October 6, and Sunday, October 9, 2011; the weekend when Tammo was killed. Image 1 is annotated with the time of the killing of Tammo as he was reportedly attacked in the evening of the 7th (Syria being in a time zone one hour ahead of the time scale of the graph), and the approximate time for his funeral. What is striking is the surge in chatter after the demise of Tammo, caused by reactions by people active in social media. At this point in time, the steep rise, and the high level of violent expressions indicate that physical manifestation related to the killing of Tammo is likely. We call it a crowd induced event; a process sparked by a real-world event, and then fueled by an on-line crowd in such a way that increases the possibility of a physical reaction to the initial event.  After the attack on the funeral party, the levels of violent chatter increased even more.

Image 1: Violent chatter in Arabic social media with respect to Syria for the weekend when Kurdish opposition politician Tammo was gunned down.

Ethersource facilitates the verification of a signal by allowing the operator to inspect the individual documents contributing toit. Image 2, below, shows three screenshots representing some of the sources underlying the signal on Friday, October 7, and Saturday, October 8. The translations from Arabic to English was made using Google Translate.

Image 2: Screenshots of some of the sources contributing to the on-line violent expressions. The translations were made with Google Translate.

To sum up: By using Ethersource, we are able to aggregate the attitudes expressed in on-line media, as they are emitted, with respect to a given entity, in a given language, thus constructing a view of attitudes over time. The view facilitates the identification of time periods in which on-line activity warrants our attention. Ethersource, then, provides access to the documents contributing to the aggregated attitudes in the time period under scrutiny.

So, with Ethersource, we can follow any target with respect to any attitude in any language. On top of that, Ethersource continuously learns from the language it is exposed to.