Hyperdimensionality, semantic singularity, and concentration of distances

  • This post digs a bit deeper into Ethersource.
  • We discuss the problems of distance concentration and semantic singularity.
  • We argue that Ethersource is not susceptible to these problems.

As we have previously discussed in this blog, the number of unique words in social media grows at a rate that far exceeds what we are normally used to when working with collections of more traditional texts. To recapitulate, the lexical variation and growth in New Text is simply astounding; there is a constant and continuous influx of new tokens. We have also previously discussed how Ethersource is designed to handle such growth. The memory/processing model (we don’t make a distinction between these) of Ethersource does not explode in size as we add (lots and lots of) new data.

To repeat the message: if your data is highly dynamic, you’d better have a model that can handle variation.

Ethersource is based on hyperdimensional computing, which means that all operations in Ethersource are performed in fixed-dimensional spaces of very high dimensionality. Such representations have a number of very attractive features (see Kanerva’s paper in the references below for more details). One of the most useful properties of hyperdimensional representations is that the dimensionality is unaffected by the size of the data. This is the reason Ethersource seamlessly and unproblematically can handle such rapidly growing vocabularies as those encountered in social media (and in other kinds of streaming data sources).

Of central importance in Ethersource (and in other data mining systems) is the notion of similarity. Applications like social media monitoring/sentiment analysis, association analysis, etc, all boil down to questions of the type “how similar is this data point to that”? Association analysis in particular is an example of nearest neighbor search, in which the task is to find the data points that are most similar to a given query data point. Nearest neighbor search is a core functionality in many data mining applications. Examples include semantic search, pattern recognition, recommendation systems, etc. All these applications (and many more), depend on nearest neighbor searches in high-dimensional spaces.

Enter the phenomenon of distance concentration and the perils of the semantic singularity.

Imagine what the impact would be for systems that rely on the notion of similarity if this notion itself became meaningless. Clearly, not good. But is this really something we need to worry about? Could it ever happen?

Science fiction-like as it may sound, this is exactly what the phenomenon of distance concentration refers to. Essentially, this is a situation in which the distance from a query data point to the nearest neighbor approaches the distance to the farthest neighbor. In such a situation, the notion of similarity becomes useless because all distances are the same. Several recent papers (see below for references) have pointed out that this situation might actually occur in certain cases where the dimensionality of the data increases.

Remember the observation about the vocabulary growth of social media? This is a hallmark example of data with continuously increasing dimensionality. Thus, not only do you need to worry about the processing cost when dealing with such data, but you also need to worry about your representation collapsing into semantic singularity. And to make matters even worse, it has been shown that certain types of dimensionality reduction and approximate nearest neighbor search techniques can further aggravate the problem of distance concentration.

If we operate in high dimensions with vast and vastly growing data sets streaming in, we should take this problem seriously.

In the case of Ethersource, we use hyperdimensional computing to ensure that the representation remains unaffected by the size of the data. This means that Ethersource is not at risk of distance concentration due to increasing dimensionality of the representation per se. However, as the attentive reader would no doubt be wondering, what about the growth of the intrinsic dimensionality? Is there no risk of a hyperdimensional representation getting “saturated”? That is, how can we be sure that there will always be enough room, locally, in the fixed-size hyperdimensional representation when there is a continuous inflow of data?

This would be a tangible problem if we were faced with data of high intrinsic dimensionalities. In such cases, the local neighbourhood of a data point can become saturated with new neighbours, thus rendering the notion of vicinity meaningless, and thereby collapsing into semantic singularity. However, Ethersource operates on a very special type of data, which has comparatively low intrinsic dimensionality (Karlgren et al. 2008).

Thus, exit the problem of distance concentration in Ethersource.

And anyway, as someone so wisely said, “forgetting is the key to a healthy mind”, and we certainly want Ethersource to stay healthy.

To end this rather technical post, we include an illustrative example of how similarities behave when adding more data in Ethersource. The following graph shows how the pairwise similarities between semantically related and semantically unrelated words remain stable as we add more data (in this case, up to some 2 billion words).

This is exactly how we want the model to behave; related words stay related, while unrelated words stay unrelated. It would definitely not be a good thing if we saw an increase in similarity between the unrelated words as we add more data, merely as an effect of adding more data. What could happen though is that two previously unrelated words suddenly become similar as an effect of new language use. This, however, is perfectly in order, since we want the similarities to reflect actual usage patterns rather than presumed ones. The fluctuations in the graph correspond to such fluctuations in language use.

References

Kevin Beyer, Jonathan Goldstein, Raghu Ramakrishnan and Uri Shaft (1999) When Is “Nearest Neighbor” Meaningful? Proceedings of the 7th International Conference on Database Theory, 1999.

Ata Kabán (2011) On the distance concentration awareness of certain data reduction techniques. Pattern Recognition, 44 (2): 265-277.

Pentti Kanerva (2009) Hyperdimensional Computing: An introduction to computing in distributed representation with high-dimensional random vectors. Cognitive Computation, 1(2): 139-159.

Jussi Karlgren, Anders Holst and Magnus Sahlgren (2008) Filaments of Meaning in Word Space. Proceedings of the 30th European Conference on Information Retrieval, 2008.

Tebow, Tebowed, Tebowing: Spelling Variants and Associations

The Wall Street Journal recently ran a piece on the countless ways to spell Tebow. The article reports on spelling variants such as “Teebow”, “Teeeebow”, and “Teeebowww”, all of which are easily recognized using regular expressions. Nevertheless, this is a nice example of how the productivity of the language use of Internet users may pose challenges for keyword-based systems.

Ethersource does not use regular expressions to handle this type of variation. On the contrary, it learns terminological variation continuously by observing language use. This means that Ethersource will not only find the type of variants reported in the WSJ article, but also more unpredictable variants, such as:

  • Twbow
  • Tibow
  • Tebox
  • Teboq
  • Tewbow
  • Teobow
  • Teabow
  • Teblow
  • Tebowm

In addition to finding out the spelling variants of a given term, Ethersource can also find associated terms that help frame its meaning.  That is, help answering the question “What is a Tebow?”.

According to our ever-changing, live data, the top terms associated with Tebow include:

  • Broncos
  • Tim
  • Denver
  • quarterback
  • Tebowing
  • Tebowed

From this, we (manually) infer that Tebow is a person whose first name is Tim, that he is a quarterback, and that he is playing for the Denver Broncos. The final two terms in the list puzzled us a bit. This is what we learned. Tebowing refers to the act of getting down on one knee and starting to pray, even if everyone around you is doing something completely different. Tebowed, on the other hand, has little to do with spirits as it denotes being run over while playing American football. Thus, we add spirituality and toughness to our notion of Tebow.

Positiveness Correlates with Holidays, Headache Correlates with New Year’s Day

We’ve previously seen that the aggregated overall positiveness of Swedes is cyclical on a weekly basis. Swedes love their days off. We’re now happy to asses what we’ve all suspected for a long time: during Christmas and New Year we all excel in positive thinking!

Additionally, the image below reveals that, for some reason, Swedes appear to be very concerned with headaches on the day after the New Year festivities.

Positiveness correlates with holidays, and headache correlates with New Year's Day.

Positiveness correlates with holidays (red circles, Christmas and New Year), and headache correlates with New Year's Day.

Iowa and social media sentiment

We must confess we were a bit wary of extending social media-based prediction into to the minds of Iowans gathering in caucus halls around their state to select their favourite candidate for presidential candidate. Iowan politics is famously local: our measurements are global.

As it turns out we were fairly good at picking out what matters. The results gave Mitt Romney, Ron Paul and Rick Santorum more or less equal votes, with others – Newt Gingrich, Michele Bachmann, Rick Perry, Jon Huntsman trailing far behind.

Our measurements of social media in the last few days showed that the three most talked about candidates were Santorum, Romney and Paul. The six most mentioned candidates received about the same amount of appreciation. But comparing the amount of appreciation with the amount of aversive sentiment they generate we find that Romney had the best differential, and that Gingrich and Bachmann show strongly negative differential.

We will not quite as shy in four years’ time!

Proportion of all mentions in social media for the day before the Iowa caucuses

Proportion of all mentions in social media for the day before the Iowa caucuses

Proportion of positive mentions in social media for the day before the Iowa caucuses

Proportion of positive mentions in social media for the day before the Iowa caucuses

Proportion of negative mentions in social media for the day before the Iowa caucuses

Proportion of negative mentions in social media for the day before the Iowa caucuses

GOP Hopefuls in Social Media

The blogsite amerikanskpolitik.se has published some measurements we made on the relative stature in social media for the main Republican party presidential candidates. Their blog post is in Swedish but the main observations are:

  1. Ron Paul has gained a massive boost in mentions lately and is now the most talked about candidate. (This is likely to be a partial effect of the general libertarian and counterestablishmentarian bias of the blogosphere).
  2. Michele Bachmann is now the candidate viewed with the most skepticism. (This is likely to be an effect of her recently expressed views on vaccination, which run counter to many health professionals’ views.)
  3. Newt Gingrich is the candidate most associated with aversive affect.
  4. Mitt Romney and Newt Gingrich are the candidates most associated with positive affect.
Aversive mentions during the week of December 22-28, 2012.

Aversive mentions during the week of December 22-28, 2012.

Proportion of mentions during the week of December 22-28, 2012.

Proportion of mentions during the week of December 22-28, 2012.

Skeptical mentions during the week of December 22-28, 2012.

Skeptical mentions during the week of December 22-28, 2012.

Positive mentions during the week of December 22-28, 2012.

Positive mentions during the week of December 22-28, 2012.

The Advantage of Ethersource on the TOEFL Synonym Test Compared to other Methods

  • This post compares the performance of various semantic algorithms
  • Ethersource solves a synonym test with 62% correct answers, while the best runner-up only reaches 52%
  • The results demonstrate the advantage of Ethersource over other relevant methods

As part of our internal system performance monitoring, we continuously evaluate Ethersource using a number of standardized benchmark tests. One such test is the synonym part of the TOEFL (Test of English as a Foreign Language). This multiple-choice vocabulary test measures the ability of the subject (in our case, Ethersource) to identify which of four alternatives is the correct synonym to a given target word.

We use the synonym part of the TOEFL as a performance benchmark for several reasons. The first is that a synonym test is a relevant test for a system that claims to know about meaning. At Gavagai, we believe in putting our money where our mouth is; if you claim that your system extracts meaning from text, you should be able to demonstrate this in a scientific test that measures meaning (such as, e.g., a standardized synonym test). Furthermore, the synonym part of the TOEFL has been used extensively in the scientific literature, so there is an abundance of published results to compare with. Lastly, the TOEFL test is normally administered to human test subjects, so you can actually compare the performance of your system to that of humans (which is nice, if you aim at intelligence).

Since Ethersource learns from the data it sees (in technical terms, we call it an unsupervised system), we benchmark its performance in relation to other unsupervised techniques. In this post, we include results for RI (Random Indexing), LSA (Latent Semantic Analysis), HAL (Hyperspace Analogue to Language), and LDA (Latent Dirichlet Allocation), since these are the standard algorithms for state-of-the-art unsupervised semantic analysis (see below for more details about the various algorithms).

In order to facilitate comparison and replicability, we apply all algorithms to the same freely available data set: the Open American National Corpus. We apply a minimum of preprocessing (non-alphabetic and non-numeric characters are replaced with white space, all characters are down-cased, and text within <p></p> is treated as a document for LSA and LDA), and run all algorithms with default parameters (unless otherwise stated).

Below are the results. As a comparison, random guessing would generate approximately 25% correct answers, while foreign applicants to U.S. colleges average around 64% (reported by Landauer and Dumais, 1997; see reference below).

Method Result
Ethersource (generation 1) 62.25%
LDA (300 topics) 52.50%
LSA (200 dimensions) 52.50%
RI-permutations (2000 dimensions) 48.75%
RI (2000 dimensions) 46.25%
HAL (300 dimensions) 43.75%

As can be seen by these results, Ethersource clearly outperforms the other unsupervised techniques included in this comparison. It should be noted that tweaking the parameters of the algorithms (and applying more careful preprocessing of the data, such as stemming and removal of high-frequency words) will typically lead to improved results for all algorithms. It should also be noted that the OANC data is comparatively small (~11M tokens), which explains why the results presented in this post fall below the state-of-the-art for algorithmic solutions to the synonym part of TOEFL.

The reason we use the OANC in this comparison is first of all to facilitate replicability, but also to be able to include results even for algorithms that do not scale very well. Furthermore, the point of this exercise is not to beat the state-of-the-art, but to compare the performance of a number of different algorithms on the same test using the same data (and, to be honest, beating the state-of-the-art on the TOEFL synonym test using unsupervised algorithms of the type we are focusing on here is mainly a matter of using sufficiently large, and sufficiently relevant, data to build the models – the results listed on the ACL Wiki are thus not very good indicators of relative performance).

To conclude, below is a short summary of the algorithms included in the comparison:

LDA

An example of a topic model, which interprets word occurrences as a result of the activation of a small set of latent topics. Words in this model become similar to the extent that they are generated by the same topics.

LDA reference: D. Blei, A. Ng and M. Jordan (2003) Latent Dirichlet allocation. Journal of Machine Learning Research 3 (4–5): pp. 993–1022.

This comparison uses the PLDA implemantation (Z. Liu, Y. Zhang, E. Chang and M. Sun (2011) PLDA+: Parallel Latent Dirichlet Allocation with Data Placement and Pipeline Processing. ACM Transactions on Intelligent Systems and Technology, special issue on Large Scale Machine Learning.).

LSA

A words-by-documents matrix is collected by noting occurrences of words in documents. The matrix is then transformed using truncated Singular Value Decomposition. Words in this model become similar to the extent that they co-occur in the same documents, and also (which is an effect of the truncated SVD) to the extent that they co-occur with the same other words.

LSA reference: T. Landauer and S. Dumais (1997) A solution to Plato’s problem: The Latent Semantic Analysis theory of the acquisition, induction, and representation of knowledge. Psychological Review, 104, 211-240.

This comparison uses the S-Space Package LSA implementation (D. Jurgens and K. Stevens (2010) The S-Space Package: An Open Source Package for Word Space Models. System Papers of the Association of Computational Linguistics).

RI

A framework for incremental and scalable word space modeling. The standard RI model computes semantic word vectors in a fixed-dimensional space by noting co-occurrences within a sliding window spanning two preceding and two succeeding words. Words in this model become similar to the extent that they occur in similar contexts. The RI-permutations variation distinguishes preceding from succeeding co-occurrences.

RI reference: M. Sahlgren and J. Karlgren (2001) From words to Understanding. In Uesaka, Y., Kanerva, P. & Asoh, H. (Eds.): Foundations of Real-World Intelligence, pp.294-308, Stanford: CSLI Publications.

RI-permutations reference: M. Sahlgren, A. Holst and P. Kanerva (2008) Permutations as a Means to Encode Order in Word Space. Proceedings of the 30th Annual Meeting of the Cognitive Science Society (CogSci’08), July 23-26, Washington D.C., USA.

This comparison uses the original SICS Random Indexing implementation.

HAL

A words-by-words matrix is collected by noting co-occurrences within a sliding window spanning ten word tokens. Semantic word vectors are produced by concatenating the row and the column for each word, and (if needed for computational reasons) dropping dimensions that are less informative. Words in this model become similar to the extent that they share contexts.

HAL reference: K. Lund and C. Burgess (1996) Producing high-dimensional semantic spaces from lexical co-occurrence. Behavior Research Methods, Instrumentation, and Computers, 28, 203-208.

This comparison uses the S-Space Package HAL implementation (D. Jurgens and K. Stevens (2010) The S-Space Package: An Open Source Package for Word Space Models. System Papers of the Association of Computational Linguistics).

Real-time Syndromic Surveillance of Social Media for Disease Symptoms related to Seasonal Influenza

  • We do real-time monitoring of  social media for disease symptoms
  • there is still no evidence of an outbreak of the seasonal flu in Sweden
  • we observe, however, an increasing trend in the intensity of symptoms

The inevitable influenza season will soon come knocking on our doors. How do we know when it has started, and how do we know just how severe it is? To this end, there are on-line tools for syndromic surveillance, aiding individual medical practitioners and national disease control centers alike to combat the spread of influenza. Internationally, perhaps the most well-known monitoring service is Google Flu Trends. Nationally, Influensakoll keeps track of the current state of flu-related illness in Sweden. Along the same lines, research carried out at the Swedish Institute for Infectious Disease Control (SMI) show the feasibility of using search queries submitted to the medical web site Vårdguiden for outbreak detection and monitoring. SMI also publishes weekly influenza reports based on input from labs and sentinels.

In addition, there is a growing effort in the research community of mining on-line social media, mostly Twitter, in English, and only by using keywords with the purpose of facilitating early-warning and outbreak detection to be used by health authorities in their planning and conducting targeted counter-measures to epidemic diseases.  Another interesting approach is that taken by the Iowa Electronic Health Market which is a prediction market for syndromic surveillance.

While the above mentioned services and research rely on either active participation on behalf of the users, or on keyword matching in social media feeds with the purpose of finding patterns, we’ve taken a different route to finding out the state of illness of Sweden. We’ve enhanced the barometer introduced earlier with concepts (not keywords) corresponding to a range of disease symptoms such as migraine, fever, expectorate, headache, nausea, sore throat, and head cold, facilitating the triangulation of more complex illnesses without having to wait for the bloggers, tweeters, forum participants, and facebookers out there to become so ill that they either actively seek answers related to their health condition, or start communicating using the actual name of the disease.

Our approach attempts to catch signs of illness early on, expressed as the participants in social media do what they usually do, that is, communicate with their peers. By focusing on the symptoms, we believe it is possible to get an early-warning of the seasonal flu, before anyone realizes it is what they are actually talking about. The image below illustrates the discrepancy between the score for the concept of influenza  (the green, nearly flat line at the bottom of the graph) and the scores for some of the symptoms of influenza; expectoration (blue line), headache (red line), and fever (yellow line). Clearly, people have not yet experienced the flu strongly enough to talk about it, although they talk loudly about some of its symptoms. Note that the graph reveals an increasing trend in the intensity of the symptoms! The Ethersource-based barometer thus serves as a complement to other surveillance tools in that it picks up on trends of (combinations of) symptoms earlier.

Expressions of the concepts expectoration, headache, fever, and influenza in Swedish social media, early December 2011. Note that while the influenza score is constantly low, the other three symptoms vary with the time-of-day, taking precedence over each other in various ways. Clearly, people have not yet experienced the flu strong enough to talk about it.

Expressions of the concepts expectoration, headache, fever, and influenza in Swedish social media, early December 2011. Note that while the influenza score is constantly low, the other three symptoms vary with the time-of-day, taking precedence over each other in various ways.

Gavagai’s Ethersource technology allows for the kind of syndromic surveillance of disease symptoms described in this blog post to be carried out in real-time, in any language.

Monitoring on-line social media for as-it-happens customer churn related to mobile network operators in the US

Churn is a measure of customers leaving a subscription-based service over time. In this post, we use Ethersource to

  • demonstrate real-time monitoring of churn-propensity related to telecom services;
  • characterize customer churn by means of annoyance, uncertainty, change, and negativity;
  • identify and extract, in real time, the source documents provoking the churn for a service (in this particular case, a rumour surrounding Sprint Nextel’s service).

A challenging question in subscription-based industry segments is: As a service provider, how do I detect that a churn-provoking event is taking place, in a timely manner permitting me to act on that information, in order to short-circuit the situation (as opposed to finding it out at the end of the fiscal Quarter)? 

Consider as back-drop to this post the age old adage “A Lost Customer is Not a Potential Customer”, and its relative pertaining to the higher cost associated with gaining a new customer, than it is to retain an existing one: “A Bird in the Hand”. As a service provider operating in a competitive landscape, you are concerned with those of your customers who are on the verge of terminating their subscription with you in favor of one of your competitors. Hence, the question starting off the post. What’s even more pressing is that since churn and retention is (approximately) a zero-sum game, you need to be aware of churn information related to your competitors when deciding on, and executing your contingency plans.

We use Ethersource to take a look at the five largest mobile network operators in the United States with respect to the relative manifestation in English social media of the churn components, introduced below, during August and September 2011. The operators are (rank and numbers from wikipedia): Verizon Wireless (107.7M subscribers; 35% of the subscribers among the five operators compared), AT&T Mobility (100.7M; 32%), Sprint Nextel (51.1M; 16%), T-Mobile USA (33.73M; 11%), and TracFone Wireless (17.75M; 6%).

Ethersource facilitates the detection of peaks in churn component signals, and allows its user to identify and thus to directly engage the individuals airing the concerns that underlie such peaks. As an operator, you can use this opportunity to raise your visibility in order to make yourself available, allowing for the user to contact you at will; directly approach individual users, or; launch and follow up campaigns targeted at a select group of users.

We charactersize customer churn in terms of a number of core components, all related to how people express themselves in relation to a service provider with respect to annoyance, uncertainty, change, and negativity. Increasing or fluctuating signal levels for any of these components, or combinations thereof, may constitute a cause for concern.

First off, let’s see how the number of subscribers per operator relates to the on-line chatter for the given time period. Image 1 shows the relative amount of chatter for the operators in September. The only operator generating an on-line buzz larger than its proportion in terms of subscribers is Sprint Nextel.

Image 1: The relative number of on-line chatter for the five mobile network operators in August 2011.

Image 1: The relative number of on-line chatter for the five mobile network operators in September 2011.

Given the disproportional attention awarded them, Sprint is what we’ll focus on. We use the time series, as they are produced by Ethersource with respect to the churn components and the companies outlined above, in an as-it-happens manner to identify a situation in September in which Sprint, but not its competitors, may see an increase in customer churn. Note that this approach allows for continuous monitoring of events as they take place; there is no need to wait until after-the-fact to carry out a proper analysis.

The images below show expressions of uncertainty (Image 2), annoyance (Image 3), change (Image 4), and negativity (Image 4) towards the five mobile network operators. We are monitoring all the time series depicted below simultaneously, looking for occasions when the values for Sprint Nextel are higher than those of its competitors. A high value for a combination of the churn components, including as many components as possible, warrants a closer inspection. Looking at the images below, there is one date in particular that is interesting: September 16, 2011. It is the only date on which all churn components exhibit higher values for Sprint than they do for any of its competitors. (Note that the graphs are timed to Stockholm time, and so the start of the event is really on September 15 in the timezones hosting Sprint.)

Image 3: Expressions of uncertainty. Red circles mark dates when values for Sprint Nextel are higher than for its competitors.

Image 2: Expressions of uncertainty. Red circles mark dates in September when values for Sprint Nextel are higher than for its competitors.

Image 4: Expressions of annoyance. Red circles mark dates when values for Sprint Nextel are higher than those of its competitors.

Image 3: Expressions of annoyance. Red circles mark dates in September when values for Sprint Nextel are higher than for its competitors.

Image 2: Expressions of change. Red circles mark dates when values for Sprint Nextel are higher than those of its competitors.

Image 4: Expressions of change. Red circles mark dates in September when values for Sprint Nextel are higher than for its competitors.

Image 5: Expressions of negativity. Red circles mark dates when values for Sprint Nextel are higher than those of its competitors.

Image 5: Expressions of negativity. Red circles mark dates in September when values for Sprint Nextel are higher than for its competitors.

What happened to Sprint on September 16?

Rumors of iPhone 5 happened to Sprint. The rumors had it that Sprint would be the exclusive reseller of the new Apple handset.

By using Ethersource, we can inspect the expressions in the social media underlying the signals, and thus make sure we understand exactly what is going on. It turns out that people express themselves in relation to

  • uncertainty about whether the Sprint network can handle the traffic generated by the new iPhone: “Can Sprint handle iPhone traffic?”
  • annoyance about rumored alterations to contracts as a result of the introduction of the new handset: “Sprint Readies To Remove Even More Customer Incentives.”
  • change of the abovementioned contracts: “Sprint puts an end to the Premier loyalty program”
  • negativity that the new handset might have on Sprint’s services: “What Sprint users should find alarming is his acknowledgement that the iPhone could potentially hurt the company in the near term because of the higher subsidies involved…”.

The quotes above are taken verbatim from top-ranked sources in Ethersource. Image 6 below shows a partial screenshot of Ethersource with a number of sources ranked according to their uncertainty value, early on September 16.

Image 6: A partial screen shot of Ethersource, showing the top five sources contributing to the uncertainty score at the point in time labelled "Z" in the red oval

Image 6: A partial screen shot of Ethersource, containing the top five sources (cloaked) contributing to the uncertainty score at the point in time labelled "Z" in the red oval.

So, the rumors of a possible advantage for Sprint, that is, the new iPhone 5  handset from Apple, turns out not to be all that positively received.

We’re sure Sprint was all over this particular event; it is used here merely as a case study to show some of the capabilities of Ethersource, when working with a polarity set-up beyond that of the ordinary positive-negative dichotomy. In fact, it would’ve been very hard to use negativity only to identify the rumor of the iPhone 5 set out on September 16 as something extraordinary for Sprint.