Tomorrow’s election in the US

Yes: we, as many others, have followed the US elections in the social media. There are many measurements of social media mentions out there, some thorough, some others little more than simple counting. (The fundamentals of the actual issues, polls, and electoral mechanisms are best summarized by Peter Norvig.)


Ethersource has been reading social media posts on the main US presidential candidates for the past year or so. Based on this reading, our analysis is that

  • Obama will stay in the White House.

… which appears to be in agreement with what most bookies, pundits, and polls predict today.

As we have shown in previous posts on this blog, we have been thinking hard about which measures best capture political attitude in the social media, and what sort of attitude best translates to prediction of results. We already know that people do not usually waste bandwidth on plain simple endorsements or statements of personal voting intentions, but in general use their space for more or less thoughtful predictions of the candidates’ chances to carry the election. Aggregating these sentiments and opinions gives us a prediction market of sorts, composed on those representatives for the electorate who write in social media. We show here our PPI score – an intensity-normalised positivity index for the two main candidates – since mid-August, in a line graph.

Intensity-normalised positivity for the two main candidates since August

Intensity-normalised positivity for the two main candidates since August

As a visualisation experiment, we can show the same data in a quicktime clip, for the two main US presidential candidates since August, with the X-axis showing positive attitude, the Y-axis the intensity-normalised positive attitude and the size of the ball the frequency of mention for the candidates. (High and upper right corner and large ball: good.)



These data show that the candidates’ mentions appear to track each other well (indicative of a close election) and that the incumbent has the edge. Based on these and our other measurements, we believe Obama will stay in the White house.

Greek Election Tomorrow!

The Euro and the European currency union is the major topic of the second Greek parliamentary elections of this Spring, to be carried out tomorrow, on Sunday, June 17.

Ethersource has been reading Greek-language social media for the past few weeks. Our prediction:

  • ND will be leader in number of votes
  • one needs more than tallying frequency of mention or simple assessment of positive vs negative sentiment to use social media for predicting electoral outcomes

We at Gavagai have been following party politics in the Greek social media for the past few weeks and have found – as have the major political commentary sites – that the major players are the conservative ND and the socialist coalition Syriza. After gauging frequency of mention in Greek social media one would be likely to conclude that the election is safely in the hands of Syriza. See the graph below. Syriza gains much more attention in the Greek social media sphere than do other parties. (The dramatic spike in attention given to the fascist party XA has to do with one of their representatives demonstrating practical violence in a TV-debate, punching and slapping a political opponent on camera).

But that is not the entire story. Mentions alone do not translate to votes. A further analysis gives pause to the first prediction. The pie chart shows what proportion party mentions are coloured by mistrust and skepticism.

One cannot predict election results by counting mentions alone – the type of mention is important as well. We have previously cut up attitude in many ways, beyond what is done by most. Here we will look at distrust and doubt as an attitude. Skeptical, worried, and doubtful mentions indicate not propensity to vote but concern about the outcome. The tweets, blogs, and forum posts by Greek voters we read are not simply rooting for the author’s favourite party – they are analyses, each in its own way, of the election outcome. By aggregating the sentiment given in each of them we find a clearer picture than we would by simply counting and tabulating mentions.

Our analysis is as follows: Syriza and ND are most frequently mentioned. Syriza mentions carry a considerable amount of concern and mistrust. We assess this to mean that the electorate will gravitate towards ND rather than Syriza at the polling station: the likely leader in votes will be ND.

How ND will be able to put together a governable majority of representatives is another matter!

Greek elections restarted

The Greek political scene is in full swing preparing for new elections on June 17, little more than a month after the previous elections in May failed to provide a useful basis for forming an executive cabinet.

The blogsite Politik i Grekland has published some measurements we made on the relative stature in Greek-language social media for the eleven main parties campaigning for seats in the parliament. Their blog post is in Swedish but the main observation is that left wing party Syriza claims most of the attention – positive, negative, and worried alike – and that the traditional labour party Pasok has gained some ground during the last few days.

There is a moratorium on opinion polls until the election, but our monitors will stay trained on the Greek social media until the polls close. We will publish an update on the electoral sentiment in the next few days!

Intelligent Business, April 19 – Grand Hôtel, Stockholm

On April 19 at the Grand Hôtel in Stockholm, Gavagai’s Jussi Karlgren will give a talk on what sort of information flows and new opportunities in Big Data analysis will afford businesses. This will be presentation to the Intelligent Business Convention, with an audience of CIOs and information professionals. He intends to ask the audience how they are prepared to adjust their business practices to fit the new information flows we can expect.

SWIRL 2012: Strategic Workshop on Information Retrieval in Lorne

Together with forty or so of my most valued and esteemed jet-lagged colleagues and friends, I attended SWIRL 2012, “the occasional talkshop on the future of information retrieval”, hosted by RMIT in Lorne, near Melbourne, earlier this month.

group photo

Swirlers at Lorne

The broadly stated topic for this gathering was to formulate the most fruitful reasonably long range research questions for information retrieval as an academic research field. There were keynote addresses and break-out groups followed by other break-out groups followed by collective authoring sessions to compose a comprehensive consensus view of where we are and where we should be headed.

For those already convinced of the necessity of a deeper understanding of users, of the easily predictable challenges inherent in the ubiquitous presence of mobile information technology, of the slow but imminent roll-in of various aspects of ubiquitous computing, and of the value of spoken dialogue as an input modality, the final list of major issues was somewhat underwhelmingly disruptive.

Cockatoo

External SWIRL participants

But that’s what happens when a group of well synched people interact in a pleasant environment and the process is vectored towards compromise. In the end, six research questions were highlighted:

  1. People or finding out more about users
  2. Zero-word-queries or ubiquitous information access
  3. Mobile information retrieval
  4. Dialogue, including speech
  5. Information literacy
  6. Structured vs unstructured information

They will be presented as a final report from the event, redacted by the organisers. The final report will be made available to interested parties in a bit.

An observation made by one of us in session was that information retrieval tends to arrive when the infrastructure and devices are in place – in that sense information retrieval is a reactive branch of computer science. The worry I myself have is that the workshop in lifting some of these questions to top urgency pitches rocks into neighbouring fields, where no-one will be too interested in picking them up for refinement. Information retrieval as an academic field must have a readiness to address questions even before they become urgent, or we risk industrial players hacking them into a sub-standard but de-facto standard solution. Mobile informatics, social informatics, and the ubicomp field already have assumptions in place for how information retrieval should be worked.

Of the six principal issues most discussed at the workshop, the one that most interested me – because of its potentially socially disruptive nature – was that of information literacy: how to best help people to understand de-contextualised information, how to make them better searchers and critical readers, how to build technology to close the gap between the information have-nots and the informocrats, and to do this both momentarily at time of information access and in a prolonged perspective over the lifetime of a user. This issue, which involves issues far beyond technology, will stay with us as more and more of society realises the infrastructural confluence of needs and potential in which information access technology resides.

Underlying many of the discussions were some technology and methdology trends I appreciate and look forward to thinking more about, and while the top questions were less technology oriented and more application oriented, some of these technologies will be brought up in the coming report.

  • Touching on the question of modelling usage and users and how to integrate knowledge of them into the system development cycle is a favourite discssion point of mine – that of use cases as a bridge between benchmarking and validation. This is, not incidentally, a main topic of the PROMISE network of excellence in which I participate, and which is in the process of publishing some reports on the matter.
  • Touching on the question of ubiquitous access to information, on the push-vs-pull information provision question and in the question of structured vs unstructured information there will be the enormous and uncharted challenges brought to us by the very real approaching internet of things – when all our things will start communicating we will need to build them with communication protocols which are flexible, learning and dynamic. We will not be able to count on web services to use the same knowledge structures from version to version – yet we will expect our preferences to migrate seamlessly from workplace to home to other destinations. This calls for an introduction of search technology, but vectored towards interaction between things.
  • Finally, most relevant to Gavagai and the work we do here: big data changes everything. We must build future-proof processing models and we must foresee that much of the information needs we will be providing are not about finding gold nuggets or needles in a haystack – they are about modelling the state of the world as it changes and as it is reflected in the information we process. The challenge will not be to winnow out the best sample or item but to – conversely – ingest all of it and track its meaning as a whole.

We will see where this eventually gets us, but the fresh thoughts and ideas this workshop has set in motion – not all from the workshop sessions and not all to be documented in the workshop reports – are evident in the discussions currently active in my mailbox. We need more gatherings of this type – much more productive than many large conferences are! (The availability of a beach in February didn’t hurt either).

Iowa and social media sentiment

We must confess we were a bit wary of extending social media-based prediction into to the minds of Iowans gathering in caucus halls around their state to select their favourite candidate for presidential candidate. Iowan politics is famously local: our measurements are global.

As it turns out we were fairly good at picking out what matters. The results gave Mitt Romney, Ron Paul and Rick Santorum more or less equal votes, with others – Newt Gingrich, Michele Bachmann, Rick Perry, Jon Huntsman trailing far behind.

Our measurements of social media in the last few days showed that the three most talked about candidates were Santorum, Romney and Paul. The six most mentioned candidates received about the same amount of appreciation. But comparing the amount of appreciation with the amount of aversive sentiment they generate we find that Romney had the best differential, and that Gingrich and Bachmann show strongly negative differential.

We will not quite as shy in four years’ time!

Proportion of all mentions in social media for the day before the Iowa caucuses

Proportion of all mentions in social media for the day before the Iowa caucuses

Proportion of positive mentions in social media for the day before the Iowa caucuses

Proportion of positive mentions in social media for the day before the Iowa caucuses

Proportion of negative mentions in social media for the day before the Iowa caucuses

Proportion of negative mentions in social media for the day before the Iowa caucuses

GOP Hopefuls in Social Media

The blogsite amerikanskpolitik.se has published some measurements we made on the relative stature in social media for the main Republican party presidential candidates. Their blog post is in Swedish but the main observations are:

  1. Ron Paul has gained a massive boost in mentions lately and is now the most talked about candidate. (This is likely to be a partial effect of the general libertarian and counterestablishmentarian bias of the blogosphere).
  2. Michele Bachmann is now the candidate viewed with the most skepticism. (This is likely to be an effect of her recently expressed views on vaccination, which run counter to many health professionals’ views.)
  3. Newt Gingrich is the candidate most associated with aversive affect.
  4. Mitt Romney and Newt Gingrich are the candidates most associated with positive affect.
Aversive mentions during the week of December 22-28, 2012.

Aversive mentions during the week of December 22-28, 2012.

Proportion of mentions during the week of December 22-28, 2012.

Proportion of mentions during the week of December 22-28, 2012.

Skeptical mentions during the week of December 22-28, 2012.

Skeptical mentions during the week of December 22-28, 2012.

Positive mentions during the week of December 22-28, 2012.

Positive mentions during the week of December 22-28, 2012.