Greek Election Tomorrow!

The Euro and the European currency union is the major topic of the second Greek parliamentary elections of this Spring, to be carried out tomorrow, on Sunday, June 17.

Ethersource has been reading Greek-language social media for the past few weeks. Our prediction:

  • ND will be leader in number of votes
  • one needs more than tallying frequency of mention or simple assessment of positive vs negative sentiment to use social media for predicting electoral outcomes

We at Gavagai have been following party politics in the Greek social media for the past few weeks and have found – as have the major political commentary sites – that the major players are the conservative ND and the socialist coalition Syriza. After gauging frequency of mention in Greek social media one would be likely to conclude that the election is safely in the hands of Syriza. See the graph below. Syriza gains much more attention in the Greek social media sphere than do other parties. (The dramatic spike in attention given to the fascist party XA has to do with one of their representatives demonstrating practical violence in a TV-debate, punching and slapping a political opponent on camera).

But that is not the entire story. Mentions alone do not translate to votes. A further analysis gives pause to the first prediction. The pie chart shows what proportion party mentions are coloured by mistrust and skepticism.

One cannot predict election results by counting mentions alone – the type of mention is important as well. We have previously cut up attitude in many ways, beyond what is done by most. Here we will look at distrust and doubt as an attitude. Skeptical, worried, and doubtful mentions indicate not propensity to vote but concern about the outcome. The tweets, blogs, and forum posts by Greek voters we read are not simply rooting for the author’s favourite party – they are analyses, each in its own way, of the election outcome. By aggregating the sentiment given in each of them we find a clearer picture than we would by simply counting and tabulating mentions.

Our analysis is as follows: Syriza and ND are most frequently mentioned. Syriza mentions carry a considerable amount of concern and mistrust. We assess this to mean that the electorate will gravitate towards ND rather than Syriza at the polling station: the likely leader in votes will be ND.

How ND will be able to put together a governable majority of representatives is another matter!

Greek elections restarted

The Greek political scene is in full swing preparing for new elections on June 17, little more than a month after the previous elections in May failed to provide a useful basis for forming an executive cabinet.

The blogsite Politik i Grekland has published some measurements we made on the relative stature in Greek-language social media for the eleven main parties campaigning for seats in the parliament. Their blog post is in Swedish but the main observation is that left wing party Syriza claims most of the attention – positive, negative, and worried alike – and that the traditional labour party Pasok has gained some ground during the last few days.

There is a moratorium on opinion polls until the election, but our monitors will stay trained on the Greek social media until the polls close. We will publish an update on the electoral sentiment in the next few days!

Everyday racism in the Swedish blogosphere

  • We use Ethersource to monitor usage of racist terminology in the Swedish blogosphere.
  • We find that one of the largest demographic groups to use such terminology is young female bloggers.
  • We demonstrate how we are able to cluster and profile users of racist terminology.

One of the many benefits of Ethersource is that it is not limited to the standard positive/neutral/negative sentiment palette, but that it can be used to analyze and monitor any type of textually manifested phenomena. Previous examples in this blog include artist popularity, flu trend, aversive language, and positivity vs headache.

In this post, we report on some observations on using Ethersource to monitor racist expressions in the Swedish blogosphere.

The following image shows the frequency of occurrence of racist terminology in the Swedish blogosphere from late March to the end of May 2012. Obviously, racist terminology is a frequent everyday occurrence on Swedish blogs.

Frequency of occurrence of racist terminology in the Swedish blogosphere.

However, merely counting the frequency of occurrence of racist terminology is of limited usefulness for understanding what people say and mean, since there are many ways to use terminology. Some uses may signal ideological or political standpoints, but other uses may not (e.g. discussions about the terminology itself, such as the origin and appropriateness of various terms). Thus, only counting the frequency of occurrence of racist terminology in the blogosphere may lead to premature or misleading conclusions. We therefore also monitor negative or degrading usage of racist terminology, as well as aggressive or hateful usage. And there is a difference between counting frequencies and counting opinionated usage, as we can see in the image below, which shows frequency (in blue), degrading usage (in green), and aggressive usage (in red).

Racist terminology in the Swedish blogosphere. The blue line shows the frequency of racist terminology, the green line shows the frequency of degrading or negative usage of racist terminology, and the red line shows the frequency of aggressive usage of racist terminology.

It is obvious that the total frequency of occurrence of racist terminology is much larger than that of the frequencies of degrading use and aggressive use. As a rough estimate, approximately 10% of the total number of posts containing racist terminology are negative or degrading, while approximately 5% are aggressive or hateful.

The general trends in these graphs are not of lasting value, since the time span is relatively short. What is interesting – and surprising – is the demographic profile of bloggers found in the two bottom lines. Since Ethersource enables an analyst to retrieve individual blog posts which contain a given target (in this case, racist terminology), it is possible to further analyze the material. Looking at the blog posts that use racist terms in degrading ways, we find that roughly 25% are written by young female bloggers who write about their own lives. Perhaps even more surprising, around 10% of blog posts using racist terms in aggressive ways are written by these young females. This is a surprising discovery, considering that the topical content of these blogs revolve around everyday events, lifestyle, and fashion.


Demographic clustering and stylometric profiling

The noteworthy observation above suggests that it may be interesting to look more closely also at the non-opinionated usage of racist terminology (i.e. the occurrences that are neither aggressive nor degrading). We do so by automatically clustering all the blog posts containing racist terminology during 2012. Always keeping the obvious risk of overgeneralizing in mind, we infer from manual inspection of the material that the four main clusters represent following groups of bloggers:

For those not familiar with Swedish internet culture, Flashback is an infamous free-speech-oriented, no-holds-barred, frequently offensive and often provocative discussion forum.

Imagine that we for some reason could not inspect the material manually and therefore did not know the demographics of the clusters we found. In such cases, we can use stylometric profiling to characterize the stylistic differences between clusters, and based on these differences we can infer demographic information. As an example, consider the following comparison between the stylometric profile for the cluster containing the young female bloggers, and the stylometric profile for the cluster containing mainly political bloggers.

Stylometric profiles of two groups of bloggers (young females vs political bloggers) that use racist terminology.

The comparison between these two stylometric profiles shows that the main stylistic differences between these two groups of bloggers (let’s call them group F for the young female bloggers and group P for the political bloggers) can be found in the following variables:

Self
Group F is more self-oriented, which indicates that this group talks mainly about things that happen to the author, stuff the author thinks or worries about, or things that the author does.
Address
Group F refers directly to the reader more often than does group P.
Abstract vocabulary
Group P tends to use more abstract and complex vocabulary than group F.
Anchoring
Blog posts from group P contain more explicit temporal and spatial references than do posts from group F.

These differences suggest that authors in group F (the young female bloggers) write mainly from a subjective point of view, while authors in group P (the political bloggers) adopt a more factual perspective. Based on such differences, we may formulate hypotheses about the demographics of these two groups. This difference would allow us to propose that since the one group writes from a more personal and immediate perspective, they can be assumed to be younger and more personally engaged in their narration than the other group. This characterisation of author style is actually more salient than the objective notion of author age and gender since writing style and authoring background are more interesting for understanding blog posts than the age and gender or other demographich variables.

The analysis and discussion above serves as an illustrative example of how stylometric profiling correlates well with human intuition about demographic clustering, and that such profiles may serve as explanatory constructs for a demographic clustering solution. We conclude this blog post with the observation that the combination of attitude analysis, clustering, and profiling provides a very powerful framework for analysis of online content.