New Issue Oversubscribed

Stockholm, May 15, 2013 – The new issue in Gavagai AB – a Swedish Language Technology company and a spin-off from SICS (The Swedish Institute of Computer Science) – was over-subscribed with substantial investor interest.

“I have never seen such interest in a company at this stage”, says Dan Andersson, serial entrepreneur and chairman of Gavagai.

“It is an indication of the strength of our technology”, says Dr. Magnus Sahlgren, co-founder of Gavagai AB.

Gavagai is a research-based company with a number of commercial partner projects in the pilot stage across various sectors, such as finance, security, and consumer markets.

Gavagai develops software and services for automatic analysis of written text in any media, format, size, language, and quantity.

ABOUT GAVAGAI
Gavagai AB (“Gavagai”) is a spin-off from the Swedish Institute of Computer Science (SICS). Gavagai has developed Ethersource, a technology based on decades of research effort in computational and computable semantics. At its core, Ethersource reads and “understands” vast amounts of streaming language data. Ethersource is designed to be a base technology for immediate deployment in any information system which relies on the analysis of large streams of text or other symbolic data in application areas such as search, big data analysis, enterprise data warehousing, associative advertising, social media analysis, deep packet inspection, or word-of-mouth marketing. Through its unique design, Ethersource has inherent advantages to traditional approaches, both with respect to statistical language models and traditional vector space models, and with respect to knowledge-based systems. Ethersource outperforms other technologies on important performance measures. Gavagai believes that Ethersource is uniquely positioned to meet the key industry challenges relating to Big Data:

• Understanding large – and growing – amounts of streaming data;

• Understanding noisy and unstructured language data;

• Understanding data in an increasing number of languages;

• Converting data into actionable intelligence – in real time

Gavagai recently made the 33-list (Ny Teknik and Affärsvärlden) of innovative companies (April 2013) and the 24-list (Veckans Affärer) of most entrepreneurial companies (April 2013). Gavagai made 2nd place on the 2012 Entrepreneur of the Year List by Internet World and was recently (29 Nov 2012) labeled “one of Sweden’s hottest IT-companies” by Reuters, a news agency. Media has covered Gavagai’s technology and accurate high-profile predictions in a number of articles: http://www.gavagai.se/inmedia.php

For more information, please contact CEO Niklas Rudemo, +46 708 141 287,niklas@gavagai.se or visit www.gavagai.se

Gavagai on the 33-list

We are pleased to announce that Gavagai made it to the prestigious 33-list, Sweden’s top list of innovative high-tech companies. The list is compiled yearly by Ny Teknik and Affärsvärlden, Sweden’s leading technical and business magazines respectively. The honor was awarded by the editor in chief of Affärsvärlden, Jon Åsberg.

Gavagai co-founder Jussi Karlgren (center) receiving the 33-list award. Photo by Annika Rudemo.

—There is no short-cut to understanding the wealth of information found in human language. This requires specialized technology, which is what we build at Gavagai. Our goal is to build tools that allow every creative developer to tap into this knowledge, says Dr. Jussi Karlgren, co-founder of Gavagai.

PhD position at Gavagai

We are happy to announce one PhD position in Computer Science with specialization in Computational Linguistics at Gavagai in Stockholm, Sweden (with formal affiliation to Linnaeus University, Växjö, Sweden).

Application deadline: 15 March, 2013.

Description

The position entails graduate studies and research in Computer Science with specialization in Computational Linguistics, with a doctoral degree as the goal. The PhD thesis should be completed and defended within the official appointment duration of four years. The position is part of the StaViCTA project on advances in the description and explanation of stance in discourse using visual and computational text analytics (http://cs.lnu.se/stavicta/). The PhD student will be expected to collaborate closely with the other project members in an interdisciplinary research environment. The position is a salaried employment (starting salary is about 23,000 SEK before taxes (around 30%)) with the right to social benefits and paid vacations. The position is located at Gavagai in Stockholm, Sweden, with formal affiliation to Linnaeus University, Växjö, Sweden.

Qualifications

  • Master´s degree in Computer Science, Computational Linguistics, or the equivalent.
  • Excellent knowledge in machine learning/data mining.
  • Excellent knowledge in natural language processing.
  • Excellent programming skills (e.g. Java, Python).
  • Solid training in mathematics and statistics.
  • Experience with deep learning algorithms.
  • Knowledge of linguistics and semantics.
  • Excellent command of English.
  • Teamwork experience.

Application

http://lnu.se/about-lnu/jobs-and-vacancies?l=en

Further information

http://cs.lnu.se/stavicta/index.php/jobs

Tomorrow’s election in the US

Yes: we, as many others, have followed the US elections in the social media. There are many measurements of social media mentions out there, some thorough, some others little more than simple counting. (The fundamentals of the actual issues, polls, and electoral mechanisms are best summarized by Peter Norvig.)


Ethersource has been reading social media posts on the main US presidential candidates for the past year or so. Based on this reading, our analysis is that

  • Obama will stay in the White House.

… which appears to be in agreement with what most bookies, pundits, and polls predict today.

As we have shown in previous posts on this blog, we have been thinking hard about which measures best capture political attitude in the social media, and what sort of attitude best translates to prediction of results. We already know that people do not usually waste bandwidth on plain simple endorsements or statements of personal voting intentions, but in general use their space for more or less thoughtful predictions of the candidates’ chances to carry the election. Aggregating these sentiments and opinions gives us a prediction market of sorts, composed on those representatives for the electorate who write in social media. We show here our PPI score – an intensity-normalised positivity index for the two main candidates – since mid-August, in a line graph.

Intensity-normalised positivity for the two main candidates since August

Intensity-normalised positivity for the two main candidates since August

As a visualisation experiment, we can show the same data in a quicktime clip, for the two main US presidential candidates since August, with the X-axis showing positive attitude, the Y-axis the intensity-normalised positive attitude and the size of the ball the frequency of mention for the candidates. (High and upper right corner and large ball: good.)



These data show that the candidates’ mentions appear to track each other well (indicative of a close election) and that the incumbent has the edge. Based on these and our other measurements, we believe Obama will stay in the White house.

What Ethersource has Learned About Al-Qaeda in the Past Few Days

  • This post gives examples of Ethersource’s learning capabilities.
  • It gives examples of automatically learned topics and senses of the use of the term Al-Qaeda in English social media.

Ethersource is continuously exposed to massive text streams. On a given day, it sees millions of blog posts, tweets, and forum posts. And it learns. It gobbles up information much the same way a human picks up new ways of using new language constructs. Ethersource learns how the terms it reads are related to each other. It learns about topicality, and it learns about the different senses of the terms.

As an example, let’s have a look at what Ethersource has learned regarding Al-Qaeda the past few days. Topicality-wise, the texts concerning Al-Qaeda are described by Ethersource using the following terms:

  • radicalization
  • LTTE
  • counterterrorism
  • ideology
  • tamils
  • jihad
  • terrorist
  • liberation
  • authoritarian

To us humans, possessing the background knowledge imposed on us in media over the past decade, these terms come as no surprise. They all make sense as describing Al-Qaeda. Ethersource, however, has learned these topics from scratch, without access to any prior knowledge.

Furthermore, Ethersource has discovered two distinct senses, or meanings, of the term Al-Qaeda, as it has been used in social media during the past couple of days.

  1. The first sense of Al-Qaeda was automatically labelled PKK. In this sense, Al-Qaeda is related to Turkish, terrorists, militants, and fighters.
  2. The second sense of Al-Qaeda was automatically labelled Syria. In this sense, Al-Qaeda is related to Iran, Libya, Turkey, Tunisia, and fighting.

Unsupervised topic detection and sense discovery are both inherent properties of the semantic representation at the core of Ethersource. This makes for a powerful tool for an analyst when forming an understanding of the use of target concepts, be it in brand management, Open Source Intelligence, or sudden swings in World Markets.

We conclude this post with the observation that Ethersource has recently learned a new synonym of “Obama”: Obameat.

A portrait of President Barack Obama made up by meat, by the artist Jason Mecier.

A portrait of President Barack Obama made up by meat, by the artist Jason Mecier.

Tiny Needle in Big Data

Weak signal emission, detection, retrieval and analysis

We are repeatedly asked about the predictive powers of Ethersource and we need to underline that Ethersource has no “predictive” power per se. The reason Ethersource can estimate – or forecast – the percentages of public votes in a television contest or the outcome of a national election with some accuracy is simply that Ethersource reads and understands massive amounts of data.

This post will focus on something slightly different, namely the ability to find, understand and analyse one or a few tiny pieces of crucial data in massive amounts of data. It is the needle in the haystack dilemma with the only difference that your proverbial haystack is the Internet, and that you have very limited time to detect the relevant blog posts, tweets or chat entries and to analyze them in time to take action. This is what we call weak signal detection.

The Yeonpyeong attack

On November 23, 2010, 1434 local time (0534 GMT), North Korea fired more than 200 artillery shells at the South Korean island of Yeonpyeong, killing at least two soldiers in the heaviest attack since the end of the Korean War in 1953. The attack, which was somewhat of a shock, had a substantial but short-lived market impact. A day after the firings, South Korea’s benchmark KOSPI index opened 2.33 percent lower and the KRW weakened against the USD.

However, the surprise attack was preceded – on November 22 – by an alert signalling a sharp increase in the weak signal violence propensity index (VPI) for Korea as a target, monitored by Ethersource:

The Domodedovo bombing

Two months later, on January 23, 1332 local Moscow time (1150 CET), the weak signal detection for Putin showed an extremely sharp increase in the violence propensity index (VPI), triggering an automatic alert:

 

 

Not even three hours later, 1632 local Moscow time (1432 CET), a suicide bombing at Moscow’s Domodedovo airport kills at least 35 people and injures more than 100. The market impact was short-lived: Russia’s rouble-denominated stock market MICEX fell by nearly two percent following the blast.

Just-in-time weak signal detection and analysis

The above examples are from an Ethersource prototype and quite dated. There have been many unexpected events since and the technology has been refined. Our ongoing analysis confirms that many unexpected events are preceded by leakage of weak signals. Such signals are very difficult – or even impossible – to systematically detect with other technologies. It might appear a big step from anticipating some kind of violent act to actually being able to take counter-measures, as this would require both the availability and the timely detection of more detailed information. Ethersource has addressed this challenge by allowing instantaneous and automated ranking, retrieval and analysis of any Internet post or document contributing to a weak signal alert on any sentiment concept, such as violence propensity, toward any given target concept.

Near real-time weak signal detection and analysis holds great promise for security and financial applications, in our view.