This is our blog. We write about cool experiments, interesting case studies, open talks and technicalities.
Grab the RSS feed
Back

2016-03-08

Business bingo – Is your text analytics system up-to-date with current affairs?

In my role as Chief Data Officer at Gavagai, I meet with lots of leads, clients, and data providers. Much of our conversations are carried out in English, and as a non-native speaker, I sometimes find the choice of wordings peculiar, and at times slightly amusing.Touch base, reach out, back-to-back, and help me understand, to name but a few.

In the game of buzzword bingo, players tick off pre-defined buzzwords available on a bingo-like board. But what to enter as buzzwords? How would you recognize such a word? In my view, many of the business terms I’ve encountered would qualify as buzzwords in a game of business bingo.

Gavagai’s semantic memories read millions of online documents per day and learn not only what constitute terms in a language, but also how they relate to each other. Many of the terms that have amused or puzzled me recently are present in the Gavagai Living Lexicon, which is the preferred way to peek into the semantic memories. Since each term in the lexicon potentially has semantically similar neighbors, the contents of the memories can be viewed as a graph: one term’s neighbors have their own neighbors, and so on. The Lexicon is accessible via an API, so I figured I’d extract the neighborhoods of some of the business terms I’ve come across and explore them using a graph database, Neo4j.

Let’s look at touch base (below). From the context of use in our conversations the past years, I understand that it has to do with connecting, or re-opening a channel. My way of knowing this is the same as our semantic memories’, that is, by observing terms in their contexts. The graph representation of the neighborhood of touch base looks like this:

touch-base-large

 The closest neighbors to touch base are stay in touchkeep in touchget together, and socialise. The nodes in the graph are terms (automatically identified by the system), and the edges between nodes denote strong semantic relationships (again, automatically identified by the system – there’s no manual intervention going on). Touch base, far left, is connected to keep in touch by means of a relation labelled * with friends. The label is read as touch base with friends, and keep in touch with friends.

In the graph for touch base, most nodes are connected to most other nodes, which renders the structure less interesting. Let’s look at a more disconnected example. The graph forsales revenue (below) contains several local cluster-like groups, distinguishing between, for instance, retail sales, net earnings, and effective tax rates.

sales-revenue-alt-large

In this zoomed-in part of the above image, we can see that the system has picked up on, and related a set of terms having to do with net earnings, which is indirectly connected to the target term, sales revenueEffective tax rate also has something to do with net earnings, via diluted net income.

Screenshot from 2016-03-07 12:50:30

One may argue that the more connected a graph is, the less distinguishable the uses of the terms represented in the graph, that is, the language use is vague. The vagueness in the touch base case signifies a term that would make a good buzzword in business bingo. Judging from the graph, sales revenue, on the other hand, is more concrete and is thus a bad buzzword candidate.

I’ve spent hours pouring over the contents of Gavagai’s semantic memories, learned a lot of new terms (it’s especially fun looking at bad language, but since it’s NSFW, I’ll save those examples for another time), and marvelled about the way an unsupervised learner can pick-up on language use by simply being exposed to lots and lots of data. In fact, many of the terms picked-up by our semantic memories are not terms that a traditional NLP pipeline would easily learn: Gavagai’s system learns common language use, not only the extraordinary cases such as entities, idioms or single word terms.

Take home message: If you’re in a business segment where you need up-to-date word knowledge, in multiple languages, accessible as SaaS, you should contact us!

The semantic memories are currently available in 20 languages, via Gavagai Living Lexicon, and API, and are at the core of our media monitoring application, Gavagai Monitor, as well as our text intelligence platform, Gavagai Explorer.

Is you text analytics solution business bingo-ready?

Category: case studies, Gavagai Lexicon, technicalities