This is our blog. We write about cool experiments, interesting case studies, open talks and technicalities.
Grab the RSS feed
Back

2011-10-31

The difference between Ethersource and those other models

We want to make clear what the difference between our approach and approach X is. (Substitute X for your favourite text analytics technology). In short, Ethersource is a vector space model, with the processing convenience that comes with a vector space.

But a vector space is only as good as the process used to populate it with data. We use distributional data to populate our vector space matrix: nearness in our vector space means similarity with respect to distribution. And we build the vector space handily – it is also compact and remains tractable in size.

Here is a brief comparison matrix.

Challenge Statistical Knowledge-based Ethersource
Vast scale Fine 
(if sampling is done correctly and samples are true to data)
Fine
(if processing model can be optimised)
Fine
inherent in memory model and in processing model
Multilinguality Fine 
(if labeled training collection is available, involves train-test-update cycle)
Problematic
(involves expensive retooling of knowledge base)
Fine
inherent in memory model and in processing model
Change Problematic
new data not guaranteed to conform to estimations based on previous data
Problematic
(involves expensive retooling of knowledge base)
Fine
inherent in memory model and in processing model
Variety Fine 
(if labeled training collection is available, involves train-test-update cycle)
Problematic
(involves expensive retooling of knowledge base)
Fine
inherent in memory model and in processing model
Coverage High recall High precision High recall
Abstraction Strings Concepts or logical forms Concepts

We view Ethersource as the base technology for any service or information process which relies on human language as an input. Any process which today uses grammars, lexica, thesauri, occurrence frequencies, estimates of collocation likelihoods will be well served by plugging in Ethersource as a base resource or as a replacement. Also, many new services which today would be prohibitive in engineering cost will be painless to design on top of Ethersource.

We have some examples in our service palette today, but we have by no means exhausted the possibilities!

Category: technicalities