This is our blog. We write about cool experiments, interesting case studies, open talks and technicalities.
Grab the RSS feed
Back

2011-11-01

We don’t do training, we do learning

We have already expressed in this blog how very pleased we are with the design of Ethersource, the technology we have developed. For Ethersource, the memory model and the processing model are the same thing. The memory model we have built Ethersource with has a built-in processing model. New data is projected into our memory model without confounding previous knowledge and without resizing the memory model. (We will return to technical details here in the near future.) Ethersource delivers salient term-term relations in real time, on-line, without recomputation or postprocessing of the aggregated data.

We are frequently asked how much data we need to train the model and how long it takes to train it.

We never quite know how to respond. Our design renders that question moot.

As new texts arrive, so do new words, terms, and turns of phrase. The inventiveness and creativity of human authorship is inexhaustible (as is our occasional unwillingness to conform to norm). Any system built to handle realistic amounts of new text must be built – by design! – to handle anomalies, divergence from norms, change in norm. This poses a challenge to systems: knowledge-based systems will need to maintain and update their knowledge resources and statistics-based systems need to update and realign their parameters and rebuild their training data sets. This is because they are designed with a disconnect between their knowledge and their processing — statistical regularities or symbolic rules are extracted from training data and then used on future incoming data. The process of acquiring new knowledge can be variously demanding in terms of data, human editorial effort, processing requirements.

By contrast, the Ethersource semantic model is available from start. We don’t do training. Ethersource is built to learn, not to be trained. From the very first token processed, it is at service. Learning is done on the fly. Of course, there is a learning curve: at first, Ethersource knows little, after a while it knows more, eventually it is quite erudite. But at all times, the semantic relationships between the units most recently encoded by Ethersource is only a query away.

Category: technicalities