This is our blog. We write about cool experiments, interesting case studies, open talks and technicalities.
Grab the RSS feed
Back

2016-05-23

What is an efficient way to analyze answers to open-ended survey questions using language technology?

There are challenges to analyzing free-text answers. In the following discussion I will assume that the purpose of the analysis is to achieve an understanding of the themes being discussed and the relative strengths of these themes as well as to get accurate quantification of the numbers of respondents and percentages for each theme.

  1. Knowing about multiword expressions. Important concepts in text often consist of more than one word,  for example: “San Francisco”, “no-fly zone”, “give me five”, or “kick the bucket”. An automated tool for analysis of answers to open-ended survey questions needs to understand such multiword expressions or the automatic analysis will perform poorly. Why is that so? To understand this, consider a “word cloud”, a graphical representation of terms in which more common terms get a larger size than the less common ones, and all terms are jumbled together. If we build a word cloud based on text with constructions like those mentioned, you will see the words “san”, “me”, “zone” and so on, as separate words all mixed together. And the word “me”, for example, is probably a part of other sentence constructions as well so it might be represented in a large font, even though the expression you are really interested in – “give me five” – only appears once. This capability is also important as a prerequisite for the next point – understanding synonyms. Could you create manual lists of important expressions like these? Yes, but this requires a lot of work, and it’s impossible to include every possible alternative. It’s better if the tool, by understanding language, can do it for you. And the system cannot generate just any multiword expressions;  they have to be accurate, relevant, and recognizable, or else they will cause more harm than good.
  2. There are many different ways to say the same thing, and yet you want similar answers (or parts of answers) to be grouped together by the tool you use when analyzing. Compare this to manual analysis; a human being will understand that an answer such as “the room was small” should be placed in the same bucket as “the room was of less than average size”. You need the tool to handle automatic merging of such variants. This is a lot like synonyms except we want the system to understand similarities between multiword expressions as well. The tool should suggest semantically similar expressions and enable the user to confirm or reject them; by visualizing suggestions instead of automatically merging them, the system will help the user to cluster the content appropriately while still allowing full control of what is merged.
  3. The language of the text. Any tool you use should support the language of the text being analyzed – or the text needs to be translated to a language that the tool can handle. Translation can bring its own set of problems, however: it can be prohibitively expensive; you need to be consistent in choosing the terms used when translating; and there could be style differences across translators. A better way to do the analysis is to use a system which understands the language (it should know about multiword expressions and it should be able to generate suggestions for similar expressions and synonyms, as described in points 1 and 2 above). Of course, you will still need someone who knows the language to do the analysis, but the point is: an Italian researcher doing a study of Italian texts should be able to use the system to analyze Italian.

There are more advantages of using automation for analysis:

  • Faster analysis. You can analyze more answers in a given period of time when you use an automated tool. Consequently you can increase the number of responses – or finish the analysis in a fraction of the time. And since the process is fast you can afford to explore and hypothesize to a greater extent – the tool could help you build your view of the data by finding its main themes before it helps you quantify them.
  • Consistency. An automated system will be consistent; it will render the same results regardless of who does the analysis or when (this relates to problems of inter-rater reliability). For example: if you do a tracker with an open-ended question and ask the same thing every month, the analysis will not vary month to month because of who is analyzing the answers or the time between coding sessions for a certain analyst. If fact, the analysis can be completely automated after the initial theme modeling has been done.
  • Lower price. Shorter time for analysis and less manual work means less expensive analysis.

The potential for using automated analytics in (market) research is changing because of new technological advances. It is currently possible to quantify the insights from qualitative material, both efficiently and accurately, while decreasing survey fatigue and survey complexity since you need fewer questions. This could lead to an increase, in the market research field, of the importance of open-ended questions and, consequently, a gradual move from quantitatively based to qualitatively based research. Net Promoter and similar methods could also change since large volumes of answers is no longer a problem (think hundreds of thousands of respondents for customer experience touchpoints), and any NP score could be accompanied by an open-ended question that can help to explain the rating.

The next frontier in text analysis is to handle open text – i.e. not only to correctly label and structure the individual entities in text, but to understand *what the text is about*. At Gavagai, we develop functions (available in our API) that are necessary for this next step. Our API is built on top of a self-learning semantic base layer, and features functions that can be used to identify and extract concepts in texts, to relate and group such concepts to each other, and to measure various aspects of such concepts, such as sentiment and stancetaking. We have used these functionalities to build the Gavagai Explorer, which is a groundbreaking tool for analyzing open-ended survey questions. The same functionalities can also be used to analyze large amounts of texts from the web, enabling the next generation of media monitoring tools.

Please visit our product page at Gavagai Explorer for examples, an explainer video, a tutorial, and a chance to try the system for free.

Category: case studies, Gavagai Explorer