This interview is an extract from the white paper "The Truth About Sentiment & Natural Language Processing" released this week by Synthesio.
An interview with Seth Grimes, an “Analytics visionary”
“Watson”, the IBM computer won on the game show, Jeopardy, created a huge buzz around “his” technology. Why do you think there was so much buzz?
Getting a computer to play Jeopardy was a great stunt. IBM made the technology do something that everyone can understand. It was a “stunt,” however, because the ability to win Jeopardy is not in high demand in business or society. Nonetheless, Watson’s Jeopardy playing helps the non-technologist public understand the potential and the reality of the technology.
Question-answer systems are already out there, automating responses to business questions – for instance, for contact-center support, customer inquiries, and online commerce – no requirement for a live person on the line. Right now Watson is focused on extracting factual information, but the technology could be working on sentiment via a sentiment “annotator.” Then we won’t be limited to asking questions about facts. We’ll be able to ask about opinions and emotions.
(An annotator analyzes text and marks it up with meaning, or attributes, features in the text. For example, a name identity annotator finds geographic locations and “marks them up”, finding semantic meaning. Annotating pattern-based entities can find addresses, identity location numbers by looking for patterns, and other annotators can mark up other parts of the text.)
How accurate can this technology be?
Accuracy goals, and the amount of work you put into meeting them, should be decided in light of the business problem. Some problems will be solvable even with low levels of precision (e.g., positive versus negative sentiment classification) while you might need higher precision for other applications. “Recall,” the ability to identify all applicable cases, is also factored into accuracy measurements.
My impression is that most sentiment tools that extract entities have out-of-the-box accuracy (without training) of something like 40-50% but can be “trained” (by having humans create marked-up samples or language rules or correct the tool) to reach above the 80% level. I saw one claim of 98% accuracy, which is laughable and ludicrous. The only way you can do this is by highly restricting the problem and tailoring the solution and being more lenient on what counts as accurate or not.
It matters most, first that you identify that there is sentiment there at all, without even identifying if it is positive or negative, and then passing materials on for human or machine classification. With machine filtering and humans analyzing, for certain problems, you can yield high levels of accuracy. If you really want the machine to do everything, you need to do a lot more work or you will get much lower levels of accuracy over all, but again, decisions should be made based on business needs and also the nature of source materials.
Let me add that I consider that while tools that analyze only at the message or document level may be accurate, the results they produce will also often be far less than useful. Think about it. It might be helpful if you’re running, say, a hotel group with 4,200 hotels, to know that (making up numbers) 77% of reviews were overall positive, 17% neutral, and 6% negative. Wouldn’t it be far more helpful to know, by hotel, opinion details? You want to know when a reviewer found that room cleanliness and staff friendliness were exemplary but that noise was a problem. The details in a net positive review are not typically going to be all positive, and only by knowing sentiment at a detailed, “feature,” level can you reinforce what’s great and correct what’s not.
By the way, let’s not overstate the accuracy of human sentiment analysis. The best study I’ve seen of accuracy was done at the University of Pittsburgh in 2005. While they found only 82% human agreement in annotating for sentiment Results jumped to over 90% when they removed uncertain cases (when they subtracted cases where people said they weren’t sure).
Are there certain online channels (among forums, blogs, Twitter, etc) that are easier to analyze using text mining as opposed to others?
To really do it well you have to go to the feature level (to the individual item). You need strong natural language processing (NLP) to do that right.
Twitter is interesting because it is very hard to express more than one idea in a given tweet. Most tweets focus on a single idea which, in theory, should make it easy to analyze. The problem is, people use a lot of slang and abbreviation, which makes it difficult to analyze, as opposed to a blog or article. Also, a tweet is often part of a conversation. Very few tweets stand on their own; many including an article link or are responses to someone, for example. Others are part of multi-way conversations, and you very often need to understand the whole conversation to get the context. Most of the tools that are out there don’t do that; they don’t reach “through the tweet” to take into account the threaded nature of Twitter conversations. The more text there is, the easier it is to analyze, but at the same time the shorter it is the more focused it’s going to be.
But let’s move from ease of analysis to business value delivered.
Applications like Synthesio’s get a lot of visibility because so many people use social media, but customer service is the sentiment-analysis application that has probably delivered the clearest business benefits, the greatest business value. Contact centers and surveys provide important data that is more focused than material out on the web, associated with actual customers and transactions. You’ll get greater benefit tying customer feedback to social media data, rather than if you spend your funds broadly listening to people that are expressing opinion in a void, without context.
There’s no denying the potential benefit in broad social-media monitoring and engagement, however. People will tell you what they like about your product (or don’t) and will post things that can be analyzed and shown to be indicators of their intent (to buy, to complain, or cancel their service, etc.) This information can be used to fix problems: the customer-service scenario. Answering a customer to make that person happy can turn them into a “net promoter,” and the information can be used to improve quality so the problems don’t happen to other people. Posted and analyzed information – beyond-polarity (positive/negative) intent signals – can also be used by companies to identify and act on opportunities. This is engagement that not only reactively responds to particular comments about products and services. It’s engagement that proactively creates new and higher-value customers.
What recent advances have you seen in sentiment analysis technology?
The latest advances in analysis do go beyond “polarity” or “valence” (positive, negative, neutral), and I don’t just mean by rating sentiment on a scale from -10 to +10 to capture “intensity”: an advance, but we can do more. For example, you might look at sentiment in the terms of emotional categories such as “angry”, “sad”, or “happy,” about a hotel service, for example. I’m sure we can all think of ways that automated understanding of emotional tone can be useful in business contexts.
Then there are the “intent signals” I was just discussing: sentiment as an indicator of plans, or actions.
You’re going to get the most flexibility in creating business-suited categorizations via statistical approaches. That is, the analyst sets up categories that make sense and drags and drops documents into the different categories for “training” purposes. The machine uses statistical similarity measures to discover what the items in the category have in common in order to automate classification.
Further, the market is beginning to understand that influence is best measured by ability to affect business. Certainly influence is correlated with the number of Facebook friends, Twitter followers, and retweets, but what should interest far more is how those measures translate into inquiries, sales, and monetizable perceptions. A person is influential for real if he or she drives business transactions.
And the market is understanding just how shallow many of the listening tools are – treating social media as a silo, completely unlinked to enterprise systems and actual business transactions, using simple keyword lists for sentiment classification, and applying sentiment analysis only at message, article, or document level – and that they can and should do better, including by joining the abilities of humans, who judge me and discern, and the power of machines, which are fast, work 24 hours per day, and can tap huge volumes of social, online, and enterprise information that are beyond human analysis regardless of cost.
Views: 41
Tags: natural-language-processing, nlp, sentiment-analysis, social-media-monitoring, synthesio
© 2012 Created by Laurent Magloire.
You need to be a member of Influenceon to add comments!
Join Influenceon