Book review by Scott Abel, The Content Wrangler
It’s not often that a structured content evangelist like myself finds himself wondering about the business value of unstructured information. After all, structured content is easy to work with, process, and thanks to XML, automatically reuse. Unstructured content is messy, inconsistent, and nearly impossible to process in large volumes in any meaningful way. Right?
Wrong. According to researchers Scott Spangler and Jeffrey Kreulen, co-authors of Mining the Talk: Unlocking the Business Value of Unstructured Information (IBM Press), unstructured content, the most common type in the world, can be mined for valuable business knowledge. Spangler and Kreulen define “talk” as unstructured information in free-form text.
“As in normal conversation,” the authors write, talk is inconsistent; “There is no consistency of word choice or sentence structure or grammar or punctuation or spelling.” Unstructured content is also the most valuable type of content, because, the authors write, “Hidden inside the talk are little bits and pieces of important information that, if aggregated and summarized, could communicate actionable intelligence about how any business is running, how its customers and employees perceive it, what is going right and what is going wrong, and possibly solutions to the most pressing problems the business faces.”
So, What Is “Mining the Talk”?
“Mining the Talk” is not just the name of a book. It is a methodology centered around developing taxonomies that capture both domain knowledge and business objectives necessary to successfully unlock the business value in all kinds of unstructured information. It’s a particularly important tool for organizations who have large amounts of unstructured information representing interactions between organizations and their customers.
One type of interaction that is loaded with potentially valuable unstructured information are customer service center and technical support desk calls. These interactions generate problem tickets—records of each call that are entered into a database repository by the call center employees each time they interact with customers—that are loaded with potentially valuable information. Each ticket contains a free-text field in which representatives type in details about customer interactions. The value of the information hidden within these data sets is difficult to see for a variety of reasons. One of the main challenges is due to the fact that customer service and technical support staff do not use consistent language to describe customer problems—there’s even dramatic differences in sentence structure, grammar, punctuation, and spelling.
The authors describe in detail a repeatable process in which computer software and human domain experts work together to find and understand the hidden intelligence locked in unstructured information. But, don’t be confused. This is not about search.
It’s Not About Search
Searching and mining information are not the same techniques. In fact, they are very different. The authors of “Mining the Talk” provide an excellent example that helps to illustrate the differences between each approach.
“In ‘Mining the Talk’, you are not looking for a single document that will answer your question–you may not even be looking for a single answer. Instead, you want to find out what you don’t already know. You want enlightenment. You want an experience, something that will widen your view of the world and expand your vision of your company and its place in the universe. This is the principle difference between ‘Mining the Talk’ and search. Search is like what you do when you ask a hotel clerk for directions to the best restaurant in the area. Search assumes you have a clear idea of what you want to find, and a clear way to describe it. Of course, if your idea of best restaurant is very different from the hotel clerk’s idea, then search may still fail to get you what you want. This illustrates the point that problems that seem like search may really be mining problems in disguise.”
According to Spangler and Kreulen, the key criteria for a search solution include:
- A specific question to be answered
- An unambiguous way to communicate the question
If both criteria are met, the authors say, “then search may succeed. If not it will almost certainly fail.”
“Mining the Talk” does not rely on search. Instead, it relies on the development of a map—or taxonomy—of the unstructured content to provide the “lay of the land, giving some direction and an organized way to go about the process of creating insight from information.”
Interactions versus Transactions
The authors say it’s important to understand the difference between customer transactions and customer interactions, when mining information. Transactions are structured and generally revolve around the exchange of money for goods or services. Software can be used to automatically create transaction reports, in which the meaning and value of the transaction information is easily seen.
Interactions, on the other hand, are typically unstructured and involve an exchange of information of some kind. Usually, the customer talks first, asking a question or requesting information, sharing feedback, or reporting a problem. Unlike transactional data, interactions are unstructured and do not have fixed formats. Therefore, software alone is not able to easily detect the hidden knowledge contained in the unstructured datasets. To see the real value, Spangler and Kreulen say, you’ll need to enlist the help of domain experts.
Finding the Business Drivers for Interaction Analysis
Before you can adopt a new approach like “Mining the Talk”, you’ll need to communicate the business value of performing the analysis to upper management. Common business drivers include:
- Reducing expenses through improved efficiency in customer interactions
- Cost reductions via automated call handling
- Reducing the average length of time for customer interactions
- Reducing labor expenses by reducing the average level of expertise needed by call center staff
The authors say the need for communicating business drivers and incorporating them into the analysis process is a critical success factor that is often overlooked.
What Can We Learn From Mining Customer Interaction Data
According to Spangler and Kreulen, mining unstructured interaction data can help shine light on many issues impacting our customers and our relationships with them. Some questions that can be answered by mining unstructured customer interaction data might include:
- What are the most common issues our customers have?
- What are the areas of dissatisfaction?
- When the customer interaction process breaks down, at what point does it fail?
- Who is doing a good job interacting with customers, and who is not?
- What are the drivers of cost in a typical customer interaction?
- What are the emerging trends in customer issues?
These questions cannot be answered by search.
What Do You Need To Mine Customer Interaction Data?
The authors say the first thing you’ll need is a reasonably complete collection of information describing a significant sample of customer interactions. This information is likely to be found in:
- Call center logs
- Customer email
- Chat and IM transcripts
- Voice transcripts from customer telephone calls
- Customer suggestion line databases
In order for this information to be most useful, the authors say it should have:
- At least one free-text field (unstructured information) containing relevant information
- The data should cover a significant time period (at least a few months, if not more)
- A mixture of structured and unstructured data
- A significant amount of data available (more than what can be processed by a human in a reasonable amount of time)
You’ll also need real world challenges (problem areas to investigate) and software designed to help you uncover the hidden value in unstructured content.
Real World Examples
“Mining the Talk” is loaded with excellent examples designed to help you better understand the concepts outlined by the authors and why they are important. A customer call center example is referenced throughout the book as call center content was the type of unstructured content IBM tried to mine. Additional examples include:
- Predicting the likely success of a new snack product
- Comparing patent portfolios between two organizations in order to determine the potential usefulness of each organizations’ proprietary technologies
- Determining the impact of bird flu epidemic on pet food sales
- Evaluating the effectiveness of communication style on business proposal acceptance
The Software: IBM Unstructured Information Modeler
“Mining the Talk” provides detailed instructions (and examples) that will help you take advantage of this approach. The authors even provide access to a free software tool—IBM Unstructured Information Modeler—that will allow you to work on datasets that contain between 1,000-10,000 examples, each containing between 1-20 lines of unstructured content. Installation and usage instructions are provided, as well as numerous examples.
Conclusion
While this book is certainly a marketing pitch for the IBM Unstructured Data Modeler, it’s not an in-your-face marketing message. In fact, the authors do a great job of exploring the concepts, methods, and reasons behind “Mining the Talk” while staying clear of trying to sell software. Instead, they tell readers that all of the methods described in their book can indeed be undertaken with pencil and paper (or a spreadsheet and a computer), but that analyzing large data sets would not be practical—nor possible—with humans alone (at least, not in any reasonable amount of time). So, the software is mentioned once or twice in the book, with the last chapter dedicated to explaining how to use the IBM software.
Overall, “Mining the Talk” may be a challenging read for some structured content pros. Not because the book is difficult to read, but because reading the book will undoubtedly introduce new thinking about the value of unstructured content. It will almost certainly provide a new vision of what’s possible when we are able to see patterns in unstructured content and why marrying structured and unstructured content can help us see patterns and detect problems that were not visible otherwise.
While the authors have worked hard to explain that unlocking the business value in unstructured information is a valuable and worthy effort, they don’t dismiss the need for structured information. In fact, they actually help make the case for structured information, while pointing out that much of the world’s knowledge is locked up in unstructured content. Until that changes, there will be many valuable secrets hiding in unstructured content chunks, just waiting to be uncovered.
The folks at IBM are onto something…something BIG! But don’t take my my word for it. Read the book and decide for yourself. If you’re like me, you’ll find the examples provided thought provoking, and before long, you’ll find yourself thinking of new and exciting ways Spangler and Kreulen’s methods can benefit your organization.