TCW: Jason, the last we talked was about a year ago.  For our readers who don’t recall that interview, can you tell us a little about yourself and the company you work for?

JH: Sure, Scott. I work as Principal Technologist with Mark Logic, focusing on large-scale XML content manipulation.  Prior to Mark Logic I did a lot of work in Java: I wrote the book Java Servlet Programming (O’Reilly), helped develop Apache Tomcat and Apache Ant, created the JDOM open source library for XML manipulation, and worked as Apache’s representative to the Java Community Process Executive Committee.

TCW: Can you tell us a little about Mark Logic. What products do you make and what types of problems are they designed to solve?

JH: Mark Logic sells an XML Content Server (called MarkLogic Server) that acts as a platform for people creating content applications.  We use XQuery as the language for interacting with the server, with extensions for advanced text search, transactional updates, and other useful features. While most XQuery engines focus on handling XML data (such as purchase orders) we focus on XML content (such as books, articles, references, web pages, and blogs). XML content, unlike data, is more textual, ordered, hierarchically structured, and diverse in its construct.  We’ve enjoyed a lot of success selling in the publishing and government verticals.

TCW: Because many people can process and understand ideas when they can see them in action, can you show us a solution or two that is made possible by your software and explain what problem it’s solving?

image
  • Facebook
  • Twitter
  • LinkedIn
  • Pinterest
JH: Sure, one of my favorite examples is Elsevier’s PathConsult web product.  It’s a differential diagnosis tool built on MarkLogic to help pathologists (like “Dr House” on TV) identify the source of illness. These doctors need to be fast, accurate, and sure. The site runs against Elsevier’s vast library of medical literature.  If you remember my discussion of web trends from our last interview, PathConsult satisfies the three trends: “sweat the content”, “deliver answers not links”, and provide “content in context”.  Next up: RadConsult, a radiology diagnostic reference system.

TCW: Those are great examples of the power of Mark Logic Server. But, we received a recent press release announcing a new service, MarkMail. And, we were totally blown away. Tell us a little about MarkMail. What is it? How does it work? And, why do we need it?

JH: I’m glad you liked it!  MarkMail is a free web site, hosted at

http://markmail.org, for interacting with email archives. The site lets you search and analyze emails from hundreds of public mailing lists: online forums where people email each other to discuss some shared interest. You may wonder, what do people talk about on public mailing lists?  Software development is probably the most common topic.  Others topics are as varied as fine wine collecting, large format photography, and techniques for guitar looping. So far we’ve loaded about 500 lists and 4,250,000 individual emails.

image
  • Facebook
  • Twitter
  • LinkedIn
  • Pinterest

Email presents an interesting challenge.  Email archives (both public and private) hold huge amounts of information, but the histories haven’t been well utilized.  We think one reason for that is technical, that you need a product like MarkLogic Server before you can take full advantage of email content.

Our plan with MarkMail, being built on MarkLogic Server, is to actively push the envelope and build a content application targeted at the email challenge.

As you’ll see with the chart on the http://markmail.org home page, one of our goals with the site has been to focus heavily on analytics.  We have lots of graphs and counts.  Each and every query you write gets its own histogram chart.  You can use these to put search results in context, track each list’s historical growth, check on a specific poster’s activity, or inquire whether the “buzz” on a topic is heading up or down.

image
  • Facebook
  • Twitter
  • LinkedIn
  • Pinterest