[ Home | What We Do | Our Clients | Press & Events | Library | Contact Us ]


Collecting and Using Customer Data - Part 1

Fastwater Rapids vol. 1.2, 28Aug98

by Bill Zoellick

It is easy to collect data about the visitors and customers who come to your web site.  Lots of it.  The hard part is making use  of what you have collected.

In talking with different companies about their data collection we see two issues come up again and again -- issues that suggest why making sense of the data is difficult:

So, let's start our discussion of collecting and using data by making some fundamental distinctions about kinds of data.  Then we will move on to look at how the data connects with business issues.

Starting with the Basics

Here is what one system administrator had to say in an e-mail message about the use of one particular tool for monitoring customer interactions on the sites that he manages for his clients: It is one of the higher-end packages, and therefore more expensive. Server logs get imported into a back-end Informix database, allowing users (Windows 95 and UNIX GUI clients) to run fairly sophisticated types of queries, like "click track" reports that show the path(s) that surfers take through your site. ... Most of my customers are interested in the number of visits/visitors to their site over time ("Hits" are basically a worthless metric). Many are also interested in referrer information; if, for instance, someone got to your site via an Alta Vista search, you can see what they entered as search terms. A few are interested in the paths taken through their site.
This writer, who is engaged in managing web site operations for an educational foundation, a makes a number of points that will help us distinguish between different kinds of customer data collection and analysis:
This last point is especially important because, as we will see, the work that is being done is actually pretty basic stuff, far from the "high end" of collecting and analyzing customer information. At the same time, the writer is absolutely correct: what his customers want is about as sophisticated as things get today. A paradox?  No -- we just need a bigger picture.

Breaking Apart the Kinds of Customer Data

We all know that "customer data" comes in a lot of forms, from a lot of different sources. Some is hard to get and some is easy. Some, like a customer name and other identifying information, is a "high-level" description of the customer -- other data, such as the pattern of page views and clicks during a given session on your site, is low-level and often voluminous.

Making sense of these different kinds of customer data is easier if we divide the information into four general categories, or "purposes" for data:

  1. General Site Usage Metrics:  how many people visit the site, where to they come from, when do they visit it, and so on.
  2. Navigation Patterns and Diagnostics:  once they are on the site, what do they do there?  How do they move through the site?  Are there pages that are often stopped before being fully displayed?  What is the user's effective bandwidth?
  3. Data to Drive Content Delivery in Response to the Customer:  at the simplest level, this consists of making decisions about content (ads, articles, lists of related links, and so on) on the basis of the previous pages, purchases, and other information that can be stored as "short term memory."  At the most complex level it includes previous history with the customer, including information about the customer coming from sources other than the web.
  4. Lifecycle Information:  historical data for numerous customers is aggregated to produce insights and metrics with regard to the purchasing patterns of typical kinds of customers over time. We'll talk more about this one later, since it is an very popular notion that can be especially difficult to implement.
We will take a brief look at each of these "purposes." They differ from each other in important ways. Some of the differences have to do with the level of detail. Some have to do with when you do the summing and averaging to produce a statistic or chart. Aggregating results to produce the big picture -- a step that is very important if the data is ever going to be useful for you and your management team -- is a process of throwing a away details so that you can see the forest, rather than just lots of trees. One important question is whether you really, literally, throw the details away.

We can make sense of such questions by framing them in terms of our four purposes.

Purpose I: General Site Usage Metrics

This kind of data and reporting is a requirement for doing business on the web, plain and simple. You have to be able to count how many people (as opposed to robots) visit your site, and where they come from. Without this basic information it is impossible to know whether new advertising efforts are effective or not, whether changes in web content or pricing are converting more visitors into buyers, and so on. It's like keeping your books: you've got to do this.

There are dozens of ways to collect this information. At the low end, you can write scripts to aggregate information in logs generated by your HTTP servers. As you pay more for more sophisticated tools, you get heuristics that convert "hits" (calls to the server -- there could be many such interactions involved in displaying a page) to actual "page views." You get fancier reporting tools. You get the ability to monitor over multiple servers in multiple sites.

You would think that, at this level, it would be hard to go wrong -- but its possible. The reason has to do with this business of aggregating and discarding the details. Throwing away details can be a great thing -- it reduces storage space, and, for sites that generate a lot of activity, can greatly reduce the processing time required to generate useful results. But it can also make it difficult to construct an "audit trail" if the numbers are surprising. In talking with companies we hear reports of aggregated data from high-end packages that simply do not match up with logged data that is being kept in parallel. If all you keep is the sums, throwing away the raw data, how do you check your work if something seems odd?

Our advice: be sure to ask for references before buying a high-end package. Talk to these reference sites and ask specifically whether they have been able to audit their results and match it up with log file information.

Purpose II: Navigation Patterns and Diagnostics

Do most people visiting your site select one of the primary links on the home page? Or do they usually do a search to get to where they want to go? Are they leaving the page before your gorgeous but large graphics download? (A surprisingly common problem -- does everyone that builds sites have a T-1 connection? What gives?) What is the effective bandwidth of your average user? Do you have classes of users (e.g., business partners and customers) who have clearly different bandwidth profiles?

In short, how can you improve the design of your site to make it more effective for your purposes?

In broad outline, this problem is just a more detailed version of the "General Metrics" case -- you are still not really looking at the behaviors of individual users -- you are interested in "aggregate" information. The difference is that you will have more of this information. One tool vendor tells us of busy sites where the detailed usage information runs in the range of 2-5 gigabytes a day.

One other consideration is that, in order to answer questions about matters such as effective bandwidth and stopped downloads, you need to monitor information not just at the web server level, but also at the network level.

Most vendors of data collection tools have solutions to provide you with navigation details and diagnostics. They approach the problem in slightly different ways: Andromedia reduces and aggregates the information early in the process, resulting in smaller storage requirements; Accrue focuses on their ability to monitor at the network level and to make useful inferences by correlating that information with web server data; net.Genesis distinguishes itself with a "late-binding" approach, saving detailed customer interaction information and providing powerful, heuristic tools to permit different post-hoc analyses of the data.

Going back to our comments from the system administrator early in this article, he noted that only "a few" of his clients "are interested in the paths taken through their site." This is consistent with our observations. We would add that when there is interest in paths through the site, it is usually for pretty simple things, such as looking at how many visitors use a site's search facility.

Our general observation here is that few companies are yet successful in connecting this more detailed, diagnostic class of information to real business issues of interest. There are dozens, probably hundreds of interesting diagnostic questions you could ask about your site, and, given a tool that collects enough fine grained data, there are many things you could scratch your head and wonder about. But which ones matter? You can't approach the "does it matter?" question without reference to business objectives.

We just recently talked with a company that is trying to decide whether to keep advertising on their site. They are succeeding in using their site to sell a good volume of high margin products, tied to their core business. Their success has them wondering whether the much smaller revenues from click-throughs on advertising on their site are worth bothering with. Screen real estate is dear -- would they make more by devoting the space now dedicated to ads to selling core products? THAT is the kind of question that diagnostic tools might answer -- particularly on a site such as theirs that is database driven -- they could literally run an experiment and find out. (For more information, see the Network Economy Practices case study on this company.)

The more sophisticated data collection and analysis tools seem most valuable when they help a company keep moving their site design forward toward their larger business objectives. Most companies aren't there yet. The connections between their business model the details of the site design are still just emerging.

Next: Looking at the Individual Customer

So far we have just talked about information that is aggregated across the many individuals that visit your site. You look at the total number of pages viewed per day, regardless of who viewed them. You look at the total number of viewers coming in from a banner ad placed on the Wall Street Journal’s site. If you are collecting more detailed information, you might look at the percentage of visitors using the search engine to move into the site from the home page. Or you might look at the percentage who never let the home page load before moving to another place on the site, or before leaving the site altogether.

But in such instances you are not looking at what I do when I visit your site. That's a very different matter. To do that you have to collect and maintain information that ties together my activities over time. That's not too difficult if you are only watching my activities over a single visit, but if I buy something, then return another time to buy something more, tying those two visits together means having a database and being able to look me up. How much information do you need to store about me? What I bought? How I got there? What process I followed before buying? What other things I looked at?

Even more important, if you have any faith in marketing at all you're willing to bet that I am an instance of a class of customers. Serving my needs, and increasing the revenues from my purchases, is easier if you understand what other people like me are buying, and how we interact with your site. How do you identify me with the right group? How do you aggregate activities for that group so that you can make useful assumptions about how its members use your products and your website? Are there ways to tie in information gathered from sources other than the website?

These are the really "high-level" uses of customer data. In "part 2" of this article we will look at why collecting and using this data is so difficult and at what can be done to make the job more manageable.


Next: Part 2
 

[ Home | What We Do | Our Clients | Press & Events | Library | Contact Us ]