[ Home | What We Do | Our Clients | Press & Events | Library | Contact Us ]

Fastwater Rapids vol. 1.2, 28Aug98
It is easy to collect data about the visitors and customers who come to your web site. Lots of it. The hard part is making use of what you have collected.
In talking with different companies about their data collection we see two issues come up again and again -- issues that suggest why making sense of the data is difficult:
Making sense of these different kinds of customer data is easier if we divide the information into four general categories, or "purposes" for data:
We can make sense of such questions by framing them in terms of our four purposes.
There are dozens of ways to collect this information. At the low end, you can write scripts to aggregate information in logs generated by your HTTP servers. As you pay more for more sophisticated tools, you get heuristics that convert "hits" (calls to the server -- there could be many such interactions involved in displaying a page) to actual "page views." You get fancier reporting tools. You get the ability to monitor over multiple servers in multiple sites.
You would think that, at this level, it would be hard to go wrong -- but its possible. The reason has to do with this business of aggregating and discarding the details. Throwing away details can be a great thing -- it reduces storage space, and, for sites that generate a lot of activity, can greatly reduce the processing time required to generate useful results. But it can also make it difficult to construct an "audit trail" if the numbers are surprising. In talking with companies we hear reports of aggregated data from high-end packages that simply do not match up with logged data that is being kept in parallel. If all you keep is the sums, throwing away the raw data, how do you check your work if something seems odd?
Our advice: be sure to ask for references before buying a high-end package. Talk to these reference sites and ask specifically whether they have been able to audit their results and match it up with log file information.
In short, how can you improve the design of your site to make it more effective for your purposes?
In broad outline, this problem is just a more detailed version of the "General Metrics" case -- you are still not really looking at the behaviors of individual users -- you are interested in "aggregate" information. The difference is that you will have more of this information. One tool vendor tells us of busy sites where the detailed usage information runs in the range of 2-5 gigabytes a day.
One other consideration is that, in order to answer questions about matters such as effective bandwidth and stopped downloads, you need to monitor information not just at the web server level, but also at the network level.
Most vendors of data collection tools have solutions to provide you with navigation details and diagnostics. They approach the problem in slightly different ways: Andromedia reduces and aggregates the information early in the process, resulting in smaller storage requirements; Accrue focuses on their ability to monitor at the network level and to make useful inferences by correlating that information with web server data; net.Genesis distinguishes itself with a "late-binding" approach, saving detailed customer interaction information and providing powerful, heuristic tools to permit different post-hoc analyses of the data.
Going back to our comments from the system administrator early in this article, he noted that only "a few" of his clients "are interested in the paths taken through their site." This is consistent with our observations. We would add that when there is interest in paths through the site, it is usually for pretty simple things, such as looking at how many visitors use a site's search facility.
Our general observation here is that few companies are yet successful in connecting this more detailed, diagnostic class of information to real business issues of interest. There are dozens, probably hundreds of interesting diagnostic questions you could ask about your site, and, given a tool that collects enough fine grained data, there are many things you could scratch your head and wonder about. But which ones matter? You can't approach the "does it matter?" question without reference to business objectives.
We just recently talked with a company that is trying to decide whether to keep advertising on their site. They are succeeding in using their site to sell a good volume of high margin products, tied to their core business. Their success has them wondering whether the much smaller revenues from click-throughs on advertising on their site are worth bothering with. Screen real estate is dear -- would they make more by devoting the space now dedicated to ads to selling core products? THAT is the kind of question that diagnostic tools might answer -- particularly on a site such as theirs that is database driven -- they could literally run an experiment and find out. (For more information, see the Network Economy Practices case study on this company.)
The more sophisticated data collection and analysis tools seem most valuable when they help a company keep moving their site design forward toward their larger business objectives. Most companies aren't there yet. The connections between their business model the details of the site design are still just emerging.
But in such instances you are not looking at what I do when I visit your site. That's a very different matter. To do that you have to collect and maintain information that ties together my activities over time. That's not too difficult if you are only watching my activities over a single visit, but if I buy something, then return another time to buy something more, tying those two visits together means having a database and being able to look me up. How much information do you need to store about me? What I bought? How I got there? What process I followed before buying? What other things I looked at?
Even more important, if you have any faith in marketing at all you're willing to bet that I am an instance of a class of customers. Serving my needs, and increasing the revenues from my purchases, is easier if you understand what other people like me are buying, and how we interact with your site. How do you identify me with the right group? How do you aggregate activities for that group so that you can make useful assumptions about how its members use your products and your website? Are there ways to tie in information gathered from sources other than the website?
These are the really "high-level" uses of customer data. In "part 2" of this article we will look at why collecting and using this data is so difficult and at what can be done to make the job more manageable.
[ Home | What We Do | Our Clients | Press & Events | Library | Contact Us ]