[ Home | What We Do | Our Clients | Press & Events | Library | Contact Us ]


Collecting and Using Customer Data - Part 2

Fastwater Rapids vol. 1.8, 19Oct98

by Bill Zoellick

There is a tendency to think of website data as if it were just one thing, as when someone asks, "What are we doing with the stuff we collect from the website?" Thinking of web data in this way can make it harder to use. There are distinctly different kinds of data that you can collect from your website; making effective use of the data starts with understanding the differences.

In part one of this article on using visitor and customer data we suggested that you could usefully divide your web site data into four different categories:

In the first part of this article we looked at the issues associated with collecting aggregate site data and site performance diagnostics. In this second part we look more closely at individual visitor and customer data.

The Advantages of Being a Web Business

In the past, getting information about your customers was expensive. It often involved survey research that sampled the customer population. Or you had to set up membership programs or buying clubs to learn more about important, regular customers. This was, for example, the original motivation behind the creation of "frequent flyer" programs by airlines years ago. Even with such programs, additional surveys were required to learn what advertisements were bringing customers to your business, or what other things they looked at in your store besides the ones they purchased.

The web greatly reduces the cost of collecting these kinds of data. These advantages apply not only to consumer businesses, but also to businesses selling to other businesses. Whether you are selling research chemicals, office supplies, electronics components, or some other product to a diverse, heterogeneous set of businesses, your web site can potentially give you a great deal of information about your customers' needs and buying patterns. This week we will take a look at a couple of the things you can do with this kind of information.

Personalization and One to One Selling

One of the most interesting, powerful, and widely proclaimed capabilities of the web is that it gives you the ability to deliver exactly the information or promotion that a visitor to your site is interested in. Enabling this kind of custom delivery requires that you combine the basic elements illustrated in the figure below. You need:
The systems that companies use to do this selection and delivery process range from the very simple to the very sophisticated and complex. For example, one online magazine subscription vendor (see our case study) has found that it can use a fairly simple recommendation scheme, based on fixed rules, with good results. For example, if a customer buys Car and Driver, it often turns out to be worthwhile to suggest that he or she might also be interested in Road and Track. In this simple case, the individual data is the fact that the customer has just made a purchase, the group data is observations that the vendor has made over time, and the classify/connect process is a simple rule: "If someone buys Car and Driver, send an offer for Road and Track." The delivery mechanism in this company's case is Vignette's StoryServer.

This magazine system is a very simple instance of the model illustrated in the picture above. The reasons that such a simple system can work are:

Things are often more complicated. For instance, a bookseller like Amazon has to work with millions of book titles, not just a few hundred magazines. The combinations of relationships are just too numerous to guess at and build by hand. Or consider a site selling digital cameras -- if a customer buys an Olympus camera, it is a safe bet that he or she is interested in cameras, but it would not make any sense at all to offer the buyer a Nikon camera as well. But more complicated "cross-sell" possibilities might be interesting, if there were a way to make them. It might turn out, for example, that camera buyers are good bets for wristwatches. How would you know?

Recommendation Engines

There are products, referred to as "Recommendation Engines" or "Collaborative Filters," that enable web businesses to offer one-to-one personalization even when there are many product choices or when the cross-selling opportunities are not intuitively obvious. As a potential buyer or user of such tools, it is useful to have at least a basic idea of what they do and how they work -- in part because it will help you understand what they don't do. A simple example will help.

Imagine that we knew the library classification numbers for the last three books that you have read. We could create a three dimensional graph, with library classification numbers on each axis, and fix a point in the graph space that identified your recent reading choices. It would be a pretty safe bet that other people whose points were close to yours shared similar reading interests. If we knew the titles of the books that they read, it would probably be worthwhile to recommend those titles to you. We would have a more powerful, flexible way of conducting the classify/connect process in the center of the diagram than is provided by a simple rule such as "recommend Road and Track if they buy Car and Driver."

Net Perceptions and LikeMinds sell software that does this sort of thing in a much more sophisticated way than our simple catalog number system, dealing with different kinds of information all at once, and with variables that don't scale as cleanly as our library catalog numbers. They also work in spaces that have many more than just three dimensions. But the general idea is that same: by collecting information about you, they can identify a group of other buyers that probably share at least some of your interests and buying preferences. If the online vendor knows the sort of things that others in the group bought, there is a good chance that you will be interested in those same things. This is one way of knowing that it might be useful to recommend a wristwatch to a camera buyer.

Recommendation engines are attractive for website use for a number of reasons that go beyond their ability to deal with disparate kinds of input data (e.g., purchases, stated preferences, site visit patterns, search terms, non web data) and different kinds of product recommendations. For one thing, they can operate unobtrusively. The user does not necessarily have to fill out a profile or provide other information about likes and preferences; the engine can just go to work on data collected as the user moves through the site and looks at things. Another attractive feature is that they can begin to make recommendations or target advertising right away, during a prospect's initial visit to a site.  Clearly, more history (more precise location on the graph) results in better recommendations and predictions, but it is possible to do useful things with even a little bit of information. Finally, recommendation engines can work in real time, which is why they are capable of performing the "classify/connect" processing in the center of the diagram pictured above.

The Problem of Really Understanding Your Customers

The computer driving the recommendations on your website doesn't have to "understand" anything to make good recommendations. All it needs to know is that there is a high probability of correlation between your interests and the interests of others who have been there before you.

If you have ever had a statistics course, then you certainly have heard someone explain that correlation doesn't "mean" anything -- and certainly does not imply cause. The classic example, as I remember it, is that if you surveyed hundreds of communities of all sizes you would probably find that the size of the churches in a town correlates with the number of bars And the meaning is ...?

The same problems of meaning apply to the operations of recommendation engines. The way that the customers and visitors cluster does not necessarily connect with anything that you can use to better understand your business, make better decisions, or make better plans. The ability to make good recommendations, as valuable as it is, is just statistics. It doesn't automatically mean anything, or have any use beyond the recommendation to the individual customer. Having gigabytes of information about your customers and not being able to use it to make business decisions is frustrating. As one vendor of personalization tools put it: "Our clients ask us, 'You serve my customers very well, but how do you serve me?'" Serving the people running the business as well as the customer implies getting the broader understanding identified at the bottom of the picture below, and requires that you add something more to the work of the recommendation engine.

The "something more" is segmentation. You have to be able to turn the groupings and clusterings of customers collected by the recommendation engine into groupings that mean something to you. It's one thing to know that there are a bunch of customers with apparently similar interests who buy a lot of products. It is quite another to discover that those customers happen to live in the Northeast, have incomes of between $100,000 and $150,000 a year, are professionals, and like to sail. When you understand the real dimensions of a customer segment, you are much better able to target advertising, devise new product offers, and anticipate changes in growth. That kind of insight requires moving beyond the fact of the correlation to interpretation and identification of the segment.

There are two basic ways to approach this. In practice, the two ways are often used together. The first approach starts with some a priori notions of what the meaningful divisions might be. You might decide that the gender of the buyer matters, for example, or the age, or the income range. Or you might try to tie the buying to a more sophisticated, psychographic classification scheme. For a good example of how such a system works, you can visit the SRI site and look at the VALS segmentation system, which classifies buyers into categories such as "actualizers," "fulfilleds," "makers," "strivers," and so on. The SRI site even gives you a test you can take to see how you would be classified. These predetermined segments have characteristics developed out of extensive consumer research -- for example, "Fulfilleds" are likely to have a swimming pool in their backyard and to own spreadsheet software The great advantage of working with such pre-defined categories is that you can connect your product with very different things, such as reading habits or entertainment preferences of the group. In short, because the segmentation is general, you can fit your product into a bigger picture. There are a number of companies (e.g., net.Genesis, Andromedia, and NetPerceptions) who are beginning to offer the capability to connect and compare the visitor data from your site with data from larger populations, such as the broad base of data that Engage has assembled.

The second approach to analysis looks just at the visitors to your web site. It can grow from the data collected to support the recommendations and one-to-one personalization. Let's go back to our example of the three dimensional grid built from the library classifications of recently read books. If you collected and graphed these data for a great many customers, you would have a three dimensional space in which there floated "clouds" of clustered points. The clouds would not all be spherical, but would fall into cigar shapes and other patterns representing the different distributions of reading preferences. To turn the clouds into something with meaning, you could begin to move and rotate the axes of the graph through the space so that they did a better job of running lengthwise through some of the bigger clouds. This would allow you to begin to name the axes, assigning meaning to the dimensions. For example, you might find that the position of an axis expressed the propensity for reading fiction rather than non-fiction, or of the tendency to read history books. You could even begin to change the angles between the axes so that they were no longer perpendicular (orthogonal) to each other, in order to better fit the clouds. In doing that, you would be saying that the dimensions were not completely independent of each other -- a situation likely to be true, in fact.

What we are describing here is an intuitive look at what a market segmentation product such as Personify does. NetPerceptions, one of the leading vendors of recommendation engines, is also moving in the direction of such offerings. The important points for business people are that:

This last point is particularly important. Too often, companies dismiss this kind of careful market segmentation as applicable only to consumer products. Not so. For example, a company selling electronics components to a variety of different companies could very usefully do this kind of segmentation to strengthen cross-sell opportunities and do a better job of targeting advertising to drive new prospects to the site.

Putting the Pieces Together

Let's summarize the points we have covered here; the whole reason for trying to discuss the different uses of customer and visitor data is that the issues tend to overlap and can be confusing. A summary will help pull the threads together.

First, we noted that your website provides you with access to information about visitors, prospects, and customers in more detail, at less cost, than is typically possible in other business environments. One important use of this information is the ability to target ads or product suggestions much more effectively; it is possible to provide visitors with information and offers that they really want to see. Products such as Andromedia's LikeMinds and NetPerceptions, that provide collaborative filtering, are key components for enabling such one-to-one delivery.

But collaborative filtering provides information to your visitors and customers, not to you. To actually use visitor information to make better decisions about advertising, about offers, and to enable cross selling, you need to identify and understand the different groups of customers that you are serving. In other words, you need to develop an effective segmentation for your market. There are two ways to do this; the methods can be used in combination with one another. One method depends on connecting your customers and prospects to larger demographic and psychographic databases. Site measurement companies like Andromedia, net.Genesis, and NetPerceptions are working with data aggregators such as Engage to begin providing these capabilities.

The second method depends on doing a segmentation analysis that works just on the data collected on your own web site. Because this approach focuses specifically on the dimensions in your market, without necessarily referring to larger consumer populations, it can be useful for business to business applications as well as consumer applications. Personify is currently the principal source of this kind of marketing analysis capability for the web; NetPerceptions is intending to offer such products in the future.

Our focus this week has been on the products in the center of the diagram, the ones that classify visitors and make connections. As the diagram suggests, implementing one-to-one delivery involves more than collaborative filtering -- you also need a way to deliver the advertisement, make the cross-sell offer, or suggest the additional product. In a coming issue we will take a look at the techniques and products used on the delivery end of the process.


Previous: Part 1
 

[ Home | What We Do | Our Clients | Press & Events | Library | Contact Us ]