Xerox Patents User Profiling by Web Traffic Analysis
This time is am not going to discuss the merits of this patent. However, I am going to talk about how utterly cool this project is. New Scientist has this article about a new Xerox patent covering technique to recover demographic information like your age, sex and perhaps even your income by analyzing the pattern of web pages you browse. Knowing you by knowing how you surf.
Xerox says it can determine demographic information such as your age, sex and perhaps even your income by analysing the pattern of pages you choose to access on the web and comparing them to a database of surfing patterns from other users with a known background. Xerox suggests that the idea could be used by online merchants and advertisers who want to identify the types
Here is the abstract of the patent:
Demographic information of an Internet user is predicted based on an analysis of accessed web pages. Web pages accessed by the Internet user are detected and mapped to a user path vector which is converted to a normalized weighted user path vector. A centroid vector identifies web page access patterns of users with a shared user profile attribute. The user profile attribute is assigned to the Internet user based on a comparison of the vectors. Bias values are also assigned to a set of web pages and a user profile attribute can be predicted for an Internet user based on the bias values of web pages accessed by the user. User attributes can also be predicted based on the results of an expectation maximization process. Demographic information can be predicted based on the combined results of a vector comparison, bias determination, or expectation maximization process.
The idea is so simple when you think about it. But, it is something I personally find very interesting. There is a lot of interesting science that can be done when data mining web usage traffic. Possible research area.
