Deactivation of unwelcomed deep web extraction services through random injection
Abstract
Websites serve content both through Web Services as well as through user-viewable webpages. While the consumers of web-services are typically 'machines', webpages are meant for human users. It is highly desirable (for reasons of security, revenue, ownership, availability etc.) for service providers that content that will undergo further processing be fetched in a prescribed fashion, preferably through a supplied Web Services. In fact, monetization of partnerships within a services ecosystem normally means that website data translate into valuable revenue. Unfortunately, it is quite commonplace for arbitrary developers to extract or leverage information from websites without asking for permission and or negotiating a revenue sharing agreement. This may translate to significant lost income for content providers. Even in cases where website owners are happy to share the data, they may want users to adopt dedicated Web Service APIs (and associated API-servers) rather than putting a load on their revenue-generating websites. In this paper, we introduce a mechanism that disables automated web scraping agents, thus forcing clients to conform to the provided Web Services. © 2009 IEEE.