A scalable architecture for real-time analysis of microblogging data
As events take place in the real world, e.g., sports games and marketing campaigns, people react and interact on online social networks (OSNs), especially microblog services such as Twitter, generating a large stream of data. Analyzing this data presents an opportunity for researchers and companies to better understand human behavior (both on the network and in real life) during the event's lifespan. Designing automated systems to conduct these analyses in fractions of minutes (or even seconds) is subjected to many challenges: the volume of data is large, the number of posts in future events cannot be predicted, and the system need to be always available and running smoothly to avoid information loss and delays on delivering the analytics results. In this paper, we present a scalable architecture for real-time analysis of microblogging data, with the ability to deal with large volumes of posts, by considering modular parallel workflows. This architecture, which has been implemented on the IBM InfoSphere Streams platform, was tested on a real-world use case to conduct sentiment analysis of Twitter posts during the games of the 2013 Fédération Internationale de Football Association (FIFA) Confederations Cup, and the system has successfully coped with the challenges of this task.