SpringerOpen Newsletter

Receive periodic news and updates relating to SpringerOpen.

Open Access Open Badges Regular article

Beating the news using social media: the case study of American Idol

Fabio Ciulla1*, Delia Mocanu1, Andrea Baronchelli1, Bruno Gonçalves1, Nicola Perra1 and Alessandro Vespignani123

Author Affiliations

1 Laboratory for the Modeling of Biological and Socio-technical Systems, Northeastern University, Boston, MA, 02115, USA

2 Institute for Scientific Interchange Foundation, Turin, 10133, Italy

3 Institute for Quantitative Social Sciences, Harvard University, Cambridge, MA, 02138, USA

For all author emails, please log on.

EPJ Data Science 2012, 1:8  doi:10.1140/epjds8

Published: 31 July 2012


We present a contribution to the debate on the predictability of social events using big data analytics. We focus on the elimination of contestants in the American Idol TV shows as an example of a well defined electoral phenomenon that each week draws millions of votes in the USA. This event can be considered as basic test in a simplified environment to assess the predictive power of Twitter signals. We provide evidence that Twitter activity during the time span defined by the TV show airing and the voting period following it correlates with the contestants ranking and allows the anticipation of the voting outcome. Twitter data from the show and the voting period of the season finale have been analyzed to attempt the winner prediction ahead of the airing of the official result. We also show that the fraction of tweets that contain geolocation information allows us to map the fanbase of each contestant, both within the US and abroad, showing that strong regional polarizations occur. The geolocalized data are crucial for the correct prediction of the final outcome of the show, pointing out the importance of considering information beyond the aggregated Twitter signal. Although American Idol voting is just a minimal and simplified version of complex societal phenomena such as political elections, this work shows that the volume of information available in online systems permits the real time gathering of quantitative indicators that may be able to anticipate the future unfolding of opinion formation events.