EU Elections 2014 Prediction dataset
This is the dataset used in (Tsakalidis et al., 2015) on predicting election results using twitter and polls. The dataset contains the ids of the tweets and the poll data used to build the prediction models. You can download the dataset here.
If you use this dataset in your research, please cite the following article:
A. Tsakalidis, S. Papadopoulos, A.I. Cristea, I. Kompatsiaris. “Predicting Elections for Multiple Countries Using Twitter and Polls”. IEEE Intelligent Systems (to appear in 2015)
The zip file has the following structure:
keywords.txt contains the keywords used as inputs tot the Twitter Streaming API to perform the collection. The three csv files contain the ids of the collected tweets.
The “polls.ods” file provides the information about the polls that we used during our processing (one sheet per country).
Every arff file contains the features that we extracted for every country on a daily basis. The attribute corresponding to the poll-based value of every party is indicated by the name of the party. The Twitter-based features are provided in the form of “feature_i”, where “feature” corresponds to the Twitter-based feature and “i” is a pointer.
The mapping of parties to this index “i” is the following:
The Netherlands (“nl.arff”)
7: CU (Notice that since “CU/SGP” was a coalition, the features corresponding to both of these indices –7 and 8– were used as an input for the prediction of “CU/SGP” voting share)
8: SGP (Notice that since “CU/SGP” was a coalition, the features corresponding to both of these indices –7 and 8– were used as an input for the prediction of “CU/SGP” voting share)
The “lefko” attribute in the case of Greece corresponds to the “blank voters” of the polls and was only presented in the case of Greece. It was not used at all in any experiment (neither pre- nor post-electoral).