lecture: Collection and clustering of 6,900 tweets censored in European countries in 2017
A previous study about Twitter censorship in Turkey in 2015 showed that the numbers of withheld tweets…
A previous study about Twitter censorship in Turkey in 2015 showed that the numbers of withheld tweets reported by Twitter are one order of magnitude below the actual numbers: while Twitter reported 4,000 censored tweets, the authors collected 88,000 unique censored tweets (https://www.cs.rice.edu/~rst5/twitterTurkey/).
In order to measure the number of censored tweets globally in Europe from June to August 2017, I collected tweets originating from the 50 largest European cities - including Russia and Turkey - and checked them 3 hours later to see if they were censored.
+5,000 tweets censored in France, Germany, Russia, or Turkey were collected. Meanwhile Twitter reported 1,200 tweets censored in those 4 countries, thus confirming that the actual number of censored tweets is way above the reported numbers. In the censored tweets, we can also notice country specific patterns linked to geography, politics, etc. Using text mining and clustering, we can regroup tweets, identify topics, and further understand the mechanics behind governmental censorship.
The software can easily be modified to collect and analyze censored tweets from other cities and thus could be used in further studies around the world. It will be published as an R package with GPL license in 2018.