Analyzing Twitter data of the 20th German Parliament (also known as "Bundestag") to identify patterns in retweeting behavior.
The data from Twitter was collected in May 2022 via the official Twitter-API for researchers.
Find the results of the research in Content_Sharing_Results.pdf.
The most important tables (stored as csv-files in the data
-folder) are mdb_list
, user_list
, tweet_list
and user_friendships
.
mdb_list
contains all politicians that are/were part of the 20th German Parliament.user_list
contains all politicians that have a Twitter account (even if the profile is not actively used). The additional account information was downloaded via the Twitter API. In the column api_call the time when the request was made is logged.tweet_list
contains the max. 3200 latest tweets by a user. This includes also retweets, quotes and replies. For a better scraping experience, the tweets of each user were stored in individual files. To work with all tweets, the files can be merged together into one table.user_friendships
contains all "friendships" between the users. As Twitter allows uni-directional und bi-directional "friendships", two users always appear twice in the list (e.g. X;Y;TRUE and Y;X;FALSE means the "friendship" is uni-directional).
This folder contains all scripts for the data collection and transformation. The config.py
script is needed to load the environment variables to be able to access the Twitter API. The files are listed in the order they need to be executed. Otherwise the output-files may contain errors.
- Create a Twitter account and get access to the API v2 and v1.1.
- Create a
.env
file in the root directory with the keysAPI_KEY
,API_SECRET_KEY
,ACCESS_TOKEN
,ACCESS_SECRET
andBEARER_TOKEN
. - Run the script
config.py
to load the environment variables from the.env
file.
mdb_collection
: Reads the XML file which was downloaded from the Bundestag website and transforms it into a table of politicians that were/are part of the 20th German Parliament. The results are stored inmdb_list.csv
.mdb_twitter_collection.py
: Loads the information about the Twitter usernames of the politicians fromtwitter_usernames.csv
and appends it to the results ofmdb_list.csv
. They are joined via the bundestag_id. The result is stored inmdb_twitter_list.csv
.
user_collection.py
: Readsmdb_twitter_list.csv
and enriches the table with the Twitter profile information. The API is accessed via the twitter_handle (Twitter username). The results are stored inuser_list.csv
and only contain entries of politicians with a Twitter account.tweet_collection.py
: Readsuser_list.csv
to get the twitter_id's of the accounts and then checks in the foldertweet_list
if the tweets of the specific account were already downloaded. If not, it downloads the last 3200 tweets of the account in one file and stores it in the formattweet_list_TWITTER_ID_TWITTER_HANDLE.csv
.user_friendship_collection.py
: Reads the filesuser_list.csv
anduser_friendships.csv
. Then it goes through the account list and checks if the friendship between two users was already checked. Friendship means if the two users follow each other or only one account follows the other account. If the friendship was not checked before, it calls the API and stores the result directly inuser_friendships.csv
.
retweeter_collection.py
: Checks all tweet-files in the foldertweet_list
and finds out which retweets/quotes are related to tweets in this list of tweets. In other words, it finds out which politician quoted/retweeted a tweet of another politician. The results are then stored inretweet_list.csv
andquote_list.csv
. Users can retweet/quote their own tweets as well.
This folder contains the scripts und jupyter notebooks for preparing the data for the tool Gephi that is used to visualize the graphs/networks.
This folder contains the jupyter notebooks for analysing the collected data. A summary of the results can be found in the PDF file Content_Sharing_Results.pdf.