Welcome! This repo contains scripts for classifying political ads by ad tone (e.g., contrast, promote, attack).
This repo is part of the Cross-platform Election Advertising Transparency Initiative (CREATIVE). CREATIVE is an academic research project that has the goal of providing the public with analysis tools for more transparency of political ads across online platforms. In particular, CREATIVE provides cross-platform integration and standardization of political ads collected from Google and Facebook. CREATIVE is a joint project of the Wesleyan Media Project (WMP) and the privacy-tech-lab at Wesleyan University.
To analyze the different dimensions of political ad transparency we have developed an analysis pipeline. The scripts in this repo are part of the Data Classification step in our pipeline.
1. Introduction
2. Data
3. Setup
4. Thank you!
This repository contains code that generates two variables. First is called ad tone mention-based, which codes ad as 'contrast, 'promote' or 'attack'. We use the outputs from the entity linking 2022 repo as an input for this. This coding is decided based on who is mentioned in the ad:
- If an ad from a candidate mentions the candidate in the ad and not their opponent: Promote
- If an ad from a candidate mentions their opponent in the ad and not themselves: Attack
- If an ad from a candidate mentions both the candidate and their opponent in the ad: Contrast
Second, we have ad tone constructed, which utilizes results from the mention-based classification, as well as results from ABSA and race of focus repos to code ads as 'contrast, 'promote' or 'attack'. To visualize our decision-making process for ad-tone constructed, consult this diagram. Also see below for more details.
Data output by the scripts is in the format ad_id,ad_tone. An example row looks like x1949505221867086,Promote
.
This repo contains eight R scripts, three that deal with ad tone constructed and five that deal with ad tone mention-based. Of the five scripts related to ad tone mention-based, scripts related to non-2022 data are in a folder called "ad_tone_mentionbased". The scripts that are related to Facebook and Google 2022 are in another folder called "ad_tone_mentionbased_2022". Thus, if you only want to work on the 2022 data, you do not have to run anything in the "ad_tone_mentionbased" folder. All of the a constructed scripts are in "ad_tone_constructed", regardless of whether they are related to 2022 or non-2022 data.
The code in this repository creates two variables, ad tone mention-based, and ad tone constructed. Results are saved as a csv file, in the data folder:
- Mention-based Results for:
- Constructed Results for:
Mention-based (or reference-based) results in ads coded as 'Contrast' if both the candidate and their opponent are mentioned in the ad (either in text, or in image appearance), 'Promote' if only the candidate is mentioned, and 'Attack' if only the opponent is mentioned. If no candidate is mentioned, the ad is coded as 'Support' (given that the basic purpose of an ad is to further the preferred candidate's electoral prospects). This variable is available for the candidate ads in the 1.4m dataset.
The construction of ad tone is based on this flowchart.
When traditional mention-based ad tone is available, we use that; otherwise we sum over ABSA results (also using race of focus). The variable is available for a larger number of ads, and the rest have no ad tone.
First, make sure you have R installed. In addition, while R can be run from the terminal, many people find it much easier to use R-Studio along with R. A link to this program can be found here
The scripts use are tested on R 4.2, 4.3, and 4.4.
Next, make sure you have the following packages installed in R (the exact version we used of each package is listed in the requirements_r.txt file. These are the versions we tested our scripts on and thus, they may also work with more recent versions):
- data.table
- stringr
- purrr
- dplyr
- tidyr
- R.utils
In order to use the scripts in this repo, you will need outputs from a number of other repos. Specifically which repositories are needed depends on which script you are executing.
All the scripts for ad tone mention-based require datasets. In addition, depending on the specific script, various other repos must also be downloaded.
Looking at the scripts within the ad_tone_mentionbased_2022 folder, they all require datasets. In addition, depending on the specific script, various other repos must also be downloaded. Note that for the files hosted on Figshare, you will need to fill out a form before gettig immediate access to the data. Specifically:
- ad_tone_mentionbased_2022/ad_tone_mentionbased_fb2022.R requires the
/entity_linking_2022/facebook/data/detected_entities_fb22_for_ad_tone.csv.gz
file from the entity linking repo, and thefb_2022_adid_var1.csv.gz
file that is found on Figshare. - ad_tone_mentionbased_2022/ad_tone_mentionbased_g2022.R requires the
entity_linking_results_google_2022_notext_combined.csv.gz
file from the entity_linking_2022 repo and theg2022_adid_01062021_11082022_var1.csv.gz
file that is found on Figshare. Once you fill out the form and get access to the files, you will see it under the name g2022 adid var1.
Scripts in the ad_tone_mentionbased folder were not used for 2022 election ads data production are described down below. They are legacy scripts serving similar purposes towards our 2020 TV and online ads data. They are preserved here for internal use.
- ad_tone_mentionbased/ad_tone_heuristic_tv_2020.R requires the
entity_linking_results_tv_2020_for_ad_tone.csv.gz
file of the entity linking repo. - ad_tone_mentionbased/ad_tone_mentionbased_FB_140m.R requires the
race_of_focus_140m.rdata
file of the race of focus repo, entity linking andfb_2020_140m_adid_var1.csv.gz
. Note thatfb_2020_140m_adid_var1.csv.gz
is not currently available, but will be shared once it is. - ad_tone_mentionbased/ad_tone_mentionbased_Google_2020.R requires
race_of_focus_2020.rdata
from the race_of_focus directory,entity_linking_results_google_2020_notext_all_fields.csv.gz
from the entity_linking repo andgoogle_2020_adid_var1.csv.gz
from the datasets repo.
-- Some input files for mention-based scripts require the metadata (e.g., var1 files) for Facebook or Google. These are too large to be uploaded to GitHub. You can download them through our Figshare page:
- For Facebook 2022 script: data_post_production/fb_2022_adid_var1.csv.gz.
- For Google 2022 script: data_post_production/g2022_adid_01062021_11082022_var1.csv.gz.
Pre-2022 data production:
- For Facebook 2020 script: fb_2020/fb_2020_140m_adid_var1.csv.gz. Note that
fb_2020_140m_adid_var1.csv.gz
is not currently available, but will be shared once it is. - For Google 2020 script: google_2020_adid_var1.csv.gz
Looking at the scripts within the ad_tone_constructed folder, they require:
- ad_tone_constructed/ad_tone_constructed_fb2022.R requires the
fb2022_ABSA_pred.csv.gz
file from the ABSA repo, as well as therace_of_focus_fb2022.rdata
file from the race_of_focus repo. - ad_tone_constructed/ad_tone_constructed_g2022.R requires the
google_2022_ABSA_pred.csv.gz
file from the ABSA repo, as well as therace_of_focus_google_2022.rdata
file from the race_of_focus repo.
Legacy scripts for pre-2022 data production, preserved here for internal use:
- ad_tone_constructed/ad_tone_constructed_fb140m.R requires the
140m_ABSA_pred.csv.gz
file from the ABSA repo, as well as therace_of_focus_140m.rdata
file from the race_of_focus repo.
In addition, all scripts within the ad_tone_constructed folder require the ad tone mention-based results (see above).
Now, depending on which variable and what data you are interested in analyzing, and you can run the script you want accordingly. For example, to do the mention-based classification for Facebook 2022 data, run ad_tone_mentionbased_fb2022.R.
Running the scripts through the terminal would look like this
cd ad_tone_mentionbased_2022
Rscript ad_tone_mentionbased_fb2022.R
and can also alternatively be done through the RStudio interface.
We would like to thank our supporters!
This material is based upon work supported by the National Science Foundation under Grant Numbers 2235006, 2235007, and 2235008. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
The Cross-Platform Election Advertising Transparency Initiative (CREATIVE) is a joint infrastructure project of the Wesleyan Media Project and privacy-tech-lab at Wesleyan University in Connecticut.