This is a repository of scripts and data required to replicate various datasets of political speeches, including:
- Hansard in full: Parliamentary speeches matched to individual MPs with party and biographic information, using the TheyWorkForYou database.
- Hansard PMQs: maiden speeches given by new British MPs from 1945 onwards, using debate transcripts scraped from Hansard - includes speech content, date.
- Conference speeches: leaders' speeches at party conferences, available at BritishPoliticalSpeech.org, scraped into a dataframe and cleaned - includes speech content, politicians' names, party, year, location, commentary on the speech, and tags.
- Manifesto forewords: uses the Manifesto Project archive to extract forewords to British election manifestos - written by party leaders from Labour, Conservatives, Lib Dems, SNP, and UKIP/Brexit Party - from 1983 to present.
- Local election leaflets: based on the Election Leaflets archive, using optical character recognition to convert images of local election leaflets to plain text, then annotating with scraped data about constituencies, parties, and candidates. [Under development]
- Party press releases: press releases scraped from party websites for Labour and the Conservatives. [Under development]