A Python-based toolkit for qualitative data analysis using Large Language Models (LLMs).
This toolkit provides two modes for automatic annotation of qualitative data:
- No-Code Mode → Use the interactive Streamlit web app.
- Low-Code Mode → Use the provided Jupyter/Colab notebooks for more customizable workflows.
Click the link below to run the web app:
Run the Qualitative Analysis App
Click the link below to run the manual annotator:
Click the badge below to run the notebooks directly in Google Colab:
If you prefer to run the analysis directly on your machine, follow those installation steps.
- Clone the repository.
git clone https://github.com/your-username/qualitative_analysis_project.git
cd qualitative_analysis_project
- Create a Virtual Environment
conda create -n qualitative_analysis python=3.10
conda activate qualitative_analysis
- Install the required packages:
pip install -r requirements.txt
- Put your API credential.
Copy or rename .env.example
to .env
. Populate it with your LLM credentials (e.g., Azure or Together keys, endpoints).
Example:
AZURE_API_KEY="your_azure_api_key"
AZURE_OPENAI_ENDPOINT="https://your-endpoint.openai.azure.com/"
AZURE_API_VERSION="2023-05-15"
If you’re using Together, set:
TOGETHER_API_KEY="your_together_api_key"
The project offers two primary usage modes:
- Streamlit app: Use the interactive GUI for classification workflows.
- Notebooks: Run classification workflows in Jupyter Notebooks.
streamlit run app.py
This launches a browser-based interface to upload data, configure LLMs, run analyses, and download results.
The notebooks contain the classification workflows for each criterion:
QUALITATIVE_ANALYSIS_PROJECT
├── codebook
│ ├── binary_codebook.txt
│ └── multiclass_codebook.txt
├── data
│ ├── outputs/
│ ├── multiclass_KA.csv
│ ├── multiclass_MC.csv
│ ├── multiclass_sample_chem.csv
│ ├── multiclass_sample.csv
│ └── sequential_binary_sample.csv
├── notebooks
│ ├── notebook_binary.ipynb
│ ├── notebook_multiclass.ipynb
│ └── notebook_sequential_binary.ipynb
├── qualitative_analysis
│ ├── __init__.py
│ ├── config.py
│ ├── cost_estimation.py
│ ├── data_processing.py
│ ├── evaluation.py
│ ├── model_interaction.py
│ ├── notebooks_functions.py
│ ├── parsing.py
│ └── prompt_construction.py
├── .env
├── .gitignore
├── .pre-commit-config.yaml
├── app.py
├── mypy.ini
└── README.md
A Streamlit app providing a GUI workflow for uploading data, configuring LLMs, classifying text, and optionally comparing with external judgments.
Contains Jupyter notebooks demonstrating how to use the library for:
- Binary classification
- Multiclass classification
- Sequential binary classification
Holds CSV samples (for various classification scenarios), plus an outputs/
subfolder where processed results can be saved.
The main Python package, housing modules
Contains text files defining classification rules or codebooks (binary/multiclass).
.env
– Environment variables for sensitive credentials (e.g., API keys, endpoints)..pre-commit-config.yaml
– Config for pre-commit hooks (linting, formatting, etc.).mypy.ini
– Configuration for static type checks (mypy).
-
Coding Style: This repo uses type hints and docstrings (see
mypy.ini
for static checks). -
Pre-Commit Hooks: Use
.pre-commit-config.yaml
for linting/formatting. Install pre-commit hooks:pre-commit install