QUALITATIVE_ANALYSIS_PROJECT

A Python-based toolkit for qualitative data analysis using Large Language Models (LLMs).

This toolkit provides two modes for automatic annotation of qualitative data:

No-Code Mode → Use the interactive Streamlit web app.
Low-Code Mode → Use the provided Jupyter/Colab notebooks for more customizable workflows.

Running without installation

Use the streamlit server

Click the link below to run the web app:

Run the Qualitative Analysis App

Click the link below to run the manual annotator:

Run the Manual annotator

Run in Google Colab

Click the badge below to run the notebooks directly in Google Colab:

Starter notebook:

Sequential analysis notebook:

Run Locally (Full Control)

If you prefer to run the analysis directly on your machine, follow those installation steps.

Clone the repository.

git clone https://github.com/your-username/qualitative_analysis_project.git
cd qualitative_analysis_project

Create a Virtual Environment

conda create -n qualitative_analysis python=3.10
conda activate qualitative_analysis

Install the required packages:

pip install -r requirements.txt

Put your API credential.

Copy or rename .env.example to .env. Populate it with your LLM credentials (e.g., Azure or Together keys, endpoints).

Example:

AZURE_API_KEY="your_azure_api_key"
AZURE_OPENAI_ENDPOINT="https://your-endpoint.openai.azure.com/"
AZURE_API_VERSION="2023-05-15"

If you’re using Together, set:

TOGETHER_API_KEY="your_together_api_key"

Usage

The project offers two primary usage modes:

Streamlit app: Use the interactive GUI for classification workflows.
Notebooks: Run classification workflows in Jupyter Notebooks.

Usage 1: Streamlit app

streamlit run app.py

This launches a browser-based interface to upload data, configure LLMs, run analyses, and download results.

Usage 2: Notebooks

The notebooks contain the classification workflows for each criterion:

Project Structure

QUALITATIVE_ANALYSIS_PROJECT
├── codebook
│   ├── binary_codebook.txt
│   └── multiclass_codebook.txt
├── data
│   ├── outputs/
│   ├── multiclass_KA.csv
│   ├── multiclass_MC.csv
│   ├── multiclass_sample_chem.csv
│   ├── multiclass_sample.csv
│   └── sequential_binary_sample.csv
├── notebooks
│   ├── notebook_binary.ipynb
│   ├── notebook_multiclass.ipynb
│   └── notebook_sequential_binary.ipynb
├── qualitative_analysis
│   ├── __init__.py
│   ├── config.py
│   ├── cost_estimation.py
│   ├── data_processing.py
│   ├── evaluation.py
│   ├── model_interaction.py
│   ├── notebooks_functions.py
│   ├── parsing.py
│   └── prompt_construction.py
├── .env
├── .gitignore
├── .pre-commit-config.yaml
├── app.py
├── mypy.ini
└── README.md

Main Files and folders

`app.py`

A Streamlit app providing a GUI workflow for uploading data, configuring LLMs, classifying text, and optionally comparing with external judgments.

`notebooks/`

Contains Jupyter notebooks demonstrating how to use the library for:

Binary classification
Multiclass classification
Sequential binary classification

`data/`

Holds CSV samples (for various classification scenarios), plus an outputs/ subfolder where processed results can be saved.