Skip to content

Commit

Permalink
Merge branch 'main' into fed_stats2
Browse files Browse the repository at this point in the history
  • Loading branch information
ZiyueXu77 authored Feb 22, 2025
2 parents dcce11c + 8a9d67f commit 129ece4
Show file tree
Hide file tree
Showing 55 changed files with 4,507 additions and 297 deletions.

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "1e333915-6759-4d82-9f83-2bccc42ca047",
"metadata": {},
"source": [
"# Introduction: Federated Language Models - from NLP to LLM"
]
},
{
"cell_type": "markdown",
"id": "c96f5007-8c62-4d61-bd97-6b31a3f5b0db",
"metadata": {},
"source": [
"In this chapter, we will explore the federated learning applications on language models.\n",
"\n",
"Natural Language Processing (NLP) is a subfield of artificial intelligence, focuses on enabling computers to process and analyze natural language data. Recently, Large Language Models (LLMs) have emerged as a transformative force in the field of NLP, enabling AI to understand, generate, and interact with human language at an unprecedented scale. Models such as BERT and GPT is able to leverage vast amounts of text data and deep learning techniques to perform various linguistic tasks, including text generation, translation, summarization, and question-answering.\n",
"\n",
"The development of LLMs relies on robust training schemes that enable these models to capture linguistic structures, contextual dependencies, and semantic meanings. Common training methodologies include unsupervised pretraining on large text corpora, followed by further fine-tuning using supervised (supervised finetuning - SFT) or reinforcement learning (reinforcement learning from human feedback - RLHF) approaches, refining their capabilities for practical applications with human interactions.\n",
"\n",
"Further, when adapting to a particular downstream task, instead of making updates to all model parameters as SFT/RLHF which can be computationally expensive and memory-intensive, Parameter-Efficient Fine-Tuning (PEFT) methods have emerged as a more efficient approach. Techniques such as Low-Rank Adaptation (LoRA), P-Tuning, and Adapter Layers enable fine-tuning by updating only a small subset of parameters, significantly reducing computational costs while maintaining performance. "
]
},
{
"cell_type": "markdown",
"id": "4fb8b73a-5058-44fe-8ecc-e355ef7178fb",
"metadata": {},
"source": [
"In the following sections, we will start with federated learning using a smaller-scale BERT model, then we extend our study to more recent open-source LLMs and their SFT and PEFT in a federated finetuning scheme. And finally to address a major challenge in federated LLM training - communication efficiency, we further visit potential solutions including quantization and streaming, and we will conclude with a recap of the covered topics."
]
},
{
"cell_type": "markdown",
"id": "9bffcb04-6839-4463-b4a5-e1792024adce",
"metadata": {},
"source": [
"8.1. **Federated BERT**\n",
"\n",
"Task-specific model training with BERT in a federated setting\n",
"* [Federated NLP with BERT Model](../08.1_fed_bert/federated_nlp_with_bert.ipynb)\n",
"\n",
"8.2. **Federated LLM Training with SFT**\n",
"\n",
"Supervised Fine-Tuning and its role in adapting LLMs in federated learning\n",
"* [Federated LLM Tuning with SFT](../08.2_llm_sft/LLM_SFT.ipynb)\n",
"\n",
"8.3. **Federated LLM Training with PEFT**\n",
"\n",
"Importance of PEFT in adapting LLMs for specific tasks, which can be achieve in a federated setting\n",
"* [Federated LLM Tuning with PEFT](../08.3_llm_peft/LLM_PEFT.ipynb)\n",
"\n",
"8.4. **Model Transmission with Quantization**\n",
"\n",
"One major hurdle of adapting LLMs in federated learning is the significant communication burden when performing federated SFT. To reduce the message size, quantization method can be applied as filters.\n",
"* [Model Quantization for Transmission](../08.4_llm_quantization/LLM_quantization.ipynb)\n",
"\n",
"8.5 **Model Transmission with Streaming**\n",
"\n",
"While quantization reduced communication cost, system memory requirement is still high for prepareing the message on either side. Therefore, we enabled streaming capabilities for more efficient and robust model communication.\n",
"* [Message Streaming for Model Transmission](../08.5_llm_streaming/LLM_streaming.ipynb)\n",
"\n",
"8.6. **Recap**\n",
"\n",
"[Recap](../08.6_recap/recap.ipynb) for federated LLM applications and features"
]
},
{
"cell_type": "markdown",
"id": "f8864f21-ce74-4adf-8b5b-240879424424",
"metadata": {},
"source": [
"Let's get started with [Federated NLP with BERT Model](../08.1_fed_bert/federated_nlp_with_bert.ipynb)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0ceaab09-f41e-41e4-8ecd-a784328b468a",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.13.2"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
# Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import argparse
import os

import pandas as pd
import torch
from seqeval.metrics import classification_report
from src.data_sequence import DataSequence
from src.nlp_models import BertModel, GPTModel
from torch.utils.data import DataLoader

os.environ["TOKENIZERS_PARALLELISM"] = "False"


def data_split_args_parser():
parser = argparse.ArgumentParser(description="Perform model testing by loading the best global model")
parser.add_argument("--data_path", type=str, help="Path to data file")
parser.add_argument("--model_path", type=str, help="Path to workspace server folder")
parser.add_argument("--num_labels", type=int, help="Number of labels for the candidate dataset")
parser.add_argument("--model_name", type=str, default="bert-base-uncased", help="Model name")
return parser


if __name__ == "__main__":
parser = data_split_args_parser()
args = parser.parse_args()
device = torch.device("cuda")

model_path = args.model_path
data_path = args.data_path
num_labels = args.num_labels
model_name = args.model_name
ignore_token = -100

df_test = pd.read_csv(os.path.join(data_path, "test.csv"))
# label and id conversion
labels = []
for x in df_test["labels"].values:
labels.extend(x.split(" "))
unique_labels = set(labels)
labels_to_ids = {k: v for v, k in enumerate(sorted(unique_labels))}
ids_to_labels = {v: k for v, k in enumerate(sorted(unique_labels))}

# model
if model_name == "bert-base-uncased":
model = BertModel(model_name=model_name, num_labels=num_labels).to(device)
elif model_name == "gpt2":
model = GPTModel(model_name=model_name, num_labels=num_labels).to(device)
else:
raise ValueError("model not supported")
model_weights = torch.load(os.path.join(model_path, "best_FL_global_model.pt"))
model.load_state_dict(state_dict=model_weights["model"])
tokenizer = model.tokenizer

# data
test_dataset = DataSequence(df_test, labels_to_ids, tokenizer=tokenizer, ignore_token=ignore_token)
test_loader = DataLoader(test_dataset, num_workers=4, batch_size=64, shuffle=False)

# validate
model.eval()
with torch.no_grad():
total_acc_test, total_loss_test, test_total = 0, 0, 0
test_y_pred, test_y_true = [], []
for test_data, test_label in test_loader:
test_label = test_label.to(device)
test_total += test_label.shape[0]
mask = test_data["attention_mask"].squeeze(1).to(device)
input_id = test_data["input_ids"].squeeze(1).to(device)
loss, logits = model(input_id, mask, test_label)

for i in range(logits.shape[0]):
# remove pad tokens
logits_clean = logits[i][test_label[i] != ignore_token]
label_clean = test_label[i][test_label[i] != ignore_token]
# calcluate acc and store prediciton and true labels
predictions = logits_clean.argmax(dim=1)
acc = (predictions == label_clean).float().mean()
total_acc_test += acc.item()
test_y_pred.append([ids_to_labels[x.item()] for x in predictions])
test_y_true.append([ids_to_labels[x.item()] for x in label_clean])
# metric summary
summary = classification_report(y_true=test_y_true, y_pred=test_y_pred, zero_division=0)
print(summary)
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import argparse

from src.nlp_models import BertModel, GPTModel

from nvflare.app_common.widgets.intime_model_selector import IntimeModelSelector
from nvflare.app_common.workflows.fedavg import FedAvg
from nvflare.app_opt.pt.job_config.model import PTModel
from nvflare.job_config.api import FedJob
from nvflare.job_config.script_runner import ScriptRunner


def define_parser():
parser = argparse.ArgumentParser()
parser.add_argument(
"--model_name",
type=str,
default="Bert",
help="Which model to choose, either Bert or GPT",
)
return parser.parse_args()


def main():
args = define_parser()
model_name = args.model_name

# Create the FedJob
if model_name.lower() == "bert":
num_clients = 4
job = FedJob(name="Bert", min_clients=num_clients)
train_model_name = "bert-base-uncased"
model = PTModel(BertModel(num_labels=3, model_name=train_model_name))
output_path = "Bert"
elif model_name.lower() == "gpt":
num_clients = 2
job = FedJob(name="GPT", min_clients=num_clients)
train_model_name = "gpt2"
model = PTModel(GPTModel(num_labels=3, model_name=train_model_name))
output_path = "GPT"
else:
raise ValueError(f"Invalid model_name: {model_name}, only Bert and GPT are supported.")

# Local training parameters
num_rounds = 5
dataset_path = f"/tmp/nvflare/dataset/nlp_ner/{num_clients}_split"
train_script = "src/nlp_fl.py"
train_args = f"--dataset_path {dataset_path} --model_name {train_model_name}"

# Define the controller workflow and send to server
controller = FedAvg(
num_clients=num_clients,
num_rounds=num_rounds,
)
job.to_server(controller)

# Define the initial global model and send to server
job.to_server(model)
job.to(IntimeModelSelector(key_metric="eval_acc"), "server")

# Add executor to clients
executor = ScriptRunner(script=train_script, script_args=train_args)
job.to_clients(executor)

# Export job config and run the job
job.export_job("/tmp/nvflare/workspace/jobs/")
job.simulator_run(f"/tmp/nvflare/workspace/works/{output_path}", n_clients=num_clients, gpu="0")


if __name__ == "__main__":
main()
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
#!/usr/bin/env bash
DATASET_ROOT=${1}
echo "4-client"
python3 code/utils/data_split.py --data_path ${DATASET_ROOT} --num_clients 4 --random_seed 0 --site_name_prefix 'site-'
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
torch
torchvision
tensorboard
transformers
pandas
seqeval
Loading

0 comments on commit 129ece4

Please sign in to comment.