Merge branch 'main' into fed_stats2

NVIDIA · Feb 22, 2025 · 129ece4 · 129ece4
2 parents dcce11c + 8a9d67f
commit 129ece4
Show file tree

Hide file tree

Showing 55 changed files with 4,507 additions and 297 deletions.
diff --git a/...ted_learning/chapter-8_federated_LLM_training/08.0_introduction/04.2.0_introduction.ipynb b/...ted_learning/chapter-8_federated_LLM_training/08.0_introduction/04.2.0_introduction.ipynb
diff --git a/..._federated_learning/chapter-8_federated_LLM_training/08.0_introduction/introduction.ipynb b/..._federated_learning/chapter-8_federated_LLM_training/08.0_introduction/introduction.ipynb
@@ -0,0 +1,106 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "1e333915-6759-4d82-9f83-2bccc42ca047",
+   "metadata": {},
+   "source": [
+    "# Introduction: Federated Language Models - from NLP to LLM"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c96f5007-8c62-4d61-bd97-6b31a3f5b0db",
+   "metadata": {},
+   "source": [
+    "In this chapter, we will explore the federated learning applications on language models.\n",
+    "\n",
+    "Natural Language Processing (NLP) is a subfield of artificial intelligence, focuses on enabling computers to process and analyze natural language data. Recently, Large Language Models (LLMs) have emerged as a transformative force in the field of NLP, enabling AI to understand, generate, and interact with human language at an unprecedented scale. Models such as BERT and GPT is able to leverage vast amounts of text data and deep learning techniques to perform various linguistic tasks, including text generation, translation, summarization, and question-answering.\n",
+    "\n",
+    "The development of LLMs relies on robust training schemes that enable these models to capture linguistic structures, contextual dependencies, and semantic meanings. Common training methodologies include unsupervised pretraining on large text corpora, followed by further fine-tuning using supervised (supervised finetuning - SFT) or reinforcement learning (reinforcement learning from human feedback - RLHF) approaches, refining their capabilities for practical applications with human interactions.\n",
+    "\n",
+    "Further, when adapting to a particular downstream task, instead of making updates to all model parameters as SFT/RLHF which can be computationally expensive and memory-intensive, Parameter-Efficient Fine-Tuning (PEFT) methods have emerged as a more efficient approach.  Techniques such as Low-Rank Adaptation (LoRA), P-Tuning, and Adapter Layers enable fine-tuning by updating only a small subset of parameters, significantly reducing computational costs while maintaining performance. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4fb8b73a-5058-44fe-8ecc-e355ef7178fb",
+   "metadata": {},
+   "source": [
+    "In the following sections, we will start with federated learning using a smaller-scale BERT model, then we extend our study to more recent open-source LLMs and their SFT and PEFT in a federated finetuning scheme. And finally to address a major challenge in federated LLM training - communication efficiency, we further visit potential solutions including quantization and streaming, and we will conclude with a recap of the covered topics."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9bffcb04-6839-4463-b4a5-e1792024adce",
+   "metadata": {},
+   "source": [
+    "8.1. **Federated BERT**\n",
+    "\n",
+    "Task-specific model training with BERT in a federated setting\n",
+    "* [Federated NLP with BERT Model](../08.1_fed_bert/federated_nlp_with_bert.ipynb)\n",
+    "\n",
+    "8.2. **Federated LLM Training with SFT**\n",
+    "\n",
+    "Supervised Fine-Tuning and its role in adapting LLMs in federated learning\n",
+    "* [Federated LLM Tuning with SFT](../08.2_llm_sft/LLM_SFT.ipynb)\n",
+    "\n",
+    "8.3. **Federated LLM Training with PEFT**\n",
+    "\n",
+    "Importance of PEFT in adapting LLMs for specific tasks, which can be achieve in a federated setting\n",
+    "* [Federated LLM Tuning with PEFT](../08.3_llm_peft/LLM_PEFT.ipynb)\n",
+    "\n",
+    "8.4. **Model Transmission with Quantization**\n",
+    "\n",
+    "One major hurdle of adapting LLMs in federated learning is the significant communication burden when performing federated SFT. To reduce the message size, quantization method can be applied as filters.\n",
+    "* [Model Quantization for Transmission](../08.4_llm_quantization/LLM_quantization.ipynb)\n",
+    "\n",
+    "8.5 **Model Transmission with Streaming**\n",
+    "\n",
+    "While quantization reduced communication cost, system memory requirement is still high for prepareing the message on either side. Therefore, we enabled streaming capabilities for more efficient and robust model communication.\n",
+    "* [Message Streaming for Model Transmission](../08.5_llm_streaming/LLM_streaming.ipynb)\n",
+    "\n",
+    "8.6. **Recap**\n",
+    "\n",
+    "[Recap](../08.6_recap/recap.ipynb) for federated LLM applications and features"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f8864f21-ce74-4adf-8b5b-240879424424",
+   "metadata": {},
+   "source": [
+    "Let's get started with [Federated NLP with BERT Model](../08.1_fed_bert/federated_nlp_with_bert.ipynb)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "0ceaab09-f41e-41e4-8ecd-a784328b468a",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.13.2"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/..._federated_learning/chapter-8_federated_LLM_training/08.1_fed_bert/code/ner_model_test.py b/..._federated_learning/chapter-8_federated_LLM_training/08.1_fed_bert/code/ner_model_test.py
@@ -0,0 +1,96 @@
+# Copyright (c) 2023, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import argparse
+import os
+
+import pandas as pd
+import torch
+from seqeval.metrics import classification_report
+from src.data_sequence import DataSequence
+from src.nlp_models import BertModel, GPTModel
+from torch.utils.data import DataLoader
+
+os.environ["TOKENIZERS_PARALLELISM"] = "False"
+
+
+def data_split_args_parser():
+    parser = argparse.ArgumentParser(description="Perform model testing by loading the best global model")
+    parser.add_argument("--data_path", type=str, help="Path to data file")
+    parser.add_argument("--model_path", type=str, help="Path to workspace server folder")
+    parser.add_argument("--num_labels", type=int, help="Number of labels for the candidate dataset")
+    parser.add_argument("--model_name", type=str, default="bert-base-uncased", help="Model name")
+    return parser
+
+
+if __name__ == "__main__":
+    parser = data_split_args_parser()
+    args = parser.parse_args()
+    device = torch.device("cuda")
+
+    model_path = args.model_path
+    data_path = args.data_path
+    num_labels = args.num_labels
+    model_name = args.model_name
+    ignore_token = -100
+
+    df_test = pd.read_csv(os.path.join(data_path, "test.csv"))
+    # label and id conversion
+    labels = []
+    for x in df_test["labels"].values:
+        labels.extend(x.split(" "))
+    unique_labels = set(labels)
+    labels_to_ids = {k: v for v, k in enumerate(sorted(unique_labels))}
+    ids_to_labels = {v: k for v, k in enumerate(sorted(unique_labels))}
+
+    # model
+    if model_name == "bert-base-uncased":
+        model = BertModel(model_name=model_name, num_labels=num_labels).to(device)
+    elif model_name == "gpt2":
+        model = GPTModel(model_name=model_name, num_labels=num_labels).to(device)
+    else:
+        raise ValueError("model not supported")
+    model_weights = torch.load(os.path.join(model_path, "best_FL_global_model.pt"))
+    model.load_state_dict(state_dict=model_weights["model"])
+    tokenizer = model.tokenizer
+
+    # data
+    test_dataset = DataSequence(df_test, labels_to_ids, tokenizer=tokenizer, ignore_token=ignore_token)
+    test_loader = DataLoader(test_dataset, num_workers=4, batch_size=64, shuffle=False)
+
+    # validate
+    model.eval()
+    with torch.no_grad():
+        total_acc_test, total_loss_test, test_total = 0, 0, 0
+        test_y_pred, test_y_true = [], []
+        for test_data, test_label in test_loader:
+            test_label = test_label.to(device)
+            test_total += test_label.shape[0]
+            mask = test_data["attention_mask"].squeeze(1).to(device)
+            input_id = test_data["input_ids"].squeeze(1).to(device)
+            loss, logits = model(input_id, mask, test_label)
+
+            for i in range(logits.shape[0]):
+                # remove pad tokens
+                logits_clean = logits[i][test_label[i] != ignore_token]
+                label_clean = test_label[i][test_label[i] != ignore_token]
+                # calcluate acc and store prediciton and true labels
+                predictions = logits_clean.argmax(dim=1)
+                acc = (predictions == label_clean).float().mean()
+                total_acc_test += acc.item()
+                test_y_pred.append([ids_to_labels[x.item()] for x in predictions])
+                test_y_true.append([ids_to_labels[x.item()] for x in label_clean])
+    # metric summary
+    summary = classification_report(y_true=test_y_true, y_pred=test_y_pred, zero_division=0)
+    print(summary)
diff --git a/...nced_federated_learning/chapter-8_federated_LLM_training/08.1_fed_bert/code/nlp_fl_job.py b/...nced_federated_learning/chapter-8_federated_LLM_training/08.1_fed_bert/code/nlp_fl_job.py
@@ -0,0 +1,84 @@
+# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import argparse
+
+from src.nlp_models import BertModel, GPTModel
+
+from nvflare.app_common.widgets.intime_model_selector import IntimeModelSelector
+from nvflare.app_common.workflows.fedavg import FedAvg
+from nvflare.app_opt.pt.job_config.model import PTModel
+from nvflare.job_config.api import FedJob
+from nvflare.job_config.script_runner import ScriptRunner
+
+
+def define_parser():
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--model_name",
+        type=str,
+        default="Bert",
+        help="Which model to choose, either Bert or GPT",
+    )
+    return parser.parse_args()
+
+
+def main():
+    args = define_parser()
+    model_name = args.model_name
+
+    # Create the FedJob
+    if model_name.lower() == "bert":
+        num_clients = 4
+        job = FedJob(name="Bert", min_clients=num_clients)
+        train_model_name = "bert-base-uncased"
+        model = PTModel(BertModel(num_labels=3, model_name=train_model_name))
+        output_path = "Bert"
+    elif model_name.lower() == "gpt":
+        num_clients = 2
+        job = FedJob(name="GPT", min_clients=num_clients)
+        train_model_name = "gpt2"
+        model = PTModel(GPTModel(num_labels=3, model_name=train_model_name))
+        output_path = "GPT"
+    else:
+        raise ValueError(f"Invalid model_name: {model_name}, only Bert and GPT are supported.")
+
+    # Local training parameters
+    num_rounds = 5
+    dataset_path = f"/tmp/nvflare/dataset/nlp_ner/{num_clients}_split"
+    train_script = "src/nlp_fl.py"
+    train_args = f"--dataset_path {dataset_path} --model_name {train_model_name}"
+
+    # Define the controller workflow and send to server
+    controller = FedAvg(
+        num_clients=num_clients,
+        num_rounds=num_rounds,
+    )
+    job.to_server(controller)
+
+    # Define the initial global model and send to server
+    job.to_server(model)
+    job.to(IntimeModelSelector(key_metric="eval_acc"), "server")
+
+    # Add executor to clients
+    executor = ScriptRunner(script=train_script, script_args=train_args)
+    job.to_clients(executor)
+
+    # Export job config and run the job
+    job.export_job("/tmp/nvflare/workspace/jobs/")
+    job.simulator_run(f"/tmp/nvflare/workspace/works/{output_path}", n_clients=num_clients, gpu="0")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/...ed_federated_learning/chapter-8_federated_LLM_training/08.1_fed_bert/code/prepare_data.sh b/...ed_federated_learning/chapter-8_federated_LLM_training/08.1_fed_bert/code/prepare_data.sh
@@ -0,0 +1,4 @@
+#!/usr/bin/env bash
+DATASET_ROOT=${1}
+echo "4-client"
+python3 code/utils/data_split.py --data_path ${DATASET_ROOT} --num_clients 4 --random_seed 0 --site_name_prefix 'site-'
diff --git a/...d_federated_learning/chapter-8_federated_LLM_training/08.1_fed_bert/code/requirements.txt b/...d_federated_learning/chapter-8_federated_LLM_training/08.1_fed_bert/code/requirements.txt
@@ -0,0 +1,6 @@
+torch
+torchvision
+tensorboard
+transformers
+pandas
+seqeval