Skip to content

The aim of this dissertation is to assess the effectiveness of LLMs such as FinBERT and GPT-2 in detecting fraudulent activities in financial reports and statements. This repo provides the code for implementing LLMs, traditional machine learning and deep learning models on the labelled dataset

Notifications You must be signed in to change notification settings

amitkedia007/Financial-Fraud-Detection-Using-LLMs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Financial Fraud Detection Using AI

Project Overview

Introduction: This project utilizes machine learning, deep learning, and Large Language Models (LLMs) to detect financial fraud. It's based on a comprehensive dataset derived from financial filings to the U.S. Securities and Exchange Commission (SEC), aiming to compare and enhance AI models in identifying fraudulent financial activities. (For more information checkout the pdf in the repo)

Objective: The goal is to foster a collaborative platform where data scientists and researchers can develop, test, and improve AI models for detecting financial fraud.

Dataset Description

Source: The dataset includes financial filings from 170 companies, split equally between those involved in fraudulent and non-fraudulent activities.

Structure: Each dataset entry contains details such as Central Index Key (CIK), filing year, company name, and a categorical indicator of fraud.

Final Dataset: Finally the dataset is out on Kaggle do check it out here..

Data Preprocessing

Preprocessing steps involve text cleaning, tokenization, and transforming data into machine-readable formats, ensuring balanced and fair model training.

Model Implementation

The project encompasses a variety of models, including Logistic Regression, SVM, Random Forest, XGBoost, ANN, HAN, GPT-2, and FinBERT, selected for their NLP capabilities and potential in fraud detection.

To Reproduce

Codebase: Complete code for data extraction, preprocessing, model training, and evaluation is available in this repository.

Environment: A requirements.txt file is provided for setting up a consistent environment.

Documentation: Each script is documented with clear instructions in the README.md, guiding through environment setup, script execution, and result interpretation.

Contribution Guidelines

Getting Started:

  • Fork the repository.
  • Setup your environment with requirements.txt.
  • Familiarize yourself with the code and dataset.

Contributing:

  • Add or improve models, or refine preprocessing methods.
  • Ensure your code is documented and aligns with the project's style.
  • Submit pull requests with a detailed description of changes.

Reporting Issues:

  • Use GitHub Issues for bug reports, feature requests, or discussions.
  • Provide detailed bug descriptions and reproduction steps.

Community:

  • Engage in discussions, share results, ask questions.
  • Adhere to community guidelines for a collaborative environment.

License

This project is open-source, available under MIT License.

Acknowledgements

Thanks to all contributors and community members for their valuable participation and insights in advancing AI in financial fraud detection.

About

The aim of this dissertation is to assess the effectiveness of LLMs such as FinBERT and GPT-2 in detecting fraudulent activities in financial reports and statements. This repo provides the code for implementing LLMs, traditional machine learning and deep learning models on the labelled dataset

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published