Skip to content

Latest commit

 

History

History
59 lines (35 loc) · 2.9 KB

File metadata and controls

59 lines (35 loc) · 2.9 KB

Financial Fraud Detection Using AI

Project Overview

Introduction: This project utilizes machine learning, deep learning, and Large Language Models (LLMs) to detect financial fraud. It's based on a comprehensive dataset derived from financial filings to the U.S. Securities and Exchange Commission (SEC), aiming to compare and enhance AI models in identifying fraudulent financial activities. (For more information checkout the pdf in the repo)

Objective: The goal is to foster a collaborative platform where data scientists and researchers can develop, test, and improve AI models for detecting financial fraud.

Dataset Description

Source: The dataset includes financial filings from 170 companies, split equally between those involved in fraudulent and non-fraudulent activities.

Structure: Each dataset entry contains details such as Central Index Key (CIK), filing year, company name, and a categorical indicator of fraud.

Final Dataset: Finally the dataset is out on Kaggle do check it out here..

Data Preprocessing

Preprocessing steps involve text cleaning, tokenization, and transforming data into machine-readable formats, ensuring balanced and fair model training.

Model Implementation

The project encompasses a variety of models, including Logistic Regression, SVM, Random Forest, XGBoost, ANN, HAN, GPT-2, and FinBERT, selected for their NLP capabilities and potential in fraud detection.

To Reproduce

Codebase: Complete code for data extraction, preprocessing, model training, and evaluation is available in this repository.

Environment: A requirements.txt file is provided for setting up a consistent environment.

Documentation: Each script is documented with clear instructions in the README.md, guiding through environment setup, script execution, and result interpretation.

Contribution Guidelines

Getting Started:

  • Fork the repository.
  • Setup your environment with requirements.txt.
  • Familiarize yourself with the code and dataset.

Contributing:

  • Add or improve models, or refine preprocessing methods.
  • Ensure your code is documented and aligns with the project's style.
  • Submit pull requests with a detailed description of changes.

Reporting Issues:

  • Use GitHub Issues for bug reports, feature requests, or discussions.
  • Provide detailed bug descriptions and reproduction steps.

Community:

  • Engage in discussions, share results, ask questions.
  • Adhere to community guidelines for a collaborative environment.

License

This project is open-source, available under MIT License.

Acknowledgements

Thanks to all contributors and community members for their valuable participation and insights in advancing AI in financial fraud detection.