Skip to content
This repository was archived by the owner on Feb 22, 2024. It is now read-only.

Latest commit

 

History

History
56 lines (33 loc) · 3.48 KB

DEVCATALOG.md

File metadata and controls

56 lines (33 loc) · 3.48 KB

Building Large-Scale End-to-End Recommendation Systems with BigDL Friesian

Overview

BigDL Friesian is an application framework for building optimized large-scale recommender solutions optimized on Intel Xeon. This workflow demonstrates how to use Friesian to easily build an end-to-end Wide & Deep Learning recommennder system on a real-world large dataset provided by Twitter.

How it Works

  • Friesian provides various built-in distributed feature engineering operations and the distributed training of popular recommendation algorithms based on BigDL Orca and Spark.
  • Friesian provides a complete, highly available and scalable pipeline for online serving (including recall and ranking) as well as nearline updates based on gRPC services.

The overall architecture of Friesian is shown in the following diagram:

Get Started

Dataset Preparation

You can download Twitter Recsys Challenge 2021 dataset from here. Or you can run the script generate_dummy_data.py to generate a dummy dataset.

To run on a Kubernetes cluster, you may need to put the downloaded data to a shared volume. Please refer to here for more details.

Docker

  • Please refer to here for the docker image for BigDL on K8s.
  • Please refer to here to create a client container for the Kubernetes cluster.

Environment Preparation

Please follow the steps here to prepare the Python environment on the client container.

How to run

  • Please refer to here to run the distributed feature engineering and training workload on a Kubernetes cluster. The scripts are here.
  • Please refer to here to run the online serving workload.

Recommended Hardware

The hardware below is recommended for use with this reference implementation.

  • Intel® 4th Gen Xeon® Scalable Performance processors

Learn More

  • Please check the notebooks here for more detailed descriptions for distributed feature engineering and training.
  • Please check here for more reference use cases.
  • Please check here for more detailed API documentations.

Known Issues

NA

Troubleshooting

NA

Support Forum

Please submit issues here and we will track and respond to them daily.