BigDL Friesian is an application framework for building optimized large-scale recommender solutions optimized on Intel Xeon. This workflow demonstrates how to use Friesian to easily build an end-to-end Wide & Deep Learning recommennder system on a real-world large dataset provided by Twitter.
- Friesian provides various built-in distributed feature engineering operations and the distributed training of popular recommendation algorithms based on BigDL Orca and Spark.
- Friesian provides a complete, highly available and scalable pipeline for online serving (including recall and ranking) as well as nearline updates based on gRPC services.
The overall architecture of Friesian is shown in the following diagram:
You can download Twitter Recsys Challenge 2021 dataset from here. Or you can run the script generate_dummy_data.py
to generate a dummy dataset.
To run on a Kubernetes cluster, you may need to put the downloaded data to a shared volume. Please refer to here for more details.
- Please refer to here for the docker image for BigDL on K8s.
- Please refer to here to create a client container for the Kubernetes cluster.
Please follow the steps here to prepare the Python environment on the client container.
- Please refer to here to run the distributed feature engineering and training workload on a Kubernetes cluster. The scripts are here.
- Please refer to here to run the online serving workload.
The hardware below is recommended for use with this reference implementation.
- Intel® 4th Gen Xeon® Scalable Performance processors
- Please check the notebooks here for more detailed descriptions for distributed feature engineering and training.
- Please check here for more reference use cases.
- Please check here for more detailed API documentations.
NA
NA
Please submit issues here and we will track and respond to them daily.