In this demo, we will show you how to do the following all orchestrated by Airflow
- Create a Dataproc cluster
- Submit a Scala Spark Job that runs a
select * from database.keyspace.table
against Astra - Destroy Dataproc cluster upon job completion
If you want to modify the Scala Spark Jar, you can reference spark-cassandra.scala
and the comments within the file.
1.1 Create a Google Cloud Storage Bucket: (https://console.cloud.google.com/storage/browser)
Airflow credentials are admin
admin
3.3 Copy the full path of the keyfile.json and paste into the Keyfile Path
section of the Google connection setup and save.
Reference comments in poc.py if unsure