DISCLAIMER: RUNNING THE CODE IN THIS REPO MAY COST REAL MONEY
Anything you add in hadoop-config/
directory will be copied as it is to Hadoop's configuration directory : /home/ubuntu/hadoop/etc/hadoop/
The basic configuration of hadoop is done in these files :
Anything you add in cassandra-config/
directory will be copied as it is to Cassandra's configuration directory : /etc/cassandra/
Basic configuration of cassandra is done in this file:
- You must have an aws account
- Configure your account to have a user with programatic access to the services that this project will use (EC2, VPC, ...etc)
- Configure this account with AWS CLI using
aws configure
- Copy
terraform.tfvars.example
toterraform.tfvars
and configure it the way you like:cp terraform.tfvars.example terraform.tfvars
- Create a directory called
keys/
- Create a key pair called
cluster_key
(cluster_key
,cluster_key.pub
). An alternative would be to create a symilink to the keys you have like so:These keys will be your keys to access AWS EC2 instances.ln -s ~/.ssh/id_rsa.pub keys/cluster_key.pub ln -s ~/.ssh/id_rsa keys/cluster_key
Simply run the following command :
terraform apply
The following command will apply the infrastructure without approval request and will save the output to a file for later debugging if necessary.
terraform apply -auto-approve 2>&1 | tee log-$(date '+%Y-%m-%d_%H-%M-%S').out
After terraform applies the infrastructure successfully you can run the following to access the Admin Node
ssh -i keys/cluster_key ubuntu@$(terraform output manager-ip)
To access the cluster's web UI (Like HDFS and Yarn) you can use SSH with local forwarding, like so:
ssh -L 50070:master.hadoop.cluster:50070 -i keys/cluster_key ubuntu@$(terraform output manager-ip)
Then you can access these services on localhost like HDFS and Yarn
Run this command with caution and at your own risk this will destroy the cluster without your confirmation:
terraform destroy -auto-approve