Crypto_Forecasting

Reason for the Selected Topic

Due to dramatic changes in the world financial environment, cryptocurrencies have gained popularity as one of the alternative investment available to most. The volatility of cryptocurrency assets, would be a bit of a challenge to predict prices changes. Using machine learning model, we hope to create a way to predict crypto market data. We will assess and analyze historical data of six most popular cryptocurrencies and compare the findings to real world market data.

Description of Source Data

Kaggle The dataset contains historical trades on several cryptoassets such as Ethereum, Dogecoin, Bitcoin, Cardano and more.
G-Reaserch is a quantitative finance research firm in Europe. They utilized machine learning, big data and the most advanced technology to predict movements in the financial markets.

Question we hope to answer with the data

Which Machine Learning Model would best predict future price changes?
By how much is cryptocurrency price going to increase in the near future compared to current price market?
Which coin/assets would be more stable out of the six cryptocurrency chosen for this project?
Which features affect the close price most?

Machine Learning Model

Description of data preprocessing:
- An assessment was performed to determine if there are missing data. This was remedied by either removing the NaN rows or filling in the gaps as was performed since the data is a time series dataset.
- Convert minute-by-minute data to day-by-day data for each crypto, and merge the 6 cryptoassets DataFrame into a new DataFrame.
- Visualizations were created to view trends and correlations.
Description of feature engineering and the feature selection, including the decision-making process:
- Use "High", "Low" & "VWAP" columns in the DataFrame as features, and "Close" as target.
Description of how data was split intro training and testing sets:
- A split of 80-20 of training-test sets was performed on the date. Three year worth of data was used to train the model and 9 months was used to test the models.

1. Linear Regression:

Explanation of model choice, including limitations and benefits:
- Advantages: LinearRegression is simple to implement and run very fast.
- Limitations: Outliers can have huge effects on the regression.
Explanation of changes in model choice (if changes occurred between the Segment 2 and Segment 3 deliverables):
- There is no changes from previous work.
Description of how the model have been trained thus far, and any additional training that will take place:
- Train the dataset of each crypto using trading data from the previous three years, and test the dataset using the last 9 months of trading data.
Description of current accuracy score:

2. XGBooster Model:

Explanation of model choice, including limitations and benefits:
- Advantages: Highly flexible and faster than Gradient Boosting
- Limitations: Data can be very noisy, in which it is unable to understand or interpret meaninglees data correctly
Explanation of changes in model choice (if changes occurred between the Segment 2 and Segment 3 deliverables):
- There was no severe changes from segment 2 to 3, however we did have to change the features to deliver a better score than the last segment
Description of how they have trained the model thus far and any additional training that will take place:
- The Scikit-Learning API model showed a high RSME (Root-mean-square deviation) of predictional errors. When using Hyperparameter, the number did improve slightly but not as much as I would like. Several other models could be used: Field Search or Randomized Search.
Description of current accuracy score

3. Random Forest Regressor:

Explanation of model choice, including limitations and benefits:
- Advantages: This model is great for Classification and Regression tasks.
- Limitations: It cannot extrapolate and can only make a prediction that is an average of previously observed labels.
Explanation of changes in model choice (if changes occurred between the Segment 2 and Segment 3 deliverables):
- Added Hyperparameter tuning to the model in hopes to improve the score
Description of how they have trained the model thus far, and any additional training that will take place:
- Added Neural Network using keras and the scores improve significantly
Description of current accuracy score:
- Model without tuning:
- Model with tuning:
- Model with ANN:

4. Artificial Neural Network:

Explanation of model choice, including limitations and benefits:
- Advantages: More accurate and run very fast.
- Limitations: Maybe easily over-fitting.
Explanation of changes in model choice (if changes occurred between the Segment 2 and Segment 3 deliverables):
- This is new machine learning model we add in segment 3.
Description of how they have trained the model thus far, and any additional training that will take place:
- Using two hidden layers with five neurons each and one output layer with one neuron.
Description of current accuracy score

Database Integration

Data Frame

Dashboard

The dashboard was built using Tableau Public

Google Slides Presentation

The Google Slides Presentation is here

Tools

PostgreSQL
Amazon Web Services (AWS)
Jupyter Notebook
Tableau Public
Google Slides
Google Colab

Name		Name	Last commit message	Last commit date
Latest commit History 165 Commits
Data_exploration		Data_exploration
Database		Database
Images		Images
ML_drafts		ML_drafts
MachineLearning_Final		MachineLearning_Final
Presentation		Presentation
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Crypto_Forecasting

Reason for the Selected Topic

Description of Source Data

Question we hope to answer with the data