Read following instructions before adding a new model.
- Code Style
- Read The Examples
- Fork
- MANDATORY For TESTS
- Create python code
- Create JSON for parameters
- Keep Your Branch Updated
- Run/test your Model
- Check Your Test Runs
- Issue A Pull Request
- Source Code Structure As Below
- How to define a custom model
#### Easy path finding
from mlmodels.util import path_norm
path_withPrefix = path_norm("dataset/timseries/myfile.csv") ## site-package/mlmodels/dataset/timseries/myfile.csv
### Run some model on Command Line for debugging
cd mlmodels
python optim.py
python model_tch/textcnn.py
python model_keras/textcnn.py
https://github.com/arita37/mlmodels/issues?q=is%3Aissue+is%3Aopen+sort%3Aupdated-desc
- You can use to 120 characters per line : Better code readability
- Do Not FOLLOW strict PEP8, make your code EASY TO READ : Align "=" together, ....
- Do NOT reformat existing files.
Fork from arita37/mlmodels. Please use same branch for your developpements: dev branch
Change in these files where needed with your MODEL_NAME and BRANCH NAME :
-
Test on YOUR_Branch, at each Commit
: At each commit -
Test at by using pullrequest/ youtest.py
: Used at PR Merge
Create mlmodels/model_XXXX/yyyyy.py
. Check template.
See examples: model_keras/textcnn.py, transformer_sentence.py
Please re-use existing functions in util.py
from mlmodels.util import os_package_root_path, log,
path_norm, get_model_uri, path_norm_dict
### Use path_norm to normalize your path.
data_path = path_norm("dataset/text/myfile.txt")
--> FULL_ PATH /home/ubuntu/mlmodels/dataset/text/myfile.txt
### Use path_norm to normalize your path.
data_path = path_norm("ztest/text/myfile.txt")
--> FULL_ PATH /home/ubuntu/mlmodels/ztest/text/myfile.txt
data_path = path_norm("ztest/text/myfile.txt")
--> FULL_ PATH /home/ubuntu/mlmodels/ztest/text/myfile.txt
Create mlmodels/model_XXXX/yyyy.json file following this template.
Sync your branch with arita37/mlmodels:dev to reduce conflicts at final steps.
Pull Request : arita37/dev --> your Branch
Run/Test newly added model on your local machine or on Gitpod or COLAB
source activate py36
cd mlmodels
python model_XXXX/yyyy.py
https://github.com/arita37/mlmodels/actions?query=workflow%3Atest_custom_model
Once you have made the changes issue a PR.
### On Linux/MacOS
pip install numpy<=1.17.0
pip install -e . -r requirements.txt
pip install -r requirements_fake.txt
### On Windows
Use WSL + Linux Installed.
pip install numpy<=1.17.0
pip install torch==1..1 -f https://download.pytorch.org/whl/torch_stable.html
pip install -e . -r requirements_wi.txt
pip install -r requirements_fake.txt
docs
: documentationmlmodels
: interface wrapper for pytorch, keras, gluon, tf, transformer NLP for train, hyper-params searchi.model_xxx
: folders for each platform with same interface defined in template folderdataset
: store dataset files for test runs.template
: template interface wrapper which define common interfaces for whole platformsztest
: testing output for each sample testing inmodel_xxx
ztest
: testing output for each sample testing inmodel_xxx
1. Create a file mlmodels\model_XXXX\mymodel.py
, XXX: tch: pytorch, tf:tensorflow, keras:keras, ....
-
Declare below classes/functions in the created file:
Class Model() : Model definition __init__(model_pars, data_pars, compute_pars) : def fit(model, data_pars, model_pars, compute_pars, out_pars ) : Train the model def fit_metric(model, data_pars, compute_pars, out_pars ) : Measure the results def predict(model, sess, data_pars, compute_pars, out_pars ) : Predict the results def get_params(choice, data_path, config_mode) : returnparameters of the model def get_dataset(data_pars) : load dataset def test() : example running the model def test_api() : example running the model in global settings def save(model, session, save_pars) : save the model def load(load_pars) : load the trained model
-
Infos
model : Model(model_pars), instance of Model() object sess : Session for TF model or optimizer in PyTorch model_pars : dict containing info on model definition. data_pars : dict containing info on input data. compute_pars : dict containing info on model compute. out_pars : dict containing info on output folder. save_pars/load_pars : dict for saving or loading a model
- Declare model definition in Class Model()
self.model = DeepFM(linear_cols, dnn_cols, task=compute_pars['task']) # mlmodels/model_kera/01_deectr.py
# Model Parameters such as `linear_cols, dnn_cols` is obtained from function `get_params` which return `model_pars, data_pars, compute_pars, out_pars`
- Implement pre-process data in function
get_dataset
which return data for both training and testing dataset Depend on type of dataset, we could separate function with datatype as below example
if data_type == "criteo":
df, linear_cols, dnn_cols, train, test, target = _preprocess_criteo(df, **kw)
elif data_type == "movie_len":
df, linear_cols, dnn_cols, train, test, target = _preprocess_movielens(df, **kw)
- Call fit/predict with initialized model and dataset
# get dataset using function get_dataset
data, linear_cols, dnn_cols, train, test, target = get_dataset(**data_pars)
# fit data
model.model.fit(train_model_input, train[target].values,
batch_size=m['batch_size'], epochs=m['epochs'], verbose=2,
validation_split=m['validation_split'], )
# predict data
pred_ans = model.model.predict(test_model_input, batch_size= compute_pars['batch_size'])
- Calculate metric with predict output
# input of metrics is predicted output and ground truth data
def metrics(ypred, ytrue, data_pars, compute_pars=None, out_pars=None, **kwargs):
- Examples
Create a JSON file inside /model_XXX/mymodel.json
- Separate configure for staging development environment such as testing and production phase then for each staging, declare some specific parameters for model, dataset and also output
- Examples
{
"test": {
"hypermodel_pars": {
"learning_rate": {"type": "log_uniform", "init": 0.01, "range" : [0.001, 0.1] },
"num_layers": {"type": "int", "init": 2, "range" :[2, 4] },
"size": {"type": "int", "init": 6, "range" :[6, 6] },
"output_size": {"type": "int", "init": 6, "range" : [6, 6] },
"size_layer": {"type" : "categorical", "value": [128, 256 ] },
"timestep": {"type" : "categorical", "value": [5] },
"epoch": {"type" : "categorical", "value": [2] }
},
"model_pars": {
"learning_rate": 0.001,
"num_layers": 1,
"size": 6,
"size_layer": 128,
"output_size": 6,
"timestep": 4,
"epoch": 2
},
"data_pars" :{
"path" :
"location_type" : "local/absolute/web",
"data_type" : "text" / "recommender" / "timeseries" /"image",
"data_loader" : "pandas",
"data_preprocessor" : "mlmodels.model_keras.prepocess:process",
"size" : [0,1,2],
"output_size": [0, 6]
},
"compute_pars": {
"distributed": "mpi",
"epoch": 10
},
"out_pars": {
"out_path": "dataset/",
"data_type": "pandas",
"size": [0, 0, 6],
"output_size": [0, 6]
}
},
"prod": {
"model_pars": {},
"data_pars": {}
}
}
#######################################################################################
https://github.com/arita37/mlmodels/blob/dev/docs/README_docs/README_usage_CLI.md
#######################################################################################
models.py
module_load(model_uri)
model_create(module)
fit(model, module, session, data_pars, out_pars )
metrics(model, module, session, data_pars, out_pars)
predict(model, module, session, data_pars, out_pars)
save(model, path)
load(model)
optim.py
optim(modelname="model_tf.1_lstm.py", model_pars= {}, data_pars = {}, compute_pars={"method": "normal/prune"}
, save_folder="/mymodel/", log_folder="", ntrials=2)
optim_optuna(modelname="model_tf.1_lstm.py", model_pars= {}, data_pars = {}, compute_pars={"method" : "normal/prune"},
save_folder="/mymodel/", log_folder="", ntrials=2)
Define in models_config.json
model_params : Relative to model definition
compute_pars : Relative to the compute process
data_pars : Relative to the input data
out_pars : Relative to outout data
Sometimes, data_pars is required to setup the model (ie CNN with image size...)
#######################################################################################
pd_ : input is pandas dataframe
np_ : input is numpy
sk_ : inout is related to sklearn (ie sklearn model), input is numpy array
plot_
col_ : function name for column list related.