Contributing Guidelines

Pull Requests, Fixes, New Models

Read following instructions before adding a new model.

Code Style
Read The Examples
Fork
MANDATORY For TESTS
Create python code
Create JSON for parameters
Keep Your Branch Updated
Run/test your Model
Check Your Test Runs
Issue A Pull Request
Source Code Structure As Below
How to define a custom model

Get started quickly

#### Easy path finding
from mlmodels.util import path_norm 
  path_withPrefix = path_norm("dataset/timseries/myfile.csv")   ##   site-package/mlmodels/dataset/timseries/myfile.csv



### Run some model on Command Line for debugging
cd mlmodels
python optim.py

python model_tch/textcnn.py

python model_keras/textcnn.py

List of TODO / ISSUES List

https://github.com/arita37/mlmodels/issues?q=is%3Aissue+is%3Aopen+sort%3Aupdated-desc

Index of Functions/Methods

Index

Using Online Editor (pre-installed mlmodels)

Gitpod

Colab

Code Style:

You can use to 120 characters per line : Better code readability
Do Not FOLLOW strict PEP8, make your code EASY TO READ : Align "=" together, ....
Do NOT reformat existing files.

Read The Examples

Issue#102
Issue#100

1) Fork

Fork from arita37/mlmodels. Please use same branch for your developpements: dev branch

2) Configure for Tests (No Tests Success, No PR Accepted)

Change in these files where needed with your MODEL_NAME and BRANCH NAME :

Test on YOUR_Branch, at each Commit : At each commit
Test at by using pullrequest/ youtest.py : Used at PR Merge

3) Create Python Script For New Model

Create mlmodels/model_XXXX/yyyyy.py. Check template.
See examples: model_keras/textcnn.py, transformer_sentence.py

Please re-use existing functions in util.py

 from mlmodels.util import os_package_root_path, log, 
                        path_norm, get_model_uri, path_norm_dict

 ### Use path_norm to normalize your path.
 data_path = path_norm("dataset/text/myfile.txt")
    --> FULL_ PATH   /home/ubuntu/mlmodels/dataset/text/myfile.txt


 ### Use path_norm to normalize your path.
 data_path = path_norm("ztest/text/myfile.txt")
    --> FULL_ PATH   /home/ubuntu/mlmodels/ztest/text/myfile.txt


 data_path = path_norm("ztest/text/myfile.txt")
    --> FULL_ PATH   /home/ubuntu/mlmodels/ztest/text/myfile.txt

4) Create JSON For Parameters

Create mlmodels/model_XXXX/yyyy.json file following this template.

5) Keep Your Branch Updated

Sync your branch with arita37/mlmodels:dev to reduce conflicts at final steps.

Pull Request : arita37/dev --> your Branch

Run Model

Run/Test newly added model on your local machine or on Gitpod or COLAB

source activate py36
cd mlmodels
python model_XXXX/yyyy.py

Check Your Test Runs

https://github.com/arita37/mlmodels/actions?query=workflow%3Atest_custom_model

Issue A Pull Request

Once you have made the changes issue a PR.

Manual Installation

### On Linux/MacOS
pip install numpy<=1.17.0
pip install -e .  -r requirements.txt
pip install   -r requirements_fake.txt


### On Windows
Use WSL + Linux Installed.
pip install numpy<=1.17.0
pip install torch==1..1 -f https://download.pytorch.org/whl/torch_stable.html
pip install -e .  -r requirements_wi.txt
pip install   -r requirements_fake.txt

Source Code Structure As Below

docs: documentation
mlmodels: interface wrapper for pytorch, keras, gluon, tf, transformer NLP for train, hyper-params searchi.
- model_xxx: folders for each platform with same interface defined in template folder
- dataset: store dataset files for test runs.
- template: template interface wrapper which define common interfaces for whole platforms
- ztest: testing output for each sample testing in model_xxx
ztest: testing output for each sample testing in model_xxx

How to define a custom model

1. Create a file `mlmodels\model_XXXX\mymodel.py` , XXX: tch: pytorch, tf:tensorflow, keras:keras, ....

Declare below classes/functions in the created file:

Class Model()                                                  :   Model definition
      __init__(model_pars, data_pars, compute_pars)            :   
                            
def fit(model, data_pars, model_pars, compute_pars, out_pars ) : Train the model
def fit_metric(model, data_pars, compute_pars, out_pars )         : Measure the results
def predict(model, sess, data_pars, compute_pars, out_pars )   : Predict the results


def get_params(choice, data_path, config_mode)                                               : returnparameters of the model
def get_dataset(data_pars)                                     : load dataset
def test()                                                     : example running the model     
def test_api()                                                 : example running the model in global settings  

def save(model, session, save_pars)                            : save the model
def load(load_pars)                                            : load the trained model

Infos

model :         Model(model_pars), instance of Model() object
sess  :         Session for TF model  or optimizer in PyTorch
model_pars :    dict containing info on model definition.
data_pars :     dict containing info on input data.
compute_pars :  dict containing info on model compute.
out_pars :      dict containing info on output folder.
save_pars/load_pars : dict for saving or loading a model

2. Write your code and create test() to test your code.

Declare model definition in Class Model()

    self.model = DeepFM(linear_cols, dnn_cols, task=compute_pars['task']) # mlmodels/model_kera/01_deectr.py
    # Model Parameters such as `linear_cols, dnn_cols` is obtained from function `get_params` which return `model_pars, data_pars, compute_pars, out_pars`

Implement pre-process data in function get_dataset which return data for both training and testing dataset Depend on type of dataset, we could separate function with datatype as below example

    if data_type == "criteo":
        df, linear_cols, dnn_cols, train, test, target = _preprocess_criteo(df, **kw)

    elif data_type == "movie_len":
        df, linear_cols, dnn_cols, train, test, target = _preprocess_movielens(df, **kw)

Call fit/predict with initialized model and dataset

    # get dataset using function get_dataset
    data, linear_cols, dnn_cols, train, test, target = get_dataset(**data_pars)
    # fit data
     model.model.fit(train_model_input, train[target].values,
                        batch_size=m['batch_size'], epochs=m['epochs'], verbose=2,
                        validation_split=m['validation_split'], )
    # predict data
    pred_ans = model.model.predict(test_model_input, batch_size= compute_pars['batch_size'])

Calculate metric with predict output

    # input of metrics is predicted output and ground truth data
    def metrics(ypred, ytrue, data_pars, compute_pars=None, out_pars=None, **kwargs):

Examples

3. Create JSON config file

Create a JSON file inside /model_XXX/mymodel.json

Separate configure for staging development environment such as testing and production phase then for each staging, declare some specific parameters for model, dataset and also output
Examples

    {
        "test": {

              "hypermodel_pars":   {
             "learning_rate": {"type": "log_uniform", "init": 0.01,  "range" : [0.001, 0.1] },
             "num_layers":    {"type": "int", "init": 2,  "range" :[2, 4] },
             "size":    {"type": "int", "init": 6,  "range" :[6, 6] },
             "output_size":    {"type": "int", "init": 6,  "range" : [6, 6] },

             "size_layer":    {"type" : "categorical", "value": [128, 256 ] },
             "timestep":      {"type" : "categorical", "value": [5] },
             "epoch":         {"type" : "categorical", "value": [2] }
           },

            "model_pars": {
                "learning_rate": 0.001,     
                "num_layers": 1,
                "size": 6,
                "size_layer": 128,
                "output_size": 6,
                "timestep": 4,
                "epoch": 2
            },

            "data_pars" :{
              "path"            : 
              "location_type"   :  "local/absolute/web",
              "data_type"   :   "text" / "recommender"  / "timeseries" /"image",
              "data_loader" :  "pandas",
              "data_preprocessor" : "mlmodels.model_keras.prepocess:process",
              "size" : [0,1,2],
              "output_size": [0, 6]              
            },


            "compute_pars": {
                "distributed": "mpi",
                "epoch": 10
            },
            "out_pars": {
                "out_path": "dataset/",
                "data_type": "pandas",
                "size": [0, 0, 6],
                "output_size": [0, 6]
            }
        },
    
        "prod": {
            "model_pars": {},
            "data_pars": {}
        }
    }

#######################################################################################

③ Command Line Input tools: package provide below tools

https://github.com/arita37/mlmodels/blob/dev/docs/README_docs/README_usage_CLI.md

#######################################################################################

④ Interface

models.py

   module_load(model_uri)
   model_create(module)
   fit(model, module, session, data_pars, out_pars   )
   metrics(model, module, session, data_pars, out_pars)
   predict(model, module, session, data_pars, out_pars)
   save(model, path)
   load(model)

optim.py

   optim(modelname="model_tf.1_lstm.py",  model_pars= {}, data_pars = {}, compute_pars={"method": "normal/prune"}
       , save_folder="/mymodel/", log_folder="", ntrials=2) 

   optim_optuna(modelname="model_tf.1_lstm.py", model_pars= {}, data_pars = {}, compute_pars={"method" : "normal/prune"},
                save_folder="/mymodel/", log_folder="", ntrials=2)

Generic parameters

   Define in models_config.json
   model_params      :  Relative to model definition 
   compute_pars      :  Relative to  the compute process
   data_pars         :  Relative to the input data
   out_pars          :  Relative to outout data

Sometimes, data_pars is required to setup the model (ie CNN with image size...)

#######################################################################################

⑥ Naming convention

Function naming

pd_   :  input is pandas dataframe
np_   :  input is numpy
sk_   :  inout is related to sklearn (ie sklearn model), input is numpy array
plot_


col_ :  function name for column list related.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

README_addmodel.md

README_addmodel.md

Contributing Guidelines

Pull Requests, Fixes, New Models

Get started quickly

List of TODO / ISSUES List

Index of Functions/Methods

Using Online Editor (pre-installed mlmodels)

Code Style:

Read The Examples

1) Fork

2) Configure for Tests (No Tests Success, No PR Accepted)

3) Create Python Script For New Model

4) Create JSON For Parameters

5) Keep Your Branch Updated

Run Model

Check Your Test Runs

Issue A Pull Request

Manual Installation

Source Code Structure As Below

How to define a custom model

1. Create a file `mlmodels\model_XXXX\mymodel.py` , XXX: tch: pytorch, tf:tensorflow, keras:keras, ....

2. Write your code and create test() to test your code.

3. Create JSON config file

③ Command Line Input tools: package provide below tools

④ Interface

Generic parameters

⑥ Naming convention

Function naming

Files

README_addmodel.md

Latest commit

History

README_addmodel.md

File metadata and controls

Contributing Guidelines

Pull Requests, Fixes, New Models

Get started quickly

List of TODO / ISSUES List

Index of Functions/Methods

Using Online Editor (pre-installed mlmodels)

Code Style:

Read The Examples

1) Fork

2) Configure for Tests (No Tests Success, No PR Accepted)

3) Create Python Script For New Model

4) Create JSON For Parameters

5) Keep Your Branch Updated

Run Model

Check Your Test Runs

Issue A Pull Request

Manual Installation

Source Code Structure As Below

How to define a custom model

1. Create a file mlmodels\model_XXXX\mymodel.py , XXX: tch: pytorch, tf:tensorflow, keras:keras, ....

2. Write your code and create test() to test your code.

3. Create JSON config file

③ Command Line Input tools: package provide below tools

④ Interface

Generic parameters

⑥ Naming convention

Function naming

1. Create a file `mlmodels\model_XXXX\mymodel.py` , XXX: tch: pytorch, tf:tensorflow, keras:keras, ....