Skip to content
This repository was archived by the owner on Jul 30, 2019. It is now read-only.

[REVIEW] Dask RF Classifier #56

Open
wants to merge 4 commits into
base: branch-0.9
Choose a base branch
from

Conversation

oyilmaz-nvidia
Copy link

This PR includes the dask RF classifier. To run the code, cuml from the following link should be installed;

https://github.com/oyilmaz-nvidia/cuml/tree/fea-update-predict

Once this PR is approved, another PR for the the updates made in the fea-update-predict branch of cuml will be created.

Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this PR was just meant for a quick overview, but I notice this does not include any automated pytests.

Also, it's going to be important that the dataframe passed in contains only 1 partition per worker. Originally I was thiking of writing some custom code to do this, but I believe it's better that we use Dask's repartition() function for this first iteration.

import math

@gen.coroutine
def _extract_ddf_partitions(ddf):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually a utility function inside cuml.dask.dask_df_utils in the new comms PR.

"""
c = default_client()

X_futures = c.sync(_extract_ddf_partitions, X)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call version from cuml.dask.common.dask_df_utils

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants