Skip to content
This repository was archived by the owner on Jul 18, 2023. It is now read-only.

The 'postprocessing' step is too slow. #137

Closed
nottwy opened this issue Dec 11, 2017 · 11 comments
Closed

The 'postprocessing' step is too slow. #137

nottwy opened this issue Dec 11, 2017 · 11 comments

Comments

@nottwy
Copy link

nottwy commented Dec 11, 2017

Dear developers,

Now I'm at the 'postprocessing' step. The command is presented below. But it takes too much time.


Run postprocessing
hinge clip ecoli.edges.hinges ecoli.hinge.list


My data size is: 855 Gb .las file. Do you have any suggestions?

@ilanshom
Copy link
Collaborator

Hi @nottwy,

The hinge clip step doesn't use the .las files, it only uses the .edges.hinges and the hinge.list files. How big are those files for you?

Typically this step is fast compared to the rest of the pipeline, so I'm surprised that it's taking that long. Is there any output to the console?

@govinda-kamath
Copy link
Collaborator

Also could you share ecoli.edges.hinges and ecoli.hinge.list with us? This contains no sequence information (in case that's a concern for you).

@nottwy
Copy link
Author

nottwy commented Dec 15, 2017

The size of edges.hinges is 113 M and of hinge.list is 27 M.
There is no output so far (>1 week) and the program runs well in my view.
The last record of git log is:
commit 4c8b36b
Author: Fei Xia xf1280@gmail.com
Date: Tue Oct 24 17:00:36 2017 -0700

Update run.sh

Wait for your reply!

@ilanshom
Copy link
Collaborator

These are very large edge/hinges files. Does your genome have telomeres/centromeres? I suspect that the graph could be very dense in these repetitive parts. Do you have del_telomeres = 1; in your nominal.ini?

Also, I remember that a while back you were trying to use the devG3 branch (see #129). Were you able to use it? Setting aggressive_pruning = true in your nominal.ini could also help.

@nottwy
Copy link
Author

nottwy commented Dec 18, 2017

yeah, I still remember this thing and plan to do it. But now I just want to run your software successfully at this time. I'll give you response if I have any progress.

And if I want to try del_telomeres, which step should I start from?

@nottwy
Copy link
Author

nottwy commented Dec 26, 2017

Now it took me ~20 days at this step. Could you please make a little change to this step ('postprocessing') which makes the program reports the progress?

@ilanshom
Copy link
Collaborator

ilanshom commented Jan 3, 2018

Yes, we can do that. But I think it would be helpful if we could have your edges.hinge and hinges.list files. Could you write an email to ilanshom@gmail.com, so that we can coordinate a way for you to send us the files? Thanks!

@nottwy
Copy link
Author

nottwy commented Jan 11, 2018

@ilanshom ,
Have you received my email? Wait for your reply!

@ilanshom
Copy link
Collaborator

Yes, thank you. We got your files and are working on it. Sorry for the delay. We'll keep you posted.

@ilanshom
Copy link
Collaborator

Hi @nottwy,
We made some changes to hinge clip and it now scales to large datasets well. We were able to run it on the files you sent us in under 1 hour. Could you please checkout the latest commit, and try it again? Also, we now write many status messages so that you can tell us how far it went in case it gets stuck.

@nottwy
Copy link
Author

nottwy commented Jan 25, 2018

Great Work!
Now it takes only 3 hours to finish the clip step.

@nottwy nottwy closed this as completed Jan 25, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants