Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify p4studio interactive to use limited parallel jobs #78

Conversation

jafingerhut
Copy link
Contributor

No description provided.

@jafingerhut
Copy link
Contributor Author

I have tested this, and I do right at the beginning see the 2 lines of output from the function that calculates the number of parallel jobs to use, so these changes are tested working.

@@ -166,7 +166,8 @@ def create_default_profile() -> Profile:


def build_sde(context: Context, profile: Profile) -> None:
plan = ProfileExecutionPlan(profile, None, os.cpu_count())
jobs = calculate_jobs_from_available_cpus_and_memory()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably should be an argument from the previous calculation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sorry, but I do not understand the comment. The new first line of build_sde calculates and stores a value in variable jobs. The second line calls ProfileExecutionPlan, with the 3rd parameter being jobs instead of os.cpu_count().

What are you recommending I change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jobs = calculate_jobs_from_available_cpus_and_memory()

this calculation is already done here, no? Ideally, we should only do this once and then reuse the result in other contexts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case it is not obvious: the code in this file is run if you do a command p4studio interactive, which is what the install.sh bash script invokes. If you do this, the other Python source file that has the definition of calculate_jobs_from_available_cpus_and_memory has none of its code executed at all, unless it is explicitly called from somewhere in this file. That is what this change is adding -- an explicit call from this file into the function defined in the other file, that calculates a reasonable number of parallel jobs to use.

This is not a repeated call of that function in this code path. It is the only one.

Copy link
Contributor

@fruffy fruffy Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the common ancestor between the profile and the interactive command? I was thinking that this ancestor should call the function, then pass the result to either one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the common ancestor is some click module Python code that parses some command line arguments, and decides via click annotations on Python methods which function to invoke. If you do p4studio profile <some args> on the command line, it invokes a method in the file profile_command.py. If you do p4studio interactive, it iknvokes a method in this file being modified.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note this line early in this file:

from profile.profile_command import execute_plan, calculate_jobs_from_available_cpus_and_memory

There is an execute_plan function called from both profile_command.py and interactive_command.py that is doing most of the work of building and installing the software. Whoever wrote this code decided to define it in profile_command.py and call it from here, rather than putting execute_plan in some common 3rd file and calling it from both places. I'm not adding new code flows that didn't exist between files already here. I'm adding one more cross-file call where there was already a more significant one before.

Copy link
Contributor

@fruffy fruffy Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think we can leave it there then. It seems like checks like these should be set from both workflows, but it may be that the interactive command has less restrictions and doesn't even invoke system checks or common code.

@fruffy
Copy link
Contributor

fruffy commented Feb 19, 2025

One thing, for CI we should try to use max CPUs. Simply to speed things up. If it locks up we can revert that change.

@jafingerhut
Copy link
Contributor Author

One thing, for CI we should try to use max CPUs. Simply to speed things up. If it looks up we can revert that change.

I can create a separate PR for that. It is just adding command line option like '--jobs nproc' to the CI file that runs the p4studio profile ... command now.

@jafingerhut
Copy link
Contributor Author

One thing, for CI we should try to use max CPUs. Simply to speed things up. If it looks up we can revert that change.

I can create a separate PR for that. It is just adding command line option like '--jobs nproc' to the CI file that runs the p4studio profile ... command now.

Or I can just toss that one-line change into this PR. Give me a minute.

@fruffy
Copy link
Contributor

fruffy commented Feb 19, 2025

Yeah, I just noted that because I saw current CI uses 3 jobs instead of 4.

@fruffy
Copy link
Contributor

fruffy commented Feb 19, 2025

Feel free to do this in a separate PR. Either works for me.

Signed-off-by: Andy Fingerhut <andy_fingerhut@alum.wustl.edu>
@jafingerhut
Copy link
Contributor Author

Yeah, I just noted that because I saw current CI uses 3 jobs instead of 4.

If it is a 4vCPU 16GB RAM instance with no swap, it could end up significantly slower, or failing. At least that is what Vlad saw on an AWS instance he tried this on a while ago.

@jafingerhut
Copy link
Contributor Author

Yeah, I just noted that because I saw current CI uses 3 jobs instead of 4.

If it is a 4vCPU 16GB RAM instance with no swap, it could end up significantly slower, or failing. At least that is what Vlad saw on an AWS instance he tried this on a while ago.

The Tofino back end unified build uses somewhere around 5-6 GB of RAM on a very few individual processes, is why. Vlad suggested the idea of disabling unified build option -- more processes to run, but no individual huge-memory ones that cause you to want to reduce the number of parallel jobs.

@fruffy
Copy link
Contributor

fruffy commented Feb 19, 2025

Yeah, I just noted that because I saw current CI uses 3 jobs instead of 4.

If it is a 4vCPU 16GB RAM instance with no swap, it could end up significantly slower, or failing. At least that is what Vlad saw on an AWS instance he tried this on a while ago.

The Tofino back end unified build uses somewhere around 5-6 GB of RAM on a very few individual processes, is why. Vlad suggested the idea of disabling unified build option -- more processes to run, but no individual huge-memory ones that cause you to want to reduce the number of parallel jobs.

In my experience a unified build outperforms even multiple cores but we can tweak the "chunk size" of the build to reduce the memory usage.

@vgurevich
Copy link
Contributor

My suggestion is to have a swap file and then implement #72. Essentially, all we need to take the min of 15% RAM and 25% Swap and add that amount to the total size of RAM before calculating the amount of RAM per CPU. Having 15% of RAM in the swap does not usually impact the performance.

It also makes sense to have swap anyway, just for the sake of safety.

@jafingerhut
Copy link
Contributor Author

I think this PR is ready to merge, if you approve.

All of the suggestions for detecting swap space and using that to determine the number of parallel jobs to run has a separate issue created to track it already. I am not planning to implement that in this PR.

@jafingerhut jafingerhut merged commit e01a187 into p4lang:main Feb 20, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants