-
Notifications
You must be signed in to change notification settings - Fork 209
High cpu or memory usage issues
jashaik edited this page Dec 7, 2021
·
9 revisions
- Identify which service is using more CPU/Memory. Go through the Access logs from nginx for indentifing the max time taking request.
status=200; req_time=346924; rdbms_time=267; rdbms_count=6; authz_time=91; authz_count=3; depsolver_time=146; depsolver_count=1
- Find the different response types from the access logs using the below command.
awk '{print $9}' /var/log/opscode/nginx/access.log | sort | uniq -c | sort -rn
sample output424886 200 106221 404 2 499
- The count of requests per second over the life of the log:
cat access.log | awk '{print $4}' | uniq -c
sample output280 rps
- For example considering depsolver is taking more time for response.
- Run the fprof for finding which function is taking more time in the erchef console.
redbug:start("chef_wm_depsolver:make_json_list", [{print_file, "/tmp/redbug.out"}, {file_size, 150}, {msgs,1}]).
- This captures one execution of
make_json_list
and prints the function args to file.Next, I had to edit the file (redbug.out) to make it a valid erlang term - so removing the function call, and basically just leaving behind the argument I cared about -- the single long list of cookbook versions. After that:{ok, [Content|_]} = file:consult("/tmp/redbug.out").` % Run fprof to profile the function in question. We'll use the argument data we just captured in redbug as the input. fprof:apply(chef_wm_depsolver, make_json_list, [Content, "https://[2600:1f1c:f24:ad01:b300:cfe6:5f15:b905]", 1], [{file, "/tmp/fprof.trace"}]).
- This handy little escript converts the trace to callgrind format:
https://github.com/isacssouza/erlgrind
- Setup chef-server & 4 load servers in the AWS console using below AMI's
chef-server-load-test-03122021(ami-0e2ad9ec5256c7b4c) Load generator backup load-gen-backup-03122021(ami-0e2ad9ec5256c7b4c)
- Upgrade chef-server to specific version by following https://docs.chef.io/server/upgrades/
- Create user's and organization in the chef-server using the below commands.
chef-server-ctl org-create test1 test1 > test1_validator.pem
chef-server-ctl user-create testuser1 test test [test@example.com](mailto:test@example.com) password > /home/ubuntu/testuser1.pem
chef-server-ctl org-user-add -a test1 testuser1
- Use specific branch of chef-load (https://github.com/chef/chef-load/tree/mp/working)
- Copy all the users/client keys from chef-server to chef-load for generating load
Copy the pem's to local and then to load servers
scp -i ~/.ssh/aws-shared-chef-infra-server.pem ubuntu@52.53.176.180:/home/ubuntu/*.pem .
scp -i ~/.ssh/aws-shared-chef-infra-server.pem *.pem ubuntu@54.241.71.101:/home/ubuntu/testing-12.3.1
scp -i ~/.ssh/aws-shared-chef-infra-server.pem *.pem ubuntu@184.169.252.217:/home/ubuntu/testing-12.3.1
scp -i ~/.ssh/aws-shared-chef-infra-server.pem *.pem ubuntu@54.177.170.21:/home/ubuntu/testing-12.3.1
scp -i ~/.ssh/aws-shared-chef-infra-server.pem *.pem ubuntu@3.101.121.52:/home/ubuntu/testing-12.3.1
- update the chef-load.toml with chef server & other details.
log_file = "chef-load.log"
chef_server_url = "https://[2600:1f1c:f24:ad01:b300:cfe6:5f15:b905]/organizations/test1/"
client_key = "./testuser1.pem"
client_name = "testuser1"
ohai_json_file = "node.json"
chef_environment = "_default"
# assume four chef-load instances for a total of 7000 nodes converging every 15 minutes.
# override on CLI with -n
num_nodes = 1750
# override on CLI with -i
interval = 15
# override on CLI with -a
num_actions = 0 # For data collector, which is disabled.
# override on CLI with -p
node_name_prefix = "load4"
# In what frequency (0.0-1.0) of all CCR runs does a node/client get replaced after initial ramp-up of all nodes & clients
# causing a new node/client to get created on the server.
# override on CLI with -R
node_replacement_rate = 0
# each node's run list is chosen randomly at the time of the simulated run from this list.
# In this case, we repeated twshared_tier to weight for higher frequency of that run list.
# Future iterations might allow you to specify the weighting directly.
# feel free to add to these these to simulate different node types . I'll be updating
# ours shortly to incorproate the new roles/cookbooks recently provided.
run_lists = [
[ "role[fb_base]", "role[fb_middleware]", "role[chef_tier]" ],
[ "role[fb_base]", "role[fb_middleware]", "role[rsw_tier]" ],
[ "role[fb_base]", "role[fb_middleware]", "role[rtsw_tier]" ],
[ "role[fb_base]", "role[biz_tier]" ],
[ "role[fb_base]", "role[fboss_tier]" ],
[ "role[fb_base]", "role[sparefullweb_tier]"],
[ "role[fb_base]", "role[perforce_tier]" ],
[ "role[fb_base]", "role[udb_tier]" ],
[ "role[fb_base]", "role[eb_tier]" ],
[ "role[fb_base]", "role[dns_tier]" ],
[ "role[fb_base]", "role[orderdb_tier]" ],
]
# never, first, always
download_cookbooks = "always"
# On average download about 1% of the cookbooks resolved from the runlist
# simulating the usual case where only some cookbooks are updated so a
# client seldom needs to download all of them. If download_cookbooks == "first"
# then this is ignored and all cookbooks are downloaded.
# Override on CLI with -C
download_cookbooks_scale_factor = 0.01
# Sleep this long (seconds) during the client run after retreiving cookbooks and before
# saving the ndoe, simulating the time client converge activity would take.
sleep_duration = 0
# Save node for 80% of runs, based on initial rough parsing of healthy logs
node_save_frequency = 0.8
# api_get_requests is an optional list of API GET requests as URLs that are made during the chef-client run.
# eg "search/node?q=*%253A*&sort=X_CHEF_id_CHEF_X%20asc&start=0"
#
api_get_requests = [ ]
# chef_version sets the value of the X-Chef-Version HTTP header in API requests sent to the Chef Server.
# It has no effect on the behavior of the run.
chef_version = "13.2.20"
# use client-side key creation instead of server side, which is the default
# since (I think) 12+
chef_server_creates_client_key = false
# Send data to the Chef server's Reporting service
enable_reporting = false
# Generate Random Data. Not used outside of data sent to data collector
random_data = true
# Generate Liveness Agent Data
liveness_agent = false
- Start the load using the below command. For more information please read the chef-load readme file(https://github.com/chef/chef-load#readme)
./chef-load -c chef-load.toml -i 1 -a 0 -n 10 -p load1a -R .01 start