From b1794eb2b6696150ac5207ae92c90bda7b3fcb2f Mon Sep 17 00:00:00 2001 From: Joe Rafaniello Date: Wed, 14 Sep 2022 15:29:51 -0400 Subject: [PATCH] Don't wait so long for primary/standby info and changes While monitoring for failover, we get the changes, such as "standby was just added", and also if a failover and promotion occurs as much as five minutes after it happened. This is far too long. With 2 minutes, we can still be conservative in polling postgres but still not take that long. We may find that we can drop this further to 90 seconds or even 60 seconds but this change seems like an obvious improvement with very little downside such as too many connections to postgres. We establish 2 very quick connections for each iteration of this loop[1], once for the logical replication connection and once for the rails connection[2]. This means every 2 minutes, we make 2 connections. This should not be a big concern even if we have tens of appliances in a complex. [1] https://github.com/ManageIQ/manageiq-postgres_ha_admin/blob/e7e87af12da82f86e967a22700505442f61bb7b1/lib/manageiq/postgres_ha_admin/failover_monitor.rb#L51-L58 [2] https://github.com/ManageIQ/manageiq/blob/17feafb6138749996fc0e529c42e1928abb18968/lib/evm_database.rb#L174-L175 --- config/ha_admin.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/config/ha_admin.yml b/config/ha_admin.yml index ea1a0287306..5934d2cfe08 100644 --- a/config/ha_admin.yml +++ b/config/ha_admin.yml @@ -1,4 +1,4 @@ --- failover_attempts: 10 -db_check_frequency: 300 +db_check_frequency: 120 failover_check_frequency: 60