Don't wait so long for primary/standby info and changes · ManageIQ/manageiq@b1794eb

Commit

Don't wait so long for primary/standby info and changes

While monitoring for failover, we get the changes, such as "standby was just
added", and also if a failover and promotion occurs as much as five minutes
after it happened.  This is far too long.  With 2 minutes, we can still be
conservative in polling postgres but still not take that long.

We may find that we can drop this further to 90 seconds or even 60 seconds but
this change seems like an obvious improvement with very little downside such as
too many connections to postgres.

We establish 2 very quick connections for each iteration of this loop[1], once for
the logical replication connection and once for the rails connection[2].  This
means every 2 minutes, we make 2 connections.  This should not be a big concern
even if we have tens of appliances in a complex.

[1] https://github.com/ManageIQ/manageiq-postgres_ha_admin/blob/e7e87af12da82f86e967a22700505442f61bb7b1/lib/manageiq/postgres_ha_admin/failover_monitor.rb#L51-L58
[2] https://github.com/ManageIQ/manageiq/blob/17feafb6138749996fc0e529c42e1928abb18968/lib/evm_database.rb#L174-L175

Loading branch information

jrafanie committed Sep 14, 2022

1 parent 17feafb commit b1794eb

config/ha_admin.yml

-Original file line number
+Diff line change
@@ -1,4 +1,4 @@
     ---
     failover_attempts: 10
-    db_check_frequency: 300
+    db_check_frequency: 120
     failover_check_frequency: 60

0 comments on commit `b1794eb`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `b1794eb`

Commit

There are no files selected for viewing

0 comments on commit b1794eb

0 comments on commit `b1794eb`