Skip to content

Commit

Permalink
Don't wait so long for primary/standby info and changes
Browse files Browse the repository at this point in the history
While monitoring for failover, we get the changes, such as "standby was just
added", and also if a failover and promotion occurs as much as five minutes
after it happened.  This is far too long.  With 2 minutes, we can still be
conservative in polling postgres but still not take that long.

We may find that we can drop this further to 90 seconds or even 60 seconds but
this change seems like an obvious improvement with very little downside such as
too many connections to postgres.

We establish 2 very quick connections for each iteration of this loop[1], once for
the logical replication connection and once for the rails connection[2].  This
means every 2 minutes, we make 2 connections.  This should not be a big concern
even if we have tens of appliances in a complex.

[1] https://github.com/ManageIQ/manageiq-postgres_ha_admin/blob/e7e87af12da82f86e967a22700505442f61bb7b1/lib/manageiq/postgres_ha_admin/failover_monitor.rb#L51-L58
[2] https://github.com/ManageIQ/manageiq/blob/17feafb6138749996fc0e529c42e1928abb18968/lib/evm_database.rb#L174-L175
  • Loading branch information
jrafanie committed Sep 14, 2022
1 parent 17feafb commit b1794eb
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion config/ha_admin.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
---
failover_attempts: 10
db_check_frequency: 300
db_check_frequency: 120
failover_check_frequency: 60

0 comments on commit b1794eb

Please sign in to comment.