HA Database Locking and Leadership Election

Database Locking

To track the status of individual nodes and determine which node should be active, MDM HA uses two tables - mdm_lock and mdm_ha_node - configured via MDM Server Application Properties (see MDM Server Application Properties). Table prefix can be set by ataccama.one.mdm.ha.db.table-prefix , database name defined by ataccama.one.mdm.ha.db.name. The following table schema is simplified for easier viewing:

mdm_lock - Stores unique identifier of lock owner and last active time. Period of heartbeat is defined by ataccama.one.mdm.ha.db.heartbeat.interval. A scheduler tries to remove expired entries in the interval set by ataccama.one.mdm.ha.db.heartbeat.expiration and it marks all entries older then ataccama.one.mdm.ha.db.heartbeat.ttl as expired.
mdm_ha_node - Stores cluster information about cluster nodes and their status and additional properties. Entries are updated by a similar logic as in mdm_lock, only driven by a different timer. Period of heartbeat is defined by ataccama.one.mdm.ha.db.heartbeat.interval. A scheduler tries to remove expired entries in the interval set by ataccama.one.mdm.ha.db.heartbeat.expiration and it marks all entries older then ataccama.one.mdm.ha.db.heartbeat.ttl as expired.

Leadership Election

The election workflow is called regularly based on ataccama.one.mdm.ha.db.election.interval with a delay defined by ataccama.one.mdm.ha.db.election.startup-timeout. At first, the LockLeaderElector::tryElect is called. It creates a table lock (with an id defined by ataccama.one.mdm.ha.db.lock-key) using the distributed lock library. If locking is successful, then the lifetime in mdm_ha_node is extended, or the sequence for switching to active mode is started. If locking is not successful, then the node is staying in passive mode, or the sequence to make the instance passive is started.

Switching to active: Initially, the node state is changed to SWITCHING_TO_ACTIVE. Before calling INmeReadWriteSwitcher::switchToReadWriteMode, there is a safety timeout defined by ataccama.one.mdm.ha.db.node.timeout.interval. After this, the HA part stays in SWITCHING_TO_ACTIVE state and it waits for the INmeReadWriteModeSwitchListener::afterSwitch event from the NME. If the event does not occur, the entire election process is repeated.
Switching to passive: The node state is changed to SWITCHING_TO_PASSIVE. Then the INmeReadWriteSwitcher::switchToReadWriteMode is called to switch the NME part to RO mode. At the same time, safety watchdog is started with a timeout defined by ataccama.one.mdm.ha.max-active-to-passive-switching-time. If NME does not call INmeReadWriteModeSwitchListener::afterSwitch with RO status in time, then a NME failure is indicated and System.exit() is called. In the case of successful change, the node status is updated to PASSIVE and the safety watchdog is cancelled.

Was this page useful?