MailStore Gateway Redundancy - Failover with graceful recovery
Overview of the options and risks
MailStore Gateway does not offer any high availability capabilities, but there are a few options to provide redundancy depending on your requirements and technical abilities.
These are unsupported, please do not contact MailStore Support with problems related to these configurations.
Consider if you need redundancy at all. A short term outage of a gateway is not an emergency and will not cause message loss. Journal reports are e-mails like any other and the sending server will queue them for re-delivery for period of time.
These are the currently available options:
Failover
- Quick, easy to understand.
- Requires manual intervention to switch between servers.
- Involves short downtimes to switch between servers.
Failover, with recovery (this article)
- Requires manual intervention to switch between servers.
- Does not require any downtime to switch between servers.
- More technical, requires an understanding of DNS.
Active-Active
- The most complex configuration, and should only be used by experienced administrators.
- Highest risk of message loss due to configuration errors.
- No downtime involved as both servers are running at all times.
- Requires additional configuration within each MailStore instance.
This article assumes a basic understanding of how to use MailStore Gateway, networking, DNS, and Microsoft 365 (or your e-mail platform)
Configuration
-
DNS
Assuming you have two MailStore Gateway servers,
192.0.2.1and192.0.2.55, create these DNS records:Name Type Record Data msgw1.mailarchiveco.example.A 192.0.2.1msgw2.mailarchiveco.example.A 192.0.2.55msgw.mailarchiveco.example.CNAME msgw1.mailarchiveco.example.To be clear, this configuration has two A records, one pointing to each server. It has a CNAME record pointing to the primary server.
All configuration in MailStore, and on mail servers must only use
msgw.mailarchiveco.example, do not referencemsgw1.mailarchiveco.exampleormsgw2.mailarchiveco.exampleanywhere except these DNS records. -
Configure MailStore Gateway using a single hostname for all clients
If your company’s domain is mailarchiveco.example then your gateway’s domain could be
msgw.mailarchiveco.example.If you have multiple MailStore instances then each would have a separate mailbox, e.g. .
mbx-123...@msgw.mailarchiveco.exampleandmbx-456...@msg.mailarchiveco.example -
Backup the configuration
Once the primary server is configured, take a backup of your MailStore Gateway configuration as described in the Backup and Restore article from the documentation.
-
Keep the configuration backup up to date
Be sure to update the configuration backup whenever you make any configuration changes to the gateway. This applies to new mailboxes as well as password changes.
Warning: MailStore Gateway stores all messages encrypted at rest, relying on the mailbox password to decrypt messages. If either server has messages in the mailbox when you do a password change you’ll likely lose messages. I would recommend not ever changing the password of the gateway mailbox.
-
You can optionally even have the gateway installed and ready on the other server, just be sure to upgrade both servers together.
-
I would recommend not keeping the service running on the second server, failovers should be done intentionally.
Technically you can failover automatically (for example, with a monitoring system updating a DNS record or NAT rule) but a human must review the mailbox status on both servers to ensure that no messages were left behind.
Let’s Encrypt will only work on one server at a time and will fail on the other server.
Failover process
In the event you have a failure that you cannot resolve within 24-hours:
-
Switch the CNAME record for
msgw.mailarchiveco.examplefrommsgw1.mailarchiveco.exampletomsgw2.mailarchiveco.example.This will cause both inbound messages and the archiving profile to use the secondary server.
-
Start the MailStore Gateway on the secondary server.
-
Verify the certificate is valid and properly configured.
If you use Let’s Encrypt then the certificate may be out of date as the secondary server will not be able to renew the certificate. You can manually renew the certificate on the secondary server, but the gateway should update the certificate on start-up.
In failover mode your DNS records will look like this:
| Name | Type | Record Data |
|---|---|---|
msgw1.mailarchiveco.example. |
A | 192.0.2.1 |
msgw2.mailarchiveco.example. |
A | 192.0.2.55 |
msgw.mailarchiveco.example. |
CNAME | msgw2.mailarchiveco.example. |
Switching back to the primary server
Recovery mode
Once the primary server is back up and running and you are ready to switch back, there is a new intermediary step involved which is called “recovery mode”.
-
Create a new MX record for
msgw2.mailarchiveco.examplepointing tomsgw1.mailarchiveco.example.Your DNS records will now look like this:
Name Type Record Data msgw1.mailarchiveco.example.A 192.0.2.1msgw2.mailarchiveco.example.A 192.0.2.55msgw2.mailarchiveco.example.MX 0 msgw1.mailarchiveco.example.msgw.mailarchiveco.example.CNAME msgw2.mailarchiveco.example.In this state, inbound messages will be delivered to
msgw1.mailarchiveco.exampledue to the MX record, while MailStore instances will use the CNAME record to connect tomsgw2.mailarchiveco.example.Remember to consider DNS TTLs, if you have a 24-hour TTL then you will have to wait 24-hours before the change takes effect globally.
-
Wait for mailboxes on the secondary server to be empty.
You can check the status of the mailboxes on the secondary server by logging into the MailStore Gateway web interface and checking the status of each mailbox. You can also check the mailbox directories on the secondary server to see if they are empty.
-
Stop the service on the secondary MailStore Gateway.
It is important to stop the service to prevent the secondary MailStore Gateway from receiving messages, otherwise you may have to repeat the process.
-
Revert your DNS to the original configuration.
Change the CNAME record back to
msgw1.mailarchiveco.example, and remove the MX record completely.
Failover testing
You can test a failover any time, or switch permanently to the secondary server if desired, just be sure to not leave any messages behind on the non-active server.
Notes
This process is more complex and potentially less intuitive, and you must have a solid understanding of what is happening. The goal is that in “recovery mode” both server are running, inbound messages are being routed to the primary server, but MailStore is still pulling messages from the secondary server.
Ideally all MailStore instances will check for messages from the gateway frequently, so it should only take a few minutes for the secondary server to be empty. During this time inbound messages are being queued on msgw1.mailarchiveco.example and will be archived once you return to normal mode.
Remember that messages can sit in the gateway queued for delivery without fear of being lost, so you can take your time with this process.