Setting Up Sophos Firewall High Availability (HA)
High Availability, or HA, connects two Sophos Firewalls into a cluster. The aim is not to make a single firewall indestructible, but to manage the failure of a device, individual critical ports, or planned maintenance in a controlled manner.
An HA environment is still no substitute for good network design, clean backups, and documented maintenance processes. A poorly planned cluster can be more difficult to operate in the event of a failure than a single firewall. Therefore, this article not only explains where to enable HA but also how to sensibly plan, set up, operate, and check a Sophos Firewall HA cluster in the event of a failure.
Table of Contents
- Summary
- Video Guides
- What High Availability on the Sophos Firewall Means
- Active-Passive or Active-Active
- Roles, Status, and Architecture
- Prerequisites
- Planning and Design
- Preparing the Setup
- Setting Up Active-Passive with QuickHA
- Setting Up Active-Active with QuickHA
- Manual HA Configuration
- Validating the Cluster
- Operation and Maintenance
- Firmware Updates and Backups
- Troubleshooting
- Best Practice Checklists
Summary
In most production environments, Active-Passive is the better HA option. One firewall processes all traffic, while the second firewall is on standby and takes over in case of failure or maintenance. The design is simpler, licensing is cheaper, and behaviour in case of failure is easier to understand.
Active-Active is only worthwhile if you consciously accept the limits and restrictions. It is not classic symmetrical load balancing, where both firewalls stand equally at all points in the network. The Primary Firewall continues to receive traffic and distributes certain connections to the Auxiliary Firewall. Not every service and not every type of traffic is distributed.
| Question | Recommendation |
|---|---|
| Maximum stability and simple operation | Active-Passive |
| Use second firewall without separate protection license | Active-Passive |
| Need more throughput for certain TCP connections | Consider Active-Active |
| Many VPN, Proxy, RED, NDR, or special cases | Prefer Active-Passive |
| Small or medium environment without clear performance problem | Active-Passive |
| Clear performance requirement and appropriate licensing for both appliances | Active-Active after testing |
Video Guides
The following Sophos Techvids show HA on Sophos Firewall visually. The videos do not replace proper planning, but they are useful for understanding QuickHA, roles, status, and the basic behaviour of an HA cluster.
What High Availability on the Sophos Firewall Means
A Sophos Firewall HA cluster consists of two firewalls. The devices exchange heartbeats, device status, connection information, and configuration data via a dedicated HA link. The configuration is synchronised from the Primary Firewall to the Auxiliary Firewall.
HA protects against typical failures:
- Failure of the Primary Firewall
- Power or hardware failure
- Failure of a monitored interface
- Software or service problem that renders a device inoperable
- Planned firmware updates
- Planned role change during maintenance
However, HA does not solve every problem:
- An incorrect firewall rule set remains incorrect even in the cluster.
- A common switch failure can affect both firewalls simultaneously.
- A defective VLAN design or faulty routing concept is not automatically corrected.
- Logs and reports are not fully synchronised between both firewalls.
- A backup remains mandatory.
Anyone planning HA should have clarified the basics for zones, interfaces, VLANs, LAGs, and bridges. The guide Planning and Configuring Sophos Firewall Zones and Interfaces fits here.
Active-Passive or Active-Active
Active-Passive
In Active-Passive, one firewall processes all productive traffic. The second firewall is passive and only takes over when the active firewall fails or a failover is manually or maintenance-triggered.
Typical characteristics:
- Primary Firewall processes the traffic.
- Auxiliary Firewall remains on standby.
- Sessions are synchronised as far as the respective service supports it.
- Only the licensed device requires protection subscriptions for hardware appliances.
- The Auxiliary Firewall takes over with the same virtual MAC address in case of failover.
- Network devices usually do not need to relearn their neighbours.
Active-Passive is usually the best choice for classic corporate environments, branches, data centres, and environments where stability is more important than potential performance gain.
Active-Active
In Active-Active, both firewalls process traffic. Nevertheless, the architecture remains asymmetrical: The Primary Firewall receives the traffic and decides whether to process a connection itself or forward it to the Auxiliary Firewall.
Sophos uses the source IP address for distribution, among other things. TCP connections from even source IP addresses are typically processed on the Primary, while odd source IP addresses can be forwarded to the Auxiliary. Non-TCP traffic and certain services are not evenly distributed.
Typical characteristics:
- Both firewalls can process traffic.
- The Primary Firewall remains the central entry point.
- Both firewalls require appropriate licenses.
- Not all services are distributed.
- Logs and reports are generated on the device that processes the respective traffic.
- Troubleshooting becomes more complex because connections can land on both nodes.
Active-Active is useful if there is a clear performance goal and it has been tested beforehand whether the relevant traffic is actually distributed. For pure high availability, Active-Passive is usually cleaner.
Roles, Status, and Architecture
Roles in the HA Cluster
| Term | Meaning |
|---|---|
| Primary | Device that manages the central cluster configuration. In both HA modes, the Primary receives the traffic. |
| Auxiliary | Second device in the cluster. It synchronises the configuration from the Primary and takes over if necessary. |
| Initial primary | The device that was started as Primary during setup. In Active-Passive, it is usually also the licensed device. |
| Preferred primary | Preferred device that should become Primary again after a failover as soon as it is stably available. |
Status Values in Operation
| Status | Meaning |
|---|---|
| Active | The device processes traffic. |
| Passive | The device is ready but does not process productive traffic in Active-Passive. |
| Standalone | The device does not see the peer or HA is not fully active. Both devices can become standalone in case of HA link problems. |
| Faulty | The device is not healthy enough for the cluster to participate normally. |
Virtual MAC Address
The Sophos Firewall uses virtual MAC addresses for productive interfaces in the HA cluster. Only the Primary responds to ARP requests for the cluster. In case of a failover, the Auxiliary takes over this virtual MAC address. This keeps the reachability for switches, routers, and clients more stable because the IP and MAC assignment does not fundamentally change.
The Cluster ID is important. This ID is used for the virtual MAC address. If multiple HA clusters are operated in the same Layer 2 environment, each cluster must have a unique Cluster ID. Otherwise, MAC conflicts can occur.
What is Synchronised
| Area | Synchronisation |
|---|---|
| Firewall rules, policies, objects, routing, CLI configuration | Synchronised from Primary to Auxiliary. |
| Active sessions | Synchronised depending on protocol and service. |
| Secure Storage Master Key and WebAdmin credentials | Synchronised. |
| Dedicated HA link | Not synchronised as a normal productive interface configuration. |
| Peer Admin Port | Treated separately and not synchronised like a normal interface. |
| Logs and reports | Not synchronised between devices. |
Failover Behaviour
A failover can be triggered by various events:
- No more heartbeats over the HA link
- Failure of a monitored port
- Power failure
- Hardware failure
- Software or service problem
- Planned role change
- Firmware update
Heartbeats run over the dedicated HA link. Very short intervals are used by default. If several heartbeats are missing in succession, the peer is considered unreachable. The firewall then checks the status and performs the role change.
During a failover, many connections are continued or quickly re-established. However, a failover is not completely transparent for every application. Especially stateful TCP connections, web sessions, proxy connections, or certain VPN scenarios may briefly disconnect or need to be re-established.
Load Balancing in Active-Active
Active-Active does not mean that both firewalls stand like two equal routers in the network. The Primary Firewall remains the central entry point and distributes certain connections to the Auxiliary Firewall.
Important points:
- Load balancing is only available in Active-Active.
- The method cannot be freely adjusted.
- External load balancers in front of an HA cluster are not used by Sophos as a standard design for this HA logic.
- Not all traffic is distributed.
- Non-TCP traffic, SD-RED, tunnelled traffic, and some Layer 7 functions may be treated differently.
- Troubleshooting must include both nodes.
In practice, Active-Active should only be used if it is clear beforehand which traffic is causing the bottleneck and whether exactly this traffic is distributed.
Supported and Restricted Services
Sophos HA supports most firewall services. However, some services have special features.
| Service or Function | Note |
|---|---|
| Firewall rules and NAT | Synchronised. In Active-Active, you need to know which node processes a connection. |
| VPN | Many VPN scenarios work with HA, but not every session type fails over without interruption. IPsec can take over stateless UDP/ICMP better than stateful TCP. |
| Web Protection | Works in the cluster. In Active-Active, alerts can come from both nodes. |
| Email Protection | Quarantine and release can be node-specific because each device stores its own data for processed mail traffic. |
| Synchronized Application Control | Not suitable for Active-Active if the function is not supported in the deployed SFOS version. |
| NDR Essentials | Plan only with Active-Passive for HA environments. |
| sFlow | Runs only on the Primary in HA environments. |
| Reports | Local reports are generated per device. Merged reports are more sensible via Sophos Central Firewall Reporting. |
| Cellular WAN | Must be disabled for HA. |
| XGS Wi-Fi Models | HA is not supported on XGS Wi-Fi models. |
If reporting or log retention is important, you should plan early whether an external syslog server or Sophos Central Firewall Reporting will be used. More on this can be found in Enable Central Firewall Reporting.
Prerequisites
Before implementation, you should carefully check the HA prerequisites against your own environment. Particularly model equality, firmware status, interfaces, Cellular WAN, and virtual platforms are points where small deviations can have a big impact later.
Hardware and Model Compatibility
| Prerequisite | Requirement |
|---|---|
| Appliance Model | Both firewalls must be the same XGS model, for example, XGS 2100 with XGS 2100. |
| Hardware Revision | Different hardware revisions are possible with the same XGS model. |
| XGS Wi-Fi Models | Not supported. Examples are XGS 126w or XGS 136w. |
| Flexi Port Modules | If expansion modules are used, the number of Flexi Ports must be the same on both devices. |
| Firmware | Both devices must use the same SFOS version, including maintenance release and build. |
| Hardware plus Virtual Appliance | Not possible as an HA pair. |
Virtual and Software Appliances
Virtual or software appliances must also match each other very closely.
| Prerequisite | Requirement |
|---|---|
| Platform | Same appliance type and SFOS platform. |
| Hypervisor | Same hypervisor type. |
| Resources | Same CPU cores, comparable resources, and same number of network interfaces. |
| Firmware | Same SFOS version, including build. |
| MAC Addresses | In virtual environments, the option for hypervisor-assigned MAC addresses may be relevant so that promiscuous mode is not required. A change, however, causes downtime. |
Cloud Deployments
In cloud environments, additional platform requirements apply. Routing, virtual interfaces, IP addresses, security groups, UDRs, or cloud-specific failover mechanisms must fit the respective cloud design. The normal appliance HA approach cannot be transferred unchecked to Azure, AWS, or other cloud environments.
If a Sophos Firewall is operated in the cloud, you should check the current Sophos documentation for the respective platform and cloud network architecture before planning HA.
Licensing and Registration
HA licensing varies depending on the platform and HA mode. Three questions are crucial: Is it hardware or virtual/software? Is Active-Passive or Active-Active used? And which device is the Initial Primary, i.e., the device that holds the license for the cluster? These points should be reconciled with license status, serial numbers, and target mode before implementation.
| Scenario | License Requirement |
|---|---|
| Base Firewall | A Base Firewall license is required for HA. Hardware appliances have this license by default. Virtual/software appliances must be appropriately licensed. |
| Active-Passive Hardware | Only the Initial Primary device needs the productive subscriptions. The Auxiliary Firewall receives a copy of the subscriptions and can process traffic after a failover. |
| Active-Active Hardware | Both firewalls need their own appropriate licenses. The license types must match, expiration dates may differ. |
| Active-Passive Virtual/Software | Only the Primary needs the necessary licenses, including Base Firewall. |
| Active-Active Virtual/Software | Both devices need their own Base Firewall license and appropriate additional protection licenses. |
| Hardware Registration | Both hardware devices must be known in Sophos Central or the Sophos Licensing Portal and be able to synchronise their licenses. |
| Virtual/Software Registration Active-Passive | According to Sophos, only the Primary is claimed in Active-Passive Virtual/Software. |
| Sophos Central Management | License synchronisation and claiming do not automatically mean that the firewalls are also managed via Sophos Central Firewall Management. A suitable additional subscription is required for this. |
| RMA and Support | For hardware replacement and advance replacement, the support status is important. In Active-Passive hardware, Sophos names Enhanced Plus Support on the Primary device as a relevant prerequisite for advance hardware replacement. In Active-Active, the support status must match on both devices. |
Important: In Active-Passive, the Initial Primary is particularly important because this device holds the license for the cluster. In the HA view, the corresponding device is indicated as holding the license for the cluster. In case of doubt, the device should be clearly documented in the operations manual.
If licenses in Active-Active do not match, load balancing initially stops according to Sophos. If the discrepancy persists, HA can be deactivated. During an initial configuration, Active-Active HA is not cleanly activated with non-matching licenses. Therefore, a license check is mandatory for operational routine.
In virtual/software appliances, the Base Firewall license is particularly critical. If it is deactivated or the Initial Primary cannot synchronise the license for a long time, HA can be deactivated, and other protection functions become inactive. Relevant here is synchronisation with the license server at least once within 90 days. For productive environments, this means: The HA cluster must not only function technically but also regularly reach the license server.
For operation, you should at least document:
- which device is the Initial Primary
- which serial numbers or appliance IDs belong to the cluster
- which licenses are active on which device
- when the licenses were last synchronised
- which support level is available for RMA or advance replacement
- who approves license changes, renewals, and RMA processes
Network Prerequisites
| Area | Recommendation |
|---|---|
| HA Link | Dedicated connection between both firewalls, ideally directly with an Ethernet cable. |
| HA Link Zone | DMZ zone with SSH enabled for the zone. |
| HA Link IP Addresses | Static IP addresses in the same subnet but different addresses. |
| HA Link Quality | High bandwidth, low latency, no packet loss. |
| Switches | Enable RSTP on switches connected to firewall ports. |
| Monitored Ports | Only monitor ports that are truly connected and critical. |
| Cellular WAN | Disable for HA. |
| Admin Port | Plan separately so that the Auxiliary Firewall remains accessible. |
The HA link does not process normal client or server traffic. It is only relevant for heartbeats, status, session synchronisation, configuration synchronisation, and Active-Active distribution. Nevertheless, it is extremely critical. If the HA link fails, both firewalls can believe they are Primary. This split-brain scenario must be avoided.
Ports and Interfaces
| Port Type | Purpose | Recommendation |
|---|---|---|
| Dedicated HA link | Heartbeat, status, configuration, and session synchronisation | Connect directly or via a very reliable switch. Do not use for productive traffic. |
| Monitored ports | Monitoring critical productive links | Monitor WAN, important DMZ, or core uplinks. Do not select unused ports. |
| Peer Admin Port | Access to Auxiliary WebAdmin | Plan and document separately. Client must be in the appropriate subnet. |
| Production Interfaces | LAN, WAN, DMZ, VLANs, LAGs | Connect identically on both firewalls and design equivalently. |
Physical interfaces, VLANs, or LAGs are possible as a dedicated HA link. Bridge interfaces and alias IP addresses cannot be used as a dedicated HA link. If a LAG is used as an HA link, the parent interfaces must be set up the same on both appliances.
Planning and Design
Recommended Topology for Active-Passive
A typical Active-Passive design looks like this:
- both firewalls are in the same rack or in racks close to each other
- all productive interfaces are connected the same
- WAN goes to redundant switches or well-documented provider handovers
- LAN/DMZ goes to redundant switch structures
- the HA link is directly connected
- a separate management or admin access is provided
- WAN and core/DMZ ports are defined as monitored ports
The most important point: The second firewall must be able to take over the same network position in case of failure. Therefore, VLANs, trunks, LAGs, switch ports, and provider connections must be consistently planned.
Recommended Topology for Active-Active
Active-Active requires the same physical cleanliness as Active-Passive but also a clear expectation of the distributable traffic.
Before Active-Active, you should answer:
- Which traffic is the bottleneck?
- Is exactly this traffic distributed in Active-Active?
- Are both appliances fully and appropriately licensed?
- Are logs, reports, and troubleshooting processes prepared for both nodes?
- Are there services like VPN, NDR, Synchronized Application Control, or proxy scenarios that speak against Active-Active?
Without a clear answer, Active-Passive is the better choice.
LAN, WAN, DMZ, and HA Link
| Area | Design Note |
|---|---|
| LAN | Cleanly zone internal networks. Plan VLANs not only technically but also securely. |
| WAN | Consider provider connection, failover gateways, and SD-WAN separately from HA. HA does not replace a WAN redundancy concept. |
| DMZ | Keep server networks and external publications separate from client networks. |
| HA Link | Own connection, statically addressed, DMZ zone, SSH allowed for DMZ. Do not conduct user or server communication over it. |
| Management | Restrict admin access, Device Access, and ACLs consciously. |
To secure local firewall services, Securing Sophos Firewall Access: Configuring Device Access Correctly fits.
Switch Requirements
Switches are often the invisible risk factor in HA. An HA cluster only works as well as the Layer 2 environment underneath.
Recommendations:
- Enable RSTP on involved switches.
- Configure trunks identically on both firewalls.
- Consistently allow VLANs on all involved ports.
- Build LAGs identically.
- Do not select half-finished or unused monitored ports.
- Connect the HA link as directly as possible.
- If the HA link runs over switches, the path must be stable, low-latency, and packet-loss-free.
VLANs, LAGs, Bridges, and RED
VLANs and LAGs are possible in HA, but they increase planning requirements. The HA cluster should not be the first place to try out a VLAN or LAG design.
| Technology | HA Note |
|---|---|
| VLAN | Plan identically on both devices. Consider parent interface. VLAN as HA link is possible but only with very clean design. |
| LAG | Useful for core uplinks or redundant switch connection. Parent interfaces must be consistent. |
| Bridge | HA in bridge mode is supported but makes troubleshooting more complex. For new designs, gateway mode is usually more transparent. |
| RED | RED and remote scenarios can affect HA, especially when networks are stretched across locations. Check performance, latency, and error patterns beforehand. |
| VPN | VPN failover works differently depending on protocol and session type. Test IPsec and remote access separately. |
Avoiding Split-Brain
Split-brain means that both firewalls believe they are active or standalone responsible. This can happen if the HA link fails, but the devices remain in the production network.
Measures:
- Do not route the HA link over insecure or unstable switch paths.
- Prefer direct connection for HA link.
- Select monitored ports consciously.
- Configure RSTP cleanly.
- No asymmetric cabling.
- Check HA status after each switch or VLAN change.
- Only adjust keepalive values with reason.
Preparing the Setup
Before setting up HA, do not improvise in the production network. This preparation saves a lot of time later.
Preparing Both Firewalls
- Bring both firewalls to the same SFOS version, including build.
- Check license and registration status.
- Disable Cellular WAN.
- Ensure models are compatible.
- Check Flexi Port equipment.
- Document interfaces and switch ports.
- Determine HA link port.
- Connect HA link directly or check switch path.
- Plan DMZ zone and SSH access for the HA link.
- Document admin access to both devices.
- Create a backup of the existing configuration.
Backups are not optional with HA. A backup should be available before setup, before firmware updates, and before major interface changes. The basics are in Creating or Restoring a Sophos Firewall Backup.
QuickHA or Manual Configuration
| Method | When Useful |
|---|---|
| QuickHA | Standard case. Fast, robust, and sufficient for most Active-Passive and Active-Active setups. |
| Manual Configuration | When admin ports, HA link, cluster ID, peer addresses, and detail values need to be specified consciously. |
QuickHA detects the peer via the selected HA link interfaces. Once the devices have found each other, the cluster is built. The passphrase is used to establish the encrypted SSH tunnel and is not treated as a reusable password afterward. If a device is replaced, HA must be deactivated and reconfigured.
The steps described here are based on the official Sophos HA configuration but are deliberately formulated as a practical admin checklist. For productive changes, you should not just click through the wizard but document roles, HA link, admin access, backup, license status, and rollback beforehand.
Setting Up Active-Passive with QuickHA
1. Prepare Primary Firewall
- Log in to the future Primary Firewall in WebAdmin.
- Go to System services > High availability.
- Select Primary (active-passive) mode.
- Use QuickHA.
- Optionally assign a node name, e.g.,
FW01. - Set an HA passphrase.
- Securely store the passphrase as it will be needed on the Auxiliary Firewall.
- Select the dedicated HA link.
- Start Initiate HA.
Notes:
- The HA link must not have productive dependencies.
- If QuickHA uses an unbound interface, Sophos assigns the DMZ zone to this interface and sets HA-specific settings.
- The HA link uses static IP addresses in the link-local range if QuickHA uses the default values.
- SSH must be allowed for the HA link zone because the HA tunnel is established over it.
2. Prepare Auxiliary Firewall
- Log in to the future Auxiliary Firewall.
- Go to System services > High availability.
- Select Auxiliary role.
- Use QuickHA.
- Optionally assign a node name, e.g.,
FW02. - Enter the same HA passphrase.
- Select the same HA link port as on the Primary side.
- Start Initiate HA.
After setup, the Primary Firewall synchronises the configuration to the Auxiliary Firewall. Many local settings on the Auxiliary Firewall are overwritten. Therefore, the Auxiliary should not be configured as an independent productive firewall before HA setup.
3. Check Advanced Settings
After building the cluster, do not just click away but check the following points:
- HA status of both nodes
- Role and status at the top right in WebAdmin
- Dedicated HA link
- Monitored ports
- Peer Admin Port
- Preferred primary
- Keepalive interval and attempts
- License holder in Active-Passive
- Sophos Central registration, if used
4. Set Monitored Ports
Monitored ports determine whether an interface failure triggers a failover. Typical candidates:
- WAN uplink
- Core LAN uplink
- Important DMZ or server uplinks
Do not monitor ports that are sometimes deliberately offline, not connected, or only used for optional scenarios. A wrongly set monitored port is a common cause of unexpected failovers or a non-starting cluster.
Setting Up Active-Active with QuickHA
Active-Active is set up similarly but with a different goal. Both firewalls must be appropriately licensed beforehand.
1. Check in Advance
- Both firewalls have appropriate licenses.
- The license types match.
- Both devices are registered.
- Both devices run on the same SFOS version, including build.
- The relevant traffic actually benefits from Active-Active.
- Services with Active-Active restrictions have been checked.
- Monitoring and troubleshooting are designed for both nodes.
2. Configure Primary Firewall
- Go to System services > High availability.
- Select Primary (active-active).
- Use QuickHA.
- Assign a node name.
- Set HA passphrase.
- Select dedicated HA link.
- Start HA.
3. Configure Auxiliary Firewall
- On the second device, go to System services > High availability.
- Select Auxiliary.
- Use QuickHA.
- Enter the same passphrase.
- Select the same HA link port.
- Start HA.
4. Test After Setup
In Active-Active, more needs to be tested than just the HA status:
- Are connections distributed across both nodes?
- Are logs visible on both devices?
- Do VPN connections work after a role change?
- Do Web Protection, IPS, Application Control, and relevant security features work?
- Are there applications that stand out due to asymmetric behaviour?
- Are reports and alerts generated as expected?
Manual HA Configuration
Manual HA configuration is useful when the automatic QuickHA logic does not provide enough control.
Typical reasons:
- Fixed HA link IP addresses should be used
- Peer Admin Port must be precisely defined
- Cluster ID should be consciously set
- Multiple HA clusters exist in the same Layer 2 environment
- Virtual appliances should be operated with specific MAC options
- A very controlled rollout is necessary
In manual setup, the Auxiliary Firewall is prepared first, and then the Primary Firewall is configured. The Primary must know the HA link IP address of the peer.
Important fields:
| Field | Meaning |
|---|---|
| Operation mode | Active-Passive or Active-Active. |
| Initial device role | Primary or Auxiliary. |
| Dedicated HA link | Interface for heartbeat, status, and synchronisation. |
| Peer HA link IPv4 | Address of the HA link interface on the second device. |
| Cluster ID | Basis for virtual MAC addresses. Set uniquely for multiple clusters. |
| Monitored ports | Critical ports whose failure should trigger a failover. |
| Peer administration settings | Access to the Auxiliary Firewall. |
| Preferred primary | Device that should become Primary again after failover. |
| Keepalive interval / Attempts | Sensitivity of HA detection. Only adjust consciously. |
Validating the Cluster
After setup, the HA cluster should be systematically checked.
WebAdmin Check
- Log in to the Primary Firewall.
- Check the HA status at the top right.
- Go to System services > High availability.
- Check roles, status, serial numbers, and mode.
- Ensure the cluster is synchronised.
- In Active-Passive, check which device holds the license for the cluster.
CLI Check
In the Device Console, you can display the HA status:
system ha show details
If it is unclear which device is the licensed Initial Primary, this value can help in the Advanced Shell:
nvram get "#li.master"
YES stands for the initial Primary device, NO for the Auxiliary device. Such commands should be documented and only used in clear maintenance or diagnostic cases.
If shell access is not yet prepared, the guide Connecting to Sophos Firewall via SSH helps.
Functional Test
- Test client from LAN to the internet.
- Test access to internal servers.
- Test VPN.
- Test DNAT or WAF scenarios.
- Test DNS and DHCP if the firewall provides these services.
- Check logs in the Log Viewer.
- Test HA role change in a maintenance window.
- Document roles, status, and session behaviour afterward.
Operation and Maintenance
Ongoing Monitoring
An HA cluster should be actively monitored. Just because two firewalls are in the rack does not mean the environment is automatically highly available.
The following should be monitored:
- HA status
- Role status Primary/Auxiliary
- HA link
- Monitored ports
- License and subscription status
- Firmware versions
- Resources like CPU, RAM, Disk
- Central services like IPS, Web, VPN, DNS, DHCP
- Logs in the Log Viewer
- Alerts in Sophos Central or via email
For disk and hardware topics, both nodes should be considered separately. Local reports, log files, and SSD status can differ. The articles Checking Sophos Firewall Storage and Managing Reports and Checking Sophos Firewall SSD Health via SMART fit here.
Logs and Reports
Logs and reports are not simply merged into a common local data set. Each device writes logs for the traffic it processes. In Active-Active, this is particularly important because traffic can appear on both nodes.
Sophos Central Firewall Reporting is helpful in HA environments because it can centrally evaluate data across multiple firewalls. Local troubleshooting logs can be found on the firewall itself. The article Sophos Firewall Troubleshooting: Services and Logs explains which services and log files are relevant for analyses.
Runbook for HA Operation
An HA cluster needs a short operational runbook. It should state which device is the Initial Primary, which ports are monitored, how to reach the Auxiliary, who approves firmware updates, and when HA is disabled for replacement or reimage.
At least document:
- Serial number, location, rack position, and role of both appliances.
- Dedicated HA link, Peer Admin Port, Cluster ID, and monitored ports.
- Preferred primary and expected behaviour after failover.
- License holder, support status, and RMA contact path.
- Procedure for firmware update, backup, reimage, and hardware replacement.
- Responsible person for logs, Central Reporting, syslog, and support cases.
If only the WebAdmin on one node hangs, the entire HA cluster is not automatically defective. Then you should specifically check whether a restart of the WebAdmin GUI or a controlled service restart is sufficient before triggering failover or reboot.
Changes to the Cluster
Configuration changes are made on the Primary Firewall. The Auxiliary Firewall is not the place for normal rule, interface, or policy changes.
Before major changes:
- Create a backup.
- Define a maintenance window.
- Check HA status.
- Update documentation.
- Define rollback path.
- Test synchronisation and traffic after the change.
This applies especially to:
- Interfaces
- VLANs
- LAGs
- Zones
- Routing
- NAT
- VPN
- Device Access
- SD-WAN
Firmware Updates and Backups
Firmware Updates in HA Environments
Firmware updates are started on the Primary Firewall. The devices are updated one after the other, and the cluster can switch roles during this time.
Typical procedure:
- Start update on the Primary Firewall.
- Auxiliary Firewall is updated.
- Auxiliary Firewall restarts and temporarily takes over.
- Previous Primary is updated.
- Previous Primary restarts.
- If Preferred primary is active, a switch back to the preferred device can occur.
Nevertheless, firmware updates belong in a maintenance window. Even if the process is designed for minimal downtime, individual sessions, VPN connections, or special applications may briefly react.
For preparation, Sophos Firewall Firmware Update - Preparation and Best Practices fits.
Pattern Updates
Pattern updates are installed on the Primary Firewall and synchronised to the Auxiliary Firewall. This also applies to environments where updates are controlled or installed offline.
If an HA cluster is operated in an isolated environment, it should be clear before each manual pattern or license update which node is the Initial Primary and which node is currently Primary. The air-gap procedure is described in Operating Sophos Firewall Air-Gap Licensing and Pattern Updates.
Backup and Restore
Backup and restore have special features in HA environments:
- Backups should be created regularly and before any major change.
- Restore is performed on the current Primary Firewall.
- After restore, both firewalls are deregistered from Sophos Central and must be registered again.
- Restore causes restart and downtime, not a normal failover.
- If a backup without HA configuration is restored to an HA cluster, HA is deactivated and must be rebuilt.
Replacing a Cluster Node
When replacing or reimaging a node, HA should not simply continue with the old peer.
Important points:
- Deactivate HA cleanly before reimage or replacement.
- Document firmware version and build.
- Create a backup.
- Identify license holder.
- Remove device from Sophos Central Management if it is returned or replaced.
- Register new device.
- Reconfigure HA because the old HA passphrase is not reused for a new device.
For technical reinstallation, Reimaging Sophos Firewall OS with USB Stick fits. If a hardware defect or RMA process is in question, Preparing for Sophos Hardware Defect and RMA should also be planned.
Troubleshooting
In case of deep error patterns, you should remain structured: first check status, HA link, monitored ports, firmware status, and license status, then only consider special cases or manufacturer notes. This way, it remains clear whether there is a real HA problem or whether license, interface, firmware status, or monitoring is causing the error.
Important Logs and Diagnostic Locations
| Area | Check |
|---|---|
| HA Status | System services > High availability |
| Event Logs | Log viewer > System |
| Troubleshooting Logs | Diagnostics > Tools or via SSH under /log |
| HA Details CLI | system ha show details |
| Interface Problems | show network interfaces, ifconfig, dmesg in appropriate diagnostic cases |
| License Holder | WebAdmin HA view or nvram get "#li.master" |
For deeper analyses, the article Sophos Firewall CLI Troubleshooting: Important Commands often helps.
Troubleshooting Table
| Error Pattern | Possible Cause | Check | Solution |
|---|---|---|---|
| Cluster not formed | Different firmware version or build | Check version on both devices | Bring both devices to identical SFOS version |
| Cluster not formed | Incorrect model or incompatible appliance | Check model and serial number | Use only compatible same XGS models |
| HA could not be enabled | Dedicated HA link not connected or peer not reachable | Check port status, cable, switch, ping on HA link IP | Correct cabling, connect HA link stably |
| HA Link down | Faulty cable, switch port, VLAN, or LAG | Check interface status, speed/duplex, VLAN trunk, LAG members | Replace cable/switch port, correct VLAN/LAG |
| Both devices become standalone | HA link failed, split-brain danger | Check physical HA link connection and switch path | Shut down one device controlled, repair HA link, then restart |
| Validation failed for HA interface IP | Admin ports or HA link addresses not in expected subnet | Check IP addresses and /log/syslog.log | Correct addressing |
| Failover happens unexpectedly | Monitored port failed or wrongly selected | Check monitored ports and switch status | Only monitor truly critical and stable ports |
| Failover does not happen | Relevant port is not monitored | Check monitored ports | Enter critical WAN/core/DMZ ports as monitored ports |
| Active-Active does not distribute as expected | Traffic type is not load-balanced | Check connection type, protocol, and logs of both nodes | Use Active-Passive or adjust design |
| Logs seemingly missing | Traffic was processed by the other node or logging is not active | Check on both nodes and in the Log Viewer | Activate logging in rules, use Central Reporting/Syslog |
| Reports differ | Local reports are node-specific | Compare reports of both devices | Use Sophos Central Firewall Reporting |
| License problem in Active-Active | License types do not match | Check licensing on both firewalls | Align and synchronise licenses |
| Problems after firmware update | One node not updated cleanly or cluster not synchronised | Check HA status, firmware versions, logs | Use maintenance window, synchronise cluster, involve Sophos Support |
| Flexi-Port HA link does not work | Speed/duplex or auto-negotiation does not match | Check interface advanced settings on both devices | Configure both sides the same or use fixed port |
| Auxiliary WebAdmin not reachable | Peer Admin Port not reachable or client not in subnet | Check IP/subnet and route | Plan admin access correctly |
Procedure for HA Link Failure
If the dedicated HA link fails, caution is needed. Both firewalls can no longer see each other. In the worst case, both devices send ARP/GARP and try to claim the cluster MAC for themselves.
Safe procedure:
- Stabilise network condition.
- Decide which device should remain active.
- Shut down the other device controlled or disconnect from the production network.
- Repair HA link cable, switch port, VLAN, or LAG.
- Restart device.
- Check HA status.
- Control logs and roles.
Relevant CLI Commands
system ha show details
Shows HA details in the Device Console.
show network interfaces
Helps with interface status and link information.
nvram get "#li.master"
Shows in the Advanced Shell whether the device is the licensed Initial Primary.
dmesg | grep PortE
Can provide hints on link flaps in certain hardware/interface problems.
Not every command belongs in every environment. For productive systems, the rule is: Document the state first, then check specifically.
Best Practice Checklists
Planning
- Check Active-Passive as the standard variant.
- Use Active-Active only with a clear performance goal.
- Check both devices for model, firmware, Flexi Ports, and licensing.
- Connect HA link dedicated and as directly as possible.
- Select monitored ports consciously.
- Plan management access to Primary and Auxiliary.
- Document Cluster ID uniquely.
- Document switch design, VLANs, and LAGs.
- Plan RSTP on relevant switches.
- Consider split-brain scenario.
- Document backup and restore process.
Implementation
- Create a backup beforehand.
- Bring both firewalls to the same SFOS version.
- Disable Cellular WAN.
- Prepare HA link with static addressing and DMZ zone.
- Allow SSH for the HA link zone.
- Name Primary and Auxiliary cleanly.
- Use passphrase only for setup and do not treat it as a permanent admin password afterward.
- Check advanced settings after QuickHA.
- Set monitored ports only when it is clear which links are stable and critical.
- Test failover in a maintenance window.
Operation
- Regularly check HA status.
- Monitor licenses and synchronisation.
- Perform firmware updates only prepared and with a maintenance window.
- Regularly test backups.
- Make configuration changes only on the Primary.
- Check HA status after switch, VLAN, or interface changes.
- Include logs of both nodes.
- Use Sophos Central Firewall Reporting or Syslog for longer analyses.
- Perform replacement or reimage of a node only with planned HA disable and reconfigure.
Don’ts
- Do not cluster different models.
- Do not plan Wi-Fi XGS models for HA.
- Do not leave Cellular WAN active in HA.
- Do not set unconnected ports as monitored ports.
- Do not use unstable VLAN or switch path as HA link.
- Do not conduct productive services over the HA link.
- Do not consider Active-Active as a simple doubling of performance.
- Do not expect or plan changes on the Auxiliary Firewall.
- Do not perform restore without a maintenance window.