Skip to content
Avanet

Setting Up Sophos Firewall High Availability (HA)

High Availability, or HA, connects two Sophos Firewalls into a cluster. The aim is not to make a single firewall indestructible, but to manage the failure of a device, individual critical ports, or planned maintenance in a controlled manner.

An HA environment is still no substitute for good network design, clean backups, and documented maintenance processes. A poorly planned cluster can be more difficult to operate in the event of a failure than a single firewall. Therefore, this article not only explains where to enable HA but also how to sensibly plan, set up, operate, and check a Sophos Firewall HA cluster in the event of a failure.

Table of Contents

Summary

In most production environments, Active-Passive is the better HA option. One firewall processes all traffic, while the second firewall is on standby and takes over in case of failure or maintenance. The design is simpler, licensing is cheaper, and behaviour in case of failure is easier to understand.

Active-Active is only worthwhile if you consciously accept the limits and restrictions. It is not classic symmetrical load balancing, where both firewalls stand equally at all points in the network. The Primary Firewall continues to receive traffic and distributes certain connections to the Auxiliary Firewall. Not every service and not every type of traffic is distributed.

QuestionRecommendation
Maximum stability and simple operationActive-Passive
Use second firewall without separate protection licenseActive-Passive
Need more throughput for certain TCP connectionsConsider Active-Active
Many VPN, Proxy, RED, NDR, or special casesPrefer Active-Passive
Small or medium environment without clear performance problemActive-Passive
Clear performance requirement and appropriate licensing for both appliancesActive-Active after testing

Video Guides

The following Sophos Techvids show HA on Sophos Firewall visually. The videos do not replace proper planning, but they are useful for understanding QuickHA, roles, status, and the basic behaviour of an HA cluster.

Video guide for HA setup and Active-Passive cluster behaviour.
Video guide for HA setup and Active-Active cluster behaviour.

What High Availability on the Sophos Firewall Means

A Sophos Firewall HA cluster consists of two firewalls. The devices exchange heartbeats, device status, connection information, and configuration data via a dedicated HA link. The configuration is synchronised from the Primary Firewall to the Auxiliary Firewall.

HA protects against typical failures:

  • Failure of the Primary Firewall
  • Power or hardware failure
  • Failure of a monitored interface
  • Software or service problem that renders a device inoperable
  • Planned firmware updates
  • Planned role change during maintenance

However, HA does not solve every problem:

  • An incorrect firewall rule set remains incorrect even in the cluster.
  • A common switch failure can affect both firewalls simultaneously.
  • A defective VLAN design or faulty routing concept is not automatically corrected.
  • Logs and reports are not fully synchronised between both firewalls.
  • A backup remains mandatory.

Anyone planning HA should have clarified the basics for zones, interfaces, VLANs, LAGs, and bridges. The guide Planning and Configuring Sophos Firewall Zones and Interfaces fits here.

Active-Passive or Active-Active

Active-Passive

In Active-Passive, one firewall processes all productive traffic. The second firewall is passive and only takes over when the active firewall fails or a failover is manually or maintenance-triggered.

Typical characteristics:

  • Primary Firewall processes the traffic.
  • Auxiliary Firewall remains on standby.
  • Sessions are synchronised as far as the respective service supports it.
  • Only the licensed device requires protection subscriptions for hardware appliances.
  • The Auxiliary Firewall takes over with the same virtual MAC address in case of failover.
  • Network devices usually do not need to relearn their neighbours.

Active-Passive is usually the best choice for classic corporate environments, branches, data centres, and environments where stability is more important than potential performance gain.

Active-Active

In Active-Active, both firewalls process traffic. Nevertheless, the architecture remains asymmetrical: The Primary Firewall receives the traffic and decides whether to process a connection itself or forward it to the Auxiliary Firewall.

Sophos uses the source IP address for distribution, among other things. TCP connections from even source IP addresses are typically processed on the Primary, while odd source IP addresses can be forwarded to the Auxiliary. Non-TCP traffic and certain services are not evenly distributed.

Typical characteristics:

  • Both firewalls can process traffic.
  • The Primary Firewall remains the central entry point.
  • Both firewalls require appropriate licenses.
  • Not all services are distributed.
  • Logs and reports are generated on the device that processes the respective traffic.
  • Troubleshooting becomes more complex because connections can land on both nodes.

Active-Active is useful if there is a clear performance goal and it has been tested beforehand whether the relevant traffic is actually distributed. For pure high availability, Active-Passive is usually cleaner.

Roles, Status, and Architecture

Roles in the HA Cluster

TermMeaning
PrimaryDevice that manages the central cluster configuration. In both HA modes, the Primary receives the traffic.
AuxiliarySecond device in the cluster. It synchronises the configuration from the Primary and takes over if necessary.
Initial primaryThe device that was started as Primary during setup. In Active-Passive, it is usually also the licensed device.
Preferred primaryPreferred device that should become Primary again after a failover as soon as it is stably available.

Status Values in Operation

StatusMeaning
ActiveThe device processes traffic.
PassiveThe device is ready but does not process productive traffic in Active-Passive.
StandaloneThe device does not see the peer or HA is not fully active. Both devices can become standalone in case of HA link problems.
FaultyThe device is not healthy enough for the cluster to participate normally.

Virtual MAC Address

The Sophos Firewall uses virtual MAC addresses for productive interfaces in the HA cluster. Only the Primary responds to ARP requests for the cluster. In case of a failover, the Auxiliary takes over this virtual MAC address. This keeps the reachability for switches, routers, and clients more stable because the IP and MAC assignment does not fundamentally change.

The Cluster ID is important. This ID is used for the virtual MAC address. If multiple HA clusters are operated in the same Layer 2 environment, each cluster must have a unique Cluster ID. Otherwise, MAC conflicts can occur.

What is Synchronised

AreaSynchronisation
Firewall rules, policies, objects, routing, CLI configurationSynchronised from Primary to Auxiliary.
Active sessionsSynchronised depending on protocol and service.
Secure Storage Master Key and WebAdmin credentialsSynchronised.
Dedicated HA linkNot synchronised as a normal productive interface configuration.
Peer Admin PortTreated separately and not synchronised like a normal interface.
Logs and reportsNot synchronised between devices.

Failover Behaviour

A failover can be triggered by various events:

  • No more heartbeats over the HA link
  • Failure of a monitored port
  • Power failure
  • Hardware failure
  • Software or service problem
  • Planned role change
  • Firmware update

Heartbeats run over the dedicated HA link. Very short intervals are used by default. If several heartbeats are missing in succession, the peer is considered unreachable. The firewall then checks the status and performs the role change.

During a failover, many connections are continued or quickly re-established. However, a failover is not completely transparent for every application. Especially stateful TCP connections, web sessions, proxy connections, or certain VPN scenarios may briefly disconnect or need to be re-established.

Load Balancing in Active-Active

Active-Active does not mean that both firewalls stand like two equal routers in the network. The Primary Firewall remains the central entry point and distributes certain connections to the Auxiliary Firewall.

Important points:

  • Load balancing is only available in Active-Active.
  • The method cannot be freely adjusted.
  • External load balancers in front of an HA cluster are not used by Sophos as a standard design for this HA logic.
  • Not all traffic is distributed.
  • Non-TCP traffic, SD-RED, tunnelled traffic, and some Layer 7 functions may be treated differently.
  • Troubleshooting must include both nodes.

In practice, Active-Active should only be used if it is clear beforehand which traffic is causing the bottleneck and whether exactly this traffic is distributed.

Supported and Restricted Services

Sophos HA supports most firewall services. However, some services have special features.

Service or FunctionNote
Firewall rules and NATSynchronised. In Active-Active, you need to know which node processes a connection.
VPNMany VPN scenarios work with HA, but not every session type fails over without interruption. IPsec can take over stateless UDP/ICMP better than stateful TCP.
Web ProtectionWorks in the cluster. In Active-Active, alerts can come from both nodes.
Email ProtectionQuarantine and release can be node-specific because each device stores its own data for processed mail traffic.
Synchronized Application ControlNot suitable for Active-Active if the function is not supported in the deployed SFOS version.
NDR EssentialsPlan only with Active-Passive for HA environments.
sFlowRuns only on the Primary in HA environments.
ReportsLocal reports are generated per device. Merged reports are more sensible via Sophos Central Firewall Reporting.
Cellular WANMust be disabled for HA.
XGS Wi-Fi ModelsHA is not supported on XGS Wi-Fi models.

If reporting or log retention is important, you should plan early whether an external syslog server or Sophos Central Firewall Reporting will be used. More on this can be found in Enable Central Firewall Reporting.

Prerequisites

Before implementation, you should carefully check the HA prerequisites against your own environment. Particularly model equality, firmware status, interfaces, Cellular WAN, and virtual platforms are points where small deviations can have a big impact later.

Hardware and Model Compatibility

PrerequisiteRequirement
Appliance ModelBoth firewalls must be the same XGS model, for example, XGS 2100 with XGS 2100.
Hardware RevisionDifferent hardware revisions are possible with the same XGS model.
XGS Wi-Fi ModelsNot supported. Examples are XGS 126w or XGS 136w.
Flexi Port ModulesIf expansion modules are used, the number of Flexi Ports must be the same on both devices.
FirmwareBoth devices must use the same SFOS version, including maintenance release and build.
Hardware plus Virtual ApplianceNot possible as an HA pair.

Virtual and Software Appliances

Virtual or software appliances must also match each other very closely.

PrerequisiteRequirement
PlatformSame appliance type and SFOS platform.
HypervisorSame hypervisor type.
ResourcesSame CPU cores, comparable resources, and same number of network interfaces.
FirmwareSame SFOS version, including build.
MAC AddressesIn virtual environments, the option for hypervisor-assigned MAC addresses may be relevant so that promiscuous mode is not required. A change, however, causes downtime.

Cloud Deployments

In cloud environments, additional platform requirements apply. Routing, virtual interfaces, IP addresses, security groups, UDRs, or cloud-specific failover mechanisms must fit the respective cloud design. The normal appliance HA approach cannot be transferred unchecked to Azure, AWS, or other cloud environments.

If a Sophos Firewall is operated in the cloud, you should check the current Sophos documentation for the respective platform and cloud network architecture before planning HA.

Licensing and Registration

HA licensing varies depending on the platform and HA mode. Three questions are crucial: Is it hardware or virtual/software? Is Active-Passive or Active-Active used? And which device is the Initial Primary, i.e., the device that holds the license for the cluster? These points should be reconciled with license status, serial numbers, and target mode before implementation.

ScenarioLicense Requirement
Base FirewallA Base Firewall license is required for HA. Hardware appliances have this license by default. Virtual/software appliances must be appropriately licensed.
Active-Passive HardwareOnly the Initial Primary device needs the productive subscriptions. The Auxiliary Firewall receives a copy of the subscriptions and can process traffic after a failover.
Active-Active HardwareBoth firewalls need their own appropriate licenses. The license types must match, expiration dates may differ.
Active-Passive Virtual/SoftwareOnly the Primary needs the necessary licenses, including Base Firewall.
Active-Active Virtual/SoftwareBoth devices need their own Base Firewall license and appropriate additional protection licenses.
Hardware RegistrationBoth hardware devices must be known in Sophos Central or the Sophos Licensing Portal and be able to synchronise their licenses.
Virtual/Software Registration Active-PassiveAccording to Sophos, only the Primary is claimed in Active-Passive Virtual/Software.
Sophos Central ManagementLicense synchronisation and claiming do not automatically mean that the firewalls are also managed via Sophos Central Firewall Management. A suitable additional subscription is required for this.
RMA and SupportFor hardware replacement and advance replacement, the support status is important. In Active-Passive hardware, Sophos names Enhanced Plus Support on the Primary device as a relevant prerequisite for advance hardware replacement. In Active-Active, the support status must match on both devices.

Important: In Active-Passive, the Initial Primary is particularly important because this device holds the license for the cluster. In the HA view, the corresponding device is indicated as holding the license for the cluster. In case of doubt, the device should be clearly documented in the operations manual.

If licenses in Active-Active do not match, load balancing initially stops according to Sophos. If the discrepancy persists, HA can be deactivated. During an initial configuration, Active-Active HA is not cleanly activated with non-matching licenses. Therefore, a license check is mandatory for operational routine.

In virtual/software appliances, the Base Firewall license is particularly critical. If it is deactivated or the Initial Primary cannot synchronise the license for a long time, HA can be deactivated, and other protection functions become inactive. Relevant here is synchronisation with the license server at least once within 90 days. For productive environments, this means: The HA cluster must not only function technically but also regularly reach the license server.

For operation, you should at least document:

  • which device is the Initial Primary
  • which serial numbers or appliance IDs belong to the cluster
  • which licenses are active on which device
  • when the licenses were last synchronised
  • which support level is available for RMA or advance replacement
  • who approves license changes, renewals, and RMA processes

Network Prerequisites

AreaRecommendation
HA LinkDedicated connection between both firewalls, ideally directly with an Ethernet cable.
HA Link ZoneDMZ zone with SSH enabled for the zone.
HA Link IP AddressesStatic IP addresses in the same subnet but different addresses.
HA Link QualityHigh bandwidth, low latency, no packet loss.
SwitchesEnable RSTP on switches connected to firewall ports.
Monitored PortsOnly monitor ports that are truly connected and critical.
Cellular WANDisable for HA.
Admin PortPlan separately so that the Auxiliary Firewall remains accessible.

The HA link does not process normal client or server traffic. It is only relevant for heartbeats, status, session synchronisation, configuration synchronisation, and Active-Active distribution. Nevertheless, it is extremely critical. If the HA link fails, both firewalls can believe they are Primary. This split-brain scenario must be avoided.

Ports and Interfaces

Port TypePurposeRecommendation
Dedicated HA linkHeartbeat, status, configuration, and session synchronisationConnect directly or via a very reliable switch. Do not use for productive traffic.
Monitored portsMonitoring critical productive linksMonitor WAN, important DMZ, or core uplinks. Do not select unused ports.
Peer Admin PortAccess to Auxiliary WebAdminPlan and document separately. Client must be in the appropriate subnet.
Production InterfacesLAN, WAN, DMZ, VLANs, LAGsConnect identically on both firewalls and design equivalently.

Physical interfaces, VLANs, or LAGs are possible as a dedicated HA link. Bridge interfaces and alias IP addresses cannot be used as a dedicated HA link. If a LAG is used as an HA link, the parent interfaces must be set up the same on both appliances.

Planning and Design

A typical Active-Passive design looks like this:

  • both firewalls are in the same rack or in racks close to each other
  • all productive interfaces are connected the same
  • WAN goes to redundant switches or well-documented provider handovers
  • LAN/DMZ goes to redundant switch structures
  • the HA link is directly connected
  • a separate management or admin access is provided
  • WAN and core/DMZ ports are defined as monitored ports

The most important point: The second firewall must be able to take over the same network position in case of failure. Therefore, VLANs, trunks, LAGs, switch ports, and provider connections must be consistently planned.

Active-Active requires the same physical cleanliness as Active-Passive but also a clear expectation of the distributable traffic.

Before Active-Active, you should answer:

  • Which traffic is the bottleneck?
  • Is exactly this traffic distributed in Active-Active?
  • Are both appliances fully and appropriately licensed?
  • Are logs, reports, and troubleshooting processes prepared for both nodes?
  • Are there services like VPN, NDR, Synchronized Application Control, or proxy scenarios that speak against Active-Active?

Without a clear answer, Active-Passive is the better choice.

AreaDesign Note
LANCleanly zone internal networks. Plan VLANs not only technically but also securely.
WANConsider provider connection, failover gateways, and SD-WAN separately from HA. HA does not replace a WAN redundancy concept.
DMZKeep server networks and external publications separate from client networks.
HA LinkOwn connection, statically addressed, DMZ zone, SSH allowed for DMZ. Do not conduct user or server communication over it.
ManagementRestrict admin access, Device Access, and ACLs consciously.

To secure local firewall services, Securing Sophos Firewall Access: Configuring Device Access Correctly fits.

Switch Requirements

Switches are often the invisible risk factor in HA. An HA cluster only works as well as the Layer 2 environment underneath.

Recommendations:

  • Enable RSTP on involved switches.
  • Configure trunks identically on both firewalls.
  • Consistently allow VLANs on all involved ports.
  • Build LAGs identically.
  • Do not select half-finished or unused monitored ports.
  • Connect the HA link as directly as possible.
  • If the HA link runs over switches, the path must be stable, low-latency, and packet-loss-free.

VLANs, LAGs, Bridges, and RED

VLANs and LAGs are possible in HA, but they increase planning requirements. The HA cluster should not be the first place to try out a VLAN or LAG design.

TechnologyHA Note
VLANPlan identically on both devices. Consider parent interface. VLAN as HA link is possible but only with very clean design.
LAGUseful for core uplinks or redundant switch connection. Parent interfaces must be consistent.
BridgeHA in bridge mode is supported but makes troubleshooting more complex. For new designs, gateway mode is usually more transparent.
REDRED and remote scenarios can affect HA, especially when networks are stretched across locations. Check performance, latency, and error patterns beforehand.
VPNVPN failover works differently depending on protocol and session type. Test IPsec and remote access separately.

Avoiding Split-Brain

Split-brain means that both firewalls believe they are active or standalone responsible. This can happen if the HA link fails, but the devices remain in the production network.

Measures:

  • Do not route the HA link over insecure or unstable switch paths.
  • Prefer direct connection for HA link.
  • Select monitored ports consciously.
  • Configure RSTP cleanly.
  • No asymmetric cabling.
  • Check HA status after each switch or VLAN change.
  • Only adjust keepalive values with reason.

Preparing the Setup

Before setting up HA, do not improvise in the production network. This preparation saves a lot of time later.

Preparing Both Firewalls

  • Bring both firewalls to the same SFOS version, including build.
  • Check license and registration status.
  • Disable Cellular WAN.
  • Ensure models are compatible.
  • Check Flexi Port equipment.
  • Document interfaces and switch ports.
  • Determine HA link port.
  • Connect HA link directly or check switch path.
  • Plan DMZ zone and SSH access for the HA link.
  • Document admin access to both devices.
  • Create a backup of the existing configuration.

Backups are not optional with HA. A backup should be available before setup, before firmware updates, and before major interface changes. The basics are in Creating or Restoring a Sophos Firewall Backup.

QuickHA or Manual Configuration

MethodWhen Useful
QuickHAStandard case. Fast, robust, and sufficient for most Active-Passive and Active-Active setups.
Manual ConfigurationWhen admin ports, HA link, cluster ID, peer addresses, and detail values need to be specified consciously.

QuickHA detects the peer via the selected HA link interfaces. Once the devices have found each other, the cluster is built. The passphrase is used to establish the encrypted SSH tunnel and is not treated as a reusable password afterward. If a device is replaced, HA must be deactivated and reconfigured.

The steps described here are based on the official Sophos HA configuration but are deliberately formulated as a practical admin checklist. For productive changes, you should not just click through the wizard but document roles, HA link, admin access, backup, license status, and rollback beforehand.

Setting Up Active-Passive with QuickHA

1. Prepare Primary Firewall

  1. Log in to the future Primary Firewall in WebAdmin.
  2. Go to System services > High availability.
  3. Select Primary (active-passive) mode.
  4. Use QuickHA.
  5. Optionally assign a node name, e.g., FW01.
  6. Set an HA passphrase.
  7. Securely store the passphrase as it will be needed on the Auxiliary Firewall.
  8. Select the dedicated HA link.
  9. Start Initiate HA.

Notes:

  • The HA link must not have productive dependencies.
  • If QuickHA uses an unbound interface, Sophos assigns the DMZ zone to this interface and sets HA-specific settings.
  • The HA link uses static IP addresses in the link-local range if QuickHA uses the default values.
  • SSH must be allowed for the HA link zone because the HA tunnel is established over it.

2. Prepare Auxiliary Firewall

  1. Log in to the future Auxiliary Firewall.
  2. Go to System services > High availability.
  3. Select Auxiliary role.
  4. Use QuickHA.
  5. Optionally assign a node name, e.g., FW02.
  6. Enter the same HA passphrase.
  7. Select the same HA link port as on the Primary side.
  8. Start Initiate HA.

After setup, the Primary Firewall synchronises the configuration to the Auxiliary Firewall. Many local settings on the Auxiliary Firewall are overwritten. Therefore, the Auxiliary should not be configured as an independent productive firewall before HA setup.

3. Check Advanced Settings

After building the cluster, do not just click away but check the following points:

  • HA status of both nodes
  • Role and status at the top right in WebAdmin
  • Dedicated HA link
  • Monitored ports
  • Peer Admin Port
  • Preferred primary
  • Keepalive interval and attempts
  • License holder in Active-Passive
  • Sophos Central registration, if used

4. Set Monitored Ports

Monitored ports determine whether an interface failure triggers a failover. Typical candidates:

  • WAN uplink
  • Core LAN uplink
  • Important DMZ or server uplinks

Do not monitor ports that are sometimes deliberately offline, not connected, or only used for optional scenarios. A wrongly set monitored port is a common cause of unexpected failovers or a non-starting cluster.

Setting Up Active-Active with QuickHA

Active-Active is set up similarly but with a different goal. Both firewalls must be appropriately licensed beforehand.

1. Check in Advance

  • Both firewalls have appropriate licenses.
  • The license types match.
  • Both devices are registered.
  • Both devices run on the same SFOS version, including build.
  • The relevant traffic actually benefits from Active-Active.
  • Services with Active-Active restrictions have been checked.
  • Monitoring and troubleshooting are designed for both nodes.

2. Configure Primary Firewall

  1. Go to System services > High availability.
  2. Select Primary (active-active).
  3. Use QuickHA.
  4. Assign a node name.
  5. Set HA passphrase.
  6. Select dedicated HA link.
  7. Start HA.

3. Configure Auxiliary Firewall

  1. On the second device, go to System services > High availability.
  2. Select Auxiliary.
  3. Use QuickHA.
  4. Enter the same passphrase.
  5. Select the same HA link port.
  6. Start HA.

4. Test After Setup

In Active-Active, more needs to be tested than just the HA status:

  • Are connections distributed across both nodes?
  • Are logs visible on both devices?
  • Do VPN connections work after a role change?
  • Do Web Protection, IPS, Application Control, and relevant security features work?
  • Are there applications that stand out due to asymmetric behaviour?
  • Are reports and alerts generated as expected?

Manual HA Configuration

Manual HA configuration is useful when the automatic QuickHA logic does not provide enough control.

Typical reasons:

  • Fixed HA link IP addresses should be used
  • Peer Admin Port must be precisely defined
  • Cluster ID should be consciously set
  • Multiple HA clusters exist in the same Layer 2 environment
  • Virtual appliances should be operated with specific MAC options
  • A very controlled rollout is necessary

In manual setup, the Auxiliary Firewall is prepared first, and then the Primary Firewall is configured. The Primary must know the HA link IP address of the peer.

Important fields:

FieldMeaning
Operation modeActive-Passive or Active-Active.
Initial device rolePrimary or Auxiliary.
Dedicated HA linkInterface for heartbeat, status, and synchronisation.
Peer HA link IPv4Address of the HA link interface on the second device.
Cluster IDBasis for virtual MAC addresses. Set uniquely for multiple clusters.
Monitored portsCritical ports whose failure should trigger a failover.
Peer administration settingsAccess to the Auxiliary Firewall.
Preferred primaryDevice that should become Primary again after failover.
Keepalive interval / AttemptsSensitivity of HA detection. Only adjust consciously.

Validating the Cluster

After setup, the HA cluster should be systematically checked.

WebAdmin Check

  1. Log in to the Primary Firewall.
  2. Check the HA status at the top right.
  3. Go to System services > High availability.
  4. Check roles, status, serial numbers, and mode.
  5. Ensure the cluster is synchronised.
  6. In Active-Passive, check which device holds the license for the cluster.

CLI Check

In the Device Console, you can display the HA status:

system ha show details

If it is unclear which device is the licensed Initial Primary, this value can help in the Advanced Shell:

nvram get "#li.master"

YES stands for the initial Primary device, NO for the Auxiliary device. Such commands should be documented and only used in clear maintenance or diagnostic cases.

If shell access is not yet prepared, the guide Connecting to Sophos Firewall via SSH helps.

Functional Test

  • Test client from LAN to the internet.
  • Test access to internal servers.
  • Test VPN.
  • Test DNAT or WAF scenarios.
  • Test DNS and DHCP if the firewall provides these services.
  • Check logs in the Log Viewer.
  • Test HA role change in a maintenance window.
  • Document roles, status, and session behaviour afterward.

Operation and Maintenance

Ongoing Monitoring

An HA cluster should be actively monitored. Just because two firewalls are in the rack does not mean the environment is automatically highly available.

The following should be monitored:

  • HA status
  • Role status Primary/Auxiliary
  • HA link
  • Monitored ports
  • License and subscription status
  • Firmware versions
  • Resources like CPU, RAM, Disk
  • Central services like IPS, Web, VPN, DNS, DHCP
  • Logs in the Log Viewer
  • Alerts in Sophos Central or via email

For disk and hardware topics, both nodes should be considered separately. Local reports, log files, and SSD status can differ. The articles Checking Sophos Firewall Storage and Managing Reports and Checking Sophos Firewall SSD Health via SMART fit here.

Logs and Reports

Logs and reports are not simply merged into a common local data set. Each device writes logs for the traffic it processes. In Active-Active, this is particularly important because traffic can appear on both nodes.

Sophos Central Firewall Reporting is helpful in HA environments because it can centrally evaluate data across multiple firewalls. Local troubleshooting logs can be found on the firewall itself. The article Sophos Firewall Troubleshooting: Services and Logs explains which services and log files are relevant for analyses.

Runbook for HA Operation

An HA cluster needs a short operational runbook. It should state which device is the Initial Primary, which ports are monitored, how to reach the Auxiliary, who approves firmware updates, and when HA is disabled for replacement or reimage.

At least document:

  • Serial number, location, rack position, and role of both appliances.
  • Dedicated HA link, Peer Admin Port, Cluster ID, and monitored ports.
  • Preferred primary and expected behaviour after failover.
  • License holder, support status, and RMA contact path.
  • Procedure for firmware update, backup, reimage, and hardware replacement.
  • Responsible person for logs, Central Reporting, syslog, and support cases.

If only the WebAdmin on one node hangs, the entire HA cluster is not automatically defective. Then you should specifically check whether a restart of the WebAdmin GUI or a controlled service restart is sufficient before triggering failover or reboot.

Changes to the Cluster

Configuration changes are made on the Primary Firewall. The Auxiliary Firewall is not the place for normal rule, interface, or policy changes.

Before major changes:

  • Create a backup.
  • Define a maintenance window.
  • Check HA status.
  • Update documentation.
  • Define rollback path.
  • Test synchronisation and traffic after the change.

This applies especially to:

  • Interfaces
  • VLANs
  • LAGs
  • Zones
  • Routing
  • NAT
  • VPN
  • Device Access
  • SD-WAN

Firmware Updates and Backups

Firmware Updates in HA Environments

Firmware updates are started on the Primary Firewall. The devices are updated one after the other, and the cluster can switch roles during this time.

Typical procedure:

  1. Start update on the Primary Firewall.
  2. Auxiliary Firewall is updated.
  3. Auxiliary Firewall restarts and temporarily takes over.
  4. Previous Primary is updated.
  5. Previous Primary restarts.
  6. If Preferred primary is active, a switch back to the preferred device can occur.

Nevertheless, firmware updates belong in a maintenance window. Even if the process is designed for minimal downtime, individual sessions, VPN connections, or special applications may briefly react.

For preparation, Sophos Firewall Firmware Update - Preparation and Best Practices fits.

Pattern Updates

Pattern updates are installed on the Primary Firewall and synchronised to the Auxiliary Firewall. This also applies to environments where updates are controlled or installed offline.

If an HA cluster is operated in an isolated environment, it should be clear before each manual pattern or license update which node is the Initial Primary and which node is currently Primary. The air-gap procedure is described in Operating Sophos Firewall Air-Gap Licensing and Pattern Updates.

Backup and Restore

Backup and restore have special features in HA environments:

  • Backups should be created regularly and before any major change.
  • Restore is performed on the current Primary Firewall.
  • After restore, both firewalls are deregistered from Sophos Central and must be registered again.
  • Restore causes restart and downtime, not a normal failover.
  • If a backup without HA configuration is restored to an HA cluster, HA is deactivated and must be rebuilt.

Replacing a Cluster Node

When replacing or reimaging a node, HA should not simply continue with the old peer.

Important points:

  • Deactivate HA cleanly before reimage or replacement.
  • Document firmware version and build.
  • Create a backup.
  • Identify license holder.
  • Remove device from Sophos Central Management if it is returned or replaced.
  • Register new device.
  • Reconfigure HA because the old HA passphrase is not reused for a new device.

For technical reinstallation, Reimaging Sophos Firewall OS with USB Stick fits. If a hardware defect or RMA process is in question, Preparing for Sophos Hardware Defect and RMA should also be planned.

Troubleshooting

In case of deep error patterns, you should remain structured: first check status, HA link, monitored ports, firmware status, and license status, then only consider special cases or manufacturer notes. This way, it remains clear whether there is a real HA problem or whether license, interface, firmware status, or monitoring is causing the error.

Important Logs and Diagnostic Locations

AreaCheck
HA StatusSystem services > High availability
Event LogsLog viewer > System
Troubleshooting LogsDiagnostics > Tools or via SSH under /log
HA Details CLIsystem ha show details
Interface Problemsshow network interfaces, ifconfig, dmesg in appropriate diagnostic cases
License HolderWebAdmin HA view or nvram get "#li.master"

For deeper analyses, the article Sophos Firewall CLI Troubleshooting: Important Commands often helps.

Troubleshooting Table

Error PatternPossible CauseCheckSolution
Cluster not formedDifferent firmware version or buildCheck version on both devicesBring both devices to identical SFOS version
Cluster not formedIncorrect model or incompatible applianceCheck model and serial numberUse only compatible same XGS models
HA could not be enabledDedicated HA link not connected or peer not reachableCheck port status, cable, switch, ping on HA link IPCorrect cabling, connect HA link stably
HA Link downFaulty cable, switch port, VLAN, or LAGCheck interface status, speed/duplex, VLAN trunk, LAG membersReplace cable/switch port, correct VLAN/LAG
Both devices become standaloneHA link failed, split-brain dangerCheck physical HA link connection and switch pathShut down one device controlled, repair HA link, then restart
Validation failed for HA interface IPAdmin ports or HA link addresses not in expected subnetCheck IP addresses and /log/syslog.logCorrect addressing
Failover happens unexpectedlyMonitored port failed or wrongly selectedCheck monitored ports and switch statusOnly monitor truly critical and stable ports
Failover does not happenRelevant port is not monitoredCheck monitored portsEnter critical WAN/core/DMZ ports as monitored ports
Active-Active does not distribute as expectedTraffic type is not load-balancedCheck connection type, protocol, and logs of both nodesUse Active-Passive or adjust design
Logs seemingly missingTraffic was processed by the other node or logging is not activeCheck on both nodes and in the Log ViewerActivate logging in rules, use Central Reporting/Syslog
Reports differLocal reports are node-specificCompare reports of both devicesUse Sophos Central Firewall Reporting
License problem in Active-ActiveLicense types do not matchCheck licensing on both firewallsAlign and synchronise licenses
Problems after firmware updateOne node not updated cleanly or cluster not synchronisedCheck HA status, firmware versions, logsUse maintenance window, synchronise cluster, involve Sophos Support
Flexi-Port HA link does not workSpeed/duplex or auto-negotiation does not matchCheck interface advanced settings on both devicesConfigure both sides the same or use fixed port
Auxiliary WebAdmin not reachablePeer Admin Port not reachable or client not in subnetCheck IP/subnet and routePlan admin access correctly

If the dedicated HA link fails, caution is needed. Both firewalls can no longer see each other. In the worst case, both devices send ARP/GARP and try to claim the cluster MAC for themselves.

Safe procedure:

  1. Stabilise network condition.
  2. Decide which device should remain active.
  3. Shut down the other device controlled or disconnect from the production network.
  4. Repair HA link cable, switch port, VLAN, or LAG.
  5. Restart device.
  6. Check HA status.
  7. Control logs and roles.

Relevant CLI Commands

system ha show details

Shows HA details in the Device Console.

show network interfaces

Helps with interface status and link information.

nvram get "#li.master"

Shows in the Advanced Shell whether the device is the licensed Initial Primary.

dmesg | grep PortE

Can provide hints on link flaps in certain hardware/interface problems.

Not every command belongs in every environment. For productive systems, the rule is: Document the state first, then check specifically.

Best Practice Checklists

Planning

  • Check Active-Passive as the standard variant.
  • Use Active-Active only with a clear performance goal.
  • Check both devices for model, firmware, Flexi Ports, and licensing.
  • Connect HA link dedicated and as directly as possible.
  • Select monitored ports consciously.
  • Plan management access to Primary and Auxiliary.
  • Document Cluster ID uniquely.
  • Document switch design, VLANs, and LAGs.
  • Plan RSTP on relevant switches.
  • Consider split-brain scenario.
  • Document backup and restore process.

Implementation

  • Create a backup beforehand.
  • Bring both firewalls to the same SFOS version.
  • Disable Cellular WAN.
  • Prepare HA link with static addressing and DMZ zone.
  • Allow SSH for the HA link zone.
  • Name Primary and Auxiliary cleanly.
  • Use passphrase only for setup and do not treat it as a permanent admin password afterward.
  • Check advanced settings after QuickHA.
  • Set monitored ports only when it is clear which links are stable and critical.
  • Test failover in a maintenance window.

Operation

  • Regularly check HA status.
  • Monitor licenses and synchronisation.
  • Perform firmware updates only prepared and with a maintenance window.
  • Regularly test backups.
  • Make configuration changes only on the Primary.
  • Check HA status after switch, VLAN, or interface changes.
  • Include logs of both nodes.
  • Use Sophos Central Firewall Reporting or Syslog for longer analyses.
  • Perform replacement or reimage of a node only with planned HA disable and reconfigure.

Don’ts

  • Do not cluster different models.
  • Do not plan Wi-Fi XGS models for HA.
  • Do not leave Cellular WAN active in HA.
  • Do not set unconnected ports as monitored ports.
  • Do not use unstable VLAN or switch path as HA link.
  • Do not conduct productive services over the HA link.
  • Do not consider Active-Active as a simple doubling of performance.
  • Do not expect or plan changes on the Auxiliary Firewall.
  • Do not perform restore without a maintenance window.

FAQ

Do you need two full licenses for Active-Passive?

For hardware appliances, the Initial Primary device usually requires the protection subscriptions in Active-Passive. The Auxiliary Firewall can use the subscription copy in failover. For virtual or software appliances, at least the Primary must be appropriately licensed. Active-Active requires appropriate licenses on both devices.

Can you cluster two different XGS models?

No. The devices must be the same XGS model. XGS 2100 with XGS 2100 is suitable, XGS 2100 with XGS 2300 is not.

Are different hardware revisions allowed?

Different hardware revisions may be possible with the same XGS model. It is crucial that model, platform, and firmware requirements are met.

Can you operate a hardware appliance with a virtual appliance as HA?

No. Hardware and virtual appliances cannot form a normal Sophos Firewall HA pair together.

Do you need to activate Preferred primary?

It is recommended, especially in Active-Passive. This makes it clear which device should become Primary again after a failover. Additionally, the licensed Initial Primary is easier to assign.

Are logs synchronised between both firewalls?

No. Logs and reports are not simply synchronised between both devices. For central evaluation, you should use Sophos Central Firewall Reporting or Syslog.

Is a firmware update on an HA cluster without interruption?

The HA update process is designed for role changes and as short interruptions as possible. In practice, however, a maintenance window should be planned because individual sessions or applications may briefly react.

When should you disable HA?

Before reimage, hardware replacement, RMA processes, or major restorations, HA should be planned to be disabled and then cleanly rebuilt.