In industrial automation systems, online equipment status monitoring is a crucial link in ensuring production safety and system reliability. The CANopen protocol, through its Heartbeat mechanism, establishes a complete "vital sign" monitoring system for each device in the network. This mechanism not only allows the NMT master station to monitor the operating status of all slave stations in real time, but also quickly triggers fault handling procedures when equipment malfunctions, making it an indispensable technical means for building highly reliable industrial communication networks.

The previous article introduced a detailed explanation of the CANopen protocol stack's NMT network management application. This article will comprehensively analyze the CANopen heartbeat mechanism's producer-consumer model, protocol frame structure, status coding rules, and practical application scenarios, helping engineers to deeply understand and correctly implement this core function.

I. Overview of the CANopen Protocol Heartbeat Mechanism

The CANopen heartbeat mechanism adopts the classic producer-consumer design pattern. In this architecture, each CANopen device acts as a heartbeat producer, periodically broadcasting its own NMT status to the network; while the NMT master station or other nodes that need to monitor equipment status act as heartbeat consumers, responsible for receiving and processing these heartbeat messages, maintaining the health of the entire network. This design enables distributed processing of status monitoring, reducing the polling burden on the master station while ensuring the real-time nature and reliability of status information.

The Heartbeat mechanism is a periodic message used in CANopen to monitor the online status and NMT status of devices.

II. Overview of the Producer Mechanism

Each CANopen device (producer) periodically broadcasts its current NMT status, and its broadcast heartbeat message is identified using a specific COB-ID.

• COB-ID Message Calculation Formula: 0x700 + Node-ID
This is the COB-ID for the NMT error control message range. 0x700 is the function code base value for the NMT error control message range, and the Node-ID explicitly specifies the specific device sending the heartbeat message.

• COB-ID Message Example:

◦ When a device with Node-ID 0x10 sends a heartbeat, its COB-ID is 0x710; a device with Node-ID 0x05 uses 0x705 as its heartbeat COB-ID. This allocation method ensures that each device's heartbeat message in the network has a unique identifier, facilitating accurate message source identification for consumers.

• Data Content: Periodically sends 1 byte of NMT status

◦ This byte represents the sending device's current NMT status code, for example:

▪ 0x00: Bootup (startup/reset), sent once after the device powers on or resets;

▪ 0x04: Stopped;

▪ 0x05: Operational;

▪ 0x7F: Pre-operational;

• Periodic Sending: The producer sends heartbeats at the time intervals configured in its object dictionary as 0x1017:00 (Heartbeat Production Time). If this value is 0, the device does not send heartbeats.

Example: When Node-ID 0x10 is in the Operational state, it periodically sends the following frames:

• COB-ID: 0x710;

• Data: [0x05] (indicating Operational state).

III. Consumer Mechanism (Heartbeat Consumer)

The NMT master station or other nodes (consumers) that need to monitor device status listen for and process heartbeat messages.

1. Monitoring Device Online Status and Maintaining Timeout Timers

• The consumer maintains an independent timeout timer for each monitored Node-ID. The timeout period is typically configured in the object dictionary 0x100D:00 (Heartbeat Consumer Time) and associated with 0x100C:00 (Heartbeat Consumer Node ID).

• Reset Timer Upon Receiving a Heartbeat: Whenever the consumer receives a heartbeat frame from a monitored device, it immediately resets the timeout timer corresponding to that device, indicating that the device is still online and working normally.

• Report Fault Upon Timeout: If a device's timer fails to reset within the set timeout period, the consumer determines that the device is offline, faulty, or has experienced a communication interruption, triggering a fault report. This timeout-based fault detection method is simple and effective, enabling timely detection of abnormal nodes in the network.

2. Upon receiving a Bootup status (0x00), a device reset is indicated.

• Special meaning: When a consumer receives a heartbeat frame with data bytes of 0x00 and a COB-ID matching 0x700 + Node-ID, it indicates that the device corresponding to that Node-ID has just performed a reset operation (power-on, software reset, or watchdog reset).

• Detailed prompts and handling:

◦ Instant identification: The consumer immediately recognizes this as a device reset event, not a simple status change or fault.

◦ Detailed logs and user notifications: The device [Node-ID] reported a Bootup status! This means the device has just completed a reset, and the system will re-attempt to establish complete communication and configuration with the device.

◦ Subsequent process: The NMT master station typically restarts the device configuration process based on this Bootup message, including sending NMT commands to enter the Pre-operational state, configuring device parameters via SDO, and finally starting the device to enter the Operational state.

Reset Counter: In advanced systems, a reset counter may also be maintained for each device, incrementing each time a Bootup message is received. This counter is used to statistically analyze device stability and failure frequency.

IV. Practical Application Value of the Heartbeat Mechanism

1. Real-time Fault Detection and Location

The heartbeat mechanism enables the NMT master station to detect device offline or communication failures within hundreds of milliseconds, far faster than traditional polling detection methods. By analyzing which Node-ID's heartbeat timed out, engineers can quickly locate the faulty device, significantly reducing troubleshooting time. In large-scale automation systems, this rapid location capability is crucial for minimizing downtime losses.

2. Device Status Synchronization and Coordination

By parsing the NMT status code in the heartbeat message, the master station can monitor the operating mode of all devices in the network in real time. This is particularly important in scenarios such as multi-axis motion control, where the master station can ensure that all servo drives have entered the Operational state before initiating synchronized motion commands, avoiding control failures caused by inconsistent device states.

3. Automatic Recovery and Redundancy Design

Combined with Bootup status detection, the heartbeat mechanism supports automatic device recovery processes. When a device restarts due to a fault, the master station can automatically identify the reset event and re-execute the configuration process, achieving unattended fault recovery. In redundant system design, the heartbeat mechanism can also be used for master-slave switchover decisions; when the master device's heartbeat is abnormal, the standby device can automatically take over control.

V. Key Points for Heartbeat Mechanism Configuration and Debugging

1. Reasonable Configuration of Heartbeat Cycle

The configuration of the heartbeat production time needs to strike a balance between real-time performance and bus load. For critical safety devices, it is recommended to set a shorter heartbeat cycle, such as 100 milliseconds, to ensure that faults can be detected quickly; for ordinary I/O devices, a longer cycle, such as 500 milliseconds or 1000 milliseconds, can be used to reduce bus occupancy. At the same time, the consumer's timeout should generally be set to 2 to 3 times the production time to avoid false alarms caused by network jitter.

2. Object Dictionary Parameter Settings

Correctly configuring the relevant parameters in the object dictionary is the foundation for implementing the heartbeat mechanism. Producers need to set parameter 0x1017 to a non-zero value to enable the heartbeat function; consumers need to configure parameters 0x100C and 0x100D to establish a monitoring relationship. During configuration, ensure that all parameter values are in the same unit to avoid configuration errors caused by confusion in time units.

3. Debugging Tool Usage Recommendations

When capturing heartbeat messages using a CAN analyzer, it is recommended to set a filter to only display messages with COB-ID ranges from 0x701 to 0x77F. This allows you to focus on NMT error control messages. Observing the periodicity of heartbeat messages and changes in status codes can verify the correctness of device state transitions. Additionally, intentionally disconnecting the CAN connection of a device and observing whether the consumer reports a fault within the expected time is an effective method to verify the normal operation of the timeout detection function.

VI. Summary of the CANopen Protocol Heartbeat Mechanism

The CANopen protocol's Heartbeat mechanism, through its simple and efficient design, provides reliable online device monitoring capabilities for industrial networks. Producers periodically broadcast NMT (Non-Maintenance, Technology, and Maintenance) status, while consumers detect faults based on timeout mechanisms. This clearly defined architecture ensures both real-time status information and rapid fault location. The special handling mechanism for the Bootup status further supports automatic device discovery and recovery, laying the foundation for building highly available industrial systems.

In practical engineering applications, properly configuring the heartbeat cycle, correctly handling various status codes, and establishing a comprehensive fault handling process are key to maximizing the value of the heartbeat mechanism. With the development of Industry 4.0 and smart manufacturing, the heartbeat mechanism will continue to play an increasingly important role in equipment health management and predictive maintenance, becoming an indispensable foundational component of the Industrial Internet of Things (IoT).