OTA Firmware Updates: End-to-End¶

This is where industrial IoT deployments go wrong most often. A poorly designed OTA system can brick thousands of devices simultaneously. Every element below has been learned from real incidents.

12.1 OTA System Architecture¶

graph TB
    subgraph CI["CI/CD Pipeline"]
        BUILD[Firmware Build<br/>CMake / PlatformIO]
        SIGN[Code Signing<br/>HSM / Vault]
        STORE[Artifact Storage<br/>S3 / Azure Blob]
        META[Firmware Metadata<br/>DB Record]
    end

    subgraph OTA_SVC["OTA Service"]
        CAMP[Campaign Manager<br/>Rollout Scheduler]
        GATE[Canary Gate<br/>Health Monitor]
        NOTIFY[Notification Publisher<br/>MQTT]
        STAT[Status Tracker]
    end

    subgraph DEVICE["Device / Gateway"]
        OTA_AGENT[OTA Agent]
        VERIFY[Verify:<br/>1. Checksum SHA-256<br/>2. Signature ECDSA<br/>3. Version constraint]
        APPLY[Apply:<br/>Write to staging partition]
        BOOT[Bootloader:<br/>A/B swap + watchdog]
        ROLLBACK[Rollback:<br/>Revert to previous]
    end

    BUILD --> SIGN --> STORE
    SIGN --> META
    META --> CAMP
    CAMP -->|select cohort| GATE
    GATE -->|if healthy| NOTIFY
    NOTIFY -->|MQTT: ota/notification| OTA_AGENT
    OTA_AGENT -->|HTTPS GET signed URL| STORE
    STORE -->|firmware binary| OTA_AGENT
    OTA_AGENT --> VERIFY
    VERIFY -->|valid| APPLY
    VERIFY -->|invalid| OTA_AGENT
    APPLY --> BOOT
    BOOT -->|boot OK| STAT
    BOOT -->|boot fail| ROLLBACK
    ROLLBACK --> STAT
    STAT -->|MQTT: ota/status| OTA_SVC
    STAT --> GATE

12.2 Firmware Signing — Non-Negotiable in Industrial IoT¶

Firmware signing is the security control that makes OTA safe to operate at fleet scale. Without it, a compromised OTA service or a MITM attack can push arbitrary code to every device on your platform simultaneously — a single point of failure with catastrophic physical consequences for an industrial deployment. ECDSA P-256 is the recommended algorithm for constrained devices: it provides strong security with much faster verification than RSA (critical on MCUs without hardware crypto acceleration). The signing key must live in an HSM, never on a CI/CD server. Treat the signing key compromise as a Tier 1 security incident requiring full fleet re-provisioning.

Threat: attacker pushes malicious firmware to 10,000 devices.
Without signing: impossible to detect until after deployment.
With signing: firmware rejected at device before any execution.

Signing process (use ECDSA P-256 — faster verification than RSA on constrained devices):

1. Build produces: firmware.bin (raw binary)

2. Sign:
   # Using OpenSSL
   openssl dgst -sha256 -sign firmware_signing.key \
     -out firmware.sig firmware.bin

   # Verify locally before publishing
   openssl dgst -sha256 -verify firmware_signing.pub \
     -signature firmware.sig firmware.bin

3. Package (OTA manifest):
{
  "firmware_id": "fw-pump-monitor-2.4.0",
  "version": "2.4.0",
  "device_type": "pump_monitor_v2",
  "min_hw_revision": "Rev-B",
  "binary_url": "https://ota.acme.com/fw/pump-monitor-2.4.0.bin",
  "binary_size_bytes": 524288,
  "checksum_sha256": "a3b4c5d6...",
  "signature_ecdsa": "3046022100...",
  "signing_cert_id": "fw-signing-cert-2026-01",
  "release_notes_url": "https://...",
  "rollback_version": "2.3.1",
  "published_at": "2026-03-19T10:00:00Z"
}

4. Device verification (C pseudocode):
   uint8_t fw_signing_pubkey[] = { /* baked into firmware */ };

   bool verify_firmware(uint8_t* fw_data, size_t fw_size,
                        uint8_t* signature, size_t sig_size) {
       // Step 1: checksum
       uint8_t actual_hash[32];
       sha256(fw_data, fw_size, actual_hash);
       if (memcmp(actual_hash, expected_hash, 32) != 0) {
           LOG_ERROR("Firmware checksum mismatch");
           return false;
       }
       // Step 2: signature
       if (!ecdsa_verify(fw_signing_pubkey, actual_hash, signature, sig_size)) {
           LOG_ERROR("Firmware signature invalid");
           return false;
       }
       return true;
   }

12.3 A/B Partition — The Only Safe OTA for Industrial¶

A/B partition (dual-bank) firmware is the only OTA approach that is safe for unattended industrial devices. Without it, a power failure during a firmware write produces a device with corrupted firmware and no recovery path — the only fix is a field visit. With A/B partitioning, the active firmware continues running on partition A while the new firmware downloads to partition B. The boot only switches after a successful download and verification. If the new firmware fails to boot healthy, the bootloader automatically reverts to the known-good partition. This makes OTA failures self-healing at the device level, which is what enables fleet-scale rollouts without field technician standby.

Flash layout (embedded Linux / RTOS):

graph TB
    subgraph FLASH["Flash Storage Layout (total: ~16 MB example)"]
        BL["Bootloader — 64 KB<br/>Read-only, NEVER updated via OTA<br/>Manages A/B swap + watchdog"]
        BC["Boot Config — 4 KB<br/>Active partition pointer<br/>rollback_on_fail flag"]
        PA["Partition A — 4 MB<br/>Active firmware (currently running v2.3)<br/>Verified good — do not modify"]
        PB["Partition B — 4 MB<br/>Staging partition (OTA download target)<br/>Write new v2.4 here while A runs"]
        DP["Data Partition — 8 MB<br/>Config, certs, local SQLite DB<br/>Survives OTA — never erased"]
    end

    BL --> BC
    BC --> PA
    BC --> PB
    PA -.->|"boot config points here during normal ops"| BC
    PB -.->|"after OTA: boot config switches pointer here"| BC
    DP -.->|"independent of firmware partitions"| BL

Update sequence: 1. Download v2.4 to Partition B (Partition A still running) 2. Verify checksum + signature of Partition B 3. Set boot config: next_boot = B, rollback_on_fail = true 4. Set watchdog timer: 120s (if new fw doesn't check in, watchdog reboots) 5. Reboot 6. Bootloader reads boot config → boots Partition B (v2.4) 7. New firmware starts, runs health checks 8. If healthy: call confirm_update() → set boot config: active = B, permanent 9. If unhealthy: watchdog fires OR firmware calls rollback() → bootloader boots Partition A (v2.3) → device publishes: ota/status {status: rolled_back, reason: "health check failed"}

What "healthy" means — device must validate: - Connects to MQTT broker within 30s - All required ```m stat

OTA Firmware Updates: End-to-End¶

12.1 OTA System Architecture¶

12.2 Firmware Signing — Non-Negotiable in Industrial IoT¶

12.3 A/B Partition — The Only Safe OTA for Industrial¶

12.5 Rollout Campaigns — Safe Deployment at Fleet Scale¶

12.6 Delta OTA — When Bandwidth Is Constrained¶

12.7 OTA Failure Recovery Playbook¶