IoT devices that last: The role of device reliability engineering

IoT devices that last: The role of device reliability engineering François Baldassari is the founder and CEO of Memfault, the first IoT reliability platform. An embedded software engineer by trade, Baldassari's passion for tooling and automation in software engineering drove him to start Memfault. Previous to Memfault, François led the firmware team at Oculus and built the OS at Pebble. Baldassari has a B.S. in electrical engineering from Brown University.


Whether an ocean-based sensor, a collar for tracking wild animals, or a smart baby monitor, advances in connectivity and hardware have led to IoT adoption in nearly every corner of our lives. But along with growth comes the risks—some avoidable, some not—that threaten to impact development, disrupt device operations, and impair regular maintenance. 

When connected devices were a novelty, these impacts could be considered minor. But now, given IoT ubiquity, developers looking to build a solid product and develop a loyal customer base need to consider how each product will function from the moment it’s in a customer’s hands—and for the long term. 

Prioritising reliability

Two trends require IoT development that focuses as much on firmware architecture as it does on the design element to address these issues. 

  1. Expectations: End user expectations for their IoT devices continue to rise, given rapid growth. It’s estimated that there were 14.4B connected devices at the end of 2022, and the market is projected to grow to $525B by 2027, with 27B connected devices by 2025. News about hacks and bugs continue to cause consumer concern about device use, but with no slowing of adoption, the pressure is on developers to simply deliver better, more reliable products. Consumer wearables or cheap sensors may have a low price point, but if millions of users experience a bug, a company’s reputation and bottom line are at risk. Industrial IoT applications may not scale in quantity, but disrupting operations in infrastructure could impact millions. In healthcare, a failure in devices trusted with sensitive data – or critical care— could be catastrophic.
  2. Security demands: With user expectations skyrocketing, developers are surely detecting rising pressure around more robust device security from users and regulatory bodies alike. There are simply too many vulnerabilities that exist across operating systems, microcontrollers, and connectivity stacks. Consumers will continue to demand and seek out devices that can be expected to protect their data and work as expected. Because of that pressure, governments will eventually need to create stronger compliance and regulatory requirements. The US has the “IoT Cybersecurity Act,” which requires any IoT devices used by the federal government to meet specific cybersecurity standards and guidelines. The EU Cybersecurity Act classifies IoT businesses under a common set of certification standards based on their level of security. France requires IoT device manufacturers to disclose how repairable devices are. These moves indicate momentum toward more regulation. Developers would be smart to pay attention and accommodate basic security standards now instead of scrambling to catch up once compliance is demanded.

New hardware development approaches can move device makers toward these priorities. Software engineers have long had software reliability engineering (SRE) tools to manage systems, solve problems, and automate tasks. Historically, hardware engineers lacked access to similar tools that help automate, accelerate, and optimise development, but that’s changed. Device reliability engineering (DRE) tools are now available to help developers deliver better products that keep customers happy. DRE equips hardware engineers with individual device and fleet-level data to accelerate IoT and edge device delivery while simultaneously minimising risk. 

Implementing DRE for IoT Devices

Just as software developers use SRE tools to accelerate development without impacting software quality or performance, device developers can use DRE to get to market quicker with a quality device that can be updated regularly for premium performance. Below are a few tips for adopting DRE in your development team.

  • Tip 1: Build with bugs in mind. Device developers know it’s impossible to anticipate each end user’s input and the operating environment they subject it to. Rather than waiting on release days for frustrated users to report bugs on Reddit threads, developers should adopt a more proactive development approach, anticipating that bugs will surface and planning for regular fixes and updates as part of the device lifecycle. In practice, that would include ensuring that devices can be reset to either factory conditions or an old firmware version, or establish a minimal firmware route. This approach also allows teams to follow a Day-0 workflow; this allows firmware to be frozen in a bare minimum state so teams ship products with the full knowledge and expectation that algorithms will be continuously improved and devices will be updated after they’ve shipped.
  • Tip 2: Actively account for security. When shipping products, developers must have a plan for how to update them in case of breaches or known vulnerabilities that will need to be patched. Other essential security steps include signing firmware updates and requiring firmware validation on a device and anti-rollback mechanisms, ensuring secure delivery and an unencrypted state in transit. Another security must: ensuring that third-party libraries remain up-to-date. Because third-party code is common and often responsible for critical functionality like connectivity or cryptography, developers need to have insight into third-party code, including its license(s) and available support.
  • Tip 3: Be prepared to monitor. It’s not enough to just get a product to market. Today’s devices must work well and provide unique features, with regular updates, and function flawlessly with other applications, platforms, and devices—all while keeping customer data safe. Remote monitoring of devices in the field is critical to visibility into a fleet’s health, as well as to meeting the rising demands of users. The development cycle essentially extends into post-production to meet the needs for device monitoring. Over-the-air (OTA) monitoring of metrics (e.g., battery life, Bluetooth connectivity, crash-free hours, mean time between failures) allows issues to be detected—and repaired–with little disruption to the user, often before they’re even aware. 

Software teams know well that reliability engineering tools lead to better overall products. Through device reliability engineering, teams can adopt a dynamic, observable, and highly efficient approach. The result: getting—and keeping—lasting products in the hands of loyal customers. 

Tags: , ,

View Comments
Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *