The Offline-First Problem: Why Most Institutional Software Fails in the Field

Ken Ruto

The most common reason infrastructure software fails in Africa has nothing to do with the software itself. The requirements were wrong. The software was built assuming a condition — persistent, reliable internet connectivity — that does not exist where it is supposed to operate. When the network drops, the software stops working. When the software stops working, the field team reverts to paper. When they revert to paper, the institution reverts to operating blind.

We see this pattern everywhere. A water utility deploys a billing system. The field team is supposed to use it for meter reading, fault reporting, and reconnection orders. The system works well in the operations centre, where there is reliable Wi-Fi. It works badly in Kayole, where GPRS signal fluctuates between one bar and nothing. It does not work at all in Mathare North, where the field technician has to climb a water tower to get signal, and cannot do that while also replacing a burst fitting. After three months, the field team has drifted back to paper. The system is used only by the billing office, which is the least important part of the operation.

This is not a story about bad software. It is a story about a failure to understand the operating environment before writing a line of code.

What connectivity looks like in the field

The mental model that most software engineers carry about connectivity is shaped by where they work. If you write software in Westlands or Kilimani, your connectivity experience is fast fibre, reliable 4G, and background syncing that happens so fast you forget it is happening. You never think about offline state because you are never offline.

The field technician servicing water connections in the periurban settlements of Nairobi has a different experience. Mobile data coverage in these areas is technically 3G or 4G on the map. In practice, it is intermittent. Coverage maps show spectrum allocation, not real-world signal penetration through concrete walls, corrugated iron, and dense settlement geometry. A phone that shows three bars in an open street shows zero bars in a ground-floor room thirty metres away.

Zone	Network (advertised)	Usable data (observed)	Dead zones (%)
Kayole	3G / 4G	Intermittent 2G equivalent	34%
Mathare North	3G	Intermittent, <1 Mbps	51%
Kibera (inner)	4G	Intermittent 3G	28%
Ruai	4G	Reliable 4G	8%
Industrial Area	4G / Fibre	Reliable 4G+	2%

Connectivity conditions: AccessWASH deployment zones, Nairobi, 2022–2024

The data above comes from signal-quality logs recorded by AccessWASH field tablets during normal operations. "Dead zones" are locations where a data operation — syncing a fault report, submitting a meter reading, confirming a reconnection — failed at least once due to connectivity loss. In our densest deployment zones, field staff encountered connectivity failures more than a third of the time.

Software that requires connectivity to function will fail that field technician more than a third of the time. That is not an acceptable failure rate for an operations system.

The cost of the wrong architecture

The cost is not just inconvenience. Connectivity-dependent systems create specific failure modes that compound over time.

Data integrity breaks down. When a field technician cannot submit a form because there is no signal, they have two options: wait (which means standing in a client's compound for an indeterminate time) or write it down and enter it later. The second option, which is what everyone actually does, creates a gap between when the event happened and when it entered the system. By the time the meter reading is entered back at the office, the timestamps are wrong. Aggregated data starts reflecting when things were entered, not when they happened. The operations picture the system shows is not the operations picture that exists.

Trust in the system erodes. Field teams are rational actors. When a system fails them repeatedly — when they have to re-enter data, when syncing loses work, when the system is unavailable when they need it — they make a reasonable judgment: the paper backup is more reliable. They use the digital system when convenient and paper when not. The institution then has two data streams, neither complete, and no clear way to reconcile them.

The institution stops learning. The whole point of operational software is to give the institution a feedback loop — to let it see the relationship between inputs and outputs, between actions and outcomes. If the data is incomplete or lagged, the feedback loop is broken. The institution cannot learn from what it cannot see accurately.

This is why offline-first is not a nice-to-have feature. It is a prerequisite for data integrity. A system that loses data when offline produces worse operational intelligence than a system that never existed at all — because it creates the illusion of data quality while quietly degrading it.

What offline-first actually means

"Offline-first" is sometimes misunderstood to mean "the app still looks like it works when offline." This is cosmetic offline support, and it is useless. Real offline-first means the device is the source of truth, not the server. Reads and writes happen against a local database. The network sync is background infrastructure, not a dependency.

In practice, this means:

Every operation is local first. A fault report is written to local storage the moment the field technician submits it. It does not wait for a server response. It is available immediately on the device. The sync to the server happens whenever connectivity permits — in the background, without blocking the user.

Conflicts are handled deliberately. When multiple technicians update the same record (a shared asset, a shared client account), and those updates were made offline, the system needs a merge strategy. This is not a problem you can defer. You have to design it. The simplest strategies (last-write-wins, server-wins) are often wrong for field operations; a technician who worked on a fault for two hours in a dead zone should not have their work overwritten by a supervisory edit made while they were working.

Sync is resilient. The background sync process needs to handle partial connectivity — uploads that start, pause, and resume without corrupting the data. It needs to queue failed operations and retry them with exponential backoff. It needs to report sync status clearly, so the field team knows which of their records have reached the server and which are still local.

Before, if the internet was bad, we just couldn't work. Now the app works the same whether there is internet or not. We know it will sync when it can. That is all we needed.

— Field supervisor, AccessWASH deployment, Kayole

AccessWASH is built entirely on this architecture. The field tablet writes to a local SQLite database. A background sync process — built on a custom queue with retry logic — handles all server communication. Field staff can complete a full day's work in a connectivity dead zone and sync all of it the next morning when they return to an area with signal. Timestamps reflect when events happened. Data integrity is preserved.

This is not technically complicated. It is architecturally deliberate. You have to decide, before you build anything else, that the device is the source of truth. That decision shapes everything downstream: the data model, the sync protocol, the conflict resolution logic, the test suite. It is much harder to add offline-first to an existing connected architecture than to build offline-first from the start.

The same problem in constituency offices

The connectivity problem is not unique to field operations. It appears in a different form in constituency offices.

Kenya's constituency offices are not in Westlands. Many are in district towns and sub-county headquarters that have 3G coverage on a good day. Power outages — which correlate with connectivity outages, since most rural base stations are grid-powered — can knock a constituency office offline for hours. A system that requires connectivity for caseworkers to update constituent records, process bursary applications, or log CDF transactions will break at exactly the moments the office is busiest and the connectivity is worst.

BungeConnect has the same offline-first architecture as AccessWASH. Caseworkers write to local storage. Records sync when connectivity permits. The system treats connectivity as a transport mechanism, not a dependency.

The result: in offices where we have tracked usage, caseworker engagement with the system is higher during and after connectivity outages than before we deployed. They have learned that the system works when the internet does not. That trust, once established, drives adoption in a way that no training programme can.

A design principle for African institutional software

If you are building software for African institutions — for water utilities, constituency offices, community health systems, commodity markets, any organisation operating at the level of communities and periurban settlements — here is the constraint that should shape everything:

Assume no connectivity. Build the system to function fully without a network. Then add sync as background infrastructure.

This is the inversion that most software misses. The default assumption in most software architecture is that connectivity exists and offline is the edge case. In much of the operating environment that matters for African institutions, this is exactly backwards. Connectivity is the edge case. Offline is the default.

Build for the default.

— Ken Ruto

What connectivity looks like in the field

The cost of the wrong architecture

What offline-first actually means

The same problem in constituency offices

A design principle for African institutional software

Comments