The Web Was Built for Navigation, Not Comprehension

Ken Ruto

You are reading a paper on watershed governance in sub-Saharan Africa. Forty seconds in, you encounter the phrase "polycentric governance." You know governance. You know poly- means many. But polycentric governance, as a term of art in institutional economics — the specific meaning Elinor Ostrom gave it, the tradition it sits in, the way it differs from federalism — that you don't know. Not quite.

You have two choices. Keep reading and carry the term as an unresolved variable, hoping context will fill it in. Or stop, open a new tab, type the phrase, scan a Wikipedia summary, read half a paragraph, realize you need to understand Ostrom first, open another tab, skim her Nobel lecture, get mildly distracted by something else on the screen, and — three minutes later, maybe five — make your way back to the paper you were reading, having lost the thread.

The second option feels like the right one. It is, in practice, nearly fatal to sustained comprehension.

This is not a self-discipline problem. It is an architecture problem. The web was designed to let you move between documents. It was not designed to help you understand any one of them.

A Link Is an Exit, Not an Explanation

The hyperlink is the defining invention of the web. Tim Berners-Lee imagined it as a way to connect scientific papers — a machine-readable bibliography, embedded in the text rather than stacked at the bottom. The link says: this term connects to another document. Click here and you will go there.

What a link cannot do is bring the other document to you. It cannot surface the relevant sentence from a 40-page paper without asking you to navigate to that paper. It cannot synthesize the three most cited definitions of a term and present them in a single sentence inline with the text you are reading. It cannot distinguish between a reader who needs the definition and one who already has it.

The hyperlink is a navigation primitive. It solves the problem of document connectivity — the problem that mattered most in 1991, when the challenge was getting scientific knowledge to link up at all. It does not solve the problem of in-document comprehension, because that problem didn't motivate the design. The architecture optimized for what it was asked to optimize for.

The result is a medium that is extraordinary for discovery and almost hostile to depth.

This is not a criticism of Berners-Lee or the early web. The early web solved exactly the problem it set out to solve. But architectural choices made for good reasons in one era become invisible constraints in the next — and the constraint is now costing knowledge workers an enormous, mostly unexamined amount of time.

What We Lost When We Gave Up Marginalia

Before the web, serious readers had a technology for comprehension that worked remarkably well: marginalia. Annotations, glosses, cross-references, interlinear notes — the scholarly tradition of writing on and between the text rather than separate from it.

A medieval manuscript might carry three layers of commentary: the base text, a gloss running in a narrow column alongside each paragraph explaining difficult terms, and an interlinear commentary filling the white space between lines with micro-explanations of specific phrases. The reader moved through all three simultaneously. The explanation was co-located with the confusion.

The print revolution disrupted this, not deliberately but structurally. Printed books are harder to annotate than manuscripts. The economics of the printing press favored clean, unmarked pages. Footnotes and endnotes emerged as compromises — explanations that preserved the visual cleanliness of the page while exiling the commentary to the margins or the back of the book. The further the footnote from the text, the more context was lost by the time you arrived there.

The digital era promised to reverse this. Hypertext theorists in the 1980s imagined linked, annotated documents as a return to the richness of manuscript culture. What we got instead was documents that were even harder to annotate than printed pages — smooth, rendered HTML with no natural place for reader-generated commentary, surrounded by interfaces designed to move you through content rather than linger in it.

Reading on a screen is navigation. The scroll gesture, the back button, the address bar, the tab — every affordance of the browser is a navigation affordance. Comprehension affordances — pause, annotate, ask, expand — have to be bolted on after the fact, against the grain of the medium.

The Context-Switch Tax

In 2005, Gloria Mark published research that would go on to be among the most cited in the field of cognitive productivity. She tracked knowledge workers across a full day of work and measured what happened every time they were interrupted or switched contexts. The recovery time — the time required to return to the same depth of engagement before the interruption — was, on average, 23 minutes.

Twenty-three minutes to recover from a single context switch.

The tab-opening pattern most readers use for unfamiliar terms is a context switch. A mild one, perhaps — you return from the Wikipedia page to the paper you were reading, not from a Slack message to a codebase. But the essential structure is the same: an interruption, a departure from the cognitive state required to understand what you were reading, and a recovery period that is longer than the interruption itself.

Reading behavior	Cognitive cost	Average recovery time	Typical outcome
Open new tab for unfamiliar term	Context switch	3–23 min (Mark 2005)	Return to text with lost thread ~60% of the time
Skip the term and keep reading	Unresolved variable in working memory	Accumulates per term	Comprehension degrades over multi-page documents
Inline glossary (hover tooltip)	Micro-interruption	~5 seconds	Term resolved, reading continues
Inline AI explanation (AnswerTab model)	Sub-interruption	~10–15 seconds	Term resolved with context-specific explanation, reading continues

The numbers in that table are rough — research on reading comprehension is methodologically tricky, and real-world variance is high. But the directional claim is solid: the cost of a tab-based context switch, repeated over the course of a serious reading session, is not negligible. It is the primary reason most people do not finish the papers, reports, and long-form articles they start.

The Scrollbar Became the Progress Metric

There is a second, subtler problem with the web as a reading medium, separate from the context-switch tax. The dominant metaphor for reading progress online is physical position in the document — the scrollbar.

The scrollbar is a navigation metaphor. It tells you where you are in the document, not whether you understood what you just read. It measures exposure to words, not comprehension of them. And because the scrollbar is visible and moving feels like progress, the medium implicitly encourages a reading behavior that privileges completion over understanding: skim to the bottom, see the scrollbar reach 100%, close the tab.

This isn't the reader's fault. It's a design affordance. The medium rewards coverage. The marginalia tradition rewarded depth.

Reading is learning from an absent teacher. The author cannot see your confusion. The book cannot ask whether you understood. The reader must bring the interrogation.

— Mortimer Adler, How to Read a Book

Adler wrote that in 1940, about printed books. The same gap exists on the web — but the web adds a wrinkle: the interface actively discourages the interrogation. A book sits still while you think about it. A web page is surrounded by notifications, related links, autoplay videos, and a toolbar full of escape routes. The architecture of the browser is the architecture of distraction.

The Comprehension Layer That Doesn't Exist

If you sketch what the web is missing, it looks like this:

A layer that sits between the reader and the document. That watches what you're reading, not to surveil you but to be ready. That activates when you select a phrase that you don't fully understand, or when you reach for a new tab. That surfaces the explanation in context — not as a link to another document, but as a few sentences, generated for this reader, in relation to this passage, right now.

Not a summary (summaries are still navigation — they move you above the document). Not a link (links are exits). An annotation. The gloss that the web never had.

The technical components to build this have existed in crude form for several years and in sophisticated form since late 2022. Large language models can explain concepts. Browser extension APIs can observe selected text. The marginal cost of a well-targeted explanation is close to zero.

What hasn't existed is an implementation that gets the interaction design right. The explanation has to appear at the point of confusion, not require a mode switch. It has to be brief by default and expandable by choice — not a chatbox, not a sidebar, not a new window. It has to work with the grain of the reading experience rather than adding friction to it.

The failure mode of every previous attempt to add "AI reading assistance" is interaction design, not model quality. A sidebar that requires you to context-switch into it defeats the purpose. A chat interface that asks you to type a question interrupts the reading mode. The right interface is the one that feels, when it works, like the explanation was always there.

BYOK as the Only Architecturally Correct Response

There is a separate problem with building a comprehension layer for serious readers: the intimacy of the data.

What you look up, what you don't know, what confuses you, what questions you ask about the things you're reading — this is not generic behavioral data. It is a map of the edges of your understanding. It reveals your domain, your level of expertise, your research agenda, your intellectual gaps. It is, in the language of data privacy, deeply sensitive.

A comprehension tool that routes your queries through a proprietary platform to be logged, analyzed, and potentially used to train models or target advertising is not a comprehension tool. It is a surveillance system dressed as a reading aid. The users who would benefit most from serious reading assistance — researchers, journalists, policymakers, analysts — are precisely the users who cannot afford to expose their knowledge boundaries to a third party.

Bring Your Own Key (BYOK) is the architecturally correct response to this. Your API key, your data, your queries going directly to the model provider — no intermediary platform capturing the signal. The tool becomes a local shell around a cloud model, not a service that owns your data.

This is not primarily a privacy-as-virtue argument. It is a product-market-fit argument. The users who will pay for a serious comprehension tool are the users with the highest reading volume and the highest stakes attached to what they're reading. Those users have already thought about where their data goes. They have institutional email addresses and VPNs and strong opinions about which cloud services they trust with which data. A tool that asks for their API key will feel correct to them, not inconvenient.

A tool that routes their reading behavior through another platform will feel like exactly what it is: a compromise they didn't ask for.

What the Comprehension Layer Changes

The case for a comprehension layer is not primarily about convenience. It is about what kind of knowledge work becomes possible when the friction of deep reading drops.

Right now, the effective reading speed of a serious knowledge worker on complex material is throttled not by how fast they read but by how often they stop. A 30-page policy paper with fifteen unfamiliar terms or contested claims takes three hours not because the reading itself takes three hours but because each unfamiliar term spawns a lookup that spawns a distraction. Remove the distraction, compress the lookup to fifteen seconds, and the same paper takes forty-five minutes.

Multiply that across a researcher who reads twenty documents a week. Multiply it across an analyst who needs to stay current across three domains simultaneously. The time reclamation is significant — not in the "10% productivity improvement" sense but in the "qualitative change in what is achievable" sense. A researcher who can read twenty complex documents per week instead of eight is not doing the same work faster. They are doing different work.

The harder case is what happens to comprehension quality, not just speed. The tab-opening pattern doesn't just cost time. It systematically punishes depth. Readers learn, implicitly, to avoid texts that require too many lookups — which means avoiding texts that challenge them. The comprehension layer doesn't just remove friction. It removes the friction that currently selects against difficulty.

The human mind does not work that way. It operates by association. With one item in its grasp, it snaps instantly to the next that is suggested by the association of thoughts.

— Vannevar Bush, As We May Think

Bush was describing the memex — a hypothetical device that would augment human memory and associative thinking, built around the idea that knowledge work is fundamentally associative. He was writing in 1945. What he described has been technically feasible for a decade. It hasn't been built because the incentive structure of the attention economy runs exactly counter to it.

The attention economy is built on navigation — on keeping you moving through content rather than deepening in it. Every feature of every major content platform is optimized for time-on-platform, which rewards breadth over depth. The comprehension layer is anti-attention-economy by construction. It helps you read one thing well rather than many things superficially. It is not compatible with a business model that monetizes distraction.

This is why the comprehension layer will not come from the platforms. It will come from a tool that sits outside them — extension-based, BYOK, with no interest in your attention.

The web gave us extraordinary access to human knowledge. It made every library available from every location. It made the sum of human writing searchable. What it did not do was help you understand any of it. The navigation layer is complete. The comprehension layer hasn't been built yet.

That is the gap AnswerTab is built to close.