← Writing

Search built for models, not people

A starter draft on what I think about at work. Make it yours before publishing.

Traditional web search optimizes for a human who will glance at a page of results, recognize the one they want, and click. That interface assumes a person in the loop doing the final filtering.

An LLM has a different shape of need. It can’t skim — it ingests. It wants results that are precise, complete, and structured, because whatever comes back goes straight into a context window and gets reasoned over. Noise isn’t a minor annoyance; it’s tokens spent and attention diluted.

A tiny sketch of the shape of the problem:

def answer(question: str, search) -> str:
    # The quality of the answer is bounded by the quality of what we retrieve.
    docs = search(question, k=5)            # retrieval is the bottleneck
    context = "\n\n".join(d.text for d in docs)
    return llm(f"Context:\n{context}\n\nQuestion: {question}")

The model is only as good as search. If retrieval returns the wrong five documents, no amount of clever prompting saves the answer. That’s why I find this layer so interesting: it’s upstream of everything else.

A few open questions I keep coming back to

  • What does “relevance” even mean when the consumer is a model, not a person?
  • How do you evaluate retrieval when the downstream task is open-ended generation?
  • Where should the boundary sit between retrieval and reasoning?

More to come.