Khlong Trace

← Back to the research record

Case 10 · Direction III · Variation across runs and systems · unsupported arrival

Which Errors Return Across Repeated Prompt Runs?

An error becomes more informative when its underlying relationship returns across preserved runs, even if the answer changes its wording, order, confidence, or visible citations.

Recorded by Kiet Arunwong February 18, 2026

When an answer changes every time, the unstable sentences draw attention. The harder question is whether the same mistaken identity, category, location, or unsupported claim keeps returning underneath them.

In one run of a composite scenario, a Bangkok wellness business was called a hospital. In the next, it became a “specialist medical centre.” A third answer described it more cautiously as a clinic, then recommended it for services the business did not advertise. The name remained recognisable throughout. So did one service page cited beside the changing descriptions.

The scenario is assembled from recurring patterns around Study object A: an independent clinic whose Thai name has several English transliterations and resembles that of a larger medical facility. The wording moved around. The underlying problem did not disappear so easily. Each run kept pulling the business toward a broader medical identity, though the exact label and degree of certainty changed.

The sentence changes first

Generated answers rarely return as clean duplicates. A model may reverse the order of recommendations, shorten a description, omit a citation, substitute a synonym, or add a qualification that was absent in the previous run. Even when the prompt appears unchanged, the answer can arrive wearing a different jacket.

That visible variation encourages two opposite mistakes. One reader may dismiss the whole exercise because the system is “random.” Another may treat each altered sentence as a new result and count differences that have little bearing on the business identity being represented. Both readings lose the structure of the observation.

The laboratory treats a repeatable run as a renewed inquiry conducted with sufficiently preserved prompt conditions and procedure to compare what returns. It does not require identical wording. The record includes the prompt, answer, visible citations, language, model context, observation date, and relevant run conditions. Where a condition cannot be preserved, the change is noted rather than smuggled into the comparison.

A recurring error is a claim-level pattern that returns across preserved runs because the same entity, attribution, or source mismatch survives wording changes.

This definition is deliberately narrower than “the answers looked similar.” Two runs may share almost no phrasing and still carry the same wrong province. Conversely, two nearly identical paragraphs may differ at the one place that matters: one identifies the correct branch, while the other quietly borrows the address of another location.

The first task, then, is to strip away sentence-level movement without stripping away evidence. The laboratory compares which entity appears to have been identified, which category is assigned, which location is stated, what attributes are attached, and whether the visible sources support those claims. The answer’s prose remains part of the observation, but it is no longer allowed to dominate it.

What counts as the same error

Suppose the composite clinic appears in four renewed inquiries. One answer calls it a hospital, another a medical centre, another a private healthcare provider, and another a clinic with “hospital-level facilities.” Those are not identical claims. They may still belong to one returning pattern if each description extends beyond the category established by the clinic’s own material and visible listings.

The laboratory is cautious here. Similarity alone is weak evidence. A broader category can enter through several routes: a platform-generated label, a mistaken entity match, an English transliteration shared with a larger facility, or ordinary model wording that compresses several healthcare terms. A repeatable-run record does not prove which route was used internally. It shows whether the same kind of departure from the available evidence returns.

The distinction becomes clearer when the team compares claim-source relationships. The Four Source Relationships typology gives the recurring pattern a stable vocabulary. Direct support appears when the cited page supports the claim as stated. Stretched support appears when the page supports a narrower fact, such as treatments offered, while the answer expands that fact into a hospital category. Borrowed identity appears when details from another organisation are carried into the clinic’s description. Unsupported arrival covers a claim for which no visible source in the observation provides support.

These relationships can persist even when the citations change. One run may cite the clinic’s treatment page and stretch it into a category claim. Another may cite a directory with an overly broad label. A third may show no citation beside the same category. The surface evidence differs, yet the repeated movement is still toward an identity larger than the one established by the available material.

The reverse can happen too. The same citation may sit beside different claims. In one answer it supports the clinic’s address directly. In another it is positioned after a sentence that joins the address, ownership, medical category, and a quality judgment. The page has not changed; the burden placed on it has.

A returning error therefore needs to be described at the right level. “The model repeated itself” is usually too coarse. “The hospital category returned under three formulations, although the visible sources supported treatments and location only” is inspectable. It tells another reader what remained stable and where the support stopped.

Repetition can expose a mistaken identity

Study object B offers a different composite pattern. It concerns a regional restaurant group with branches in Bangkok and a neighbouring province. Its social pages, map listings, booking platforms, and directory entries use inconsistent branch names. A similarly named independent venue appears in the same discovery category.

In a typical sequence of renewed runs, the first answer recommends the Bangkok branch but gives the neighbouring province’s address. The second selects the provincial branch and attaches a description apparently drawn from the independent venue. The third avoids naming a branch, then states that the restaurant is located “just outside Bangkok.” It also gets the closing day wrong, a small rough edge that does not fit neatly into the larger branch-confusion story.

The differences matter. They show that the system is not simply copying one fixed record. Yet the branch boundary remains unstable each time. The group name seems to function like a loose drawer label: information associated with several locations is being placed inside, then retrieved without a reliable divider.

The laboratory would record this as a returning entity-identification problem only after checking the visible evidence for each run. The correct-looking group name does not establish that the correct branch was identified. The address, booking link, category, photographs, and supporting source may point elsewhere.

Repeated runs are especially useful where one answer looks plausible enough to escape notice. A Bangkok restaurant described with a provincial landmark may not seem obviously wrong to a reader unfamiliar with the area. When the same geographic leakage appears again under altered wording, the discrepancy becomes easier to isolate. Repetition does not validate the laboratory’s explanation, but it can make the observed boundary failure harder to dismiss as a stray sentence.

There is another useful possibility: the error may fragment. One run carries the wrong address, another the wrong opening status, and a third the reputation language of the similarly named venue. Those claims need not be treated as unrelated merely because they arrive separately. If the apparent source trail repeatedly crosses the same entity boundary, the runs may reveal different pieces of one identification problem.

The team resists forcing a tidy answer. A shared name may explain the confusion, but so may an old listing, a platform category, or a branch page with weak internal naming. Several explanations can remain compatible with the observations. The record keeps that uncertainty visible.

Stability is not the same as truth

A repeated correct claim deserves the same scrutiny as a repeated incorrect one. If several runs return the same address, the result may look reassuring. The laboratory still checks the address against the visible material and the entity being discussed. A consistently repeated statement can be consistently inherited from an outdated or mismatched page.

This is where repetition becomes slightly treacherous. Human readers are sensitive to recurrence. A claim seen several times begins to feel established even when each occurrence may descend from the same ambiguous public record. The model’s confidence can amplify the effect. A clean sentence repeated with minor variations starts to resemble corroboration.

The research record breaks that spell by keeping runs separate. Cross-run agreement is an observation. Support is assessed claim by claim. The fact that a category returns tells the laboratory that the category is persistent under the recorded conditions. It does not show that the category is correct, and it does not prove that independent sources produced it.

The same discipline applies when the error disappears. A later run that identifies the correct branch does not erase earlier observations. It shows that the pattern is variable. The laboratory then asks what else changed: prompt wording, language, visible citations, model context, observation occasion, or the public information available about the entity.

Sometimes nothing obvious explains the correction. That absence is itself part of the record. Generative systems can produce different selections under apparently similar conditions, and the laboratory cannot inspect every internal step. A disappearance should not be narrated as a repair unless there is evidence of a repair.

For a business owner, this distinction has practical weight. A single mistaken answer may be incidental. A recurring error shows that the mistaken version can be reconstructed under the recorded conditions. The appropriate response is rarely to chase the exact sentence. It is to inspect the names, branch labels, categories, addresses, and source relationships that repeatedly accompany the mistake.

What repeated runs cannot reveal

A preserved series can show that a pattern returns. It cannot expose the model’s complete retrieval infrastructure, hidden ranking logic, undisclosed intermediate steps, or every source used internally. The apparent retrieval path remains a reconstruction from visible evidence.

Repeated observations also do not create a numerical failure rate unless the laboratory has designed and preserved a sample capable of supporting one. Running a prompt several times and seeing an error recur does not justify claims about how often all users will encounter it. The result belongs to the conditions recorded.

Model context can shift between observation dates. Search indexes change. Listings are edited, citations appear or disappear, and the system itself may be updated without offering a researcher a clean boundary between versions. Where those changes are visible, the laboratory records them. Where they are not, the comparison carries additional uncertainty.

There is a subtler limitation. The procedure can make a returning pattern look more unified than it really is. “Wrong branch” may cover several distinct mechanisms: a shared group page, a booking platform that suppresses branch labels, an ambiguous English name, or a geographic phrase interpreted too broadly. The laboratory therefore preserves the individual claims and sources before applying a higher-level description.

The strongest conclusion available from repeated prompt runs is usually modest: under stated conditions, a particular entity, attribution, location, or support problem returned. That conclusion can be valuable. It identifies an observed seam where the generated representation repeatedly tears, even when each paragraph tears in a slightly different shape.

Kiet Arunwong
responsible for the record
Khlong Trace Laboratory · Bangkok · February 18, 2026