Khlong Trace

← Back to the research record

Case 01 · Direction I · Entity identity and naming · borrowed identity

Which Thai Business Name Does AI Choose After Transliteration Drift

When a Thai business name has several English spellings in circulation, AI search systems may select whichever transliteration appears most prominently in directory and platform records, regardless of the spelling the business itself prefers.

Recorded by Kiet Arunwong January 21, 2026

A Thai business name does not arrive in English already settled. Transliteration choices accumulate across platforms, directories, and booking records, each one slightly different, each one a candidate for the version that ends up in the generated answer.

In a composite scenario, Khlong Trace Laboratory examined a Bangkok wellness clinic whose Thai name could be rendered in English at least four ways. Two of those spellings appeared on directory pages used by AI search systems. A third appeared on the clinic’s own website. A fourth appeared on a map platform that had borrowed the spelling from an older listing. When the team ran identical queries, each run could return a different spelling, sometimes with a different citation behind it.

The scenario is assembled from recurring patterns in Khlong Trace Laboratory’s observations of Thai business name behaviour in AI search. No single named company is under examination. The patterns recur across enough cases to make the composite worth building.

Why transliteration is not a solved problem

Thai script uses a phonetic writing system, but the sounds do not map cleanly onto English letters. Standard romanisation systems exist, but Thai businesses, government databases, map platforms, booking services, and directory operators do not apply them consistently. A business may register under one romanisation for legal purposes, use another on its own signage, and appear under a third on a platform that auto-generated the label from a transliteration algorithm.

None of these versions is necessarily wrong in an absolute sense. Each records an attempt to represent the Thai sounds in the Latin alphabet. The problem for AI search is that the different spellings fragment the evidence. Pages about the same business look, at the text level, as though they concern different entities.

Khlong Trace Laboratory treats this fragmentation as the primary mechanism behind transliteration drift. The drift is not one wrong spelling. It is the spread of plausible spellings across the public record, each anchored in a slightly different source, each capable of entering a generated answer under the right conditions.

The role of directory prominence

In the composite clinic case, the team observed that the spelling most prominent in directory pages was not the one the clinic preferred. A national health directory had indexed the clinic under a transliteration that differed by one vowel from the clinic’s preferred English spelling. A major booking platform had inherited the directory spelling. A second directory used a third spelling derived from a different romanisation system.

When AI search systems assembled answers about the clinic, the citation frequently pointed to the directory page. The spelling in the generated answer matched the directory, not the clinic’s own site. The clinic’s preferred spelling appeared in the answer when the prompt used that exact spelling, but often not otherwise.

Khlong Trace Laboratory would describe this as borrowed identity at the spelling level. The preferred name is not being actively suppressed. It simply has fewer prominent citations behind it. The system appears to follow the weight of the visible public record, and in this case the weight lies with the directory version.

Repeated runs and spelling instability

One of the more useful observations from this study is that spelling instability compounds across runs. A single query may return the preferred spelling on one occasion and the directory spelling on another. When the team ran the same prompt across several sessions and preserved the results, the returned spelling was not consistent even under stable conditions.

That instability is not random noise. It reflects genuine uncertainty in the public record. The system is not choosing between a right answer and a wrong answer. It is choosing between several plausible versions, each backed by a different slice of the visible evidence.

For a business owner, the practical weight of this finding is significant. A single check of an AI search result may show the preferred spelling and produce relief. A check on a different day, or with slightly altered wording, may show the directory spelling. A third check may show a hybrid formulation that appears in neither the directory nor the clinic’s own pages but resembles a social platform’s auto-generated label.

Khlong Trace Laboratory does not treat any of these as the definitive answer about the business. Each is an observation tied to a moment, a prompt, a language, and a visible source set.

How citation attachment follows spelling

The more consequential part of the observation is what travels with the spelling. When the system uses the directory spelling, it typically also uses the directory’s other data: category, location, opening hours, and the brief description that appears on the directory listing page. When the system uses the clinic’s own spelling, it tends to draw from the clinic’s own pages, which carry a different and generally more precise set of attributes.

A citation attached to the directory spelling is not wrong in a narrow sense. The directory page does exist, and it does refer to the clinic. The problem is that the directory page may carry a broader category, an older address, or a different set of services than the clinic currently offers. The generated answer inherits all of this alongside the spelling.

Khlong Trace Laboratory uses the Four Source Relationships typology to describe this. When the directory page supports the category claim as stated, that is direct support. When the directory carries a general medical category but the clinic is a specialist facility, the category support is stretched. When the directory description mentions treatments associated with the similarly named larger facility down the road, the identity has been borrowed rather than found. When no visible source supports a particular service claim that appears in the generated answer, that claim arrived without support.

The spelling is the surface symptom. The source relationship is the structural problem.

What a business can do with this information

The laboratory is cautious about prescriptions. Adding the preferred romanisation to a directory record does not guarantee that subsequent queries will use it. Directory updates take time to propagate. AI systems may cache older versions. The public record already contains the competing spellings, and removing them entirely is rarely possible.

What the laboratory can offer is a more precise diagnosis. If the preferred spelling appears on the clinic’s own site but not in the prominent directories, the problem is directory lag, not incorrect self-presentation. If the preferred spelling appears in directories but the AI system is still choosing the older version, the problem may be source prominence or platform category weighting rather than simple spelling correction.

Correcting a spelling without understanding which source the system is using and why is likely to produce a marginal improvement at best, and it may introduce a fourth competing version at worst.

The stronger intervention, in the laboratory’s observation, is to ensure that the preferred spelling appears consistently across the highest-prominence sources: the business’s own structured data, its most-indexed service pages, and the directories whose categories and attribute data seem to carry most weight in the generated answers. The spelling correction becomes effective when it replaces the directory version across enough prominent pages that the weight of the visible record shifts.

What the observations cannot show

Khlong Trace Laboratory does not claim to know the internal logic of any AI search system. The apparent retrieval path — the sequence from prompt to entity selection to source citation — is a reconstruction from visible evidence. It tells the team which pages appeared and what they said. It does not reveal hidden ranking signals, private training data, or the intermediate steps between source retrieval and final generation.

The composite scenario also carries no claim about frequency. The patterns observed here recur across the laboratory’s case base, but the laboratory has not measured how often Thai businesses in general face transliteration drift of this kind. The finding belongs to the preserved observations, not to a general statistic.

A business operating in Nonthaburi would face a different evidence environment from one in a district with a long and settled English name. A clinic with a purely Thai-script brand presence would face different conditions from one that has actively maintained English pages for a decade. The observations are specific. Their usefulness lies in knowing when they apply.

Kiet Arunwong
responsible for the record
Khlong Trace Laboratory · Bangkok · January 21, 2026