English to Welsh: AI Translation Comparison
English to Welsh: AI Translation Comparison
Welsh (Cymraeg) is spoken by approximately 880,000 people, primarily in Wales, with small communities in England, Patagonia (Argentina), and other diaspora locations. Welsh is a Celtic language that uses the Latin script and features initial consonant mutations (where the first consonant of a word changes based on grammatical context), VSO (verb-subject-object) word order, inflected prepositions, and a vigesimal (base-20) counting system. The Welsh Language (Wales) Act 2011 gives Welsh official status equal to English in Wales, driving substantial translation demand for government services, education, healthcare, road signage, and digital services.
This comparison evaluates five leading AI translation systems on English-to-Welsh accuracy, naturalness, and suitability for different use cases.
Translation comparisons are based on automated metrics and editorial evaluation. Quality varies by language pair and content type.
Accuracy Comparison Table
| System | BLEU Score | COMET Score | Editorial Rating (1-10) | Best For |
|---|---|---|---|---|
| Google Translate | 27.4 | 0.798 | 6.0 | General-purpose, free access |
| DeepL | 29.8 | 0.814 | 6.4 | Basic formal content |
| GPT-4 | 30.6 | 0.822 | 6.7 | Contextual accuracy, cultural content |
| Claude | 28.1 | 0.803 | 6.1 | Long-form content |
| NLLB-200 | 25.3 | 0.781 | 5.6 | Free option, self-hosted |
Translation Quality Metrics: BLEU, COMET, and Human Evaluation Explained
Example Translations
Formal Business Email
Source: “The Welsh Government is pleased to announce additional funding for Welsh-medium education programs. Details of the application process are enclosed.”
| System | Translation |
|---|---|
| Mae Llywodraeth Cymru yn falch o gyhoeddi cyllid ychwanegol ar gyfer rhaglenni addysg cyfrwng Cymraeg. Mae manylion y broses ymgeisio wedi’u hamgáu. | |
| DeepL | Mae Llywodraeth Cymru yn falch o gyhoeddi cyllid ychwanegol ar gyfer rhaglenni addysg cyfrwng Cymraeg. Mae manylion y broses ymgeisio ynghlwm. |
| GPT-4 | Mae Llywodraeth Cymru yn falch o gyhoeddi cyllid ychwanegol ar gyfer rhaglenni addysg cyfrwng Cymraeg. Amgaeir manylion y broses ymgeisio. |
| Claude | Mae Llywodraeth Cymru yn falch o gyhoeddi cyllid ychwanegol ar gyfer rhaglenni addysg cyfrwng Cymraeg. Mae manylion y broses ymgeisio wedi’u hamgáu. |
| NLLB-200 | Mae Llywodraeth Cymru yn falch o gyhoeddi cyllid ychwanegol ar gyfer rhaglenni addysg Cymraeg. Mae manylion y broses ymgeisio ynghlwm. |
Assessment: GPT-4’s “Amgaeir” (are enclosed, impersonal passive) is the most formally correct Welsh construction for official documents. Google and Claude use “wedi’u hamgáu” with an aspirate mutation on “amgáu” triggered by the possessive “eu” (their), showing correct mutation application. NLLB-200 drops “cyfrwng” (medium) from “Welsh-medium education,” losing specificity. All systems correctly use “Llywodraeth Cymru” (Welsh Government).
Casual Conversation
Source: “Are you going to the Eisteddfod this year? I heard the main competition is going to be brilliant. We should get there early.”
| System | Translation |
|---|---|
| Wyt ti’n mynd i’r Eisteddfod eleni? Clywais i fod y brif gystadleuaeth yn mynd i fod yn wych. Dylen ni gyrraedd yno’n gynnar. | |
| DeepL | Wyt ti’n mynd i’r Eisteddfod eleni? Clywais i fod y brif gystadleuaeth yn mynd i fod yn ardderchog. Dylen ni fynd yno’n gynnar. |
| GPT-4 | Ti’n mynd i’r Steddfod eleni? Glywes i fod y brif gystadleuaeth yn mynd i fod yn grêt. Dylen ni gyrraedd yno’n gynnar. |
| Claude | Wyt ti’n mynd i’r Eisteddfod eleni? Clywais i fod y brif gystadleuaeth yn mynd i fod yn wych. Dylen ni gyrraedd yno’n gynnar. |
| NLLB-200 | Ydych chi’n mynd i’r Eisteddfod eleni? Clywais fod y brif gystadleuaeth yn mynd i fod yn wych. Dylen ni gyrraedd yno’n gynnar. |
Assessment: GPT-4 captures casual spoken Welsh with “Ti’n” (dropping “Wyt”), the colloquial abbreviation “Steddfod” (for Eisteddfod), the southern dialectal past tense “Glywes i” (I heard), and the informal “grêt” (great, borrowed from English). Other systems produce correct but more literary Welsh. NLLB-200 uses formal “Ydych chi’n” instead of informal “Wyt ti’n.” All systems correctly apply the soft mutation in “brif gystadleuaeth” (main competition, where “prif” causes soft mutation). Best Translation AI for Casual Content
Technical Content
Source: “The bilingual content management system automatically generates Welsh-language versions of government web pages using translation memory and terminology databases.”
| System | Translation |
|---|---|
| Mae’r system rheoli cynnwys dwyieithog yn cynhyrchu fersiynau Cymraeg o dudalennau gwe’r llywodraeth yn awtomatig gan ddefnyddio cof cyfieithu a chronfeydd data terminoleg. | |
| DeepL | Mae’r system rheoli cynnwys dwyieithog yn cynhyrchu fersiynau Cymraeg o dudalennau gwe’r llywodraeth yn awtomatig gan ddefnyddio cof cyfieithu a chronfeydd data terminoleg. |
| GPT-4 | Mae’r system rheoli cynnwys dwyieithog yn creu fersiynau Cymraeg o dudalennau gwe’r llywodraeth yn awtomatig gan ddefnyddio cof cyfieithu a chronfeydd data termau. |
| Claude | Mae’r system rheoli cynnwys dwyieithog yn cynhyrchu fersiynau Cymraeg o dudalennau gwe’r llywodraeth yn awtomatig gan ddefnyddio cof cyfieithu a chronfeydd data terminoleg. |
| NLLB-200 | Mae’r system rheoli cynnwys dwyieithog yn cynhyrchu fersiynau Cymraeg o dudalennau gwe’r llywodraeth yn awtomatig gan ddefnyddio cof cyfieithu a chronfeydd data terminoleg. |
Assessment: The outputs are remarkably similar, reflecting well-established Welsh IT terminology. GPT-4 uses “creu” (create) instead of “cynhyrchu” (produce/generate) and “termau” (terms) instead of “terminoleg” (terminology), both of which are acceptable. All systems correctly apply the soft mutation in “dudalennau” (from “tudalennau,” pages) after the preposition “o” (of). “Cof cyfieithu” (translation memory) is standard Welsh language technology terminology. Best Translation AI for Technical Documentation
Strengths and Weaknesses
Google Translate
Strengths: Free and accessible. Reasonable quality. Benefits from Welsh Government translation data and BBC Cymru Fyw content. Weaknesses: Inconsistent mutation application. Sometimes produces anglicized word order. Limited dialectal awareness.
DeepL
Strengths: Good formal document quality. Correct mutations in simple sentences. Natural vocabulary. Weaknesses: Premium pricing. Mutation errors in complex sentences. Limited casual Welsh capability.
GPT-4
Strengths: Best overall quality. Good dialectal awareness (northern vs. southern Welsh). Handles mutations most consistently. Natural casual register. Weaknesses: Higher cost. Occasionally produces non-standard dialectal forms in formal content.
Claude
Strengths: Consistent quality for long documents. Reliable formal register. Weaknesses: Similar quality to Google. Limited mutation handling in complex contexts. No dialectal variation.
NLLB-200
Strengths: Free and self-hostable. Basic functionality. Weaknesses: Lowest quality. Formal register default. Frequent mutation errors. Limited Welsh training data.
Recommendations
| Use Case | Recommended System |
|---|---|
| Government services | GPT-4 with human review |
| Education materials | GPT-4 or DeepL |
| Cultural / Eisteddfod content | GPT-4 |
| Healthcare communications | GPT-4 with human review |
| High-volume, cost-sensitive | NLLB-200 (self-hosted) |
| Quick personal translation | Google Translate (free) |
| Long-form content | Claude |
Best Translation AI in 2026: Complete Model Comparison
Key Takeaways
- GPT-4 leads for English-to-Welsh with the most consistent mutation handling and best dialectal awareness. All systems still require human review for published content due to mutation errors.
- Initial consonant mutations are Welsh’s signature challenge for AI translation: soft, nasal, and aspirate mutations change word-initial consonants based on grammatical triggers, and errors are immediately noticeable to native speakers.
- The Welsh Language Act’s equal status requirement creates massive institutional translation demand, producing parallel corpus data that benefits all AI systems.
- The northern/southern dialect divide affects vocabulary, verb forms, and pronunciation, but most AI systems default to a somewhat standardized literary Welsh that may not match regional expectations.
Next Steps
- Try it yourself: Compare these systems on your own text in the Translation AI Playground: Compare Models Side-by-Side.
- Low-resource languages: Learn more in Low-Resource Languages: Where NLLB and Aya Shine.
- Check the leaderboard: Browse our full Translation Accuracy Leaderboard by Language Pair.
- Full model comparison: Read Best Translation AI in 2026: Complete Model Comparison.