Arabic to Persian: AI Translation Comparison

Arabic is spoken by over 400 million people across the Middle East and North Africa, while Persian (Farsi) serves approximately 110 million speakers primarily in Iran, Afghanistan (as Dari), and Tajikistan (as Tajik). Translation demand between these two languages is driven by shared religious scholarship in Islamic texts, bilateral trade between Iran and Arab Gulf states, literary exchange rooted in centuries of cultural overlap, and diplomatic communications across the region. Despite sharing the Arabic script (with Persian adding four additional letters), the languages belong to different families: Arabic is Semitic with a root-based trilateral morphology and VSO word order, while Persian is Indo-European with an SOV structure and a largely analytic grammar that uses postpositions and light verb constructions.

Translation comparisons are based on automated metrics and editorial evaluation. Quality varies by language pair and content type.

Accuracy Comparison Table

System	BLEU Score	COMET Score	Editorial Rating (1-10)	Best For
Google Translate	38.4	0.86	7.8	General content, news articles
DeepL	36.1	0.84	7.3	Formal business documents
GPT-4	40.2	0.88	8.4	Literary and religious texts
Claude	39.5	0.87	8.1	Academic and diplomatic content
NLLB-200	35.7	0.83	7.1	Offline batch processing

Translation Quality Metrics: BLEU, COMET, and Human Evaluation Explained

Example Translations

Scenario 1: Formal Business Email

Source text (Arabic): “السيد المحترم، يسعدنا أن نعلمكم بأن شركتنا ترغب في إقامة شراكة تجارية مع مؤسستكم الموقرة. نأمل أن نتمكن من ترتيب اجتماع في الأسابيع القادمة لمناقشة شروط التعاون المشترك.”

System	Persian Translation
Google Translate	آقای محترم، خوشحالیم به اطلاعتان برسانیم که شرکت ما مایل به ایجاد مشارکت تجاری با موسسه محترم شماست. امیدواریم بتوانیم در هفته‌های آینده جلسه‌ای برای بحث درباره شرایط همکاری مشترک ترتیب دهیم.
DeepL	جناب آقا، با کمال احترام به اطلاع می‌رسانیم که شرکت ما خواهان برقراری همکاری تجاری با مؤسسه شما می‌باشد. امید است در هفته‌های پیش رو جلسه‌ای جهت بررسی شرایط همکاری ترتیب داده شود.
GPT-4	آقای محترم، مفتخریم به اطلاع برسانیم که شرکت ما علاقه‌مند به ایجاد مشارکت تجاری با مؤسسه محترم شماست. امیدواریم بتوانیم در هفته‌های آتی جلسه‌ای جهت بحث و بررسی شرایط همکاری متقابل برگزار کنیم.
Claude	آقای گرامی، با کمال مسرت اعلام می‌داریم که شرکت ما خواستار برقراری شراکت تجاری با مؤسسه ارجمند شماست. امیدواریم در هفته‌های پیش رو بتوانیم نشستی به منظور گفت‌وگو درباره شروط همکاری مشترک تدارک ببینیم.
NLLB-200	آقای محترم، خوشحالیم که به شما اطلاع دهیم شرکت ما می‌خواهد شراکت تجاری با سازمان شما ایجاد کند. امیدواریم در هفته‌های آینده جلسه‌ای برای بحث در مورد شرایط همکاری داشته باشیم.

Assessment: GPT-4 delivered the most natural Persian business register, using formal expressions like “مفتخریم” and “همکاری متقابل” that mirror standard Iranian commercial correspondence. Claude produced equally polished output with strong diplomatic vocabulary. NLLB-200 was accurate but somewhat flat in tone, missing the elevated formality expected in Persian business writing.

Scenario 2: Casual Conversation

Source text (Arabic): “والله يا صاحبي، الجو اليوم رهيب! تعال نروح على المقهى ونشرب شاي ونتسولف شوي. خلاص ملينا من الشغل.”

System	Persian Translation
Google Translate	به خدا رفیق، هوا امروز عالیه! بیا بریم کافه چایی بخوریم و یکم گپ بزنیم. دیگه از کار خسته شدیم.
DeepL	به خدا دوست من، هوای امروز فوق‌العاده است! بیا برویم قهوه‌خانه و چای بنوشیم و کمی صحبت کنیم. دیگر از کار خسته شده‌ایم.
GPT-4	والله رفیق، هوا امروز محشره! بیا بریم کافه یه چایی بزنیم و یه کم حرف بزنیم. دیگه حوصلمون از کار سر رفته.
Claude	به خدا رفیق، هوا امروز خیلی توپه! بیا بریم کافه چایی بخوریم و یکم درد و دل کنیم. دیگه از کار کلافه شدیم.
NLLB-200	به خدا دوست من، آب و هوا امروز عالی است. بیایید به کافه برویم و چای بنوشیم و صحبت کنیم. دیگر از کار خسته شده‌ایم.

Assessment: GPT-4 captured the colloquial energy with natural Persian slang like “محشره” and “حوصلمون سر رفته.” Claude similarly used casual registers effectively. DeepL defaulted to more formal speech, and NLLB-200 produced grammatically correct but stilted output that lost the conversational warmth.

Scenario 3: Technical Content

Source text (Arabic): “تعتمد الترجمة الآلية العصبية على نماذج المحولات التي تستخدم آلية الانتباه الذاتي لمعالجة التسلسلات اللغوية. يتطلب تدريب هذه النماذج مجموعات بيانات متوازية كبيرة تحتوي على ملايين أزواج الجمل المترجمة.”

System	Persian Translation
Google Translate	ترجمه ماشینی عصبی بر مدل‌های ترانسفورمر متکی است که از مکانیزم توجه خودی برای پردازش توالی‌های زبانی استفاده می‌کنند. آموزش این مدل‌ها به مجموعه‌داده‌های موازی بزرگ حاوی میلیون‌ها جفت جمله ترجمه‌شده نیاز دارد.
DeepL	ترجمه ماشینی عصبی مبتنی بر مدل‌های ترانسفورمر است که از سازوکار خودتوجهی برای پردازش دنباله‌های زبانی بهره می‌گیرند. آموزش این مدل‌ها مستلزم مجموعه‌داده‌های موازی بزرگ شامل میلیون‌ها زوج جمله ترجمه‌شده است.
GPT-4	ترجمه ماشینی عصبی بر پایه مدل‌های ترنسفورمر بنا شده است که از سازوکار خود-توجهی (Self-Attention) برای پردازش توالی‌های زبانی استفاده می‌کنند. آموزش این مدل‌ها نیازمند مجموعه‌داده‌های موازی حجیم حاوی میلیون‌ها جفت جمله ترجمه‌شده است.
Claude	ترجمه ماشینی عصبی بر مدل‌های ترنسفورمر استوار است که با بهره‌گیری از سازوکار خود-توجهی، دنباله‌های زبانی را پردازش می‌کنند. آموزش این مدل‌ها مستلزم مجموعه‌داده‌های موازی گسترده‌ای است که میلیون‌ها جفت جمله ترجمه‌شده را در بر می‌گیرد.
NLLB-200	ترجمه ماشینی عصبی بر مدل‌های ترانسفورمر تکیه دارد که از مکانیسم توجه خود برای پردازش توالی‌های زبانی استفاده می‌کنند. آموزش این مدل‌ها به مجموعه‌داده‌های موازی بزرگ با میلیون‌ها جفت جمله ترجمه نیاز دارد.

Assessment: GPT-4 provided the most complete technical translation, helpfully including the English term “Self-Attention” in parentheses for clarity. Claude and DeepL both produced polished technical prose. All systems handled this high-resource pair well for technical content, reflecting the substantial parallel corpora available for Arabic-Persian.

Strengths and Weaknesses

Google Translate

Strengths: Fast and reliable for everyday content. Strong coverage of Modern Standard Arabic and conversational Persian. Handles Arabic dialectal variations reasonably well. Weaknesses: Occasionally mishandles literary and classical Arabic expressions. Can produce awkward Persian constructions for complex sentences.

DeepL

Strengths: Clean, professional translations. Good with formal registers and business terminology. Weaknesses: Less effective with colloquial Arabic dialects. Tends toward overly formal Persian output. Limited handling of Arabic-Persian cognate disambiguation.

GPT-4

Strengths: Best overall quality for this pair. Excellent cultural context awareness given the deep historical connections between Arabic and Persian. Handles register shifts naturally and provides helpful transliterations for technical terms. Weaknesses: Slower processing time. Occasionally over-explains cultural references that native speakers would understand without annotation.

Claude

Strengths: Consistent, high-quality output. Strong diplomatic and academic register. Careful handling of religiously sensitive content. Weaknesses: Sometimes overly conservative with colloquial expressions. Can miss dialectal nuance in Gulf or Levantine Arabic.

NLLB-200

Strengths: Open-source and privacy-preserving. Solid baseline performance given the relatively high resource availability for this pair. Good for batch processing. Weaknesses: Consistently more formal than appropriate for casual content. Lacks cultural sensitivity that commercial systems offer.

Recommendations

Use Case	Recommended System
Islamic scholarship and religious texts	GPT-4
Business correspondence	Claude or DeepL
News and media translation	Google Translate
Diplomatic and political content	Claude
Bulk document translation	NLLB-200
Social media and casual chat	GPT-4

Best Translation AI in 2026: Complete Model Comparison

Key Takeaways

GPT-4 leads for Arabic-to-Persian translation with the strongest grasp of shared cultural and religious vocabulary
The shared Arabic script simplifies transliteration but does not reduce the fundamental grammatical differences between these Semitic and Indo-European languages
All commercial systems handle this medium-to-high resource pair competently, with meaningful quality differences emerging primarily in literary and colloquial content
NLLB-200 provides a solid open-source baseline but falls short on register sensitivity

Next Steps

Try it yourself: Translation AI Playground lets you compare systems side by side.
Reverse direction: Persian to Arabic Translation covers translation going the other way.
See the full leaderboard: Translation Accuracy Leaderboard ranks all systems across 200+ language pairs.
Learn how it works: How AI Translation Works covers the technology behind neural machine translation.