VoiceArena · Sonic-3 · v2 · May 2026
Performance brief · v2 data

Cartesia Sonic-3 on VoiceArena

A 119-Elo spread on a single model. Six languages, twelve thousand battles, and a defect pattern that points at three concrete fixes.

Generated May 2026 · re-derived from raw votes · cross-checked against the live leaderboard (max |Δ| = 0.00 Elo)
Best language
#4/8
Hindi
Worst language
#7/7
US English
Elo spread
120
same model
Pairwise battles
12,859
11,477 defect tags
§02Cross-language

How Sonic-3 ranks across six languages

Same model, six markets. The gap between top and bottom is bigger than the gap between most adjacent competitors.

LanguageRankElo95% CIWin rateBattles
🇮🇳 Hindi4 / 81012.1±952.1%2,475
🇧🇷 Brazilian Portuguese3 / 71011.7±1151.9%2,201
🇻🇳 Vietnamese3 / 7991.1±1248.7%1,487
🇸🇦 Arabic MSA6 / 7931.6±839.5%2,687
🇯🇵 Japanese4 / 6925.2±1238.7%2,225
🇺🇸 US English7 / 7892.5±1032.9%1,784

Hindi and Brazilian Portuguese sit comfortably mid-pack; US English finishes dead last among seven models. The 119-Elo spread between best and worst is a single-model variance, not a model-vs-model one.

§03Per category

Where in each language does Sonic-3 actually rank?

Four content buckets — customer support, media, general conversation, content & education — broken down per language.

LanguageCustomer supportMediaGen-convContent & Edu
🇺🇸 US English
7/7
Elo 909 · 35.4%
7/7
Elo 873 · 30.1%
7/7
Elo 903 · 34.5%
7/7
Elo 873 · 29.7%
🇮🇳 Hindi
4/8
Elo 1001 · 50.1%
3/8
Elo 1012 · 52.1%
4/8
Elo 1023 · 53.9%
4/8
Elo 1021 · 53.6%
🇯🇵 Japanese
4/6
Elo 923 · 38.5%
4/6
Elo 906 · 36.5%
4/6
Elo 940 · 40.4%
5/6
Elo 931 · 40.2%
🇧🇷 Brazilian Portuguese
3/7
Elo 1012 · 52.1%
3/7
Elo 1018 · 52.7%
4/7
Elo 981 · 47.0%
3/7
Elo 1037 · 55.7%
🇻🇳 Vietnamese
3/7
Elo 989 · 48.4%
4/7
Elo 982 · 46.6%
4/7
Elo 1004 · 51.1%
4/7
Elo 994 · 49.0%
🇸🇦 Arabic MSA
4/7
Elo 960 · 44.0%
7/7
Elo 911 · 36.8%
6/7
Elo 938 · 40.3%
7/7
Elo 894 · 33.7%

The category gap inside a language is tiny — Sonic-3 is either good in a language or it isn't. Arabic is the one exception, where customer-support performance (#4) clearly beats content & education (#7).

§04Defect pattern

When Sonic-3 loses, what do raters complain about?

Stacked share of defect tags placed on Sonic-3 by raters who preferred the opposing model.

Overall, across all languages

DefectCountShare
Irregular pacing2,79024.3%
Mild mispronunciation2,75424.0%
Unnatural / robotic voice2,04117.8%
Severe mispronunciation1,62414.2%
Noise / distortion8367.3%
Hallucination / extra content7756.8%
Missing word6575.7%

Irregular pacing, mispronunciation, and an unnatural / robotic voice together account for ~80% of complaints. The mix differs sharply by language: US English is overwhelmingly pacing + robotic timbre, while Arabic is dominated by mispronunciation.

§05Head-to-head

Sonic-3 vs every other model, per language

Win rate with Wilson 95% CIs. Green = Sonic-3 wins, red = loses, grey = within tie band.

🇺🇸 US English
vs each opponent
🇮🇳 Hindi
vs each opponent
🇯🇵 Japanese
vs each opponent
🇧🇷 Brazilian Portuguese
vs each opponent
🇻🇳 Vietnamese
vs each opponent
🇸🇦 Arabic MSA
vs each opponent

The same pair of models can flip outcome by language. Sonic-3 beats Gemini 3.1 Flash in Hindi by 30+ points, then loses to it in US English by a similar gap. The competitive bar Sonic-3 has to clear is language-specific.

§06Wins

Where Sonic-3 wins

Five strong examples per content category, with side-by-side audio against the opponent it beat most decisively.

🇯🇵 Japanese · jp_cs_35
Won 5/5 vs Grok TTS
Court postponement PA

皆様にお知らせ申し上げます。東京地方裁判所第2A法廷において予定されていた事件番号CV-9X77Bの審理は、来週木曜日午前10時へ延期となりました。

Sonic-3
Grok TTS
🇧🇷 Brazilian Portuguese · br_cs_15
Won 5/5 vs gpt-4o-mini-tts
E-commerce — delivery ETA stall

Hum, ah, deixa eu verificar isso aqui pra você… beleza, parece que o entregador está virando agora perto do Pão de Açúcar na Rua Oscar Freire, e, hum, deve chegar na sua porta em uns 12 minutinhos.

Sonic-3
gpt-4o-mini-tts
🇯🇵 Japanese · jp_cs_06
Won 5/5 vs Grok TTS
E-commerce — delivery ETA stall

うーん、えーと、ちょっと確認いたしますね…はい、ドライバーが今ちょうど吉祥寺のサンロード商店街の角を曲がったところでして、あの、お荷物は、まあ、約7分ほどでお手元に届く予定です。

Sonic-3
Grok TTS
🇮🇳 Hindi · hi_cs_35
Won 5/5 vs Grok TTS
Court postponement PA

कृपया ध्यान दें, बंबई उच्च न्यायालय के न्यायालय कक्ष CR-2A में निर्धारित मुक़दमा संख्या CV-9X77B की सुनवाई न्यायाधीश के निर्देशानुसार स्थगित कर दी गई है और अब 14 अगस्त को होगी। संशोधित सुनवाई का समय सुबह 11:30 बजे होगा, और सभी कार्यवाही updated court calendar के अनुसार चलेंगी। संबंधित सभी पक्षों, अधिवक्ताओं और वादकारों से अनुरोध है कि वे इस परिवर्तन का संज्ञान लें और आवश्यक व्यवस्थाएँ कर लें। अधिक स्पष्टीकरण हेतु कृपया अदालत के क्लर्क कार्यालय से संपर्क करें। हुई असुविधा के लिए हमें खेद है और आपके सहयोग के लिए धन्यवाद।

Sonic-3
Grok TTS
🇸🇦 Arabic MSA · ar_cs_19
Won 5/5 vs gpt-4o-mini-tts
Utilities — record not found

أم، أعتذر منكم، لكن، أه، ليس لدينا أي سجل لدفع فاتورة الكهرباء لشهر يناير على رقم الحساب CG-554-3320.

Sonic-3
gpt-4o-mini-tts

The wins skew toward short, conversational, single-script sentences — the natural territory of Sonic-3's prosody.

§07Losses

Where Sonic-3 lost

Same shape as the wins, with the defect tags raters flagged on Sonic-3 and their verbatim comments.

🇯🇵 Japanese · jp_cs_16
Lost 4/5 vs Grok TTS
Booking confirmation

はい、えーと、お客様のサービス予約番号HVAC-TKY-4351、確定いたしました。

Sonic-3
Grok TTS
Defect tags
Noise / distortion×5Irregular pacing×5Mild mispronunciation×3Unnatural / robotic voice×2Severe mispronunciation×1Hallucination / extra content×1
Rater comments
  • it read out loud another sentence again and again.
  • Echo Pause should be made after a hiphen.ハイフンごとに話すべきです。
  • ノイズがある 機械的な音声と抑揚 英数字を読む時の息を置くタイミングが違う
🇸🇦 Arabic MSA · ar_cs_29
Lost 4/5 vs Gemini 3.1 Flash
Utility outage PA

سيشهد المشتركون في منطقة الشبكة 7-شرق انقطاعاً في التيار الكهربائي من الساعة 11 صباحاً إلى الساعة 2 ظهراً يوم 25 مارس 2026، وذلك بسبب أعمال صيانة طارئة على المحول الفرعي. ويُنصح السكان بتأجيل استخدام الأجهزة كثيفة الاستهلاك للطاقة وحفظ أي عمل رقمي قبل بدء الانقطاع.

Sonic-3
Gemini 3.1 Flash
Defect tags
Severe mispronunciation×3Mild mispronunciation×3Hallucination / extra content×2Unnatural / robotic voice×2Missing word×2Noise / distortion×2
Rater comments
  • المشتركون wrong accent صباحا mispronounced مارس wrong accent
  • اخطاء
🇸🇦 Arabic MSA · ar_cs_07
Lost 4/5 vs gpt-4o-mini-tts
Call transfer/handoff

مرحباً، أه، أهلاً بكم في مكتب الدعم لدينا. أنا، أم، أحوّل مكالمتكم إلى أخصائينا أحمد المنصوري المختص بطلبات إعادة التمويل، حتى يتمكن من خدمتكم بشكل أفضل.

Sonic-3
gpt-4o-mini-tts
Defect tags
Mild mispronunciation×4Severe mispronunciation×3Unnatural / robotic voice×3Hallucination / extra content×1Missing word×1Noise / distortion×1
Rater comments
  • نطق غير دقيق لكلمة : طلبات : نطق غير دقيق لكل من أه - أم
  • لفظ التنوين في كلمة " أهلا " بشكل خاطيء و التعابير غير بشرية
  • اقل جودة
🇯🇵 Japanese · jp_cs_32
Lost 5/5 vs Gemini 3.1 Flash
Commercial license PA

事業者免許番号BIZ-TKY-2026-W4-882をお持ちの事業主の皆様は、金曜日までに区役所第5-B窓口にて許可証の更新手続きをお済ませください。

Sonic-3
Gemini 3.1 Flash
Defect tags
Noise / distortion×3Mild mispronunciation×2Irregular pacing×2Severe mispronunciation×1Unnatural / robotic voice×1Missing word×1
Rater comments
  • The letters and numbers are read incorrectly.
  • There is noise background. The pace is not natural when it says the number.
  • There was a wrong pronunciation for the number,BIZ-TKY-2026-W4-882. Others are generally ok.
  • BIZ-TKY-2026-W4-882 is spoken with irregular pacing and messed up. At 0:15, "te" is missed.
  • ハイフンで区切られた箇所は一気に読む
🇻🇳 Vietnamese · vi_cs_19
Lost 4/5 vs Eleven v3
Utilities — record not found

Ừm, em thực sự xin lỗi chị, nhưng, ờ, em không tìm thấy bất kỳ ghi nhận thanh toán hóa đơn điện nào cho số tài khoản CG-554-3320 trong tháng 1 ạ.

Sonic-3
Eleven v3
Defect tags
Noise / distortion×5Irregular pacing×4Mild mispronunciation×2Unnatural / robotic voice×1
Rater comments
  • AudioB phía cuối có tạp âm
  • CG-554-3320: should speak one by one word and number at a time Pacing not good should improve Got a weird voice or sound at the end
  • Tiếng ồn cuối Audio ( Từ giây 13 đến giây 19) Đọc dãy số bị cách quãng
  • Tiếng ồn lớn cuối audio, giây 12 đến hết audio. Ngắt nghỉ ở các con số giữa "CG-554-3320" cũng không bình thường, giọng đều đều, ngữ điệu phẳng, lối đọc cơ học.
§08What to do

Recommendations

Five concrete moves the data points at, in priority order. (preview — not final)

  1. 01

    Fix pacing first

    Irregular pacing is the single largest defect tag (24% overall, 45% in US English). Tightening micro-pause behavior — especially after hyphens, alphanumerics, and clause boundaries — would lift the floor across every language.

  2. 02

    Treat US English as a regression target

    US English is dead last among seven models and the biggest contributor to the 119-Elo spread. Allocate dedicated voice-data and prosody work to close the gap with Gemini and ElevenLabs.

  3. 03

    Invest in Arabic pronunciation

    Arabic complaints are 57% mispronunciation (mild + severe). This is a lexicon problem, not a prosody one: prioritize Arabic G2P, MSA edge cases, and code-mixed Latin spans inside Arabic prompts.

  4. 04

    Audit alphanumeric handling per locale

    Japanese drops ~5 points of win rate when a prompt contains inline alphanumerics; Arabic gains ~9 points. The handling is clearly locale-conditional and worth a directed eval.

  5. 05

    Stress-test long inputs

    The length-vs-WR trend is mild but consistent. Add long-sentence battles (>200 chars) to the routine eval so regressions there don't hide behind aggregate scores.

§09Methodology

How this brief was built

Re-derived from raw pairwise votes at report-build time, then cross-checked against the live VoiceArena leaderboard.

All figures come from VoiceArena v2 data captured in May 2026. Elo scores were recomputed from raw pairwise battles using the same algorithm the live dashboard uses, with a Bayesian bootstrap for the 95% confidence intervals. Recomputed Sonic-3 Elos matched the leaderboard exactly (max |Δ| = 0.00 across all six languages).

Defect tags are only counted when the rater preferred the opposing model and chose to fill out the optional tag form. Counts reflect how often each tag was placed specifically on Sonic-3.

Audio examples are the original generated waveforms served by VoiceArena's storage bucket — neither file was re-encoded or normalized for this report.

© 2026 · Cartesia Sonic-3 performance brief · VoiceArena v2 data