Performance brief · v2 data

Cartesia Sonic-3 on VoiceArena

A 119-Elo spread on a single model. Six languages, twelve thousand battles, and a defect pattern that points at three concrete fixes.

Generated May 2026 · re-derived from raw votes · cross-checked against the live leaderboard (max |Δ| = 0.00 Elo)

Best language

#4/8

Hindi

Worst language

#7/7

US English

Elo spread

120

same model

Pairwise battles

12,859

11,477 defect tags

§02Cross-language

How Sonic-3 ranks across six languages

Same model, six markets. The gap between top and bottom is bigger than the gap between most adjacent competitors.

Language	Rank	Elo	95% CI	Win rate	Battles
🇮🇳 Hindi	4 / 8	1012.1	±9	52.1%	2,475
🇧🇷 Brazilian Portuguese	3 / 7	1011.7	±11	51.9%	2,201
🇻🇳 Vietnamese	3 / 7	991.1	±12	48.7%	1,487
🇸🇦 Arabic MSA	6 / 7	931.6	±8	39.5%	2,687
🇯🇵 Japanese	4 / 6	925.2	±12	38.7%	2,225
🇺🇸 US English	7 / 7	892.5	±10	32.9%	1,784

Hindi and Brazilian Portuguese sit comfortably mid-pack; US English finishes dead last among seven models. The 119-Elo spread between best and worst is a single-model variance, not a model-vs-model one.

§03Per category

Where in each language does Sonic-3 actually rank?

Four content buckets — customer support, media, general conversation, content & education — broken down per language.

Language	Customer support	Media	Gen-conv	Content & Edu
🇺🇸 US English	7/7 Elo 909 · 35.4%	7/7 Elo 873 · 30.1%	7/7 Elo 903 · 34.5%	7/7 Elo 873 · 29.7%
🇮🇳 Hindi	4/8 Elo 1001 · 50.1%	3/8 Elo 1012 · 52.1%	4/8 Elo 1023 · 53.9%	4/8 Elo 1021 · 53.6%
🇯🇵 Japanese	4/6 Elo 923 · 38.5%	4/6 Elo 906 · 36.5%	4/6 Elo 940 · 40.4%	5/6 Elo 931 · 40.2%
🇧🇷 Brazilian Portuguese	3/7 Elo 1012 · 52.1%	3/7 Elo 1018 · 52.7%	4/7 Elo 981 · 47.0%	3/7 Elo 1037 · 55.7%
🇻🇳 Vietnamese	3/7 Elo 989 · 48.4%	4/7 Elo 982 · 46.6%	4/7 Elo 1004 · 51.1%	4/7 Elo 994 · 49.0%
🇸🇦 Arabic MSA	4/7 Elo 960 · 44.0%	7/7 Elo 911 · 36.8%	6/7 Elo 938 · 40.3%	7/7 Elo 894 · 33.7%

The category gap inside a language is tiny — Sonic-3 is either good in a language or it isn't. Arabic is the one exception, where customer-support performance (#4) clearly beats content & education (#7).

§04Defect pattern

When Sonic-3 loses, what do raters complain about?

Stacked share of defect tags placed on Sonic-3 by raters who preferred the opposing model.

Overall, across all languages

Defect	Count	Share
Irregular pacing	2,790	24.3%
Mild mispronunciation	2,754	24.0%
Unnatural / robotic voice	2,041	17.8%
Severe mispronunciation	1,624	14.2%
Noise / distortion	836	7.3%
Hallucination / extra content	775	6.8%
Missing word	657	5.7%

Irregular pacing, mispronunciation, and an unnatural / robotic voice together account for ~80% of complaints. The mix differs sharply by language: US English is overwhelmingly pacing + robotic timbre, while Arabic is dominated by mispronunciation.

§05Head-to-head

Sonic-3 vs every other model, per language

Win rate with Wilson 95% CIs. Green = Sonic-3 wins, red = loses, grey = within tie band.

🇺🇸 US English

vs each opponent

🇮🇳 Hindi

vs each opponent

🇯🇵 Japanese

vs each opponent

🇧🇷 Brazilian Portuguese

vs each opponent

🇻🇳 Vietnamese

vs each opponent

🇸🇦 Arabic MSA

vs each opponent

The same pair of models can flip outcome by language. Sonic-3 beats Gemini 3.1 Flash in Hindi by 30+ points, then loses to it in US English by a similar gap. The competitive bar Sonic-3 has to clear is language-specific.

§06Wins

Where Sonic-3 wins

Five strong examples per content category, with side-by-side audio against the opponent it beat most decisively.

🇯🇵 Japanese · jp_cs_35

Won 5/5 vs Grok TTS

Court postponement PA

“皆様にお知らせ申し上げます。東京地方裁判所第2A法廷において予定されていた事件番号CV-9X77Bの審理は、来週木曜日午前10時へ延期となりました。”

Sonic-3

Grok TTS

🇧🇷 Brazilian Portuguese · br_cs_15

Won 5/5 vs gpt-4o-mini-tts

E-commerce — delivery ETA stall

“Hum, ah, deixa eu verificar isso aqui pra você… beleza, parece que o entregador está virando agora perto do Pão de Açúcar na Rua Oscar Freire, e, hum, deve chegar na sua porta em uns 12 minutinhos.”

Sonic-3

gpt-4o-mini-tts

🇯🇵 Japanese · jp_cs_06

Won 5/5 vs Grok TTS

E-commerce — delivery ETA stall

“うーん、えーと、ちょっと確認いたしますね…はい、ドライバーが今ちょうど吉祥寺のサンロード商店街の角を曲がったところでして、あの、お荷物は、まあ、約7分ほどでお手元に届く予定です。”

Sonic-3

Grok TTS

🇮🇳 Hindi · hi_cs_35

Won 5/5 vs Grok TTS

Court postponement PA

“कृपया ध्यान दें, बंबई उच्च न्यायालय के न्यायालय कक्ष CR-2A में निर्धारित मुक़दमा संख्या CV-9X77B की सुनवाई न्यायाधीश के निर्देशानुसार स्थगित कर दी गई है और अब 14 अगस्त को होगी। संशोधित सुनवाई का समय सुबह 11:30 बजे होगा, और सभी कार्यवाही updated court calendar के अनुसार चलेंगी। संबंधित सभी पक्षों, अधिवक्ताओं और वादकारों से अनुरोध है कि वे इस परिवर्तन का संज्ञान लें और आवश्यक व्यवस्थाएँ कर लें। अधिक स्पष्टीकरण हेतु कृपया अदालत के क्लर्क कार्यालय से संपर्क करें। हुई असुविधा के लिए हमें खेद है और आपके सहयोग के लिए धन्यवाद।”

Sonic-3

Grok TTS

🇸🇦 Arabic MSA · ar_cs_19

Won 5/5 vs gpt-4o-mini-tts

Utilities — record not found

“أم، أعتذر منكم، لكن، أه، ليس لدينا أي سجل لدفع فاتورة الكهرباء لشهر يناير على رقم الحساب CG-554-3320.”

Sonic-3

gpt-4o-mini-tts

The wins skew toward short, conversational, single-script sentences — the natural territory of Sonic-3's prosody.

§07Losses

Where Sonic-3 lost

Same shape as the wins, with the defect tags raters flagged on Sonic-3 and their verbatim comments.

🇯🇵 Japanese · jp_cs_16

Lost 4/5 vs Grok TTS

Booking confirmation

“はい、えーと、お客様のサービス予約番号HVAC-TKY-4351、確定いたしました。”

Sonic-3

Grok TTS

Defect tags

Noise / distortion×5Irregular pacing×5Mild mispronunciation×3Unnatural / robotic voice×2Severe mispronunciation×1Hallucination / extra content×1

Rater comments

it read out loud another sentence again and again.
Echo Pause should be made after a hiphen.ハイフンごとに話すべきです。
ノイズがある機械的な音声と抑揚英数字を読む時の息を置くタイミングが違う

🇸🇦 Arabic MSA · ar_cs_29

Lost 4/5 vs Gemini 3.1 Flash

Utility outage PA

“سيشهد المشتركون في منطقة الشبكة 7-شرق انقطاعاً في التيار الكهربائي من الساعة 11 صباحاً إلى الساعة 2 ظهراً يوم 25 مارس 2026، وذلك بسبب أعمال صيانة طارئة على المحول الفرعي. ويُنصح السكان بتأجيل استخدام الأجهزة كثيفة الاستهلاك للطاقة وحفظ أي عمل رقمي قبل بدء الانقطاع.”

Sonic-3

Gemini 3.1 Flash

Defect tags

Severe mispronunciation×3Mild mispronunciation×3Hallucination / extra content×2Unnatural / robotic voice×2Missing word×2Noise / distortion×2

Rater comments

المشتركون wrong accent صباحا mispronounced مارس wrong accent
اخطاء

🇸🇦 Arabic MSA · ar_cs_07

Lost 4/5 vs gpt-4o-mini-tts

Call transfer/handoff

“مرحباً، أه، أهلاً بكم في مكتب الدعم لدينا. أنا، أم، أحوّل مكالمتكم إلى أخصائينا أحمد المنصوري المختص بطلبات إعادة التمويل، حتى يتمكن من خدمتكم بشكل أفضل.”

Sonic-3

gpt-4o-mini-tts

Defect tags

Mild mispronunciation×4Severe mispronunciation×3Unnatural / robotic voice×3Hallucination / extra content×1Missing word×1Noise / distortion×1

Rater comments

نطق غير دقيق لكلمة : طلبات : نطق غير دقيق لكل من أه - أم
لفظ التنوين في كلمة " أهلا " بشكل خاطيء و التعابير غير بشرية
اقل جودة

🇯🇵 Japanese · jp_cs_32

Lost 5/5 vs Gemini 3.1 Flash

Commercial license PA

“事業者免許番号BIZ-TKY-2026-W4-882をお持ちの事業主の皆様は、金曜日までに区役所第5-B窓口にて許可証の更新手続きをお済ませください。”

Sonic-3

Gemini 3.1 Flash

Defect tags

Noise / distortion×3Mild mispronunciation×2Irregular pacing×2Severe mispronunciation×1Unnatural / robotic voice×1Missing word×1

Rater comments

The letters and numbers are read incorrectly.
There is noise background. The pace is not natural when it says the number.
There was a wrong pronunciation for the number,BIZ-TKY-2026-W4-882. Others are generally ok.
BIZ-TKY-2026-W4-882 is spoken with irregular pacing and messed up. At 0:15, "te" is missed.
ハイフンで区切られた箇所は一気に読む

🇻🇳 Vietnamese · vi_cs_19

Lost 4/5 vs Eleven v3

Utilities — record not found

“Ừm, em thực sự xin lỗi chị, nhưng, ờ, em không tìm thấy bất kỳ ghi nhận thanh toán hóa đơn điện nào cho số tài khoản CG-554-3320 trong tháng 1 ạ.”

Sonic-3

Eleven v3

Defect tags

Noise / distortion×5Irregular pacing×4Mild mispronunciation×2Unnatural / robotic voice×1

Rater comments

AudioB phía cuối có tạp âm
CG-554-3320: should speak one by one word and number at a time Pacing not good should improve Got a weird voice or sound at the end
Tiếng ồn cuối Audio ( Từ giây 13 đến giây 19) Đọc dãy số bị cách quãng
Tiếng ồn lớn cuối audio, giây 12 đến hết audio. Ngắt nghỉ ở các con số giữa "CG-554-3320" cũng không bình thường, giọng đều đều, ngữ điệu phẳng, lối đọc cơ học.

§08What to do

Recommendations

Five concrete moves the data points at, in priority order. (preview — not final)

01
Fix pacing first
Irregular pacing is the single largest defect tag (24% overall, 45% in US English). Tightening micro-pause behavior — especially after hyphens, alphanumerics, and clause boundaries — would lift the floor across every language.
02
Treat US English as a regression target
US English is dead last among seven models and the biggest contributor to the 119-Elo spread. Allocate dedicated voice-data and prosody work to close the gap with Gemini and ElevenLabs.
03
Invest in Arabic pronunciation
Arabic complaints are 57% mispronunciation (mild + severe). This is a lexicon problem, not a prosody one: prioritize Arabic G2P, MSA edge cases, and code-mixed Latin spans inside Arabic prompts.
04
Audit alphanumeric handling per locale
Japanese drops ~5 points of win rate when a prompt contains inline alphanumerics; Arabic gains ~9 points. The handling is clearly locale-conditional and worth a directed eval.
05
Stress-test long inputs
The length-vs-WR trend is mild but consistent. Add long-sentence battles (>200 chars) to the routine eval so regressions there don't hide behind aggregate scores.

§09Methodology

How this brief was built

Re-derived from raw pairwise votes at report-build time, then cross-checked against the live VoiceArena leaderboard.

All figures come from VoiceArena v2 data captured in May 2026. Elo scores were recomputed from raw pairwise battles using the same algorithm the live dashboard uses, with a Bayesian bootstrap for the 95% confidence intervals. Recomputed Sonic-3 Elos matched the leaderboard exactly (max |Δ| = 0.00 across all six languages).

Defect tags are only counted when the rater preferred the opposing model and chose to fill out the optional tag form. Counts reflect how often each tag was placed specifically on Sonic-3.

Audio examples are the original generated waveforms served by VoiceArena's storage bucket — neither file was re-encoded or normalized for this report.

Cartesia Sonic-3 on VoiceArena

Overall, across all languages

Fix pacing first

Treat US English as a regression target

Invest in Arabic pronunciation

Audit alphanumeric handling per locale

Stress-test long inputs