Translation Model Comparison
This document records the performance of different Ollama models in pretok translation tasks.
Test Date: 2026-01-30
pretok Version: 0.1.2
Tested Models
| Model | Size | Description |
|---|---|---|
| gemma3:latest | 3.3 GB | Google Gemma 3 general-purpose model |
| qwen3:latest | 5.2 GB | Alibaba Qwen 3 thinking model (uses internal reasoning) |
| translategemma:4b | 3.3 GB | Google dedicated translation model |
| gpt-oss:20b | 13 GB | GPT-OSS 20B general-purpose model |
Test Results
Basic Translation Tests
| Test Case | Original (Chinese) | gemma3 | qwen3 (with max_tokens) | translategemma | gpt-oss:20b (with max_tokens) |
|---|---|---|---|---|---|
| Code Request | 請幫我寫一個計算費波那契數列的 Python 函數 | ✅ Please help me write a Python function to calculate the Fibonacci sequence. | ✅ Please help me write a Python function to calculate the Fibonacci sequence. | ✅ Please help me write a Python function to calculate the Fibonacci sequence. | ✅ Please help me write a Python function to calculate the Fibonacci sequence. |
| Daily Conversation | 今天天氣很好,適合出門散步 | ✅ Today the weather is very good, suitable for going out for a walk. | ⚠️ Today's weather is great, perfect for going out for a walk. | ✅ The weather is very nice today, perfect for a walk. | ✅ Today the weather is very good, suitable for going out for a walk. |
| Technical Terms | 機器學習是人工智慧的一個重要分支 | ✅ Machine learning is an important branch of artificial intelligence. | ✅ Machine learning is an important branch of artificial intelligence. | ✅ Machine learning is an important branch of artificial intelligence. | ✅ Machine learning is an important branch of artificial intelligence. |
Note: qwen3 and gpt-oss:20b require
max_tokensconfiguration. See Configuration for Thinking Models below.⚠️ qwen3 Stability Warning: qwen3 may occasionally return untranslated text or empty results. Recommend using higher
max_tokens(1000+) for better stability.
Translation Speed (seconds)
| Test Case | gemma3 | qwen3 (with max_tokens) | translategemma | gpt-oss:20b (with max_tokens) |
|---|---|---|---|---|
| Code Request | 3.75s | 14.18s | 3.51s | 14.91s |
| Daily Conversation | 0.41s | 13.53s | 0.36s | 6.43s |
| Technical Terms | 0.32s | 6.15s | 0.32s | 5.10s |
Note: qwen3 is slower because it uses internal reasoning (thinking) before output. gpt-oss:20b is slower due to its larger model size (20B parameters).
ChatML Format Test
Testing whether pretok correctly preserves special tokens and translates content when processing ChatML format prompts:
Input:
<|im_start|>system
你是一個專業的程式助手。
<|im_end|>
<|im_start|>user
請解釋什麼是遞迴函數
<|im_end|>
<|im_start|>assistant
| Model | Structure Preserved | Content Translated | Time |
|---|---|---|---|
| gemma3:latest | ✅ | ✅ You are a professional programming assistant. / Please explain what a recursive function is. | 0.56s |
| qwen3:latest (with max_tokens) | ✅ | ✅ You are a professional programming assistant. / Please explain what a recursive function is. | 13.82s |
| translategemma:4b | ✅ | ✅ You are a professional programming assistant. / Please explain what a recursive function is. | 0.57s |
| gpt-oss:20b (with max_tokens) | ✅ | ✅ You are a professional programming assistant. / Please explain what a recursive function is. | 10.39s |
Conclusions and Recommendations
Recommended Models
- translategemma:4b ⭐ Recommended
- Designed specifically for translation, best translation quality
- Fastest speed (~0.3-3.5s)
- Most stable and consistent results
-
More natural and fluent translations (e.g., "perfect for a walk" vs "suitable for going out for a walk")
-
gemma3:latest ✅ Usable
- General-purpose model with good translation quality
- Fast speed (~0.3-3.8s)
- Stable and reliable
-
Can be used as an alternative
-
gpt-oss:20b ✅ Usable (with max_tokens)
- Large model (20B) with good translation quality
- Slower due to model size (~5-15s per translation)
- Stable when configured with
max_tokens=500 -
Requires
max_tokensconfiguration -
qwen3:latest ⚠️ Not Recommended
- Unstable translation results (may return empty or untranslated text)
- Slowest speed (~6-14s) due to thinking process
- Requires
max_tokens=1000+configuration for basic functionality - Even with proper configuration, results may be inconsistent
Usage Example
from pretok import Pretok
from pretok.config import LLMTranslatorConfig
from pretok.translation.llm import LLMTranslator
# Recommended configuration - using translategemma
config = LLMTranslatorConfig(
api_key="ollama",
model="translategemma:4b", # Recommended
base_url="http://localhost:11434/v1",
)
translator = LLMTranslator(config)
pretok = Pretok(target_language="en", translator=translator)
Configuration for Thinking Models (qwen3, DeepSeek-R1, etc.)
Thinking models use internal reasoning tokens (<think>...</think>) before producing output.
The default max_tokens calculation (len(text) * 4) is insufficient for these models.
Some larger models like gpt-oss:20b may also require explicit max_tokens configuration.
Solution: Set max_tokens explicitly or increase the multiplier:
# Option 1: Set explicit max_tokens (recommended)
config = LLMTranslatorConfig(
api_key="ollama",
model="gpt-oss:20b", # Stable with max_tokens=500
base_url="http://localhost:11434/v1",
max_tokens=500,
)
# Option 2: For qwen3 (requires higher max_tokens, but still unstable)
config = LLMTranslatorConfig(
api_key="ollama",
model="qwen3:latest",
base_url="http://localhost:11434/v1",
max_tokens=1000, # Higher value needed for stability
)
# Option 3: Increase the multiplier
config = LLMTranslatorConfig(
api_key="ollama",
model="qwen3:latest",
base_url="http://localhost:11434/v1",
max_tokens_multiplier=15, # 15x input length
)
| Parameter | Default | Description |
|---|---|---|
max_tokens |
None | Fixed token limit. Overrides multiplier if set. |
max_tokens_multiplier |
4 | Multiplier for len(text) * multiplier. |
Notes
- Test results may vary depending on hardware configuration and model version
- Thinking models (qwen3, DeepSeek-R1) require higher
max_tokenssettings but may still produce inconsistent results - qwen3 is not recommended for production use due to inconsistent translation behavior
- gpt-oss:20b works reliably with
max_tokens=500configuration - It is recommended to conduct your own tests before actual use