Quickstart
This guide will help you get started with pretok in just a few minutes.
Basic Usage
Creating a Pretok Instance
from pretok import Pretok, create_pretok
# Create with explicit target language
pretok = Pretok(target_language="en")
# Or create with model-based configuration
pretok = create_pretok(model_id="gpt-4") # Uses GPT-4's primary language
Processing Text
# Simple processing
result = pretok.process("Hola, como estas?")
print(f"Output: {result.processed_text}")
print(f"Was modified: {result.was_modified}")
print(f"Detections: {result.detections}")
Working with Language Detection
# Detect language only (no translation)
detection = pretok.detect("Bonjour le monde")
print(f"Language: {detection.language}")
print(f"Confidence: {detection.confidence}")
Working with Prompts
pretok preserves prompt structure during translation:
prompt = """<|im_start|>system
You are a helpful assistant.
<|im_end|>
<|im_start|>user
Ecrivez un poeme sur Python.
<|im_end|>"""
result = pretok.process(prompt)
# Only the content is translated, markers are preserved
Using Custom Translation Backends
OpenAI API
from pretok import Pretok
from pretok.config import LLMTranslatorConfig
from pretok.translation.llm import LLMTranslator
config = LLMTranslatorConfig(
base_url="https://api.openai.com/v1",
model="gpt-4o-mini",
)
translator = LLMTranslator(config)
pretok = Pretok(target_language="en", translator=translator)
OpenRouter
config = LLMTranslatorConfig(
base_url="https://openrouter.ai/api/v1",
model="anthropic/claude-3-haiku",
# api_key will use OPENAI_API_KEY or OPENROUTER_API_KEY env var
)
Local Ollama
config = LLMTranslatorConfig(
base_url="http://localhost:11434/v1",
model="llama3",
api_key="ollama", # Ollama doesn't require a real key
)
Configuration
Using YAML
Create a pretok.yaml file:
version: "1.0"
pipeline:
default_detector: langdetect
cache_enabled: true
translation:
llm:
base_url: "https://api.openai.com/v1"
model: "gpt-4o-mini"
# api_key_env: OPENAI_API_KEY # Optional, defaults to OPENAI_API_KEY
cache:
memory:
max_size: 1000
ttl: 3600
Loading Configuration
from pretok.config import load_config
config = load_config("pretok.yaml")
pretok = Pretok(config=config)
Next Steps
- Configuration Guide - Learn about all configuration options
- Pipeline Guide - Deep dive into pipeline usage
- Translation Backends - Configure translation engines