Language Detection
pretok supports multiple language detection backends.
Available Detectors
LangDetect (Default)
Pure Python implementation, no external dependencies.
from pretok.detection import LangDetectDetector
detector = LangDetectDetector(seed=42)
result = detector.detect("Bonjour le monde")
print(result.language) # "fr"
print(result.confidence) # 0.99
FastText
High accuracy, requires FastText model file.
from pretok.detection import FastTextDetector
detector = FastTextDetector(model_path="/path/to/lid.176.bin")
result = detector.detect("Hello world")
Composite Detector
Combine multiple detectors for better accuracy.
from pretok.detection import CompositeDetector
detector = CompositeDetector(
detectors=["fasttext", "langdetect"],
strategy="voting",
)
Detection Result
@dataclass
class DetectionResult:
language: str # ISO 639-1 code
confidence: float # 0.0 to 1.0
alternatives: list[tuple[str, float]]