Model Capabilities
Define which languages each model supports.
Capability Registry
from pretok.capability import CapabilityRegistry, ModelCapability
registry = CapabilityRegistry()
# Register a model
registry.register(ModelCapability(
model_id="llama-2-7b",
supported_languages=frozenset(["en"]),
primary_language="en",
))
Built-in Profiles
pretok includes profiles for common models:
from pretok.capability import load_builtin_profiles
registry = CapabilityRegistry()
load_builtin_profiles(registry)
# Now includes GPT-4, Llama-2, etc.
Configuration
Define capabilities in your config file:
models:
gpt-4:
supported_languages: [en, zh, ja, ko, fr, de, es]
primary_language: en
llama-2-7b:
supported_languages: [en]
primary_language: en
my-custom-model:
supported_languages: [en, fr]
primary_language: en
fallback_language: en
Checking Translation Requirements
needs_translation, target_lang = registry.requires_translation(
model_id="llama-2-7b",
source_lang="fr",
)
print(needs_translation) # True
print(target_lang) # "en"