Current API Models, Clearly Tiered
Free API keys can call auto and hrLLM at 40 requests per hour. PRO unlocks direct access to the rest of the new lineup, while legacy Kiwi models remain visible as deprecated compatibility entries.
EOL: 09.03.2026. They stay visible for lineage and transition planning, but the current public lineup centers on auto, hrLLM, and the new direct PRO models.Recommended Free Croatian Model
hrLLM is our Croatian-first model. It writes and answers only in grammatically correct Croatian and is being actively tuned because Croatian is still poorly covered by most general-purpose models.
Built specifically for Croatian instead of treating it as a low-priority multilingual edge case.
Keeps tone, inflection, and sentence structure cleaner than general-purpose models on Croatian prompts.
Recommended free model for Croatian-first API and dashboard workflows.
Public API access
Direct model ID: hrllm
Free API keys: 40 requests/hour
Recommended for Croatian-first products, assistants, and writing workflows.
Open hrLLM pageCurrent API Lineup
These are the current public-facing models in the lineup. hrLLM is the recommended free Croatian model, while the other direct models are available as PRO.
api.llm.kiwi
Croatian-first model for writing and answering in grammatically correct Croatian.
Best used for: Croatian customer support, formal business writing, public-sector communication, and education content.
Based on: hrllm
api.llm.kiwi
Reasoning-heavy Pro model for deeper analysis, technical planning, and multi-step problem solving.
Best used for: Complex reasoning, technical architecture, advanced debugging plans, and step-heavy analytical work.
Based on: DeepSeekR1
api.llm.kiwi
Compact Pro model for quick reasoning, drafting, and lightweight production tasks.
Best used for: Fast general chat, structured drafting, lightweight copilots, and low-latency automations.
Based on: Qwen3-1.4B
api.llm.kiwi
Small Pro model tuned for efficient text work, simple assistants, and lean automation.
Best used for: Short-form generation, compact task agents, headline variants, and simple classification-style prompts.
Based on: SmolLM2-1.7B
api.llm.kiwi
Coding-first Pro model for implementation, refactors, and repository-aware engineering help.
Best used for: Code generation, repository edits, bug fixing, refactors, and engineering assistance workflows.
Based on: starcoder2-7b
Deprecated Models
Deprecated models remain listed for continuity, migration, and provider lineage. They are intentionally greyed out and clearly marked with their EOL date.
Kiwi Code Frontier
High-throughput coding lane tuned for enterprise repositories and API services.
Best used for: Backend implementation, SQL-heavy services, and test-driven code generation.
Based on: Kiwi Codestral lane
Kiwi Code Frontier
Coding-focused model lane for advanced implementation, refactors, and debugging.
Best used for: Large codebase edits, architectural refactors, and deep debugging workflows.
Based on: Kiwi DeepSeek V3 lane
Mistral AI
Stable all-purpose instruction model for consistent team outputs.
Best used for: General business writing, reusable templates, and dependable delivery.
Based on: Mistral 7B Instruct v0.1
Efficient compact model from Google's Gemma family.
Best used for: Quick copy variants, concise outlines, and fast idea expansion.
Based on: Gemma 2B Instruct LoRA
Kiwi Frontier
Fast multimodal lane for lightweight reasoning and visual-text blended prompts.
Best used for: Rapid assistants, concise drafting, and image-aware prompt pipelines.
Based on: Kiwi GLM 4.6V Flash lane
Kiwi Frontier
Turbo reasoning lane for high-context conversation and tool-compatible outputs.
Best used for: Long-context technical Q&A, API assistants, and dynamic copilots.
Based on: Kiwi Llama 3.1 8B Turbo lane
Kiwi Frontier
Multimodal-capable Pro lane tuned for fast reasoning and robust instruction following.
Best used for: Mixed text/image workflows, compact automation agents, and rapid product features.
Based on: Kiwi Ministral 8B lane
Microsoft
Fast lightweight assistant for short tasks and quick checks.
Best used for: Simple prompts, short rewrites, and rapid iteration loops.
Based on: Phi-2
Meta Llama
Balanced model based on Meta Llama family for dependable dialog tasks.
Best used for: General chat, assistant workflows, and robust business Q&A.
Based on: Llama 2 7B Chat LoRA
Kiwi Frontier
Open-model lane for balanced general tasks and flexible experimentation.
Best used for: General workflows, iterative prompts, and broad assistant behavior tuning.
Based on: Kiwi OSS 20B lane
LLM.kiwi Core
Reasoning-first assistant for technical and strategic deliverables.
Best used for: Deep technical guidance, architecture writing, and detailed analysis.
Based on: Kiwi Core Reasoning
LLM.kiwi Core
Balanced general-purpose model for reliable daily production work.
Best used for: Blog content drafts, marketing copy, and structured Q&A.
Based on: Kiwi Core Balanced
Microsoft
Low-latency assistant optimized for speed and practical action.
Best used for: Fast answers, tactical task lists, and lightweight workflow support.
Based on: Phi-2
Meta Llama
High-signal assistant tuned for clarity and practical structure.
Best used for: Clear explainers, comparison writeups, and concise plans.
Based on: Llama 3.1 8B Instruct FP8
Mistral AI
Instruction-heavy profile powered by Mistral family models.
Best used for: Complex instruction following, workflows, and technical drafting.
Based on: Mistral 7B Instruct v0.2 LoRA
Cloudflare Workers AI
Highest depth assistant for long-form reasoning and premium output quality.
Best used for: Executive briefs, long-form strategy, and advanced reasoning tasks.
Based on: Workers AI runtime default (ultra quality profile)
Access and Usage Limits
These are the model-access highlights users need most often. The complete reference stays in the docs.
Free: 40 requests/hour for auto and hrLLM
PRO unlocks direct access to the new advanced models with higher sustained throughput.
192 requests/minute per IP
Cache-friendly endpoint for model discovery and compatibility metadata.
36 requests/minute per signed-in user + IP
hrLLM additionally uses a tighter free-tier hourly model limit.
24 requests/minute per signed-in user
hrLLM additionally uses a tighter free-tier hourly model limit.